The bad data conundrum

The Bad Data Conundrum

Dirty data could be a bigger problem in market research than many people realize, and the drive is on to keep it in check.

As the pandemic forced market research to move onto a more online footing in order to conduct day to day activities, so alarm bells sounded in some quarters over the quality of data being received. 

A 2020 report by Grey Matter Research and Harmon Research has highlighted some of the significant issues with data collection from popular survey panels used in the USA. 

By placing a test questionnaire with five of the ten largest online panels, they were able to interrogate the data produced. Their findings are alarming and should be noted by anyone in MR with an interest in the integrity of data. 

How widespread is the problem?

While Pew Research Centre estimates that 4-7% of panel opt-in survey respondents are bogus, the test by Grey Matter and Harmon actually threw out 46% of all respondents once they had applied more rigorous quality control methods. As well as removing respondents who failed to perform the CAPTCHA test, they ran an algorithm to remove ‘speeders’ and nonsensical replies to questions. Equally, they removed those with simplistic “good” or “I like it” answers, according to their report, before applying a more forensic approach to respondents, line by line. Once someone had made four or more errors, they were removed. 

Of 1909 people who answered the survey, they ended up with 1029 valid responses and 880 who were removed. 

Those bogus responses can skew findings and insights significantly, leading to false client confidence and potentially, large financial losses along with a loss of confidence in MR as a whole.

What can be done about dirty data?

As companies begin to reinvest in MR following a year of uncertainty, the onus is upon MR to provide the kind of insights clients can confidently act upon. While AI tools can be used to harvest data from qualitative interviews and panels, the risk of false positives and corrupted data remain possible. 

Unfortunately, much of the work rests upon the researchers themselves. Diligent, manual oversight of the data is now more crucial than ever. Double-checking, cross referencing and a deeper dive in to the data sources and their credibility may be the only way to ensure that what you present to a client is the true picture. Anything less than that risks your future relationship.

Keeping it clean

There are obvious methods of cleaning survey data, of course, such as applying filters to weed out the speed runs and even respondents who fall outside of things like age range and location. Another way is to inspect the numerical data as a graphic; this can often highlight the speedsters and ‘Christmas tree’ style respondents, who reply in a pattern rather than answer the questions themselves. 

Finally, don’t delete the bad responses. Keeping them will allow you to present them at a later date, as proof that you were working to obtain the best insights and also so that you can flag them for possible follow up and case studies of your own on data integrity.

Related Articles