case study data verification

Data Verification Case Study © 2013 K2 Case Studies www.k2reimagine.com 1 Managing for Complexity in the Crowd We delegated to the crowd an assignment what could have been simply completed by any data collection analyst. The problem with assigning this sort of work to an in-house analyst, however, is that t his sort of task is generally “low value” – a business would not want to spend its expert staff’s time doing this j ob. Yet, the job required some amount of cognitive ability and could not be ordered to a machine. This was the perfect j ob for the crowd. We took an approach that most data managers would in engaging t he crowd ! it didn't go well. Here's what we learned. The Task We asked the crowd to validate company websites. We provided them with a Company Name and a Company Description, which was actually the description of the SIC industry classificati on that had be en assigned to the business, and we asked the crowd workers to confirm whether the link we gave them was the correct website for the company. We asked the contributors to sel ect one of the following answers: 1 – Yes, the Name and SIC match the site 2 – No, neither the Name nor the SIC match the site 3 – Maybe, the Name matches but the SIC does not 4 – Bad link As you can see from our ans wer cho ices, we were ac tually trying to get m ore than one answer from the workers. In addition t o providing the SIC code as a means to confirm that the website belonged to the business (matching not only the Company Name but also the Company Description to the website), we also hoped to validate the industry classification previously assigned to the business (answer choice “Maybe”). Our instructions explained how to find the Company Name on the website and where workers might easily locate a description of services that could be used to compare the Company Description. We distributed the work to U.S.-based crowdworkers and paid $0.04 per unit. The entire file included 391 records with 13 test questions (or “gold” records) that were used to gauge the trust level of the workers on this task. Contributors were paid for every judgment (answer) provi ded and, apart from the test questions which were included in each task and used to determine whether the worker could continue working on the job, the accuracy of the work was not checked until the entire file had been completed. Findings Overall, the quality of the work we received was not great – however, this did not surprise us, as we knew that there were flaws in the design of our task. We will discuss the mi stakes we made and our takeaways in the next section. First, we share our analysis of the result s file: ! The same universe of workers completed both the “golds” and full file records, and had an average trust level of 89.9% ! The quality-checked sample from the full file was not randomly selected. We included more units where the correct answer was not likely to be “Yes” in order to gauge the quality of judgments where the correct answer was “No” or “Maybe”, as these judgme nts had a higher e rror rate in th e gold re cord results. ! Based on the golds and the sample checked from the full file, the correct answer was “Yes” more than half of the time. TIP 1 Avoid overloading questions – instead, break complex questions into multiple steps. TIP 2 If there is a predominant correct answer to your question, select test questions where the answer is not the easiest uess.

case study data verification

Documents