fundamental of quality data - anthony ndungu

36
Research Methods Group http://worldagroforestry.org/research- methods/ Fundamentals of data quality control Antony Karanja Ndungu [email protected] Science Week 2013

Upload: world-agroforestry-centre-icraf

Post on 14-Aug-2015

198 views

Category:

Data & Analytics


4 download

TRANSCRIPT

Research Methods Group

http://worldagroforestry.org/research-methods/

Fundamentals of data quality control

Antony Karanja [email protected]

Science Week 2013

Outline Why Quality/Consistent data? Where do we start? Levels of Data Quality Check Standard Data Cleaning

Procedure Storage, sharing and archiving

Introduction

• Why Quality Data?• Data quality is crucial for quality research

results.– Factual and represent real world– Accuracy, reliability and replication– Integrity, credibility and reputation– Avoid your paper being rejected

• A study can only be as good as the data

Introduction

• Where does Quality Data start?Research design stage– Questionnaire design/Data Collection

Tool/Research Methods used– Data collection– Data entry management– Data cleaning

Data Quality Check Stages

Field Data Collection• Data collection tool design. • Personnel training and research objectivity• Pretest/Piloting• Back check (field audit)- going back to the same

sample surveyed. • Identify the source of error (respondent,

enumerator, you)• Take the necessary action

Field Data Collection

• Enumerators/ Data collection clerks trainingo Research objective/targeto Survey Tool Contents – Masteredo “Survey interviewing is a story that needs to following in a directed manner and with a objective”

o What and how to ask for exact datao Survey questions shouldn’t be altered (even on

translation to local language)o Flow of sections and content on each mastered,o Follow survey instruction well.o “Given 6 hours to cut down a tree”; 4Hours-Sharpening tools (axe), 2 hours cutting

Field Data Collection

• Back Checks/ Field Audit• Proposed protocols 5%-10%, random across

the team• Every team and every surveyor is back

checked as soon as possible• Compare results and act accordingly

Back Check protocol

• How to do a back check?Develop a plan before you start surveyingSelect your back check questionsSelect your back check teamExecuting the back checkDealing with the resultsBack checks in the context of electronic surveyingSee Back Check manual for Specific details on each

of these steps.

Data Quality Check Stages

Research design

Data collection tool

Rigorous training and pretest

Data Quality Collection

Field Audit/Back Check

Sit ins/Spot Checks

Physical editing

Structuring data collection protocols

What Next?……

Managing the Data Entry/Capture Process

Data Entry Level

• Field data collection stage done!• We collect tons of Data through surveys. • How do we convert them to a form which we

can analyze?• Simple Answer: Create data sets• For both Paper bases surveys and Digital data

Collection(DDC)/Computer Assisted Interviewing(CAI))

Data Entry Level

Surveys

type, type, type…

Data!

…but is it?

DISASTER is just waiting to happen

1. Unorganized surveys. Misplaced an entire village. Lost data.

2. Sent data to another project site. Truck crashed. Lost data.

3. Server crashed. No backup. Lost data.

4. No one checked data quality. Turns out, there’s no ID variable. Lost data.

5. No one monitored data entry contractor. Turns out, they copy + pasted data and changed the IDs. Lost data.

- Lost Surveys

- Blank Surveys Lost

Data

(implication on the

power of results)

Rules for Data Entry

• Double Blind Entry• Enter PII separately & encrypt• Two Unique Identifiers• Data Cleaning

Double data entry

First Entry Second Entry

Double data entry

• The gold standard for professional data entry. (What is collected in what is coded/entered)

• The two data sets are compared, differences are examined and corrections are made.

• “Garbage in- Garbage out.” Don’t enter garbage data. If you want any analysis of your data to be valid, your data itself must be valid.

• Specific program designed for data entry (CsPro, Ms Access/MySQL, Excel SPSS, Epi Info, Epi Data etc.), ensure double (blind) data entry is done

Double Data entry Flow

1st Entry 2nd Entry

Discrepancies

Reconciliation

Questionnaire

Final Dataset

If Stata, cfout, readreplace, cfbyCsPro and Access

Data Audit (3rd entry) is done after this and normally accepted error rate is 0.5%

Error rate committed as a result of data entry is 1.25% (10 entries

conflicting)

Data Entry Level

o Double entered and verified…..What do you do next?

o Data Cleaningo There is no one standard cleaning process, but

it is very common to do the following tasks on every dataset;

Standard Data Cleaning Procedures

a) Labeling variables and labeling variable values (scale response or pre-coded responses)

STATA

R

b) Unique Identifiers, Skip Patterns Check (data logical tests). Maintain code book!

Standard Data Cleaning Procedures

b) Unique Identifiers, Skip Patterns Check (data logical tests)

Standard Data Cleaning Procedures

b) Unique Identifiers, Skip Patterns Check (data logical tests). Advance

Standard Data Cleaning Procedures

c) Unique Identifiers, Skip Patterns Check (data logical tests). Advance- Splitting

Standard Data Cleaning Procedures

Example Data

d) *Massaging* data; Used for data cleaning and analysis (extracting datasets)o Reshaping, o Collapsing, o Merging or o Appending datasets

Standard Data Cleaning Procedures

Data Cleaning Scripts

Indicators Extraction

and Analysis

Database/Data on Server

Data Inconsistencies/ Errors in the data to be corrected

Data Quality

Quality Data

Storage, sharing and archiving

Help

RMG

Thank You !

http://worldagroforestry.org/research-methodswww.worldagroforestry.org