5/21/2014 d ata p reparation and p rofiling : s trategies, challenges, and experiences t im n orris...

16
5/21/2014 DATA PREPARATION AND PROFILING: STRATEGIES, CHALLENGES, AND EXPERIENCES TIM NORRIS AND MARK LUNDGREN

Upload: fay-boone

Post on 05-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN

5/21/2014

DATA PREPARATION AND PROFILING:

STRATEGIES, CHALLENGES, AND EXPERIENCESTIM NORRIS AND MARK LUNDGREN

Page 2: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN

5/21/2014

TODAYS AGENDA

• Introductions•Date Profiling and Readiness•Lessons Learned•Future Direction

Page 3: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN

5/21/2014

ABOUT THE P20W DATA WAREHOUSE

• Statewide longitudinal data system • De-identified data about people's early childhood, Kindergarten through 12th

grade, higher education and workforce experiences and performances

• Collected and linked from existing state agency data systems.

• It includes data about the kinds of services they receive, programs in which they participate, and their academic performance and program or degree completion.

• It also includes a variety of demographic data so we are able to look at a variety of different groups of people.

• Personally identifiable information, such as names, social security numbers, addresses, and other data which can identify a person as an individual, are not part of the research database.

Page 4: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN

4

ECEAP students

K-12 students

K-12 teachers

CTC students

Baccalaureate students

National Student Clearinghouse

Workforce

IPEDS Financial

Data Sources

data

Data Management, Governance

Standards, confidentiality, security

Critical questions

Data dictionary, matching,

longitudinal linking, cross-sector derived

elements

P-20/W datasets

ERDC

Research

Data to partner agencies

PCHEES

Collaborative research

Ad-hoc requests (data and research) for

partners and legislature

LEAP

External requests for data

Feedback reports (behalf of agencies)

Output

OFM

Page 5: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN

5/21/2014

DATA FLOW PROCESS

•Chart of data flow goes here

Page 6: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN

5/21/2014

DATA SOURCE CHARACTERISTICS

•Over 20 source data feeds•Data systems being developed in

parallel•Some migrated historic data,

some didn’t

Page 7: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN

5/21/2014

DATA PREPARATION: DATA PROFILING

•Do it early, do it often•Verification of data dictionary•Descriptive statistics•Distinct counts and percentages• Zero, blanks and nulls•Minimum and maximum values• Patterns of data

Page 8: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN

5/21/2014

DATA PREPARATION: DATA PROFILING

• Dataset validation checks• Counts of records by time, institution

• Values and codes over time• Systematic changes (0,1 to Y,N)• Values defined in data dictionary• Quality of data• Names and identifiers• Data elements

Page 9: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN

5/21/2014

DATA PREPARATION: DATA PROFILING

• Toolset varied by analyst• SAS• Informatica Data Analyst• Excel

• Goal of understanding the data• Constraints• Completeness, patterns over time• Values of each data element

Page 10: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN

5/21/2014

DATA PREPARATION: DATA READINESS

•Document and expand results of profiling process•Generate the “goto” resource for

follow-up question•Resource to begin data loading• Content that feeds the data

dictionary

Page 11: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN

5/21/2014

DATA PREPARATION: DATA READINESS

• Information about:•Data provider•Data file•Data elements

Page 12: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN

5/21/2014

READINESS CONTENT ITEMSDataset elements Data element

Number of records Name and description

Years Provided Acceptable values

Primary key Data format/length

Business owner and steward Business rules

Update frequency Identity matching flag

Extract process Field/record level data rules

Known issues Security category

Dataset level rules Notes

Page 13: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN

5/21/2014

DATA READINESS TEMPLATE

• s

Page 14: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN

5/21/2014

WHAT WE’VE LEARNED

• Customers need to be involved•Dictionaries don’t match data• Educate our analyst on the data,

the customer on the vision of the database•Avoid custom extracts•More time required up front

Page 15: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN

5/21/2014

TOWARD THE FUTURE

• Empower the provider by offering guidance and tools for profiling•Develop feedback process of data

quality and edits back to customer•Open and transparent

Page 16: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN

5/21/2014

QUESTIONS?