dataframevalidation in python · 2018-06-08 · tdda applying test driven development (tdd)...

20
DataFrame Validation In Python

Upload: others

Post on 14-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,

DataFrame Validation In Python

Page 2: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,

Sounds Familiar? Credit: Anaconda, Inc.

AnacondaCON 2018

https://www.youtube.com/watch?v=UXd0EDy7aTY

Page 3: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,

About Me

Page 4: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,

MotivationWhy do we need data validation?

Page 5: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,

Data Quality Dimensions

Valid Accurate Complete

Consistent Uniform Unique

Page 6: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,

It can happen to all of us

Page 7: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,

1 Perfect World

Page 8: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,

1 Perfect World

Page 9: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,

2 Model Deterioration

Page 10: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,

3 Accidental Discovery

Page 11: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,

4 Ignorance Is Bliss

Page 12: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,

Let ’s See Some Tools?

Page 13: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,

VoluptuousVoluptuous is a Python data validation library.

• Simplicity.• Support for complex data structures.• Useful error messages.

Page 14: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,

Engarde

• Great for flat files like csv

•• As decorators, which are most useful in .py scripts• Interactively at the interpreter

https://github.com/TomAugspurger/engarde

Page 15: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,

TDDAApplying Test Driven Development (TDD) principals to data analysis.

• Correctness

• Regression detection

• Specification, Design and Documentation

• Refactoring

• Portability

Test Driven Data Analysis

Page 16: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,
Page 17: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,
Page 18: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,

Credit - Practical Data Cleaning with Python

http://kjamistan.com/

Page 19: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,

“Quality is never an accident; it is always the result of intelligent effort.”

─ John Rusk in

Page 20: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,

https://github.com/pyotam/Dataframe-Validation