data science mis0855 | spring 2016 data cleansing david schuff
DESCRIPTION
Cleaning Data Consider this Excel spreadsheet of sales in Pennsylvania, New Jersey, and Delaware for the years 2009 through Identify two problems with this data set.TRANSCRIPT
DATA SCIENCEMIS0855 | Spring 2016Data Cleansing
David [email protected]
http://community.mis.temple.edu/dschuff
Discuss (5 minutes)Have you fallen victim to any of Taber’s “stupid data corruption tricks?”
From the readings, what are the best tips for cleaning data?
Cleaning DataConsider this Excel spreadsheet of sales in Pennsylvania, New Jersey, and Delaware for the years 2009 through 2013.
Identify two problems with this data set.
And the problems show up during analysis…
How do you find the “errors” and fix them?
The problem of outliers
Do you correct this by…• Removing the data point?• Using the average of the other data points?• Guessing at the right value?And is this an error or just an anomaly?