data science mis0855 | spring 2016 data cleansing david schuff

5
DATA SCIENCE MIS0855 | Spring 2016 Data Cleansing David Schu David.Schuff@temple http://community.mis.temple.edu/dsc

Upload: kenneth-fox

Post on 17-Jan-2018

226 views

Category:

Documents


0 download

DESCRIPTION

Cleaning Data Consider this Excel spreadsheet of sales in Pennsylvania, New Jersey, and Delaware for the years 2009 through Identify two problems with this data set.

TRANSCRIPT

Page 1: DATA SCIENCE MIS0855 | Spring 2016 Data Cleansing David Schuff

DATA SCIENCEMIS0855 | Spring 2016Data Cleansing

David [email protected]

http://community.mis.temple.edu/dschuff

Page 2: DATA SCIENCE MIS0855 | Spring 2016 Data Cleansing David Schuff

Discuss (5 minutes)Have you fallen victim to any of Taber’s “stupid data corruption tricks?”

From the readings, what are the best tips for cleaning data?

Page 3: DATA SCIENCE MIS0855 | Spring 2016 Data Cleansing David Schuff

Cleaning DataConsider this Excel spreadsheet of sales in Pennsylvania, New Jersey, and Delaware for the years 2009 through 2013.

Identify two problems with this data set.

Page 4: DATA SCIENCE MIS0855 | Spring 2016 Data Cleansing David Schuff

And the problems show up during analysis…

How do you find the “errors” and fix them?

Page 5: DATA SCIENCE MIS0855 | Spring 2016 Data Cleansing David Schuff

The problem of outliers

Do you correct this by…• Removing the data point?• Using the average of the other data points?• Guessing at the right value?And is this an error or just an anomaly?