what is data science?olgabaysal.com/.../winter17/slides/lecture1_data_science.pdfwhat data...

22
What is Data Science?

Upload: others

Post on 03-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: What is Data Science?olgabaysal.com/.../winter17/slides/lecture1_data_science.pdfWhat data scientists do • Ask a question • Get relevant data • Prepare data for analysis - outliers,

What is Data Science?

Page 2: What is Data Science?olgabaysal.com/.../winter17/slides/lecture1_data_science.pdfWhat data scientists do • Ask a question • Get relevant data • Prepare data for analysis - outliers,

Business efficiency: Wal-Mart

http://www.nytimes.com/2004/11/14/business/yourmoney/14wal.html

Page 3: What is Data Science?olgabaysal.com/.../winter17/slides/lecture1_data_science.pdfWhat data scientists do • Ask a question • Get relevant data • Prepare data for analysis - outliers,

Business Marketing: Target

http://tinyurl.com/7jbntx3

Page 4: What is Data Science?olgabaysal.com/.../winter17/slides/lecture1_data_science.pdfWhat data scientists do • Ask a question • Get relevant data • Prepare data for analysis - outliers,

• In October 2006 Netflix held a competition for the best

algorithm to predict user ratings of movies.

• The winner must improve Netflix’ own algorithm (Cinematch) by at

least 10%

• Award was given in September 2009

• Based on Collaborative Filtering

• Difficult movies to predict:

“Napoleon Dynamite” ,“Lost in Translation”, “Fahrenheit 9/11”, “Kill Bill: Volume 1”

http://www2.research.att.com/~volinsky/netflix/bpc.html

Recommendations:

Page 5: What is Data Science?olgabaysal.com/.../winter17/slides/lecture1_data_science.pdfWhat data scientists do • Ask a question • Get relevant data • Prepare data for analysis - outliers,

Sports Analytics

Page 6: What is Data Science?olgabaysal.com/.../winter17/slides/lecture1_data_science.pdfWhat data scientists do • Ask a question • Get relevant data • Prepare data for analysis - outliers,

Beyond Moneyball: The defensive shift

http://www.sporttechie.com/2014/11/11/sports/mlb/beyond-moneyball-how-big-data-is-changing-baseball/

Page 7: What is Data Science?olgabaysal.com/.../winter17/slides/lecture1_data_science.pdfWhat data scientists do • Ask a question • Get relevant data • Prepare data for analysis - outliers,

Lesson for Data Scientists:

- Question your assumptions (be especially skeptical when predicting a rare event with limited history using human behavior.

- Examine data quality - in this election polls were not reaching all likely voters - Beware of your own biases: many pollsters were likely Clinton supporters and did not want to question the results that favored their candidate

Page 8: What is Data Science?olgabaysal.com/.../winter17/slides/lecture1_data_science.pdfWhat data scientists do • Ask a question • Get relevant data • Prepare data for analysis - outliers,

• Physician John

Snow links the

outbreak to a

contaminated

well by plotting

number of

cases on a map

• Started the

science of

epidemiology

Cholera outbreak in London 1854

Page 9: What is Data Science?olgabaysal.com/.../winter17/slides/lecture1_data_science.pdfWhat data scientists do • Ask a question • Get relevant data • Prepare data for analysis - outliers,

a.k.a. Domesday Book

• Commissioned in 1085 by

William the Conqueror

• Record of the Great

Survey of England

• Last used to settle dispute

in court in the 1960s!

http://www.domesdaybook.co.uk/

The Book of Winchester (1086)

Page 10: What is Data Science?olgabaysal.com/.../winter17/slides/lecture1_data_science.pdfWhat data scientists do • Ask a question • Get relevant data • Prepare data for analysis - outliers,

What problems were solved?

• Engineering: design of machines

• Sciences: formulation of theories

How were problems solved?

• Empirically

• Theories

• Computation

Data in the 20th century

Page 11: What is Data Science?olgabaysal.com/.../winter17/slides/lecture1_data_science.pdfWhat data scientists do • Ask a question • Get relevant data • Prepare data for analysis - outliers,

Data in the 21st Century

How is today different?

• More data is available

• More data is digital

• More data is observed, rather than

generated by a designed experiment

Page 12: What is Data Science?olgabaysal.com/.../winter17/slides/lecture1_data_science.pdfWhat data scientists do • Ask a question • Get relevant data • Prepare data for analysis - outliers,

Data in the 21st Century

What problems are solved today?

• Spell checking

• Face recognition

• Sentiment analysis

• Optimal routing

• High-frequency trading algorithms

• just to name a few …

Page 13: What is Data Science?olgabaysal.com/.../winter17/slides/lecture1_data_science.pdfWhat data scientists do • Ask a question • Get relevant data • Prepare data for analysis - outliers,

Data in the 21st Century

How are problems solved today?

• Empirically

• Theories

• Computation

• Data exploration

http://research.microsoft.com/en-us/collaboration/fourthparadigm/

Page 14: What is Data Science?olgabaysal.com/.../winter17/slides/lecture1_data_science.pdfWhat data scientists do • Ask a question • Get relevant data • Prepare data for analysis - outliers,

For Example

Network security:

• 20th century: based on rules and signatures

• 21st century: data mining traffic logs

http://www.bro.org/

Artificial Intelligence:

VS.

Page 15: What is Data Science?olgabaysal.com/.../winter17/slides/lecture1_data_science.pdfWhat data scientists do • Ask a question • Get relevant data • Prepare data for analysis - outliers,

IBM Watson: The Jeopardy Challenge

Not everything is perfect!

ITS LARGEST AIRPORT IS NAMED FOR A WORLDWAR II HERO: ITS SECOND LARGEST, FOR A WORLD WAR II BATTLE.

Category: U.S. Cities

Page 16: What is Data Science?olgabaysal.com/.../winter17/slides/lecture1_data_science.pdfWhat data scientists do • Ask a question • Get relevant data • Prepare data for analysis - outliers,

A good question

So, what is data science?

Page 17: What is Data Science?olgabaysal.com/.../winter17/slides/lecture1_data_science.pdfWhat data scientists do • Ask a question • Get relevant data • Prepare data for analysis - outliers,

Who are the Data Scientists?https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/

Skills:

• Make discoveries while swimming in data

• Don’t allow technical limitations to bog down solutions

• Often fashion their own tools

• Skilled in storytelling with data

Some data-driven companies:

Google, Wal-Mart, Twitter, LinkedIn, Amazon

Page 18: What is Data Science?olgabaysal.com/.../winter17/slides/lecture1_data_science.pdfWhat data scientists do • Ask a question • Get relevant data • Prepare data for analysis - outliers,

What data scientists do

• Ask a question• Get relevant data• Prepare data for analysis

- outliers, missing values, incorrect values• Explore data

- understand the world as it is (was)• Statistical model

- estimate/train and validate model- predict what will (likely) happen

• Communicate results- tell a story- recommend

Page 19: What is Data Science?olgabaysal.com/.../winter17/slides/lecture1_data_science.pdfWhat data scientists do • Ask a question • Get relevant data • Prepare data for analysis - outliers,

The Data Science Process

Data Extraction

Exploratory Data Analysis

Machine Learning,Statistical Models

Data Cleaning

Communicate andReport Findings

Build Data Product

Page 20: What is Data Science?olgabaysal.com/.../winter17/slides/lecture1_data_science.pdfWhat data scientists do • Ask a question • Get relevant data • Prepare data for analysis - outliers,

Data Scientist skills

• Computer science- programming, hacking skills

• Statistics- probability, distributions, modelling

• Mathematics- linear algebra, calculus, optimization

• Domain expertise- storytelling, pose question, interpret result

• Communication- presentation, data visualization

Page 21: What is Data Science?olgabaysal.com/.../winter17/slides/lecture1_data_science.pdfWhat data scientists do • Ask a question • Get relevant data • Prepare data for analysis - outliers,

Drew Conway’s Venn diagram

• Real world motivating questions• Hypothesis Testing

• Extract insight• Familiarity with statistical

tools• Understand algorithms • Interpret results

http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

• Acquire and clean data• Text file manipulation• Think algorithmically

Page 22: What is Data Science?olgabaysal.com/.../winter17/slides/lecture1_data_science.pdfWhat data scientists do • Ask a question • Get relevant data • Prepare data for analysis - outliers,

IBM Predictive Analytics for Asset Management

https://www.youtube.com/watch?v=b9LrXxG5SjY