exploratory data analysis with r - nccu

63
Exploratory Data Analysis with R @MatthewRenze #PrDC16

Upload: others

Post on 03-Oct-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exploratory Data Analysis with R - NCCU

Exploratory Data Analysis with R

@MatthewRenze

#PrDC16

Page 2: Exploratory Data Analysis with R - NCCU

Motivation

The ability to take data—to be able to understand it, toprocess it, to extract value from it, to visualize it, tocommunicate it—that’s going to be a hugely importantskill in the next decades, … because now we really dohave essentially free and ubiquitous data. So thecomplimentary scarce factor is the ability to understandthat data and extract value from it.

Hal Varian, Google’s Chief EconomistThe McKinsey Quarterly, Jan 2009

Page 3: Exploratory Data Analysis with R - NCCU

Source: Dice 2014 Tech Salary Survey Results

Page 4: Exploratory Data Analysis with R - NCCU

A Flood of Data is Coming...

Sink or Swim

Source: http://www.dot.gov.nt.ca/ Source: Wikipedia

Page 5: Exploratory Data Analysis with R - NCCU

Overview

• Introduction to R

• Data Munging

• Descriptive Statistics

• Data Visualization

• Beyond R and EDA

Page 6: Exploratory Data Analysis with R - NCCU

How Does This Apply to Me?

• As a software developer, I often: Perform log file analysis

Analyze software performance

Analyze code metrics for code quality

Detect anomalies in source data

Transform or clean data files to make them usable

Help decision makers make decisions based on data

Page 7: Exploratory Data Analysis with R - NCCU

About Me

• Independent software consultant

• Education• B.S. in Computer Science

• B.A. in Philosophy

• Community• Pluralsight Author

• ASPInsider

• Public Speaker

• Open-Source Software

Page 8: Exploratory Data Analysis with R - NCCU

SPONSORS

Page 9: Exploratory Data Analysis with R - NCCU

Introduction to R

Page 10: Exploratory Data Analysis with R - NCCU

What is R?

• Open source

• Language and environment

• Numerical and graphical analysis

• Cross platform

Page 11: Exploratory Data Analysis with R - NCCU

What is R?

• Active development

• Large user community

• Modular and extensible

• 6700+ extensions

and best of all…

Page 12: Exploratory Data Analysis with R - NCCU

FREE

Page 13: Exploratory Data Analysis with R - NCCU

FREE

Page 14: Exploratory Data Analysis with R - NCCU

Source: http://redmonk.com/sogrady/2014/06/13/language-rankings-6-14/

Page 15: Exploratory Data Analysis with R - NCCU
Page 16: Exploratory Data Analysis with R - NCCU

Source: www.rstudio.com/ide/

Page 17: Exploratory Data Analysis with R - NCCU
Page 18: Exploratory Data Analysis with R - NCCU

Code Demo

Page 19: Exploratory Data Analysis with R - NCCU

Data Munging

Page 20: Exploratory Data Analysis with R - NCCU

Data Munging

• Transforming data

• Raw data to usable data

• Data must be cleaned first

Page 21: Exploratory Data Analysis with R - NCCU

Data Munging Tasks

• Renaming variables

• Data type conversion

• Encoding, decoding or recoding data

• Merging data sets

• Transforming data

• Handling missing data (imputing)

• Handling anomalous values

Page 22: Exploratory Data Analysis with R - NCCU

Loading Data in R

• File-based data

• Web-based data

• Databases

• Statistical data

• And many more…

CSV XML

Page 23: Exploratory Data Analysis with R - NCCU

Cleaning Data

• This step is often the:• Most difficult

• Most time consuming

• TIP: Record all steps

Page 24: Exploratory Data Analysis with R - NCCU
Page 25: Exploratory Data Analysis with R - NCCU
Page 26: Exploratory Data Analysis with R - NCCU

• Column with wrong name

• Rows with missing values

• Runtime column has units

• Revenue in multiple scales

• Wrong file format

Page 27: Exploratory Data Analysis with R - NCCU

Code Demo

Page 28: Exploratory Data Analysis with R - NCCU
Page 29: Exploratory Data Analysis with R - NCCU

Descriptive Statistics

Page 30: Exploratory Data Analysis with R - NCCU

Descriptive Statistics

• Describe data

• Provides a summary

• aka: Summary statistics

Movie Runtime

Statistic Value (minutes)

Minimum 38

1st Quartile 93

Median 101

Mean 104

3rd Quartile 113

Maximum 219

Page 31: Exploratory Data Analysis with R - NCCU

Statistical Terms

• Observations

• Variables

• Qualitative variable

• Quantitative variable

ID Date Customer Product Quantity

1 2015-08-27 John Pizza 2

2 2015-08-27 John Soda 2

3 2015-08-27 Jill Salad 1

4 2015-08-27 Jill Milk 1

5 2015-08-28 Miko Pizza 3

6 2015-08-28 Miko Soda 2

7 2015-08-28 Sam Pizza 1

8 2015-08-28 Sam Milk 1

Page 32: Exploratory Data Analysis with R - NCCU

Types of Analysis

• Number of variables • Univariate

• Bivariate

• Multivariate

• Type of variables• Qualitative

• Quantitative

QualitativeUnivariate

Analysis

QuantitativeUnivariate

Analysis

QualitativeBivariateAnalysis

QuantitativeBivariateAnalysis

Qualitative & Quantitative

BivariateAnalysis

Type of Variable(s)

Nu

mb

er

of

Var

iab

les

MultivariateAnalysis

Page 33: Exploratory Data Analysis with R - NCCU

Univariate Analysis

• One variable

• Qualitative• Frequency

• Quantitative• Central tendency

• Dispersion

Page 34: Exploratory Data Analysis with R - NCCU

Bivariate Analysis

• Qualitative• Joint frequency

• Quantitative• Two variables

• Predictor

• Outcome

• Measures• Covariance

• Correlation

Page 35: Exploratory Data Analysis with R - NCCU
Page 36: Exploratory Data Analysis with R - NCCU

Cowboys &

The Musical

Space Invaders:

Extended Edition

Page 37: Exploratory Data Analysis with R - NCCU

Code Demo

Page 38: Exploratory Data Analysis with R - NCCU

Cowboys &

The Musical

Space Invaders:

Extended Edition

Page 39: Exploratory Data Analysis with R - NCCU

Data Visualization

Page 40: Exploratory Data Analysis with R - NCCU

Data Visualization

• Visual data representation

• For human pattern recognition

• Map dimensions to visual characteristics

Page 41: Exploratory Data Analysis with R - NCCU

Data Visualization

ID Date Customer Product Quantity

1 2015-08-27 John Pizza 2

2 2015-08-27 John Soda 2

3 2015-08-27 Jill Salad 1

4 2015-08-27 Jill Milk 1

5 2015-08-28 Miko Pizza 3

6 2015-08-28 Miko Soda 2

7 2015-08-28 Sam Pizza 1

8 2015-08-28 Sam Milk 1

Page 42: Exploratory Data Analysis with R - NCCU

Types of Data Visualizations

• Number of variables • Univariate

• Bivariate

• Multivariate

• Type of variable(s)• Qualitative

• Quantitative

QualitativeUnivariate

Analysis

QuantitativeUnivariate

Analysis

QualitativeBivariateAnalysis

QuantitativeBivariateAnalysis

Qualitative & Quantitative

BivariateAnalysis

Type of Variable(s)

Nu

mb

er

of

Var

iab

les

MultivariateAnalysis

Page 43: Exploratory Data Analysis with R - NCCU

Cowboys &

The Musical

Space Invaders:

Extended Edition

Page 44: Exploratory Data Analysis with R - NCCU

Code Demo

Page 45: Exploratory Data Analysis with R - NCCU

Feature Length PG

Warlordof the

Rings :

The

An Unexpected Adventure

Page 46: Exploratory Data Analysis with R - NCCU

Beyond R and EDA

Page 47: Exploratory Data Analysis with R - NCCU

This is just the tip of the iceberg!This is just the tip of the iceberg!

Page 48: Exploratory Data Analysis with R - NCCU

Advanced Data Analysis with R

• Cluster Analysis

• Statistical Modeling

• Dimensionality Reduction

• Analysis of Variance (ANOVA)

Page 49: Exploratory Data Analysis with R - NCCU

Source: Nathan Yau (www.flowingdata.com)

Page 50: Exploratory Data Analysis with R - NCCU

Data Mining and Machine Learning with R

Page 51: Exploratory Data Analysis with R - NCCU

Alternatives to R for EDA

Page 52: Exploratory Data Analysis with R - NCCU

Source: Microsoft

Page 53: Exploratory Data Analysis with R - NCCU
Page 54: Exploratory Data Analysis with R - NCCU
Page 55: Exploratory Data Analysis with R - NCCU
Page 56: Exploratory Data Analysis with R - NCCU
Page 57: Exploratory Data Analysis with R - NCCU
Page 58: Exploratory Data Analysis with R - NCCU

Where to Go Next…

• R website: http://www.cran.r-project.org

• R Studio: http://www.rstudio.com

• Revolutions: http://blog.revolutionanalytics.com

• Flowing Data: http://flowingdata.com

• R-Blogger: http://www.r-bloggers.com

Page 59: Exploratory Data Analysis with R - NCCU

www.pluralsight.com/authors/matthew-renze

Page 60: Exploratory Data Analysis with R - NCCU

Conclusion

Page 61: Exploratory Data Analysis with R - NCCU

Conclusion

• Introduction to R

• Data munging

• Descriptive statistics

• Data visualization

• Beyond R & EDA

Page 62: Exploratory Data Analysis with R - NCCU

Feedback

• Feedback is very important to me!

• One thing you liked?

• One thing I could improve?

Page 63: Exploratory Data Analysis with R - NCCU

Contact Info

Matthew Renze

• Twitter: @matthewrenze

• Email: [email protected]

• Website: www.matthewrenze.com