machine learning- key concepts

33
Machine learning Applications, types and key concepts

Upload: amir-ziai

Post on 12-Jan-2017

265 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Machine learning- key concepts

Machine learningApplications, types and key concepts

Page 2: Machine learning- key concepts

● What is machine learning?

● Applications

● Types

● Terminology

● Key concepts

Outline

Page 3: Machine learning- key concepts

Next classes1. Key concepts

2. A tour of machine learning (linear algebra, probability theory, calculus)

3. Machine learning pipelines (pre-processing, model training, and evaluation in Python and Scala)

4. Machine learning case studies (Python and Scala examples)

a. Sentiment analysis (Natural Language Processing, NLTK)

b. Spam classifier

c. Stock price prediction (regression)

d. Image recognition, deep learning (TensorFlow, keras)

e. Recommendation engine

5. Machine learning at scale (algorithms, linear algebra, probability, Spark MLLib, Vowpal Wabbit, scikit-learn)

Page 4: Machine learning- key concepts

Next classes

Key concepts Tour Pipelines Case studies Scale

Concepts ০০০ ০ ০০ ০০ ০০০

Code ০ ০০ ০০০ ০০০ ০০০

Math/stats ০ ০০০ ০ ০ ০০০

Page 5: Machine learning- key concepts

What is machine learning?● Learn from data (past experiences)

● Generalize (find the signal/pattern)

● Predict, forward looking

● Observational data

Page 6: Machine learning- key concepts

Relationship to data science and deep learning

What is machine learning?

Data Science

ML

DL

Page 7: Machine learning- key concepts

Applications● Autonomous cars

● Siri

● Facial recognition

● People who bought this also bought...

● Spam filters

● Targeted advertising

● ...

Page 8: Machine learning- key concepts

Types of machine learning● Supervised learning

○ Classification

○ Regression

● Unsupervised learning

● Reinforcement learning

Page 9: Machine learning- key concepts

Types of machine learning

Supervised Unsupervised Reinforcement learning

Cancer diagnosisStock market predictionCustomer churnRecommendation engineAnomaly detection

Dimensionality reductionClusteringPageRankAnomaly detection

Self-driving carsAlphaGo

Page 10: Machine learning- key concepts

(Linear) Regression● Predict a continuous variable (e.g. price)

● Y=mx+b

● Ordinary Least Squares

● Analytical solution

● Geometric model

Page 11: Machine learning- key concepts

(Linear) Regression● Can use multiple variables

(multi-variate regression)

● Relationships are not always linear

Page 12: Machine learning- key concepts

(Linear) Regression example● Boston housing dataset

● Median value of houses (MV)

vs. average # rooms (RM)

from sklearn.linear_model import LinearRegressionmodel = LinearRegression()x, y = housing[['RM']], housing['MV']model.fit(x, y)model.score(x, y)

R2=0.48

Page 13: Machine learning- key concepts

(Linear) Regression example● Boston housing dataset

● Median value of houses (MV)

vs. average # rooms (RM),

and industrial zoning proportions (INDUS)

from sklearn.linear_model import LinearRegressionmodel = LinearRegression()x, y = housing[['RM', ‘INDUS’]], housing['MV']model.fit(x, y)model.score(x, y)

R2=0.53

Page 14: Machine learning- key concepts

(Linear) Regression example● Intuition breaks down in high-dimensions (>3)

● Interpretability goes down

● Real-world data is usually non-linear

Page 15: Machine learning- key concepts

Terminology● Feature (a.k.a. input, variable, predictor, explanatory, independent variable)

● Output (a.k.a. target, label, class, dependent variable)

● Training instance (aka observation, row)

● Training dataset

● Training (a.k.a. learning, modeling, fitting)

● Model validation and testing

RM INDUS ZN ... MV

6.575 2.31 18.0 ... 24.0

6.421 7.07 0.0 ... 21.6

... ... ... ... ...

Page 16: Machine learning- key concepts

Terminology● Feature (a.k.a. input, variable, predictor, explanatory, independent variable)

● Output (a.k.a. target, label, class, dependent variable)

● Training instance (aka observation, row)

● Training dataset

● Training (a.k.a. learning, modeling, fitting)

● Model validation and testing

RM INDUS ZN ... MV

6.575 2.31 18.0 ... 24.0

6.421 7.07 0.0 ... 21.6

... ... ... ... ...

Page 17: Machine learning- key concepts

Terminology● Feature (a.k.a. input, variable, predictor, explanatory, independent variable)

● Output (a.k.a. target, label, class, dependent variable)

● Training instance (aka observation, row)

● Training dataset

● Training (a.k.a. learning, modeling, fitting)

● Model validation and testing

RM INDUS ZN ... MV

6.575 2.31 18.0 ... 24.0

6.421 7.07 0.0 ... 21.6

... ... ... ... ...

Page 18: Machine learning- key concepts

Terminology● Feature (a.k.a. input, variable, predictor, explanatory, independent variable)

● Output (a.k.a. target, label, class, dependent variable)

● Training instance (aka observation, row)

● Training dataset

● Training (a.k.a. learning, modeling, fitting)

● Model validation and testing

RM INDUS ZN ... MV

6.575 2.31 18.0 ... 24.0

6.421 7.07 0.0 ... 21.6

... ... ... ... ...

Page 19: Machine learning- key concepts

Terminology● Feature (a.k.a. input, variable, predictor, explanatory, independent variable)

● Output (a.k.a. target, label, class, dependent variable)

● Training instance (aka observation, row)

● Training dataset

● Training (a.k.a. learning, modeling, fitting)

● Model validation and testing

RM INDUS ZN ... MV

6.575 2.31 18.0 ... 24.0

6.421 7.07 0.0 ... 21.6

... ... ... ... ...

Page 20: Machine learning- key concepts

Terminology● Feature (a.k.a. input, variable, predictor, explanatory, independent variable)

● Output (a.k.a. target, label, class, dependent variable)

● Training instance (aka observation, row)

● Training dataset

● Training (a.k.a. learning, modeling, fitting)

● Model validation and testing

RM INDUS ZN ... MV

6.575 2.31 18.0 ... 24.0

6.421 7.07 0.0 ... 21.6

... ... ... ... ...

Page 21: Machine learning- key concepts

Statistical learning● The true underlying function is not known

● Usually can’t observe all features (e.g. policy impact, global trends, etc.)

● Most interesting phenomenon are neither deterministic, nor stationary

● No guarantee that a set of variables is predictive of the outcome

Machine learning territory100% deterministicF=ma

100% stochasticCoin flip

Page 22: Machine learning- key concepts

Classification● Target variable, qualitative, classes

● Binary classification

Cancer patientPositive class (class of interest)

Healthy patientNegative class

Page 23: Machine learning- key concepts

Classification● Linear vs. non-linear decision boundaries

● Model complexity, training time, and latency

Page 24: Machine learning- key concepts

Bias

Cancer patientPositive class (class of interest)

Healthy patientNegative class

Page 25: Machine learning- key concepts

Bias

Cancer patientPositive class (class of interest)

Healthy patientNegative class

Page 26: Machine learning- key concepts

Bias

Page 27: Machine learning- key concepts

Variance/overfit● Learning the wrong things, memorizing

● Modeling the noise and not the signal

○ Model 1- if GPA > 3.8 and hours studied>5 then passed

○ Model 2- if student ID != 2 then passed

○ New record: StudentID = 4, Hours Studied = 5.5, GPA = 3.82, passed?

Student ID Hours studied GPA ... Passed

1 10 4.00 Yes

2 0 2.71 No

3 6 3.95 Yes

Page 28: Machine learning- key concepts

Bias/variance

Page 29: Machine learning- key concepts

Guarding against overfitting● Split into train, validation and test

● Cross-validation

Page 30: Machine learning- key concepts

SummaryIncreasing model complexity generally:

● Increases model fit

● Decreases interpretability

● Increases chance of overfitting

● Increases training time

● Increases model latency

Page 31: Machine learning- key concepts

Remember that...

Page 32: Machine learning- key concepts

Next class: a tour of machine learning

Page 33: Machine learning- key concepts

Preparation for next class1- Test your understanding: http://bit.ly/mlseries1

2- Check this out: Visual intro to ML

Want to pursue machine learning more seriously?

● Read A few useful things to know about machine learning

● Theory and intuition, Python Machine Learning book

● Hands-on experience, Kaggle (start with titanic)

● Elements of statistical learning (advanced)