machine learning, hype or hit?

44
ANP126 Machine Learning: Hype or Hit? Fred Verheul

Upload: fredverheul

Post on 21-Apr-2017

828 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Machine Learning, hype or hit?

ANP126Machine Learning: Hype or Hit?Fred Verheul

Page 2: Machine Learning, hype or hit?

2

Agenda

1. Introduction: Hype or Hit?!

2. Machine Learning

3. Demo, SAP ICN

4. Skill set for aspiring ML experts

5. Take-aways

Page 3: Machine Learning, hype or hit?

3

Agenda

1. Introduction: Hype or Hit?!

2. Machine Learning

3. Demo, SAP ICN

4. Skill set for aspiring ML experts

5. Take-aways

Page 4: Machine Learning, hype or hit?

4

Machine Learning

"Field of study that gives computers the ability to learnwithout being explicitly programmed” (Arthur Samuel, 1959)

Page 5: Machine Learning, hype or hit?

5

What is Machine Learning?

Computer

Computer

Traditional Programming

Machine Learning

Data

Data

Program Output

ProgramOutput

Page 6: Machine Learning, hype or hit?

6

Examples: Recommender systems

Page 7: Machine Learning, hype or hit?

7

Examples: Natural Language Processing

Siri

Google Translate

Page 8: Machine Learning, hype or hit?

8

Examples, continued…

SPAM-filtering

Handwriting recognition

Page 9: Machine Learning, hype or hit?

9

ML in the news: IBM Watson

Page 10: Machine Learning, hype or hit?

10

ML in the news: Deepmind’s AlphaGo

Page 11: Machine Learning, hype or hit?

11

ML in the news: business example

Page 12: Machine Learning, hype or hit?

12

Vendor Platforms…

Page 13: Machine Learning, hype or hit?

13

Tricking a neural network…

A cat! Surely also a cat?!

More examples and explanation by Julia Evans (@b0rk)

Page 14: Machine Learning, hype or hit?

14

Machine Learning gone wrong

Page 15: Machine Learning, hype or hit?

15

Data Mining Fail (by Carina C. Zona)

Page 16: Machine Learning, hype or hit?

16

Prediction is hard…

Page 17: Machine Learning, hype or hit?

17

Agenda

1. Introduction: Hype or Hit?!

2. Machine Learning

3. Demo, SAP ICN

4. Skill set for aspiring ML experts

5. Take-aways

Page 18: Machine Learning, hype or hit?

18

CRISP-DM: data mining process

ML important

ML important

Page 19: Machine Learning, hype or hit?

19

Data: terminology

featuretarget / label

instance

Page 20: Machine Learning, hype or hit?

20

Examples of ML tasksSupervised learning

Regression target is numeric

Classification target is categorical

Unsupervised learning

Clustering

Dimensionalityreduction

Page 21: Machine Learning, hype or hit?

21

Exploratory Data Analysis

Page 22: Machine Learning, hype or hit?

22

Data preparation

• Data Cleaning

• Missing Data

• Feature Engineering• Normalization• Categorical data Numerical features• Log-based features or target• Date/time-related features• Combine features, e.g. by +, -, x, /

Page 23: Machine Learning, hype or hit?

23

Modeling: so many algorithms…

Page 24: Machine Learning, hype or hit?

24

ML Algorithms: by RepresentationCollection of candidate models/programs, aka hypothesis space

Decision trees

Instance-based

Neural networks

Model ensembles

Page 25: Machine Learning, hype or hit?

ML Algorithms: by Evaluation

Evaluation: Quality measure for a model

25

Regression

Example metric: Root Mean Squared Error

RMSE =

Binary classification: confusion matrix

Accuracy: 8 + 971 -> 97,9%

Example: medical test for a disease

Accuracy: Better evaluation metrics:• Precision: 8 / (8 + 19)• Recall: 8 / (8 + 2)

Page 26: Machine Learning, hype or hit?

26

Optimization: how the algorithm ‘learns’, depends on representation and evaluation

ML Algorithms: by Optimization

Greedy Search, ex. of combinatorial optimization

Gradient Descent (or in general: Convex Optimization)

Linear Programming (or in general:Constrained/Nonlinear Optimization)

Page 27: Machine Learning, hype or hit?

27

Algorithms by Evaluation: Heuristics

• Hill climbing

• Simulated Annealing

• Nelder-Mead Simplex Method

• Artificial Bee Colony Optimization

• Genetic Algorithms

• Particle Swarm Optimization

• Ant Colony Optimization

Page 28: Machine Learning, hype or hit?

28

Choice of ML-algorithm, considerations

• Size & Dimensionality of training set

• Computational efficiency

• Model building, no of parameters• Eager vs lazy learning• Online vs batch

• Interpretability

Page 29: Machine Learning, hype or hit?

29

Evaluation: training vs test data

5-fold cross validation

Page 30: Machine Learning, hype or hit?

30

Training error vs test error

Page 31: Machine Learning, hype or hit?

31

Overfitting

Page 32: Machine Learning, hype or hit?

32

Chebishev distance (L∞-norm: || ||∞ )

|| P – Q ||∞ = max( , )

Number of moves of a King on a chessboard ;-)

Manhattan distance (L1-norm: || ||1 )

|| P – Q ||1 = +

0 1 2 3 4 5 6 7 8 9012345678

Line through (2,2) and (6,5)Line y = 2 (between 2 and 6)Vertical line x = 6 (between 2 and 5)

Distance metrics

Euclidean distance (L2-norm: || ||2 )

|| P – Q ||2 = (length of)

P

Q

Many more: Cosine distance, Edit distance (aka Levenshtein distance), …

Page 33: Machine Learning, hype or hit?

33

Agenda

1. Introduction: Hype or Hit?!

2. Machine Learning

3. Demo, SAP ICN

4. Skill set for aspiring ML experts

5. Take-aways

Page 34: Machine Learning, hype or hit?

34

Agenda

1. Introduction: Hype or Hit?!

2. Machine Learning

3. Demo, SAP ICN

4. Skill set for aspiring ML experts

5. Take-aways

Page 35: Machine Learning, hype or hit?

35

So you want to be a Data Scientist?

Page 36: Machine Learning, hype or hit?

36

CRISP-DM: data mining process

Page 37: Machine Learning, hype or hit?

37

Hacking skills

• Programming languages:

• Libraries (examples):• Tensorflow, Caffe, Theano, Keras• SciPy & scikit-learn• Spark MLLib (Scala/Java/Python)

Page 38: Machine Learning, hype or hit?

38

Math skills: Statistics

Source: http://xkcd.com/552/

Page 39: Machine Learning, hype or hit?

39

More math skills that may be needed…

Calculus Linear Algebra

Page 40: Machine Learning, hype or hit?

40

Data Science for Business

• Focuses more on general principles than specific algorithms

• Not math-heavy, does contain some math

• O’Reilly link: http://shop.oreilly.com/product/0636920028918.do

• Book website: http://data-science-for-biz.com/DSB/Home.html

Page 41: Machine Learning, hype or hit?

41

Agenda

1. Introduction: Hype or Hit?!

2. Machine Learning

3. Demo, SAP ICN

4. Skill set for aspiring ML experts

5. Take-aways

Page 42: Machine Learning, hype or hit?

42

What has NOT been covered

• Deep learning / Neural Networks

• Specifics of ML-algorithms

• Tools / Libraries / Code

• SAP Products, like HANA / Predictive Analytics / Vora / …

• Hardware

• …

Page 43: Machine Learning, hype or hit?

43

Take-aways

• Goal of ML: generalize from training data (not optimization!!)

• Part of ‘Data Mining Process’, not a goal in and of itself

• No magic! Just some clever algorithms…

• Increasingly important non-technical aspects:• Ethics

• Algorithmic transparency

Page 44: Machine Learning, hype or hit?

Thank [email protected]@SOAPEOPLE

Fred VerheulBig Data Consultant+31 6 3919 [email protected]