machine learning 101 sit hvr

Post on 19-Mar-2017

420 Views

Category:

Software

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Machine Learning 101Fred Verheul

2

What we won’t cover…

• Deep learning / Neural Networks

• Specifics of ML-algorithms

• Tools / Libraries / Code

• SAP Products, like HANA / Predictive Analytics / Vora / …

• Ethics, algorithmic transparency & fairness

• Hardware

3

Examples: Recommender systems

4

Examples, continued…

SPAM-filtering

Handwriting recognition

5

ML in the news: Deepmind’s AlphaGo

6

7

Machine Learning

"Field of study that gives computers the ability to learnwithout being explicitly programmed” (Arthur Samuel, 1959)

8

What is Machine Learning?

Computer

Computer

Traditional Programming

Machine Learning

Data

Data

Program Output

ProgramOutput

9

Sweet spot for Machine Learning

• It’s impossible to write down the rules in code:• Too many rules• Too many factors influencing the rules• Too finely tuned• We just don’t know the rules (image recognition)

• Lots of labeled data (examples) available (e.g. historical data)

10

Basic Machine Learning ‘workflow’

Feature Vectors

Training data

Labels

Machine Learning Algorithm

Feature Vectors

New data Prediction

Training Phase

Operational Phase

Predictive Model

11

Training Phase in more detail

Raw dataData

preparation Feature Vectors

Training Data

Test data

Model Building (by ML

algorithm)

Model Evaluation

Predictive Model

Feedback loop

data cleansingdata transformation

normalizationfeature extraction

aka ‘learning’

12

CRISP-DM: data mining process

ML important

ML important

13

Examples of ML tasksSupervised learning

Regression target is numeric

Classification target is categorical

Unsupervised learning

Clustering

Dimensionalityreduction

14

Modeling: so many algorithms…

15

ML Algorithms: by RepresentationCollection of candidate models/programs, aka hypothesis space

Decision trees

Instance-based

Neural networks

Model ensembles

ML Algorithms: by Evaluation

Evaluation: Quality measure for a model

16

Regression

Example metric: Root Mean Squared Error

RMSE =

Binary classification: confusion matrix

Accuracy: 8 + 971 -> 97,9%

Example: medical test for a disease

Accuracy: Better evaluation metrics:• Precision: 8 / (8 + 19)• Recall: 8 / (8 + 2)

17

Optimization: how the algorithm ‘learns’, depends on representation and evaluation

ML Algorithms: by Optimization

Greedy Search, ex. of combinatorial optimization

Gradient Descent (or in general: Convex Optimization)

Linear Programming (or in general:Constrained/Nonlinear Optimization)

18

Training error vs test error

19

Data Science for Business

• Focuses more on general principles than specific algorithms

• Not math-heavy, does contain some math

• O’Reilly link: http://shop.oreilly.com/product/0636920028918.do

• Book website: http://data-science-for-biz.com/DSB/Home.html

20

Take-aways

• Goal of ML: generalize from training data (not optimization!!)

• Part of ‘Data Mining Process’, not a goal in and of itself

• No magic! Just some clever algorithms…

• Increasingly important non-technical aspects:• Ethics

• Algorithmic transparency

Thank Youwww.soapeople.cominfo@soapeople.com@SOAPEOPLE

Fred VerheulBig Data Consultant+31 6 3919 2986fred.verheul@soapeople.com@fredverheul

top related