machine learning saarland university, ss 2007 holger bast marjan celikik kevin chang stefan funke...

21
Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken, Germany Lecture 1, Friday April 19 th , 2007 (basics and example applications)

Upload: antonia-hamilton

Post on 21-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,

Machine LearningSaarland University, SS 2007

Holger BastMarjan CelikikKevin ChangStefan Funke

Joachim Giesen

Max-Planck-Institut für InformatikSaarbrücken, Germany

Lecture 1, Friday April 19th, 2007(basics and example applications)

Page 2: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,

Overview of this Lecture

Machine Learning Basics

– Classification

– Objects as feature vectors

– Regression

– Clustering

Example applications

– Surface reconstruction

– Preference Learning

– Netflix challenge (how to earn $1,000,000)

– Text search

Page 3: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,

Classification

Given a set of points, each labeled + or –

– learn something from them …

– … in order to predict label of new points

+ +

+

+

+

+

++ – –

––

––

–?–

this is an instance of supervised learning

Page 4: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,

Classification — Quality

Which classifier is better?

– answer requires a model of where the data comes from

– and a measure of quality/accuracy

+ +

+

+

+

+

++ – –

––

––

–?

Page 5: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,

Classification — Outliers and Overfitting

We have to find a balance between two extremes

– oversimplification ( large classification error)

– overfitting ( lack of regularity)

– again: requires a model of the data

+ +

+

+

+

+

++ – –

––

––

+

––

Page 6: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,

Classification — Point Transformation

If a classifier does not work for the original data

– try it on a transformation of the data

– typically: make points linearly separable by a suitable mapping to a higher-dimensional space

+ ++ +++– – – ++0

++

++

+

– –

++

+

map x to (x , |x|)

Page 7: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,

Classification — more labels

+ +

+

+

+

+

++

– –

––

––

o o

o

o

o

oo

Typically:

– first, basic technique for binary classification

– then, extension to more labels

Page 8: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,

Objects as Feature Vectors

But why learn something about points ?

General Idea:

– represent objects as points in a space of fixed dimension

– each dimension corresponds to a so-called feature of the object

Very crucial:

– selection of features

– normalization of vectors

Page 9: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,

Objects as Feature Vectors

Example: Objects with attributes

– features = values

– normalize by reference value for each feature

Person 1 Person 2 Person 3

188 cm 181 cm 190 cm75 kg 90 kg 77 kg

age 36 age 32 age 34

1887536

1819033

Person 4

176 cm55 kg

age 24

heightweightage

1907734

1725534

1.040.940.90

1.011.130.83

height/180weight/70age/30

1.060.960.85

0.960.690.60

Page 10: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,

Objects as Feature Vectors

2 8 28 5 82 7 2

282858272

Example: Images

– features = pixels(with grey values)

– often fine without further normalization

1 6 16 6 61 6 1

Image 1 Image 2

pixel (1,1)

pixel (1,2)

pixel (1,3)

pixel (2,1)

pixel (2,2)

pixel (2,3)

pixel (3,1)

pixel (3,2)

pixel (3,3)

161666161

Page 11: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,

Objects as Feature Vectors

Example: Text documents– features = words

– normalize to unit norm

1110001

LearningMachineSSStatisticalTheory20062007

Doc. 1

Machine LearningSS 2007

Doc. 1

Machine LearningSS 2007

Doc. 2

Statistical

LearningTheory

SS 2007

Doc. 2

Statistical

LearningTheory

SS 2007

Doc. 3

Statistical

LearningTheory

SS 2006

Doc. 3

Statistical

LearningTheory

SS 2006

1011101

1011110

Page 12: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,

Objects as Feature Vectors

Example: Text documents– features = words

– normalize to unit norm

0.50.50.50000.5

LearningMachineSSStatisticalTheory20062007

Doc. 1

Machine LearningSS 2007

Doc. 1

Machine LearningSS 2007

Doc. 2

Statistical

LearningTheory

SS 2007

Doc. 2

Statistical

LearningTheory

SS 2007

Doc. 3

Statistical

LearningTheory

SS 2006

Doc. 3

Statistical

LearningTheory

SS 2006

0.400.40.40.400.4

0.400.40.40.40.40

Page 13: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,

Regression

Learn a function that maps objects to values

Similar trade-off as for classification:

– risk of oversimplification vs. risk of overfitting

xx

x

xx

?

x

x

x

given value(typically multi-dimensional)

value to learn(typically a real number)

Page 14: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,

Regression

Learn a function that maps objects to values

Similar trade-off as for classification:

– risk of oversimplification vs. risk of overfitting

xx

x

xx

?

x

x

x

given value(typically multi-dimensional)

value to learn(typically a real number)

Page 15: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,

Clustering

Partition given set of points into clusters

Similar problems as for classification

– follow data distribution, but not too closely

– transformation often helps (next slide)

xx

x xx

xx

x

x

x

xx

this is an instance of unsupervised learning

Page 16: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,

Clustering

Partition given set of points into clusters

Similar problems as for classification

– follow data distribution, but not too closely

– transformation often helps (next slide)

xx

x xx

xx

x

x

x

xx

Page 17: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,

Clustering

Partition given set of points into clusters

Similar problems as for classification

– follow data distribution, but not too closely

– transformation often helps (next slide)

xx

x xx

xx

x

x

x

xx

Page 18: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,

Clustering

Partition given set of points into clusters

Similar problems as for classification

– follow data distribution, but not too closely

– transformation often helps (next slide)

xx

x xx

xx

x

x

x

xx

Page 19: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,

Clustering — Transformation

For clustering, typically dimension reduction helps

– whereas in classification, embedding in a higher-dimensional space typically helps

1 0 1 0 01 1 0 0 01 1 1 1 00 0 0 1 1

internetwebsurfingbeach

vectors fordocuments 2, 3, and 4

equally dissimilar

0.9 0.8 0.8 0.0 0.0-0.1 0.0 0.0 1.1 0.9

project to 2 dimensions

2-clustering wouldwork fine now

doc1 doc2 doc3 doc4 doc5

Page 20: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,
Page 21: Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,

Application Example: Text Search

676 abstracts from the Max-Planck-Institute

– for example:

We present two theoretically interesting and empirically successful techniques for improving the linear programming approaches, namely graph transformation and local cuts, in the context of the Steiner problem. We show the impact of these techniques on the solution of the largest benchmark instances ever solved.

– 3283 words (words like and, or, this, … removed)

– abstracts come from 5 working groups: Algorithms, Logic, Graphics, CompBio, Databases

– reduce to 10 concepts

No dictionary, no training, only the plain text itself !