machine learning saarland university, ss 2007 holger bast marjan celikik kevin chang stefan funke...

Machine LearningSaarland University, SS 2007

Holger BastMarjan CelikikKevin ChangStefan Funke

Joachim Giesen

Max-Planck-Institut für InformatikSaarbrücken, Germany

Lecture 1, Friday April 19th, 2007(basics and example applications)

Overview of this Lecture

Machine Learning Basics

– Classification

– Objects as feature vectors

– Regression

– Clustering

Example applications

– Surface reconstruction

– Preference Learning

– Netflix challenge (how to earn $1,000,000)

– Text search

Classification

Given a set of points, each labeled + or –

– learn something from them …

– … in order to predict label of new points

+ +

+

+

+

+

++ – –

––

–

––

–?–

this is an instance of supervised learning

Classification — Quality

Which classifier is better?

– answer requires a model of where the data comes from

– and a measure of quality/accuracy

+ +

+

+

+

+

++ – –

––

–

––

–?

Classification — Outliers and Overfitting

We have to find a balance between two extremes

– oversimplification ( large classification error)

– overfitting ( lack of regularity)

– again: requires a model of the data

+ +

+

+

+

+

++ – –

––

–

––

–

+

––

–

Classification — Point Transformation

If a classifier does not work for the original data

– try it on a transformation of the data

– typically: make points linearly separable by a suitable mapping to a higher-dimensional space

+ ++ +++– – – ++0

++

++

+

– –

–

++

+

map x to (x , |x|)

Classification — more labels

+ +

+

+

+

+

++

– –

––

–

––

–

o o

o

o

o

oo

Typically:

– first, basic technique for binary classification

– then, extension to more labels

Objects as Feature Vectors

But why learn something about points ?

General Idea:

– represent objects as points in a space of fixed dimension

– each dimension corresponds to a so-called feature of the object

Very crucial:

– selection of features

– normalization of vectors


Example: Objects with attributes

– features = values

– normalize by reference value for each feature

Person 1 Person 2 Person 3

188 cm 181 cm 190 cm75 kg 90 kg 77 kg

age 36 age 32 age 34

1887536

1819033

Person 4

176 cm55 kg

age 24

heightweightage

1907734

1725534

1.040.940.90

1.011.130.83

height/180weight/70age/30

1.060.960.85

0.960.690.60


2 8 28 5 82 7 2

282858272

Example: Images

– features = pixels(with grey values)

– often fine without further normalization

1 6 16 6 61 6 1

Image 1 Image 2

pixel (1,1)

pixel (1,2)

pixel (1,3)

pixel (2,1)

pixel (2,2)

pixel (2,3)

pixel (3,1)

pixel (3,2)

pixel (3,3)

161666161


Example: Text documents– features = words

– normalize to unit norm

1110001

LearningMachineSSStatisticalTheory20062007

Doc. 1

Machine LearningSS 2007

Doc. 1


Doc. 2

Statistical

LearningTheory

SS 2007

Doc. 2

Statistical

LearningTheory

SS 2007

Doc. 3

Statistical

LearningTheory

SS 2006

Doc. 3

Statistical

LearningTheory

SS 2006

1011101

1011110


Example: Text documents– features = words

– normalize to unit norm

0.50.50.50000.5

LearningMachineSSStatisticalTheory20062007

Doc. 1


Doc. 1


Doc. 2

Statistical

LearningTheory

SS 2007

Doc. 2

Statistical

LearningTheory

SS 2007

Doc. 3

Statistical

LearningTheory

SS 2006

Doc. 3

Statistical

LearningTheory

SS 2006

0.400.40.40.400.4

0.400.40.40.40.40

Regression

Learn a function that maps objects to values

Similar trade-off as for classification:

– risk of oversimplification vs. risk of overfitting

xx

x

xx

?

x

x

x

given value(typically multi-dimensional)

value to learn(typically a real number)

Clustering

Partition given set of points into clusters

Similar problems as for classification

– follow data distribution, but not too closely

– transformation often helps (next slide)

xx

x xx

xx

x

x

x

xx

this is an instance of unsupervised learning

Clustering

Partition given set of points into clusters

Similar problems as for classification

– follow data distribution, but not too closely

– transformation often helps (next slide)

xx

x xx

xx

x

x

x

xx

Clustering — Transformation

For clustering, typically dimension reduction helps

– whereas in classification, embedding in a higher-dimensional space typically helps

1 0 1 0 01 1 0 0 01 1 1 1 00 0 0 1 1

internetwebsurfingbeach

vectors fordocuments 2, 3, and 4

equally dissimilar

0.9 0.8 0.8 0.0 0.0-0.1 0.0 0.0 1.1 0.9

project to 2 dimensions

2-clustering wouldwork fine now

doc1 doc2 doc3 doc4 doc5

Application Example: Text Search

676 abstracts from the Max-Planck-Institute

– for example:

We present two theoretically interesting and empirically successful techniques for improving the linear programming approaches, namely graph transformation and local cuts, in the context of the Steiner problem. We show the impact of these techniques on the solution of the largest benchmark instances ever solved.

– 3283 words (words like and, or, this, … removed)

– abstracts come from 5 working groups: Algorithms, Logic, Graphics, CompBio, Databases

– reduce to 10 concepts

No dictionary, no training, only the plain text itself !

machine learning saarland university, ss 2007 holger bast marjan celikik kevin chang stefan funke...

Documents