universty of são paulo institute of mathematics and statistics computer science department...

28
Universty of São Paulo Institute of Mathematics and Statistics Computer Science Department Introduction to Pattern Recognition A Bioinformatics Viewpoint Roberto Marcondes Cesar Junior (IME-USP) http://www.ime.usp.br/~cesar/ [email protected]

Upload: chastity-ellis

Post on 13-Dec-2015

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Universty of São Paulo Institute of Mathematics and Statistics Computer Science Department Introduction to Pattern Recognition A Bioinformatics Viewpoint

Universty of São Paulo

Institute of Mathematics and Statistics

Computer Science Department

Introduction to Pattern Recognition

A Bioinformatics Viewpoint

Roberto Marcondes Cesar Junior (IME-USP)

http://www.ime.usp.br/~cesar/[email protected]

Page 2: Universty of São Paulo Institute of Mathematics and Statistics Computer Science Department Introduction to Pattern Recognition A Bioinformatics Viewpoint

OrganizationOrganization

Introduction

Case Study

Generalizing the ConceptsConcluding Remarks

Page 3: Universty of São Paulo Institute of Mathematics and Statistics Computer Science Department Introduction to Pattern Recognition A Bioinformatics Viewpoint

IntroductionIntroduction

Pattern Recogntion To recognize is to classify. To classify an object is to label the object. An object is anything we want to recognize.

Applications Computer Vision Speech recognition Bioinformatics ...

Page 4: Universty of São Paulo Institute of Mathematics and Statistics Computer Science Department Introduction to Pattern Recognition A Bioinformatics Viewpoint

Case StudyCase Study

We are interested in studying some disease, which we will call disease X.

Hypothesis:There are some different types of disease X,

which will be called A, B, C...

Question:What is the expression behaviour of a given set of

genes g1, g2, ...gn with respect to A, B, C...?

Page 5: Universty of São Paulo Institute of Mathematics and Statistics Computer Science Department Introduction to Pattern Recognition A Bioinformatics Viewpoint

Case StudyCase Study

First step: gathering some sick people

C1 C5 C6C2 C3 C4

Page 6: Universty of São Paulo Institute of Mathematics and Statistics Computer Science Department Introduction to Pattern Recognition A Bioinformatics Viewpoint

Case StudyCase Study

Second step:Each case will be analyzed based on the gene

expression with respect to g1, g2, ...gn

Therefore, we have to measure gene expression of the genes of interest for each case C1, C2, ..., C6

Ex: Microarrays

Page 7: Universty of São Paulo Institute of Mathematics and Statistics Computer Science Department Introduction to Pattern Recognition A Bioinformatics Viewpoint

Case StudyCase Study

1 5 6 2 3 4

Page 8: Universty of São Paulo Institute of Mathematics and Statistics Computer Science Department Introduction to Pattern Recognition A Bioinformatics Viewpoint

Case StudyCase Study

............

...3.0710

...17.02

...0920

Page 9: Universty of São Paulo Institute of Mathematics and Statistics Computer Science Department Introduction to Pattern Recognition A Bioinformatics Viewpoint

Case StudyCase Study

1 5 6 2 3 4

............

...3.0710

...17.02

...0920

............

...3121

...4.0120

...1530

............

............

............

............

............

............

............

............

............

............

............

............

............

............

............

............

M1 M 2 M 3 M 4 M 5 M 6

Page 10: Universty of São Paulo Institute of Mathematics and Statistics Computer Science Department Introduction to Pattern Recognition A Bioinformatics Viewpoint

Case StudyCase Study

............

...3.0710

...17.02

...0920

...

1

7.0

2

...

0

9

20Expression vector: stacking the array lines

Page 11: Universty of São Paulo Institute of Mathematics and Statistics Computer Science Department Introduction to Pattern Recognition A Bioinformatics Viewpoint

Case StudyCase Study

...

1

7.0

2

...

0

9

20

...

4.0

1

20

...

1

5

30

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

Page 12: Universty of São Paulo Institute of Mathematics and Statistics Computer Science Department Introduction to Pattern Recognition A Bioinformatics Viewpoint

Case StudyCase Study

Brief: Each case C1, ..., C6 is represented by a

vector v1, v2, ..., v6

Each coordinate in the expression vectors corresponds to the expression of a given gene gi

Page 13: Universty of São Paulo Institute of Mathematics and Statistics Computer Science Department Introduction to Pattern Recognition A Bioinformatics Viewpoint

Case StudyCase Study

Some PR terminology:

...

1

7.0

2

...

0

9

20

Feature

Feature Vector

Page 14: Universty of São Paulo Institute of Mathematics and Statistics Computer Science Department Introduction to Pattern Recognition A Bioinformatics Viewpoint

Case StudyCase Study

Trainning Set

Sample

Page 15: Universty of São Paulo Institute of Mathematics and Statistics Computer Science Department Introduction to Pattern Recognition A Bioinformatics Viewpoint

Case StudyCase Study

Let’s simplify things: We’re only interested in two genes g1

and g2.

15

13 ,

12

16 ,

17

15 ,

1

4 ,

5

2 ,

4

1

v1 v2 v3 v4 v5 v6

2

1 : vectorsFeatureg

g

Page 16: Universty of São Paulo Institute of Mathematics and Statistics Computer Science Department Introduction to Pattern Recognition A Bioinformatics Viewpoint

0 2 4 6 8 10 12 14 16 18 200

2

4

6

8

10

12

14

16

18

20

Case StudyCase Study

g2

g1

v1

v2

v3

v4

v5

v6

Type A

Type B FeatureSpace

Classes

Page 17: Universty of São Paulo Institute of Mathematics and Statistics Computer Science Department Introduction to Pattern Recognition A Bioinformatics Viewpoint

Case Study: the classifierCase Study: the classifier

0 2 4 6 8 10 12 14 16 18 200

2

4

6

8

10

12

14

16

18

20

Input

Trainning set with unlabelled samples

0 2 4 6 8 10 12 14 16 18 200

2

4

6

8

10

12

14

16

18

20

Output

Classes of thefeature space

Unsupervised classifier:Clustering algorithm

Page 18: Universty of São Paulo Institute of Mathematics and Statistics Computer Science Department Introduction to Pattern Recognition A Bioinformatics Viewpoint

Case Study: Linkage AlgorithmCase Study: Linkage Algorithm

0 2 4 6 8 10 12 14 16 18 200

2

4

6

8

10

12

14

16

18

20

Page 19: Universty of São Paulo Institute of Mathematics and Statistics Computer Science Department Introduction to Pattern Recognition A Bioinformatics Viewpoint

Case Study: Linkage AlgorithmCase Study: Linkage Algorithm

0 2 4 6 8 10 12 14 16 18 200

2

4

6

8

10

12

14

16

18

20 v2

v1

v3

v4

v6

v5

Dendrogram

Page 20: Universty of São Paulo Institute of Mathematics and Statistics Computer Science Department Introduction to Pattern Recognition A Bioinformatics Viewpoint

Case Study: Visualization Case Study: Visualization

Intermezzo: vectors as signals

...

...

...

...

...

0

9

20

0 2 4 6 8 10 12 14 16 180

5

10

15

20

Page 21: Universty of São Paulo Institute of Mathematics and Statistics Computer Science Department Introduction to Pattern Recognition A Bioinformatics Viewpoint

Case Study: Visualization Case Study: Visualization

Intermezzo: signals as images

0 2 4 6 8 10 12 14 16 180

5

10

15

20

Page 22: Universty of São Paulo Institute of Mathematics and Statistics Computer Science Department Introduction to Pattern Recognition A Bioinformatics Viewpoint

Generalizing the conceptsGeneralizing the concepts

Putting all together: datamining

Page 23: Universty of São Paulo Institute of Mathematics and Statistics Computer Science Department Introduction to Pattern Recognition A Bioinformatics Viewpoint

Concluding remarksConcluding remarks

Supervised classification

Which classifier should be used?

Be careful: clustering algorithms always find clusters!

Normalization issues

Page 24: Universty of São Paulo Institute of Mathematics and Statistics Computer Science Department Introduction to Pattern Recognition A Bioinformatics Viewpoint

Concluding remarksConcluding remarks

A key problem: which genes should be used?

Or: which features should be selected?

Well-known problem in PR: Dimensionality Reduction

Page 25: Universty of São Paulo Institute of Mathematics and Statistics Computer Science Department Introduction to Pattern Recognition A Bioinformatics Viewpoint

Concluding remarksConcluding remarks

Y1

Y2

Page 26: Universty of São Paulo Institute of Mathematics and Statistics Computer Science Department Introduction to Pattern Recognition A Bioinformatics Viewpoint

Concluding remarksConcluding remarks

Feature space 1

Page 27: Universty of São Paulo Institute of Mathematics and Statistics Computer Science Department Introduction to Pattern Recognition A Bioinformatics Viewpoint

Concluding remarksConcluding remarks

Feature space 2

Page 28: Universty of São Paulo Institute of Mathematics and Statistics Computer Science Department Introduction to Pattern Recognition A Bioinformatics Viewpoint

Concluding remarksConcluding remarks

Feature space 3