diagnosis of multiple cancer types by shrunken centroids of gene expression

15
Diagnosis of multiple cancer types by shrunken centroids of gene expression Course: 550.635 Topics in Bioinformatics Presenter: Ting Yang Teacher: Professor Geman By Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gil

Upload: shantell-allen

Post on 03-Jan-2016

18 views

Category:

Documents


0 download

DESCRIPTION

Diagnosis of multiple cancer types by shrunken centroids of gene expression. By Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu. Course: 550.635 Topics in Bioinformatics Presenter: Ting Yang Teacher: Professor Geman. Nearest Centroid Classification. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Diagnosis of multiple cancer types by shrunken centroids of gene expression

Diagnosis of multiple cancer types by shrunken centroids of gene expression

Course: 550.635 Topics in Bioinformatics Presenter: Ting YangTeacher: Professor Geman

By Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu

Page 2: Diagnosis of multiple cancer types by shrunken centroids of gene expression

Nearest Centroid Classification

Example: small round blue cell tumors of childhood

• 63 training samples, 25 testing samples

• 4 classes: BL, EWS, NB, RMS

• Figure 1

• Nearest centroid classification

• Disadvantage

Page 3: Diagnosis of multiple cancer types by shrunken centroids of gene expression
Page 4: Diagnosis of multiple cancer types by shrunken centroids of gene expression

Nearest shrunken Centroids

• A modification of the nearest centroid method

• Idea: First normalize class centroids by the within-class standard deviation for each gene, shrink each class centroid towards the overall centroid.

Page 5: Diagnosis of multiple cancer types by shrunken centroids of gene expression

Details:

0( )ik i

ikk i

x xd

m s s

Mean expression value in class k for gene i

ith component of the overall centroid

Pooled within class standard deviation for gene i

:t statistics

1 1k

k

mn n

Page 6: Diagnosis of multiple cancer types by shrunken centroids of gene expression

:t statistics0( )

ik iik

k i

x xd

m s s

• It measures the difference between the gene i in class k and gene i in all classes combined.

• Idea: a gene that discriminates one class from the rest will have a statistic of large absolute value.

Page 7: Diagnosis of multiple cancer types by shrunken centroids of gene expression

• Shrink it toward zero to eliminate the genes that do not provide sufficient information.

• ‘De-noising’ step

( )( )ik ik ikd sign d d

Page 8: Diagnosis of multiple cancer types by shrunken centroids of gene expression

Choosing the amount of shrinkage• Shrinkage amount is allowed to vary over a wide range.

• 10-fold cross validation ( choose the one that has the smallest error rate)

• Divide the set of samples (at random)into 10 equal size parts.

(classes were distributed proportionally among each of the 10 parts)

• Fit the model on 90% of the samples and then predict the class label of the remaining 10% (test samples).

• Repeat 10 times, add together the error (overall error).

• Figure 2

• Figure 1

Page 9: Diagnosis of multiple cancer types by shrunken centroids of gene expression
Page 10: Diagnosis of multiple cancer types by shrunken centroids of gene expression
Page 11: Diagnosis of multiple cancer types by shrunken centroids of gene expression

More Figures

• Figure 3

• Figure 4

Page 12: Diagnosis of multiple cancer types by shrunken centroids of gene expression
Page 13: Diagnosis of multiple cancer types by shrunken centroids of gene expression

Classification

• A new sample is classified by comparing its expression profile with each shrunken centroid, over those 43 active genes.

• Distance function: prior information included.

Page 14: Diagnosis of multiple cancer types by shrunken centroids of gene expression

Statistical details:

• t-statistic

• Estimates of the class probabilities (Figure 5)

0( )ik i

ikk i

x xd

m s s

Page 15: Diagnosis of multiple cancer types by shrunken centroids of gene expression