qianren (tim) xu

A Significance Test-Based Feature A Significance Test-Based Feature Selection Method for the Detection of Selection Method for the Detection of

Prostate Cancer from Proteomic PatternsProstate Cancer from Proteomic Patterns

Qianren (Tim) XuM.A.Sc. Candidate:

Supervisors: Dr. M. Kamel

Dr. M. M. A. Salama

2

HighlightHighlight

• STFS can be generally used for any problems of supervised pattern recognition

• Very good performances have been obtained on several benchmark datasets, especially with a large number of features

Significance Test-Based Feature Selection (STFS):

Proteomic Pattern Analysis for Prostate Cancer Detection

STFS Neural Networks

ROC analysis

• Sensitivity 97.1%, Specificity 96.8%

• Suggestion of mistaken label by prostatic biopsy

3

Outline of Part IOutline of Part I

Significance Test-Based Feature Selection (STFS) on Supervised Pattern Recognition

• Introduction

• Methodology

• Experiment Results on Benchmark Datasets

• Comparison with MIFS

4

IntroductionIntroduction

Problems on Features

• Large number

• Irrelevant

• Noise

• Correlation

Increasing computational complexity

Reducing recognition rate

5

Mutual Information Feature SelectionMutual Information Feature Selection

• Large number of features and the large number of classes

• Continuous data

dxdyypxp

yxpyxpyxI

x y

)()(

),(log),();(

• But estimation of the mutual information is difficult:

• One of most important heuristic feature selection methods, it can be very useful in any classification systems.

6

Problems on Feature Selection Problems on Feature Selection MethodsMethods

• Computational complexity

• Optimal deficiency

Two key issues:

7

Proposed MethodProposed Method

Significance of feature =

Criterion of Feature Selection

Significant difference IndependenceX

Noncorrelation betweencandidate feature and

already-selected features

Pattern separabilityon individual

candidate features

8

Measurement of Pattern Measurement of Pattern Separability of Individual FeaturesSeparability of Individual Features

Statistical Significant Difference

Continuous data with normal distribution

Continuous data with non-normal distribution

or rank data

Categorical data

Two classes

More than two classes

Two classes

More than two classes

t-test ANOVA Mann-Whitneytest

Chi-squaretest

Kruskal-Wallistest

9

IndependenceIndependence

Independence

Continuous data with normal distribution

Continuous data with non-normal distribution

or rank data

Categorical data

Pearson contingency

coefficientPearson correlation

Spearman rank correlation

21 ncorrelatioceindependen

10

Selecting ProcedureSelecting Procedure

MSDI: Maximum Significant Differenceand Independence Algorithm

MIC: Monotonically Increasing Curve Strategy

11

Maximum Significant Difference and Maximum Significant Difference and Independence (MSDI) AlgorithmIndependence (MSDI) Algorithm

Compute the significance difference (sd) of every initial features

Select the feature with maximum sd as the first feature

Computer the independence level (ind) between every candidate feature and the already-selected feature(s)

Select the feature with maximum feature significance (sf = sd x ind) as the new feature

12

Monotonically Increasing Curve Monotonically Increasing Curve (MIC) Strategy(MIC) Strategy

0 10 20 300.4

0.6

0.8

1

Number of featuresR

ate

of

reco

gn

itio

n

Performance Curve

The feature subset selected by MSDI

Plot performance curve

Delete the features that have “no good” contribution to

the increasing of recognition

Until the curve is monotonically increasing

13

Example I: Handwritten Digit RecognitionExample I: Handwritten Digit Recognition

• Thus 8x8 matrix is generated, that is 64 features

• The pixels in each block is counted

• 32-by-32 bitmaps are divided into 8X8=64 blocks

14

Performance CurvePerformance Curve

0 10 20 30 40 50 600.4

0.5

0.6

0.7

0.8

0.9

1

Number of features

Rat

e o

f re

cog

nit

ion MSDI

MIFS(β=1.0)

MIFS(β=0.8)MIFS(β=0.6)

MIFS(β=0.4)MIFS(β=0.2)

Battiti’s MIFS:

Ss

sfIfCI );();(

Random rankingIt is need to determined β

MSDI: Maximum Significant Difference and IndependenceMIFS: Mutual Information Feature Selector

15

Computational ComplexityComputational Complexity

Selecting 15 features from the 64 original feature set

MSDI: 24 seconds

Battiti’s MIFS: 1110 seconds

(5 vales of β are searched in the range of 0-1)

16

Example II: Handwritten digit recognitionExample II: Handwritten digit recognition

The 649 features that distribute over the following six feature sets:

• 76 Fourier coefficients of the character shapes,

• 216 profile correlations, • 64 Karhunen-Love coefficients, • 240 pixel averages in 2 x 3 windows, • 47 Zernike moments, • 6 morphological features.

17

Performance CurvePerformance Curve

0 10 20 30 40 500.2

0.4

0.6

0.8

1

Number of features

Rat

e o

f re

cog

nit

ion

MSDI + MIC

MSDI

Random ranking

MSDI: Maximum Significant difference and independenceMIC: Monotonically Increasing Curve

18

Comparison with MIFSComparison with MIFS

0 10 20 30 40 500.4

0.5

0.6

0.7

0.8

0.9

1

Number of features

Rat

e o

f re

cog

nit

ion MSDI

MIFS (β=0.2)

MIFS (β=0.5)

MSDI is much better on large number of features

MIFS is better on small number of features

MSDI: Maximum Significant Difference and IndependenceMIFS: Mutual Information Feature Selector

19

Summary on Comparing MSDI Summary on Comparing MSDI with MIFSwith MIFS

• MSDI is much more computational effective• MIFS need to calculate the pdfs• The computational effective criterion (Ba

ttiti’s MIFS) still need to determine β• MSDI only involves the simple statistical

calculation

• MSDI can select more optimal feature subset from a large number of feature, because it is based on relevant statistical models

• MIFS is more suitable on small volume of data and small feature subset

dxdyypxp

yxpyxpyxI

vx y v

vvvv

)()(

),(log),(),(

p

uvuvvvv xxIyxII

1

);();(

20

Outline of Part IIOutline of Part II

Mass Spectrometry-Based Proteomic Pattern Analysis for Detection of Prostate Cancer

• Problem Statement• Methods

• Feature• Classification• optimization

• Results and Discussion

21

Problem StatementProblem Statement

1. Very large number of features

2. Electronic and chemical noise

3. Biological variability of human disease

4. Little knowledge in the proteomic mass spectrum

0 2000 4000 6000 8000 10000 12000 14000 16000 18000-20

020406080

0 2000 4000 6000 8000 10000 12000 14000 16000 18000-20

020406080

0 2000 4000 6000 8000 10000 12000 14000 16000 18000-20

020406080

0 2000 4000 6000 8000 10000 12000 14000 16000 18000-20

020406080

M/Z

Intensity

Mean of non-cancer training sets

Mean of non-cancer training setn=100 most significant features

Mean of cancer training sets

Mean of cancer training setn=100 most significant features

15154 points (features)

22

The system of Proteomic The system of Proteomic Pattern AnalysisPattern Analysis

Training dataset (initial features > 104)

RBFNN / PNN learning

Most significant featuresselected by STFS

Mature classifier

Trained neural classifier

Optimization of the size of featuresubset and the parameters of classifier

by minimizing ROC distance

STFS: Significance Test-Based Feature SelectionPNN: Probabilistic Neural NetworkRBFNN: Radial Basis Function Neural Network

23

Feature Selection: STFSFeature Selection: STFS

MIC

MSDI Significanceof feature

= Significantdifference

Independencex

STFS: Significance Test-Based Feature SelectionMSDI: Maximum Significant Difference and Independence Algorithm MIC: Monotonically Increasing Curve Strategy

StudentTest

Pearsoncorrelation

24

Classification: PNN / RBFNNClassification: PNN / RBFNN

x1

x2

xn

Pool 1

Pool 2

S1

S2

x1

x2

x3

xn

y(1)

y(2)

y yd

PNN is a standard structure with four layers

RBFNN is a modifiedfour-layer structure

PNN: Probabilistic Neural NetworkRBFNN: Radial Basis Function Neural Network

otherwisey

yyyyifyyd

)2(

)2()1()1( ||||

25

Optimization: ROC DistanceOptimization: ROC Distance

Minimizing the ROC distanceto optimize:- Feature subset numbers m - Gaussian spread σ- RBFNN pattern decision weight λ

ROC: Receiver Operating Characteristic

a

b

dROC

False positive rate(1-specificity)

100

1

Tru

e p

osi

tive

rat

e(s

ensi

tivi

ty)

22

22

)1(min

)1()1(min

yspecificitysensitivit

FPTPdROC

26

Results: Results: Sensitivity and SpecificitySensitivity and Specificity

Sensitivity Specificity

Our results 97.1% 96.8%

Petricoin (2002) 94.7% 75.9%

DRE 55-68% 6-33%

PSA 29-80% --

27

Pattern Pattern DistributionDistribution

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.40

10

20

30

40

50

60

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.40

10

20

30

40

50

60

70

False negative 2.9%

False positive 3.2%

True negative 96.8%

True positive 97.1%

Cut-point

Non-Cancer

Cancer

Labelled byBiopsies

Non-Cancer Cancer

Pattern recognizedby RBFNN

28

The possible causes onThe possible causes onthe unrecognizable samplesthe unrecognizable samples

1. The algorithm of the classifier is not able to recognize all the samples

2. The proteomics is not able to provide enough information

3. Prostatic biopsies mistakenly label the cancer

29

Possibility of Mistaken Diagnosis of Possibility of Mistaken Diagnosis of Prostatic BiopsyProstatic Biopsy

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.40

10

20

30

40

50

60

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.40

10

20

30

40

50

60

70

True non-cancer False non-cancer

False cancer True cancer

Cut-point

• Biopsy has limited sensitivity and specificity

• Proteomic classifier has very high sensitivity and specificity correlated with biopsy

• The results of proteomic classifier are not exactly the same as biopsy

• All unrecognizable sample are outliers

31

Summary (1)Summary (1)

Significance Test-Based Feature Selection (STFS):

• STFS selects features by maximum significant difference and independence (MSDI), it aims to determine minimum possible feature subset to achieve maximum recognition rate

• Feature significance (selecting criterion ) is estimated based on the optimal statistical models in accordance with the properties of the data

• Advantages:

• Computationally effective• Optimality

32

Summary (2)Summary (2)

Proteomic Pattern Analysis for Detection of Prostate Cancer

• The system consists of three parts: feature selection by STFS, classification by PNN/RBFNN, optimization and evaluation by minimum ROC distance

• Sensitivity 97.1%, Specificity 96.8%, it would be an asset to early and accurately detect prostate, and to prevent a large number of aging men from undergoing unnecessary prostatic biopsies

• Suggestion of mistaken label by prostatic biopsy through pattern analysis may lead to a novel direction in the diagnostic research of prostate cancer

33

Thanks for your timeThanks for your time

Questions?

qianren (tim) xu

Documents