enabling reproducible gene expression analysis using biological pathways limsoon wong 7 april 2011...

48
Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Upload: valentine-crawford

Post on 16-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Enabling Reproducible Gene Expression Analysis Using

Biological Pathways

Limsoon Wong

7 April 2011

(Joint work with Donny Soh, Difeng Dong, Yike Guo)

Page 2: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

2

Plan

• An issue in gene expression analysis

• Comparing pathway sources: Comprehensiveness, Consistency, Compatibility

• Matching pathways in different sources

• Finding more consistent disease subnetworks

Page 3: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

An Issue in Gene Expression Analysis

Page 4: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

4

First, the good news..

Page 5: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

5

• The subtypes look similar

• Conventional diagnosis– Immunophenotyping– Cytogenetics– Molecular diagnosticsÞ Unavailable in

developing countries

Childhood Acute Lymphoblastic Leukemia

• Major subtypes: T-ALL, E2A-PBX, TEL-AML, BCR-ABL, MLL genome rearrangements, Hyperdiploid>50

• Diff subtypes respond differently to same Tx

• Over-intensive Tx – Development of

secondary cancers– Reduction of IQ

• Under-intensiveTx – Relapse

Page 6: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

6

Fold Change

T-test

x – Microarray value after drugy – Microarray value before drugi – Gene

x – Log2 value of treatmenty – Log2 value of controls – Standard errori – Gene

Individual Gene Testing

Golub et al, Science, 1999

Page 7: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

7

Yeoh et al, Cancer Cell 2002

Page 8: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

8

Conventional Tx:• intermediate intensity to allÞ 10% suffers relapseÞ 50% suffers side effectsÞ costs US$150m/yr

Our optimized Tx:• high intensity to 10%• intermediate intensity to 40%• low intensity to 50%• costs US$100m/yr

Copyright © 2004 by Jinyan Li and Limsoon Wong

•High cure rate of 80%• Less relapse

• Less side effects• Save US$51.6m/yr

Impact

Page 9: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

9

Now, the bad news..

Page 10: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

10

Percentage of Overlapping Genes

• Low % of overlapping genes from diff expt in general

– Prostate cancer• Lapointe et al, 2004• Singh et al, 2002

– Lung cancer• Garber et al, 2001• Bhattacharjee et al,

2001– DMD

• Haslett et al, 2002• Pescatori et al, 2007

Datasets DEG POG

ProstateCancer

Top 10 0.30

Top 50 0.14

Top100 0.15

LungCancer

Top 10 0.00

Top 50 0.20

Top100 0.31

DMDTop 10 0.20

Top 50 0.42

Top100 0.54

Zhang et al, Bioinformatics, 2009

Page 11: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

11

Gene Regulatory Circuits

• Each disease subtype has underlying cause

• There is a unifying biological theme for genes that are truly associated with a disease subtype

• Uncertainty in selected genes can be reduced by considering biological processes of the genes

• The unifying biological theme is basis for inferring the underlying cause of disease subtype

Page 12: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

12

Towards More Meaningful Genes

• ORA – Khatri et al– Genomics, 2002

• FCS– Pavlidis & Noble– PSB 2002

• GSEA– Subramanian et al– PNAS, 2005

• Pathway Express– Draghici et al – Genome Res, 2007

Page 13: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

13

All of these newer methods rely on gene group or pathway information.

But how good are the available sources of pathway information?

Page 14: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Comparing Pathway Sources:Comprehensiveness, Consistency, &

Compatibility

Page 15: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

15

Data Sources

• KEGG– Curated by a single lab– Long famous history– Used by many people

• Wikipathways– Community effort– new curation model

• Ingenuity– Commercial effort– Used by many biopharma’s

Page 16: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

16

Low Comprehensiveness of Pathway Sources

0

50

100

150

200

250

300

350

400

450

500

Unified KEGG Ingenuity Wiki

# of Pathways

0

10000

20000

30000

40000

50000

60000

70000

Unified KEGG Ingenuity Wiki

# of Genes Pairs

0

5000

10000

15000

20000

25000

Unified KEGG Ingenuity Wiki

# of Genes

Page 17: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

17

Gene Pair Overlap

Gene Overlap

Wiki vs KEGG Wiki vs Ingenuity KEGG vs Ingenuity

Wiki vs KEGG Wiki vs Ingenuity KEGG vs Ingenuity

Low Consistencyof Pathway Sources

Page 18: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

18

Example: Apoptosis Pathway

Page 19: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

19

Would Unifying Pathway Sources Help?

• Incompatibility Issues!• Data extraction method variations

• Format variations

• Data differences

• Gene/GeneID name differences

• Pathway name differences

Page 20: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

20

The preceding analyses hide an intricate issue…

The same pathways in the different sources are often given different names.

So how do we even know two pathways are the same and should be compared / merged?

Page 21: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Intricacy of Pathway Matching

Page 22: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

22

Possible Ways to Match Pathways

• Match based on name– Pathways w/ similar name should be the same

pathway– But annotations are very noisyÞ Likely to mismatch pathways?Þ Likely to match too many pathways?

• Are the followings good alternative approaches?– Match based on overlap of genes– Match based on overlap of gene pairs

Page 23: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

23

Matching Pathways by Name

• LCS procedure– Given pathway X in

db A– Sort pathways in db

B by “longest common substring” with X

– Manually scan the ranked list to choose closest nomen-clatural match

• Issue: Accuracy– When LCS says two

pathways are the same one, are they really the same?

• Issue: Completeness– When LCS says two

pathways are different, are they really different?

Page 24: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

24

LCS vs Gene-Agreement Matching

• Accuracy– 94% of LCS matches

are in top 3 gene agreement matches

– 6% of LCS matches not in top 3 of gene agreement matches; but their gene-pair agreement levels are higher

• Completeness– Let Pi be a pathway in

db A that LCS cannot find match in db B

– Let Qi be pathway in db B with highest gene agreement to Pi

– Gene-pair agreement of Pi-Qi is much lower than pathway pairs matched by LCS

LCS is better than gene-agreement based matching!

Page 25: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

25

LCS vs Gene-Agreement Matching

• LCS consistently has higher gene-pair agreementÞ LCS is better than gene-agreement based matching!

gene overlap percentage

Gene-pair overlap percentage

LCS match

Gene-agreement match

Page 26: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

26

LCS vs Gene-Pair Agreement Matching

8

2416

LCS

Gene-PairOverlap

The 8 pathway pairs singled out by LCS

The 24 pathway pairs singled out by maximal gene-pair overlap

Note: We consider only pathway pairs that have at least 20 reaction overlap.

Page 27: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

27

LCS vs Gene-Pair Agreement Matching

• Gene-pair agreement match will miss when– Pathway P in db A has few overlap with pathway P in

db B due to incompleteness of db, even if pathway name matches perfectly!

– Example: wnt signaling pathway, VEGF signaling pathway, MAPK signaling pathway, etc. in KEGG don’t have largest gene-pair overlap w/ corresponding pathways in Wikipathways & Ingenuity

Þ Bad for getting a more complete unified pathway P

Page 28: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

28

LCS vs Gene-Pair Agreement Matching

• Pathways having large gene-pair overlap are not necessarily the same pathways

• Examples– “Synaptic Long Term Potentiation” in Ingenuity vs

“calcium signalling” in KEGG – “PPAR-alpha/RXR-alpha Signaling” in Ingenuity vs

“TGF-beta signaling pathway” in KEGG

Þ Difficult to set correct gene-pair overlap threshold to balance against false positive matches

Page 29: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

29

% o

verla

p w

ith L

CS

Top n% matching pathways based on gene / gene pair overlap

Gene vs Gene Pair Agreement Matching

• Pathways w/ higher gene/gene pair overlap have higher overlap w/ LCS

• Gene pair matching is better than gene matching• But both are not as good as LCS

Page 30: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

30

PathwayAPI= KEGG

+ Wikipathways + Ingenuity

• Having found a good way to match up pathways in different datasources, we proceeded to build a big unified pathway db….

Donny Soh, Difeng Dong, Yike Guo, Limsoon Wong. Consistency, Comprehensiveness, and Compatibility of Pathway Databases. BMC Bioinformatics, 11:449, September 2010.

Page 31: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

More Consistent Disease Subnetworks

Page 32: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

32

But these methods still

don’t return the precise parts of a pathway that

are significant…

• ORA – Khatri et al– Genomics, 2002

• FCS– Pavlidis & Noble– PSB 2002

• GSEA– Subramanian et al– PNAS, 2005

• Pathway Express– Draghici et al – Genome Res, 2007

Test whole gene group at a time

Test a node and its immediate neighbours

Page 33: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

33

The SNet Method

• Group samples into type D and D• Extract & score subnetworks for type D

– Get list of genes highly expressed in most D samples• These genes need not be differentially expressed!

– Put these genes into pathways– Locate connected components (ie., candidate

subnetworks) from these pathway graphs– Score subnetworks on D samples and on D samples

• For each subnetwork, compute t-statistics on the two sets of scores

• Determine significant subnetworks by permutations

Page 34: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

34

SNet: Extract Subnetworks

Genes highly expressed in many type-D samples

Page 35: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

35

SNet: Score Subnetworks

Page 36: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

36

SNet: Significant Subnetworks

• Randomize patient samples many times

• Get t-score for subnetworks from the randomizations

• Use these t-scores to establish null distribution

• Filter for significant subnetworks from real samples

Page 37: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

37

Let’s see whether SNet gives us subnetworks that are

(i) more consistent between datasets of the same types of disease samples

(ii) larger and more meaningful

Page 38: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

38

Recall Examples from “Bad News”

• Low % of overlapping genes from diff expt in general

– Prostate cancer• Lapointe et al, 2004• Singh et al, 2002

– Lung cancer• Garber et al, 2001• Bhattacharjee et al,

2001– DMD

• Haslett et al, 2002• Pescatori et al, 2007

Datasets DEG POG

ProstateCancer

Top 10 0.30

Top 50 0.14

Top100 0.15

LungCancer

Top 10 0.00

Top 50 0.20

Top100 0.31

DMDTop 10 0.20

Top 50 0.42

Top100 0.54

Zhang et al, Bioinformatics, 2009

Page 39: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

39

Better Subnetwork Overlap

• For each disease, take significant subnetworks from one dataset and see if it is also significant in the other dataset

Page 40: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

40

Better Gene Overlaps

• For each disease, take significant subnetworks extracted independently from both datasets and see how much their genes overlap

Page 41: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

41

Larger Subnetworks

Page 42: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

42

Genes A, B, C are high in phenotype D

A is high in phenotype ~D but B and C are not

A

B

C

Conventional techniques: Gene B and Gene C are selected. Possible incorrect postulation of mutations in gene B and C

Key Insight # 1

• SNet does not require all the genes in subnet to be diff expressed

• It only requires the subnet as a whole to be diff expressed

• Able to capture entire relationship, postulating a mutation in gene A

Page 43: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

43

A branch within pathway consisting of genes A, B, C, D and E are high in phenotype D

Genes C, D and E not high in phenotype ~D

30 other genes not diff expressed

A

B

C

Conventional techniques: Entire subnetwork is likely to be missed

D

E

30 other genes

Key Insight # 2

• SNet: Able to capture the entire subnetwork branch within the pathway

Page 44: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

44

Genes A, B and C are present in two separate pathways

A, B and C are high in phenotype D, but not high in phenotype ~D

Conventional techniques:

Both pathways are scored equally. So both got selected, resulting in pathway 2 being a false positive

A

B

C

A

B

C

Pathway 1 Pathway 2

Key Insight # 3

• SNet: Able to select only pathway 1, which has the relevant relationship

Page 45: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Remarks

Page 46: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

46

What have we learned?

• Significant lack of concordance betw db’s– Level of consistency for genes is 0% to 88%– Level of consistency for genes pairs is 0%-61%– Most db contains less than half of the pathways in

other db’s

• Matching pathways by name is better than matching by gene overlap or gene-pair overlap

• SNet method yields more consistent and larger disease subnetworks

Page 47: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

47

Acknowledgements

• A*STAR AIP scholarship• A*STAR SERC PSF grant

Difeng DongDonny Soh Yike Guo

Page 48: Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)

Talk at IPM-NUS Workshop on Bioinformatics and Computer Science, 7 April 2011 Copyright 2011 © Limsoon Wong

48

References

• Eng-Juh Yeoh, Mary E. Ross, Sheila A. Shurtleff, W. Kent William, Divyen Patel, Rami Mahfouz, Fred G. Behm, Susana C. Raimondi, Mary V. Reilling, Anami Patel, Cheng Cheng, Dario Campana, Dawn Wilkins, Xiaodong Zhou, Jinyan Li, Huiqing Liu, Chin-Hon Pui, William E. Evans, Clayton Naeve, Limsoon Wong, James R. Downing. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell, 1:133--143, March 2002.

• Donny Soh, Difeng Dong, Yike Guo, Limsoon Wong. Enabling More Sophisticated Gene Expression Analysis for Understanding Diseases and Optimizing Treatments. ACM SIGKDD Explorations, 9(1):3--14, June 2007.

• Donny Soh, Difeng Dong, Yike Guo, Limsoon Wong. Consistency, Comprehensiveness, and Compatibility of Pathway Databases. BMC Bioinformatics, 11:449, September 2010.

• Donny Soh, Understanding Pathways, PhD thesis, December 2010, Imperial College London