pathway ranking tool dimitri kosturos linda tsai socalbsi, 8/21/2003
Post on 19-Dec-2015
218 views
TRANSCRIPT
Pathway Ranking Tool
Dimitri Kosturos Linda Tsai
SoCalBSI, 8/21/2003
Project Overview
BioDiscovery, Inc. at Marina del Rey Analyzing microarray data on pathway level
instead of individual gene level Methods:
-Enrichment Analysis-Permutational Statistics
-S. Metric-Multivariate test
Project Overview
Validation of statistical methods 2 data sets: Brain Tumor, Interferon-gamma. Sources of annotation: BioCarta, Kegg, Gene
Ontology.
Project Overview, cont.
phenotype
microarray
algorithm
pathway
Dimitri,
(Computer Scientist)
Linda(biologist)
Project Flowchart
GeneSight is a data analysis software Feature:
-Statistical significance testing
-Multiple Data Visualizations
-Automated gene annotation
-Complete result reports
-Pathway analysis (?)
Research and Development in GeneSight
Glioblastoma multiforme(GBM) is the most malignant of the glial tumors, classified as grade IV.
Many brain tumors are currently incurable. Average survival time: 1 year
Biology of Brain Tumor
Oncogenes: promote normal cell growth
Tumor suppressor genes: retard cell growth
http://www.med.harvard.edu/publications/On_The_Brain/Volume4/Number2/SP95Awry.html
Bad Genes Foment Trouble
Interferon is a class of cytokines that mediate antiviral, antiproliferative, antitumor activites, etc.
IFN gamma is produced by T lymphocytes in response to mitogens or to antigens.
IFNs bind to their receptors and initiate JAK-STAT signaling cascade.
Biology of Interferon
http://www.grt.kyushu-u.ac.jp/eny-doc/pathway/ifn_gamma.html
Biology of Interferon, cont.
Grouping related genes together into pathways(A) BioCarta
Ex: p53 Signaling Pathway(B) KEGG
Ex:Citrate cycle (TCA cycle) Grouping genes into structured, controlled
vocabularies (ontologies) Gene Ontology-Biological Process. Ex: angiogenesis, apoptosis-Molecular Function. Ex: DNA binding activity-Cellular Component. Ex: nucleus, mitochondria
Gene Annotations
Traditional method of ranking gene pathways
Steps:
1. Mann-Whitney Test: obtain list of probe sets that satisfy a certain p-value.
2. Cluster analysis: see how many of listed probe occur in a cluster (pathway).
Example:
1. Original data: 12,625 genes. Select genes p-value <0.001.
=>narrow to 927 genes.
2. Cluster those 927 genes into clusters.
4 of the genes in SODD/TNFR1 Signaling Pathway satisfy p-value<0.001
Annotations \ Lists DG-Less_than_0.001BioCarta Pathway SODD/TNFR1 Signaling Pathway p=0.012: CASP8,FADD,LTA,TNF (4 of 9)BioCarta Pathway D4-GDI Signaling Pathway p=0.017: CASP1,CASP10,CASP8,JUN (4 of 10)BioCarta Pathway TNFR1 Signaling Pathway p=0.021: CASP8,FADD,JUN,LMNB1,LTA,MADD,TNF (7 of 28)BioCarta Pathway Cadmium induces DNA synthesis and proliferation in macrophages p=0.021: JUN,LTA,MAPK3,PRKCB1,TNF (5 of 16)BioCarta Pathway Visceral Fat Deposits and the Metabolic Syndrome p=0.022: LPL,LTA,TNF (3 of 6)BioCarta Pathway Fibrinolysis Pathway p=0.032: F13A1,F2R,SERPINE1 (3 of 7)BioCarta Pathway EPO Signaling Pathway p=0.033: EPO,EPOR,GRB2,JUN,MAPK3 (5 of 18)
Mann-Whitney Test, Denovo Glioblastoma p<0.001
How Affy. Microarray Chips Work
http://www.ucl.ac.uk/oncology/MicroCore/HTML_resource/Norm_Affy1.htm
Best results: Genes hybridize perfectly with Perfect Match, and not at all with Mismatch.
PM: Perfect MatchMM: Mismatch
Example of GeneSight PlotData
Normal Normal Tumor Tumor
Probe Set A 4.5 3.8 10.2 11.1
Probe Set B 2.3 2.7 13.5 13.6
Probe Set C 7.8 8.2 1.4 1.8
Probe Set A 3.5 4.2 8.9 9.6
Theoretical Tumor Expression Levels (Log Transformed)
Conditions
GenesNotice column replicates, Probe Set replicates.
Given Data Sets
Given two data sets: Brain Tumor, IFN-γ Brain Tumor Data Set has 5+ tumor
types,however, only 2 Tumor types were used (Denovo Glioblastoma, Progressive Glioblastoma)
IFN-γ Data Set: the entire data set was used.
What and why?
Goal: write a prototype extension to GeneSight that uses permutational statistics to develop a custom distribution for a given Microarray data set.
Overall significance: the software provides a list of (potentially) significant pathways that enables researchers to focus their work.
What is permutational statistics?
E E C C1 2 3 4
Choose different Control and Experiment groupings (permute).
E C E C1 2 3 4
By iterating through an adequate number of permutations, we can determine if a pathway is likely to be significant (p-value).
(In this context.)
Permutational Stats.
There are two versions of the S. Metric currently implemented.
S. Metric I =
S. Metric II =
M = Number of Genes flagged as significant
Total = Total number of Genes in the Pathway
(Layman's) How Statistics Works
Data Statistic P-Value
Permute Here
S. Metric I, II
After all permutations are done, calculate the p-Value
Algorithm
Take at least 10,000 unique permutations. A unique permutation is determined by a Permute class.For each condition For each permutation For each gene Calc. Mean diff. Calc. T-stat End For For each pathway store the statistic End for End for calcPvalue(stored statistic)End For
S. Metric
Initial Significance Flagging
pValue
Limitations
Computational Power (Memory, CPU) Required number of replicates (8,8)
Output of result
Validation of pathway analysisMethod 1
Computer algorithm classified as significant pathways
Computer algorithm classified as insignificant pathways
Linda's Selection of significant pathways True Positive False Negative
Linda's Selection of insignificant pathways False Positive True Negative
Problem: lack of insignificant pathways
????
Validation of pathway analysisMethod 2
Best algorithm Random
Worst
Comparision of Prediction Methods
0
2
4
6
8
10
12
14
16
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96
# of Pathways in BioCarta sorted by P-value
# o
f id
enti
fied
sig
nif
ican
t p
ath
way
s
ResultBrain Tumor-BioCarta
D. glioblastoma
0
510
15
20
2530
35
40
1 26 51 76 101 126 151 176 201 226 251
# of pathways in BioCarta sorted by P-value
# of
sig
. ide
ntifi
ed
path
way
s
SMI
SMII
DG0.001
DG0.01
ResultIFNG-Molecular Function (GO)
IFNG-Molecular Function
01020304050607080
1
125
249
373
497
621
745
869
993
1117
1241
1365
1489
1613
1737
Number of terms in MF(GO) sorted by p-value
Num
ber
of s
ig.
Iden
tifie
d te
rms SMI
SMII
enrich0.01
enrich0.001
Biological Limitations
Prediction of pathways to be significant in the conditions of interest is subjective.
Assumption of similar biological states between Denovo Glioblastoma and Progressive Glioblastoma.
Future Direction
Finish modifying the Multivariate Statistic for use in the permutational method. This method uses PCA and Multivariate statistics.
Finish Validating the data produced using the Multivariate Statistic.
Initial Results of Multivariate Stat.
IFNG-Biological Process(GO)
0
10
20
30
40
50
1 98 195
292
389
486
583
680
777
874
971
1068
1165
Number of terms in BP(GO)
Nu
mb
er o
f te
rms
iden
tifi
ed
SMI
SMII
enrich0.01
enrich0.001
M. Perm
Sorted by p-value.
Conclusion
It is not clear which is better the S. metric or traditional Enrichment Analysis.
Improvements can be made to the S. metric.
Acknowledgements
Dr. Bruce Hoff Dr. Anton Petrov SoCalBSI: Dr. Jamil Momand,
Dr. Sandra Sharp, Dr. Nancy Warter-Perez, Dr. Wendie Johnston
National Science Foundation National Institute of Heath