protein mutations and pathways in cancer toward modular & combinatorial therapy less war !!more...
TRANSCRIPT
Protein Mutations and Pathways in CancerToward Modular & Combinatorial Therapy
Less war !! more science !!
Chris SanderComputational & Systems Biology
Memorial Sloan-Kettering Cancer Center, New York
International Conference on BioinformaticsAsia-Pacific Bioinformatics Network
CancerCancer
Simplicity of phenotypeSimplicity of phenotype
Diversity of implementationDiversity of implementation
Modular therapy!Modular therapy!
Combinatorial therapy!Combinatorial therapy!
Cancer GenomicsCancer Genomics
Functional consequences of somatic Functional consequences of somatic mutationsmutations
Molecular alterations in pathway contextMolecular alterations in pathway context
Toward Combinatorial TherapyToward Combinatorial Therapy
Combinatorial Perturbations & Network Combinatorial Perturbations & Network Models Models
Information InfrastructureInformation Infrastructure
Pathway Commons & Author Fact Pathway Commons & Author Fact DepositionDeposition
Function of Protein Mutations
Boris RevaJenya AntipinAlyosha Stupalov
Cancer Genomics
Nikolaus SchultzBarry Taylor, Ethan Cerami
Nick Socci, John MajorSam Singer
Marc Ladanyi, Cameron BrennanMatt Meyerson, Jordi Barretina
& TCGA community Niki Schultz
PathwayCommons.org Emek Demir
Gary Bader, Toronto Ethan Cerami
Ben GrossRobert Hoffman
Ken FukudabioPAX community
The Cancer Genome Atlas (TCGA)The Cancer Genome Atlas (TCGA)
DNA copy number
Gene expression - mRNA (exon-level) - miRNA
DNA sequencingcurrently 1300 genes
soon 6000 or all genes
DNA methylation
Genomic rearrangements
Proteomics
Sample processingClinical annotation
Data storage and distribution
Integrative analysis
Next: lung squamous, kidney, breast, colon
DNA copy number DNA methylation mRNA expression miRNA expression mutationsClinical data
DNA copy number alterations in GBM and ovarian cancerDNA copy number alterations in GBM and ovarian cancerO
VAG
BM
More than half of the genes copy-number altered in ovarian cancer correlated with expression
Cancer GenomicsCancer Genomics
Functional Consequences of Protein Functional Consequences of Protein MutationsMutations
Q9SD07 RLHIGGLQ44861 RLLIGRVQ61EI4 RLLIGRVQ7PY36 RLFIGKIQ4I4E0 RLWMDQIQ58SJ9 RLLIGRVO01811 RLFIGKIP55159 RLLIGRV
Q9SD07 RLHIGGLQ44861 RLLIGRVQ61EI4 RLLIGRVQ7PY36 RLFIGKIQ4I4E0 RLWMDQIQ58SJ9 RLLIGRVO01811 RLFIGKIP55159 RLLIGRV
Public Databases
protein stabilityprotein stability
psc pcr ppspppi
correlation betweeninteracting residuescorrelation betweeninteracting residues
specificity &conservation
specificity &conservation
Probability(Disruptive/Non-disruptive) = f ( Psc, Pcr, Pps, Pppi )Probability(Disruptive/Non-disruptive) = f ( Psc, Pcr, Pps, Pppi )Output
protein family
Input mutationin coding region
protein-protein interactions
protein-protein interactions
SuperfamilyPDBPFAMSCOPNCBIENSEMBLReactome
allele 1 … GCC ATC CCG … ALA ILE/MET PROallele 2 … GCC AAC CCG …
3D structure 3D complex pathway
Somatic mutations in cancer:What are the functional consequences ?
Variant Annotation Top Spec/Cons Probability (%)G719S (in lung cancer; somatic mutation) Yes 99G724S (in lung cancer) Yes 100E734K (in lung cancer) 74L747F (in lung cancer) 98R748P (in lung cancer) 99Q787R (in lung cancer) Yes 73T790M (in lung cancer) Yes 98L833V (in lung cancer) Yes 96V834L (in lung cancer) 98L858R (in lung cancer; somatic mutation) 100L861Q (in lung cancer) 99G873E (in lung cancer) Yes 78R962G (in dbSNP:17337451) 100D761Y (in lung cancer, MSKCC) 96
Assessing the functional consequences of mutations
EGFR_human
G E K Q E S S S S Y E P K E E F A Q C V L LG E S L E E A S V N G P F Q Y F Y T V E C LG E S S E V A A Q N V P M L W F Y Q R H V MG E Q V E S S E S Q E P H E E F Y Q I R T LW E S K E E N A V N V P H Q K F F T V L T MK E T N E V P W F K K P M R E F Y S AW G LE E Q S E S A E S Q Q P E E P F Y Q I L E LG E K N E V E A F K L P F R E F Y S V Q R VH E R V E S A A S N V P M E T F Y Q I A E LW E E K E E F A V Y I P L Q P F L T F G R LR E C H E V K A Q Y V P M L E F Y Q V K P WG E T N E E E A F N V P R R V F F S V S N LG E S P E E N F V N V P H Q Y F Y T V E P MT E N P E V E L F K V P F R V F F S L S H YS G W K E E L A V N Q P V Q E F E T F E I EG E A S E V E H Q N V P H L K F Y Q E G P PR E A Q E S Q A S N V P M E T F Y Q V R T L
S G W K E E L A V N Q P V Q E F E T F E I EW E E K E E F A V Y I P L Q P F L T F G R LG E S P E E N F V N V P H Q Y F Y T V E P MG E S L E E A S V N G P F Q Y F Y T V E C LW E S K E E N A V N V P H Q K F F T V L T MT E N P E E E L F K V P F R V F F S L S H YK E T N E E P W F K K P M R E F Y S AW G LG E T N E E E A F N V P R R V F F S V S N LG E K N E E E A F K L P F R E F Y S V Q R VE E Q S E S A E S Q Q P E E P F Y Q I L E LG E Q V E S S E S Q E P H E E F Y Q I R T LG E K Q E S S S S Y E P K E E F A Q C V L LR E A Q E S Q A S N V P M E T F Y Q V R T LH E R V E S A A S N V P M E T F Y Q I A E LR E C H E V K A Q Y V P M L E F Y Q V K P WG E S S E V A A Q N V P M L W F Y Q R H V MG E A S E V E H Q N V P H L K F Y Q E G P P
Defining subfamilies and specificity residues
Input Output
Sub-F
amilies
1
2
3
4
Specificity Residues
Clustering
Conserved Residues
Minimize contrast function = difference between entropies of ordered and disordered clusters of sequences of the same size
S
S’
S-S’=0 S-S’=-9S-S’=-3.5 S-S’=-7.5
ordered
disordered
Q: How one can achieve the most distinctive=informative separation of sequences into clusters?
Goal: S-S’->min
∑ ∏=
=k ki
ki N
NS
20,...,1,, !
!ln
αα
∑∏=
=k
ki
ki
N
NS
20,...,1
,,
~
~
!
!ln
α
α
ikki PNN ,,,
~
αα =
∑ ∑=k k
kkii NNP /,,, αα
)(~
0 ∑ −=Δi
ii SSS
Optimization problem: form clusters (subfamilies) of sequences, so as to minimize the combinatorial entropy difference .
For each column i of the alignment one computes the combinatorial entropy
and the reference entropy :
iS
is the number of sequences in cluster (subfamily) k;
is the number of residues of type α in the column i of the cluster k.
kN
kiN ,,α
The entropy difference , summed up over all columns i, is a measure of the deviation of a given sequence clustering from random. This difference is minimal when each cluster has its distinct type of residues.
ii SS~
−
combinatorial entropy measure of specificity patterns
iS~
Specificity residues - high contrastGlobally conserved residues - low contrast
-400
-350
-300
-250
-200
-150
-100
-50
0
0 30 60 90 120 150 180 210 240 270
Specificity region
Conserved region
Rank of residue position
Contrast entropy difference Family of 390 protein kinases
G E K Q E S S S S Y E P K E E F A Q C V L LG E S L E E A S V N G P F Q Y F Y T V E C LG E S S E V A A Q N V P M L W F Y Q R H V MG E Q V E S S E S Q E P H E E F Y Q I R T LW E S K E E N A V N V P H Q K F F T V L T MK E T N E V P W F K K P M R E F Y S AW G LE E Q S E S A E S Q Q P E E P F Y Q I L E LG E K N E V E A F K L P F R E F Y S V Q R VH E R V E S A A S N V P M E T F Y Q I A E LW E E K E E F A V Y I P L Q P F L T F G R LR E C H E V K A Q Y V P M L E F Y Q V K P WG E T N E E E A F N V P R R V F F S V S N LG E S P E E N F V N V P H Q Y F Y T V E P MT E N P E V E L F K V P F R V F F S L S H YS G W K E E L A V N Q P V Q E F E T F E I EG E A S E V E H Q N V P H L K F Y Q E G P PR E A Q E S Q A S N V P M E T F Y Q V R T L
S G W K E E L A V N Q P V Q E F E T F E I EW E E K E E F A V Y I P L Q P F L T F G R LG E S P E E N F V N V P H Q Y F Y T V E P MG E S L E E A S V N G P F Q Y F Y T V E C LW E S K E E N A V N V P H Q K F F T V L T MT E N P E E E L F K V P F R V F F S L S H YK E T N E E P W F K K P M R E F Y S AW G LG E T N E E E A F N V P R R V F F S V S N LG E K N E E E A F K L P F R E F Y S V Q R VE E Q S E S A E S Q Q P E E P F Y Q I L E LG E Q V E S S E S Q E P H E E F Y Q I R T LG E K Q E S S S S Y E P K E E F A Q C V L LR E A Q E S Q A S N V P M E T F Y Q V R T LH E R V E S A A S N V P M E T F Y Q I A E LR E C H E V K A Q Y V P M L E F Y Q V K P WG E S S E V A A Q N V P M L W F Y Q R H V MG E A S E V E H Q N V P H L K F Y Q E G P P
Defining subfamilies and specificity residues
Input Output
Sub-F
amilies
1
2
3
4
Specificity Residues
Clustering
Conserved Residues
Variant Annotation Top Spec/Cons Probability (%)G719S (in lung cancer; somatic mutation) Yes 99G724S (in lung cancer) Yes 100E734K (in lung cancer) 74L747F (in lung cancer) 98R748P (in lung cancer) 99Q787R (in lung cancer) Yes 73T790M (in lung cancer) Yes 98L833V (in lung cancer) Yes 96V834L (in lung cancer) 98L858R (in lung cancer; somatic mutation) 100L861Q (in lung cancer) 99G873E (in lung cancer) Yes 78R962G (in dbSNP:17337451) 100D761Y (in lung cancer, MSKCC) 96
Assessing the functional consequences of mutations
EGFR_human
Functional implications of cancer mutationsat the protein level
ERBB2 mutations
L49H no alignment data available
C311R strong functional impact, conserved residue
N319D likely functional impact, conserved and specificity residue
E321G likely functional impact, specificity residue
D326G likely functional impact, specificity residue - binding site?
C334S strong functional impact, conserved residue in S-S bridge
V750Estrong functional impact, strongly conserved residue
V777Aunlikely functional
NF1 mutations
V1308E strong functional impact, buried residue
R1412S strong functional impact
D1849N no alignment data available
A2336T likely functional impact, specificity residue
D326G in ERBB2- Tyrosine kinase-type cell surface receptor HER2
Examples of mutations predicted as functional by OMA
likely functional impact
specificity residue with conserved neighborsmay be a part of binding site
D->G
C334S in ERBB2 - Tyrosine kinase-type cell surface receptor HER2
Examples of mutations predicted as functional by OMA
strong functional impact
conserved residue
mutation eliminates SS bridge C334-C338
C334C338
Cancer GenomicsCancer Genomics
Molecular alterations in pathway contextMolecular alterations in pathway context
Glioblastoma copy number alterationsWhich events are functional, which are passengers ?
RAE: Barry Taylor, Nick Socci, Chris Sander PLoS ONE 2008
RAErecurrenceamplitudeextent
Mapping molecular alterations in 200 glioblastoma samples
onto biological pathways
Goal: determine oncogenic programs
www.cbio.mskcc.org/cancergenomics
EGFR ERBB2
PI-3KClass I
PI-3KClass I
PDGFRA MET
mutation, amplificationin 46%
mutationin 7%
amplificationIn 14%
amplificationin 3%
RASRASNF-1NF-1
AKTAKT
FOXOFOXO
PTENPTEN
Proliferation
Activated oncogenes
MDM4MDM4
TP53TP53
MDM2MDM2
CDKN2A(ARF)
CDKN2A(ARF)
RB1RB1
RTK/RAS/PI-3Ksignaling
altered in
85%
RTK/RAS/PI-3Ksignaling
altered in
85%
P53signaling
altered in
86%
P53signaling
altered in
86%
Senescence Apoptosis
CDK4CDK4
CDKN2A(INK4A)
CDKN2A(INK4A) CDKN2BCDKN2B CDKN2CCDKN2C
G1/S progression
homozygousdeletion in 51%
RB signaling
altered in
77%
RB signaling
altered in
77%
homozygousdeletion in 48%
homozygousdeletion in 2%
homozygous deletion in 49%
amplification in 13%
amplification in 5%
mutation,deletion in 35%
amplification in 17%
deletion,mutation in 11%
mutation in 2%
amplification in 2%
mutation in 2% mutation,amplification in 24%
mutation,deletion in 17%
mutation,deletion in 33%
Cancerprogram bysub-networks
The CancerGenome AtlasPilot Project(2006-2008)
~200 cases ofglioblastoma m.brain tumors
www.cbio.mskcc.org/cancergenomics
Key: capture biological knowledge in computable form
Facilitate creation and communication of pathway dataAggregate pathway data in the public domainProvide easy access for pathway analysis
http://www.pathwaycommons.org
Community Process !
bioPAX
Network pharmacologyNetwork pharmacology
Toward Combinatorial TherapyToward Combinatorial Therapy
Simple Models from Complex DataSimple Models from Complex Data
CoPIA – Nelander et al. - 2008CoPIA – Nelander et al. - 2008
Perturbation Cell Biology – CoPIA
Sven Nelander
Peter Gennemark & Wei Qing Wang
Bjoern Nilsson, Christine Pratilas, QingBai She
Neal Rosen
Sven Nelander
http://cbio.mskcc.org/copia/
Nelander, Sander et al., Molecular Systems Biology, 2008
Experiment: Dual drug perturbation of MCF7 cancer cell line@ MSKCC
Wei Qing Wang, Sven Nelander & Rosen Lab 2007-2008
Mathematical Model System Simulation by Bounded ODEs
like Hopfield Network
€
dx idt
= ( W ij x jj
∑ ) −α ix i + Pi
€
dx idt
= β i f ( W ij x jj
∑ + Pi) −α ix i
Mean Field Model for Combinatorial Perturbation
linear
non-linear
€
dx idt
= β i f ( W ij x jj
∑ + Pi) −α ix i
transfer function
f(…)
A simple but effective non-linear deviceto capture cooperative effects(epistatis, synergy, antagonism)
Optimize the network model
Minimize the discrepancy between prediction and experiment
while keeping the model simple !
Sum of squares pred-expt error
Structural complexity€
E = ESSQ + ESTRUCT
€
dx idt
= β i f ( W ij x jj
∑ + Pi) −α ix i
Dual drug perturbation in MCF7 cancer
cell line@ MSKCC
Does the model work ?Leave out one drug combo at a time, compute best model, predict & compare with experiment
€
dx idt
= β i f ( W ij x jj
∑ + Pi) −α ix i
Power of CoPIA network modelsCoPIA = Combinatorial Perturbation Analysis
Capture …
multiple perturbation
epistasis (synergy/antagonism)
feedback loops
time-dependent processes
modification of prior knowledge
CoPIA Network Models - Applications
design combination
therapy
refine pathway models
identify drug
targets
predict outcomes
Cancer GenomicsCancer Genomics
Functional consequences of somatic Functional consequences of somatic mutationsmutations
Molecular alterations in pathway contextMolecular alterations in pathway context
Toward Combinatorial TherapyToward Combinatorial Therapy
Combinatorial Perturbations & Network Combinatorial Perturbations & Network Models Models
Information InfrastructureInformation Infrastructure
Pathway Commons & Author Fact Pathway Commons & Author Fact DepositionDeposition
Integrate Pathway Information
Facilitate creation and communication of pathway dataAggregate pathway data in the public domainProvide easy access for pathway analysis
http://www.pathwaycommons.org
Community Process !
bioPAX
http://iHOP-net.orgGenes & compounds & interactions from millions of abstracts - instantly
Robert Hoffmann, Benjamin Gross, Chris Sander iHop-net.org version 2 released 6 Dec 2006
Factoidsdigital abstracts to databases
As authors submit a paper they deposit structured facts
to a public database
How to get rich biological knowledge into a computable form
Postdocs wanted
Sander Group – Computational & Systems Biology @ MSKCC in NYCUpper East Side Tri-I Campus: Sloan Kettering, Cornell Weill, Rockefeller
Cancer genomics (dry)
Network pharmacology (wet)
We pause for station identification…
Toward Combinatorial TherapyToward Combinatorial TherapyUse multiple perturbation experiments to Use multiple perturbation experiments to
build predictive network modelsbuild predictive network models
Cancer GenomicsCancer Genomics
The active sub-pathway model of cancer The active sub-pathway model of cancer biologybiology
Pathway CommonsPathway CommonsOne-stop-shop access to pathway informationOne-stop-shop access to pathway information
using the bioPAX common languageusing the bioPAX common language
SummarySummary
Cytoscape, bioPAX & Pathway Commons
Emek DemirEthan Cerami
Ben GrossRobert Hoffman
Ken FukudabioPAX community
Gary Bader
Perturbational cell biologySven Nelander
Wei Qing Wang Peter Gennemark
Neal Rosen, Christine Pratilas
Small RNAsDoron Betel
Rob SheridanChristina Leslie, Debora Marks
Tom Tuschl, Eric Kandel
Protein Families & Combinatorial Entropy
Boris Reva, Jenya Antipin
Cancer GenomicsNikolaus Schultz
Barry TaylorBoris Reva,
J Antipin, A StukalovJohn Major
Nick Socci, Sam SingerMarc Ladanyi
Matt Meyerson, Jordi Barretina
tools >
TGFα • 6000 gene RNAi screenNikolaus SchultzDina Marenstein
Joan Massague, Hakim Djaballah
Support: Bioinformatics Core in the Computational Biology Center at MSKCC
Optimization algorithm 1outer loop - explore alternative network structures
€
dx idt
= β i f ( W ij x jj
∑ + Pi) −α ix imodel
occasoinally climb uphill in Monte Carlo fashion
error