machine reading for cancer panomics - akbc · 2018. 11. 4. · genomics 12 discovery …...

62
Machine Reading for Cancer Panomics Hoifung Poon 1

Upload: others

Post on 04-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Machine Reading for

Cancer Panomics

Hoifung Poon

1

Page 2: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Overview

2

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

……

……

Disease Genes

Drug Targets

……KB

Cancer Systems Modeling

High-Throughput Data

Page 3: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

3

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

……

……

Disease Genes

Drug Targets

…KB

Extract Pathways

from PubMed

Overview

High-Throughput Data

Grounded

Semantic Parsing

Page 4: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Precision Medicine

Page 5: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

5

Before Treatment 15 Weeks

Vemurafenib on BRAF-V600 Melanoma

Page 6: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Vemurafenib on BRAF-V600 Melanoma

6

Before Treatment 15 Weeks 23 Weeks

Page 7: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

7

Page 8: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Traditional Biology

8

Targeted Experiments Discovery

One

hypothesis

Page 9: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Genomics

9

High-Throughput ExperimentsDiscovery

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

Many

hypotheses

?

Page 10: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC … Healthy

Disease(e.g., Alzheimer, Cancer)

Genome-Wide Association Studies (GWAS)

2000

2010

“Genetic diagnosis of diseases would be

accomplished in 10 years and that

treatments would start to roll out perhaps

five years after that.”

“A Decade Later, Genetic Maps Yield Few New Cures”

New York Times, June 2010.

10

Page 11: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Key Challenges

Human genome: 3 billion base pairs

Potential variations: > 10 million variants

Combination: > 101000000 (1 million zeros)

Machine learning problem

Atomic features: > 10 million

Feature combination: Too many to enumerate

11

Page 12: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Genomics

12

Discovery

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

How to Scale Discovery?

High-Throughput Experiments

Page 13: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Cancer

Hundreds of mutations

Most are “passenger”, not driver

Can we identify likely drivers?

13

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC … Normal cells

Tumor cells

Page 14: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Panomics

14

… ATTCGGATATTTAAGGC …

Genome Transcriptome Epigenome

……

Page 15: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Pathway Knowledge

Genes work synergistically in pathways

15

Page 16: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Why Hard to Identify Drivers?

Complex diseases Perturb multiple pathways

16Hanahan & Weinberg [Cell 2011]

Page 17: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Why Cancer Comes Back?

Subtypes with alternative pathway profile

Compensatory pathways can be activated

17

EphA2 EphB2

Ovarian Cancer

Page 18: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Why Cancer Comes Back?

Subtypes with alternative pathway profile

Compensatory pathways can be activated

18

EphA2 EphB2

Ovarian Cancer

X

Page 19: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Cancer Systems Modeling

19

Gene A DNA mRNA Protein Protein Active

Transcription Translation Activation

… ATTCGGATATTTAAGGC …

Functional activity

Mutation effect

Drug Target

……

Page 20: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

20

Gene A DNA mRNA Protein Protein Active

Gene B DNA mRNA Protein Protein Active

Gene C DNA mRNA Protein Protein Active

Transcription Factor

Protein Kinase

Knowledge Model

Page 21: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

21

Gene A DNA mRNA Protein Protein Active

Gene B DNA mRNA Protein Protein Active

Gene C DNA mRNA Protein Protein Active

Transcription Factor

Protein Kinase

?Knowledge Model

Page 22: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

22

Gene A DNA mRNA Protein Protein Active

Gene B DNA mRNA Protein Protein Active

Gene C DNA mRNA Protein Protein Active

Transcription Factor

Protein Kinase

?Knowledge Model

Page 23: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

23

Gene A DNA mRNA Protein Protein Active

Gene B DNA mRNA Protein Protein Active

Gene C DNA mRNA Protein Protein Active

Transcription Factor

Protein Kinase

!Knowledge Model

Page 24: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Approach: Graph HMM

24

Gene A DNA mRNA Protein Protein Active

Transcription Factor

Protein Kinase

Gene B DNA mRNA Protein Protein Active

Gene C DNA mRNA Protein Protein Active

Page 25: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Extract Pathways from PubMed

25

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

……

……

Disease Genes

Drug Targets

……KBHigh-Throughput Data

Page 26: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

PubMed

24 millions abstracts

Two new abstracts every minute

Adds over one million every year

26

Page 27: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

VDR+ binds to

SMAD3 to form

JUN expression

is induced by

SMAD3/4

PMID: 123

PMID: 456

……

27

Machine Reading

Page 28: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

28

Involvement of p70(S6)-kinase activation in IL-10

up-regulation in human monocytes by gp41 envelope

protein of human immunodeficiency virus type 1 ...

Machine Reading

Page 29: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

29

Involvement of p70(S6)-kinase activation in IL-10

up-regulation in human monocytes by gp41 envelope

protein of human immunodeficiency virus type 1 ...

IL-10human

monocytegp41 p70(S6)-kinase

Machine Reading

PROTEINPROTEINPROTEINCELL

Page 30: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

30

Involvement of p70(S6)-kinase activation in IL-10

up-regulation in human monocytes by gp41 envelope

protein of human immunodeficiency virus type 1 ...

Involvement

up-regulation

IL-10human

monocyte

SiteTheme Cause

gp41 p70(S6)-kinase

activation

Theme Cause

Theme

Machine Reading

REGULATION

REGULATION REGULATION

PROTEINPROTEINPROTEINCELL

Page 31: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

31

Involvement of p70(S6)-kinase activation in IL-10

up-regulation in human monocytes by gp41 envelope

protein of human immunodeficiency virus type 1 ...

Involvement

up-regulation

IL-10human

monocyte

SiteTheme Cause

gp41 p70(S6)-kinase

activation

Theme Cause

Theme

Machine Reading

REGULATION

REGULATION REGULATION

PROTEINPROTEINPROTEINCELL

Semantic Parsing

Page 32: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Long Tail of Variations

32

TP53 inhibits BCL2.

Tumor suppressor P53 down-regulates the activity of BCL-2 proteins.

BCL2 transcription is suppressed by P53 expression.

The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 …

……

Page 33: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Bottleneck: Annotated Examples

GENIA (BioNLP Shared Task 2009-2013)

1999 abstracts

MeSH: human, blood cell, transcription factor

Challenge for “supervised” machine learning

Can we breach this bottleneck?

33

Page 34: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Free Lunch #1:

Distributional Similarity

Similar context Probably similar meaning

Annotation as latent variables

Textual expression Recursive clusters

Unsupervised semantic parsing

34

Poon & Domingos, “Unsupervised Semantic Parsing”.

EMNLP 2009. Best Paper Award.

Page 35: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Recursive Clustering

35

TP53 inhibits BCL2.

Tumor suppressor P53 down-regulates the activity of BCL-2 proteins.

BCL2 transcription is suppressed by P53 expression.

The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 …

……

Page 36: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Recursive Clustering

36

TP53 inhibits BCL2.

Tumor suppressor P53 down-regulates the activity of BCL-2 proteins.

BCL2 transcription is suppressed by P53 expression.

The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 …

……

Page 37: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Recursive Clustering

37

TP53 inhibits BCL2.

Tumor suppressor P53 down-regulates the activity of BCL-2 proteins.

BCL2 transcription is suppressed by P53 expression.

The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 …

……

Page 38: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Recursive Clustering

38

TP53 inhibits BCL2.

Tumor suppressor P53 down-regulates the activity of BCL-2 proteins.

BCL2 transcription is suppressed by P53 expression.

The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 …

……

BCL2, BCL-2 proteins,

B-cell CLL/Lymphoma 2

……

TP53,Tumor

suppressor P53

……

inhibits, down-regulates,

suppresses, inhibition, …

Theme Cause

Page 39: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Free Lunch #2:

Existing KBs

Many KBs available

Gene/Protein: GeneBank, UniProt, …

Pathways: NCI, Reactome, KEGG, BioCarta, …

Annotation as latent variables

Textual expression Table, column, join, …

Grounded semantic parsing

39

Page 40: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Entity Extraction

40

ID Symbol Alias

990 BCL2 B-cell CLL/Lymphoma 2, …

11998 TP53 Tumor suppressor P53, …

… … …

HGNC

Page 41: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Entity Extraction

41

ID Symbol Alias

990 BCL2 B-cell CLL/Lymphoma 2, …

11998 TP53 Tumor suppressor P53, …

… … …

HGNC

TP53 inhibits BCL2.

Tumor suppressor P53 down-regulates the activity of BCL-2 proteins.

BCL2 transcription is suppressed by P53 expression.

The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 …

……

Page 42: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Relation Extraction

42

Regulation Theme Cause

Positive A2M FOXO1

Positive ABCB1 TP53

Negative BCL2 TP53

… … …

NCI-PID

Pathway KB

TP53 inhibits BCL2.

Tumor suppressor P53 down-regulates the activity of BCL-2 proteins.

BCL2 transcription is suppressed by P53 expression.

The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 …

……

Page 43: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Relation Extraction

43

Regulation Theme Cause

Positive A2M FOXO1

Positive ABCB1 TP53

Negative BCL2 TP53

… … …

NCI-PID

Pathway KB

TP53 inhibits BCL2.

Tumor suppressor P53 down-regulates the activity of BCL-2 proteins.

BCL2 transcription is suppressed by P53 expression.

The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 …

……

Grounded Learning

Page 44: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Question Answering w.r.t. KB

44

Poon, “Grounded Unsupervised Semantic Parsing”. ACL 2013.

System Accuracy

ZC07 84.6

FUBL 82.8

GUSP 83.5

Supervised

Unsupervised

Page 45: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Pathway Extraction

Generalize distant supervision:

Nested events in KB likely occur in

semantic parse of some sentence

Prior: Favor semantic parse grounded in KB

Outperformed the majority of participants in

original GENIA Event Shared Task

45

Parikh, Poon, Toutanova. In Progress.

Page 46: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

http://literome.azurewebsites.net

46

Literome

Poon et al., “Literome: PubMed-Scale Genomic Knowledge

Base in the Cloud”, Bioinformatics 2014.

Page 47: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

PubMed-Scale Extraction

Preliminary pass:

2 million instances

13,000 genes, 870,000 unique regulations

Applications:

UCSC Genome Browser, MSR Interactions Track

Expression profile modeling

Validate de novo pathway prediction

Etc.

47

Poon, Toutanova, Quirk, “Distant Supervision for Cancer

Pathway Extraction from Text”. PSB 2015. To appear.

Page 48: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Machine Science

48

Evans & Rzhetsky, “Machine Science”.

Science, Vol. 329, 2010.

Page 49: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Machine Science

49

Big Data

Page 50: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Machine Science

50

Big Data Rich Knowledge

KB

Page 51: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Machine Science

51

Deep Model

Big Data Rich Knowledge

KB

Page 52: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Machine Science

52

Deep Model

Big Data Rich Knowledge

Hypotheses

KB

Page 53: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Machine Science

53

Deep Model

Big Data Rich Knowledge

Hypotheses

Experiments

KB

Page 54: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Machine Science

54

Deep Model

Big Data Rich Knowledge

Hypotheses

Experiments

KB

Page 55: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Roadmap

Extract richer knowledge:

Cell type, experimental condition, …

Hedging, negation, …

Formulate coherent models:

Supporting evidence, contradiction, …

Intellectual gaps, hypotheses, …

Integrate w. data & experiments:

Cancer panomics Driver genes / pathways

Single-drug response Drug combo prioritization

55

Page 56: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Big Mechanism

42-million program

Reading, Assembly, Explanation

Domain: Cancer signaling pathways

We are in

PI: Andrey Rzhetsky

Co-PI w. James Evans, Ross King

56

Page 57: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

57

Berkeley

AMP Lab

OHSU

Microsoft

Research

Page 58: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

We Have Digitized Life

58

Page 59: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Next: Digitize Medicine

59

Knock down genes A, B, C → Cure

Page 60: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Summary

Precision medicine is the future

Cancer systems modeling

Graphical model: Pathways + Panomics data

Extract pathways from PubMed

Machine reading by grounded semantic parsing

Literome: KB for genomic medicine

60

Page 61: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Acknowledgments

61

U. Chicago: Andrey Rzhetsky, Kevin White

OHSU: Brian Drucker, Jeff Tyner

Berkeley AMP Lab: David Patterson

U. Wisconsin: Anthony Gitter

Microsoft Research: Chris Quirk, Kristina

Toutanova, David Heckerman, Ankur Parikh,

Lucy Vanderwende, Bill Bolosky, Ravi Pandya

Page 62: Machine Reading for Cancer Panomics - AKBC · 2018. 11. 4. · Genomics 12 Discovery … ATTCGGATATTTAAGGC ... UCSC Genome Browser, MSR Interactions Track ... Toutanova, David Heckerman,

Summary

62

… ATTCGGATATTTAAGGC …

… ATTCGGGTATTTAAGCC …

……

……

Disease Genes

Drug Targets

……KBHigh-Throughput Data