finding biologically relevant information using adios thaibinh’s final project for cbb545: april...

38
Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

Upload: job-burns

Post on 08-Jan-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

Finding biologically relevant information using ADIOS

ThaiBinh’s final project for CBB545:

April 19, 2007

Page 2: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

The current state of affairs in natural language processing

• NLP: Converting human language into representations that are easier for computers to understand

• Most natural language processing requires a tagged training set

• Tagging = time consuming/costly

http://en.wikipedia.org/wiki/Natural_language_processing

Page 3: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC99T42

Page 4: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC99T42

Page 5: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

ADIOS

• “Unsupervised learning of natural languages”• ADIOS: Automatic distillation of structure

• Input: A corpus of characters (most likely, untagged sentences)

• Output: A grammar

“Unsupervised learning of natural languages”, Solan, et al., PNAS vol. 102, August 2005.

Page 6: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

A very quick primer on grammars

• A set of “rules” for making a “sentence”• Ex.

The grammar:S S + SS 1S a

A possible derivation:SS + SS + S + S1 + S + S1 + 1 + S1 + 1 + a

Page 7: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

A very quick primer on grammars

• We can visualize the expansion as a tree, and read the leaves

The grammar:S S + SS 1S a

A possible derivation:SS + SS + S + S1 + S + S1 + 1 + S1 + 1 + a

S

S S

S S1

1 a

+

+

Page 8: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

A very quick primer on grammars

• We can visualize the expansion as a tree, and read the leaves

S

S S

S S1

1 a

+

+

The grammar:S S + SS 1S a

A possible derivation:SS + SS + S + S1 + S + S1 + 1 + S1 + 1 + a

Page 9: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

ADIOS• The system builds a graph using the first

sentence• With each successive sentence, it tries to

find overlapping “subpaths” (patterns)

Page 10: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

ADIOS• Also try to generalize the path by looking

for equivalence classes

• Search for patterns and equivalence classes until no new ones are found

Page 11: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

ADIOS: A quick example

• Input a corpus of sentences * Chong had a presentation in CBB545 # * on Tuesday Chong had a presentation # * next Thursday Laura has a presentation # * ThaiBinh has a presentation in CBB545 # * ThaiBinh has a presentation today # * today ThaiBinh has a presentation # * Chong had a presentation # * Hugo has a presentation in CBB545 today # * ThaiBinh has a presentation in CBB545 today # * Laura has a presentation in CBB545 next Thursday # * Hugo has a presentation today # * Chong had a presentation on Tuesday # * Chong had a presentation in CBB545 on Tuesday # * Laura has a presentation next Thursday # * in CBB545 ThaiBinh has a presentation # * today ThaiBinh has a presentation in CBB545 #

Page 12: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

ADIOS: A quick example

• Output is a grammarP18 (a,presentation) P19 (E20,has,P18)E20 {Hugo,Laura,ThaiBinh}P21 (Chong,had)P22 (in,CBB545)P23 (P19,P22)P24 (P21,P18)

Page 13: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007
Page 14: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

P18 (a,presentation) P19 (E20,has,P18)E20 {Hugo,Laura,ThaiBinh}P21 (Chong,had)P22 (in,CBB545)P23 (P19,P22)P24 (P21,P18)

Page 15: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

P18 (a,presentation) P19 (E20,has,P18)E20 {Hugo,Laura,ThaiBinh}P21 (Chong,had)P22 (in,CBB545)P23 (P19,P22)P24 (P21,P18)

(P19,P22)

(E20,has,P18)

{Hugo,Laura,ThaiBinh}

(a,presentation) (in,CBB545)

Page 16: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

P18 (a,presentation) P19 (E20,has,P18)E20 {Hugo,Laura,ThaiBinh}P21 (Chong,had)P22 (in,CBB545)P23 (P19,P22)P24 (P21,P18)

(P21,P18)

(Chong,had)

(a,presentation)

Page 17: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007
Page 18: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007
Page 19: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

P18 (Chong,had,a) P19 (has,a)P20 (E21,P19,presentation)E21 {Hugo,Laura,ThaiBinh}P22 (in,CBB545) P23 (P20,P22) P24 (P18,presentation)

Page 20: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

P18 (Chong,had,a) P19 (has,a)P20 (E21,P19,presentation)E21 {Hugo,Laura,ThaiBinh}P22 (in,CBB545) P23 (P20,P22) P24 (P18,presentation)

P18 (a,presentation) P19 (E20,has,P18)E20 {Hugo,Laura,ThaiBinh}P21 (Chong,had)P22 (in,CBB545)P23 (P19,P22)P24 (P21,P18)

Two different grammars: Same end result

Page 21: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

ADIOS

• Able to generate sentences using the grammar it created

• Can test if new sentence fits one of the grammar rules

• Can be applied to wide variety of domains– Bible in various languages– Classify protein function based on amino acid

sequence

Page 22: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

The Project

• Use ADIOS to create grammar rules from biomedical sentences

• Look for gene-gene associations• Look for gene-disease associations• Infer information about a pair of genes in

an unseen sentence based on its sentence structure (pattern)

Page 23: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007
Page 24: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

AbnerFind mentions of genes

Page 25: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

“The clinical effects of cortisone and ACTH (adrenocorticotropic hormone) in the collagen diseases: acute disseminated lupus erythematosus, periarteritis nodosa, dermatomyositis and scleroderma; interim report.”

Phrase: "in the collagen diseases"Meta Candidates (6) 1000 C0009326:Collagen Diseases [Disease or Syndrome]

Phrase: "periarteritis nodosa,"Meta Candidates (4) 1000 C0031036:Periarteritis Nodosa (Polyarteritis Nodosa) [Disease or Syndrome]

Phrase: "dermatomyositis"Meta Candidates (2) 1000 C0011633:Dermatomyositis [Disease or Syndrome] 1000 C0221056:Dermatomyositis (Dermatomyositis, Adult Type) [Disease or Syndrome]

Phrase: "scleroderma"Meta Candidates (4) 1000 C0011644:Scleroderma [Disease or Syndrome] 1000 C0036421:Scleroderma (Systemic Scleroderma) [Disease or Syndrome]

MetamapFind mentions of diseases

Page 26: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

The Project: Input

• Replace any mention of a gene with a generic term

• Ex.

GeneOne antagonizes GeneTwo signaling in the nucleus

Smad7 antagonizes TGF-{beta} signaling in the nucleus

GeneOne negatively regulates expression of GeneTwo

PTEN negatively regulates expression of cyclin D1

Page 27: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

The Project: Input

• Replace any mention of a gene/disease with a generic term

• Ex.p16 is consistently expressed in endometrial tubal metaplasia

The expression of cyclin D1 is more often correlated with prognosis in cancers of ampulla of vater

GeneOne is consistently expressed in DiseaseOne

The expression of GeneOne is more often correlated with prognosis in DiseaseOne

Page 28: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

Let ADIOS work its “magic”…

Page 29: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

Let ADIOS work its “magic”…

Out pops patterns to describe the sentences (the grammar)

Page 30: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

“Tagging” the patterns

GeneOne GeneTwoantagonizes

negatively regulates

GeneOnez GeneTwoincreases transcription

positively regulates

GeneOne GeneTwo

GeneOnez GeneTwo

Page 31: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

“Tagging” the patterns

GeneOne GeneTwoantagonizes

negatively regulates

GeneOnez GeneTwoincreases transcription

positively regulates

GeneOne GeneTwo

GeneOnez GeneTwo

Page 32: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

“Tagging” the patterns

GeneOne GeneTwo

antagonizes

negatively regulates

inhibits

GeneOne GeneTwo

increases transcription

positively regulates

activates

Page 33: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

Seeing a new sentence

Ras/Erk pathway positively regulates Jak1/STAT6 activity

Page 34: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

Seeing a new sentence

GeneOne GeneTwo

increases transcription

positively regulates

activates

Ras/Erk pathway positively regulates Jak1/STAT6 activity

Page 35: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

Seeing a new sentence

Ras/Erk Jak1/STAT6

increases transcription

positively regulates

activates

Page 36: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

The big picture…Automatic extraction of regulation

Ras/Erk pathway positively regulates Jak1/STAT6 activity

GeneOne action GeneTwoSmad7 inhibit TGF-Beta

PTEN inhibit Cyclin D1

Ras/ERK activate Jak1/Stat6

Smad7 antagonizes TGF-{beta} signaling in the nucleus

PTEN negatively regulates expression of cyclin D1

GeneOne expression DiseaseOnep53 down-

regulatedNeck cancer

p16 upregulated Endo. tubal metaplasia

Loss of p53 Expression Correlates with… Neck Cancer

p16 is consistently expressed in endometrial tubal metaplasia

Page 37: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

Potential (inevitable) problems

• The data/sentences– Amount

• ADIOS’S data usually had 1000’s of sentences– Quality

• ABNER/MetaMap (used for finding gene/disease-mentions) are not always accurate

• Is it even feasible?– Biologists/Scientists are very creative in

coming of with various ways of saying the same thing

Page 38: Finding biologically relevant information using ADIOS ThaiBinh’s final project for CBB545: April 19, 2007

Potential (inevitable) problems

• The data/sentences– Amount

• ADIOS’S data usually had 1000’s of sentences– Quality

• ABNER/MetaMap (used for finding gene/disease-mentions) are not always accurate

• Is it even feasible?– Stay tuned…