2015 6 bd2k_biobranch_knowbio
TRANSCRIPT
![Page 1: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/1.jpg)
Three TSRI Tools for capturing, sharing, and applying community knowledge
Benjamin GoodThe Scripps Research Institute
@bgood
![Page 2: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/2.jpg)
Outline
• Gene wiki, quick recap, update• Introducing:– http://knowledge.bio– http://biobranch.org
![Page 3: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/3.jpg)
Gene Wiki (on Wikipedia)
3
Protein structure
Symbols and identifiers
Tissue expression pattern
Gene Ontology annotations
Links to structured databases
Gene summary
Protein interactions
Linked references
Huss, PLoS Biol, 2008
![Page 4: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/4.jpg)
Wikidata
4
is a
regulates
Interacts with
Protein
Glycoprotein
Neural development
VLDL receptor
Amyloid precursor protein
Property:P31
Property:P128
Property:P129
Q8054
Q187126
Q1345738
Q1979313
Q423510
Q414043
Reelin
http://www.wikidata.org/wiki/Q414043
![Page 5: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/5.jpg)
A computable Gene (& Disease & Drug) Wiki
5
Structured data
Here nowSoon
Downstream(but exciting potential..)
?? ?
Wikipedia(s)
![Page 6: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/6.jpg)
Status Update
• Genes, diseases (and any minute.. Drugs) are in wikidata
• Demonstrations of incorporating this content in Wikipedia are functional
• We’ve been slowed a little bit by wikidata governance policies.. (they blocked our bot temporarily)
![Page 7: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/7.jpg)
Wikidata activities
• YOU can help!• https://www.wikidata.org/wiki/User:ProteinBoxBot
Join in one of these discussions and voice your support
![Page 8: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/8.jpg)
Outline
• Gene wiki• knowledge.bio• biobranch.org
![Page 9: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/9.jpg)
Knowledge.bio
• Provides a concept-centric view of the scientific literature. – You search and interact with concepts rather than
documents.• Main purpose is hypothesis generation• 2 data sources mined from PubMed– 70 million Explicit semantic relations (‘triples’)– 200 million Implicit gene-disease associations
![Page 10: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/10.jpg)
http://knowledge.bio
![Page 11: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/11.jpg)
Explicit relations view
Search for concept
View related concepts
(67 results)
Filter results
![Page 12: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/12.jpg)
View text where triple was extracted
![Page 13: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/13.jpg)
Diseases implicitly related to queried concept: CYP2R1
![Page 14: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/14.jpg)
Concepts linking CYP2R1 to Smith-Lemli Opitz Syndrome
![Page 15: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/15.jpg)
Table views complemented by a Network view for taking notes..
![Page 16: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/16.jpg)
Network (“Map”) view
Cytoscape.js canvasAuto and manual layout
Save Map as local text file
Load saved map
![Page 17: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/17.jpg)
Step 1: find candidate relationWhat new diseases might be related to CYP2R1?
Implicit prediction
![Page 18: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/18.jpg)
Step 2: find linking conceptsHow is CYP2R1 related to SLO syndrome?
![Page 19: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/19.jpg)
Step 3: Start building a hypothesis to explain the predicted relation
Do CYP2R1 and DHCR7 participate in a process related to SLO syndrome?
Explicit relations view
![Page 20: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/20.jpg)
Warning, may prove addictive..
![Page 21: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/21.jpg)
Next steps for knowledge.bio
• Enhanced community sharing• Integration with http://ndexbio.org from the
Cytoscape consortium• Allow user actions to feedback into underlying
NLP systems• Include access to other structured knowledge
sources e.g. Gene Ontology
![Page 22: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/22.jpg)
Outline
• Gene wiki• knowledge.bio• biobranch.org
![Page 23: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/23.jpg)
Breast cancer prognosis:10 year survival?
find patterns
Inferring class predictors
No
van't Veer, Laura J., et al. "Gene expression profiling predicts clinical outcome of breast cancer.” Nature 415.6871 (2002): 530-536.
Yes make predictions on new samples
No
Yes
10 year survival?
![Page 24: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/24.jpg)
find patterns make predictions
inferring survival predictors
1) select genes
2) infer predictor from data (e.g. decision tree, SVM, etc.)
Out of the 25,000+ genes, which small set works together the best?
No
Yes
10 year survival?
![Page 25: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/25.jpg)
Problem: gene selection instability
instability: different methods, different datasets produce different gene sets for the same phenotype [1]
[1] Griffith, Obi L., et al. "A robust prognostic signature for hormone-positive node-negative breast cancer.” Genome Medicine 5.10 (2013).
![Page 26: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/26.jpg)
Problem: the validation gap
training data, test data
validation
validation: predictive signatures often perform worse on independent data created for validation.
Photograph by Richard Hallman, National Geographic Adventure Blog
![Page 27: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/27.jpg)
find patternsmake predictions
Adding prior knowledge to the discovery algorithm
<10 yr survival
>10 yr survival
![Page 28: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/28.jpg)
Ex.) Network guided forests
Use protein interaction network to find good gene combinations
Dutkowski & Ideker (2011) Protein Networks as Logic Functions in Development in Development and Cancer. PLoS Computational Biology
![Page 29: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/29.jpg)
But most knowledge is not structured
2000200120022003200420052006200720082009201020112012
500000
550000
600000
650000
700000
750000
800000
850000
900000
950000
1000000
Number ar-ticles added to PubMed
>100 publications/hour
>194715 publications linked to “breast cancer” since 2000 http://tinyurl.com/brsince2000
![Page 30: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/30.jpg)
How can we use unstructured knowledge to improve predictors?
Need a distributed network of intelligent systems that are good at reading and hypothesizing
Like you and your friends
![Page 31: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/31.jpg)
A game with a purpose: The Cure
• http://genegames.org/cure• http://games.jmir.org/2014/2/e
7/• The Cure: Design and Evaluation of a
Crowdsourcing Game for Gene Selection for Breast Cancer Survival Prediction JMIR Serious Games PMID: 25654473
![Page 32: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/32.jpg)
People wanted to control the trees
![Page 33: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/33.jpg)
http://biobranch.org
![Page 34: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/34.jpg)
Branch Goals
• Provide easy, visual way for non-programmers to use large datasets to answer questions
• Construct libraries of manually crafted predictive models
• Use the collected models to generate ensemble predictors that incorporate the knowledge of the users
![Page 35: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/35.jpg)
Branch walkthrough: Choose a dataset
![Page 36: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/36.jpg)
Select evaluation option
![Page 37: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/37.jpg)
Tree Builder
![Page 38: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/38.jpg)
Split node builder
Each button is a different way to compose a split node in your decision tree
![Page 39: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/39.jpg)
Split node
Predictions at leaf nodes
100% correct
56% accurate
View data, adjust split point
If age less than 34.5Predict relapse
If greater, Predict no relapse
![Page 40: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/40.jpg)
Single feature splits
Pick from genes or clinical features
Type-ahead search
Statistical ranker
![Page 41: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/41.jpg)
Custom feature combination
BRCA2TOP2B
BRCA2 + TOP2B
Allows user to use a manually composed linear combination of other features
![Page 42: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/42.jpg)
Eg: 21 Gene Signature from OncoType Dx
ProliferationKi67STK15SurvivinCCNB1 (cyclin B1)MYBL2
InvasionMMP11CTSL2
HER2GRB7HER2
EstrogenERPGRBCL2SCUBE2
GSTM1
ReferenceACTB(b-actin)GAPDHRPLPOGUSTFRC
Recurrence Score Algorithm1. HER2 group score = 0.9 x GRB7+ 0.1 x HER2 (if the result is less than 8, then the GRB7
group score is considered 8);2. ER group score = (0.8x ER +1.2 x PGR + BCL2+ SCUBE2)÷43. Proliferation group score = ( Survivin + KI67 + MYBL2 + CCNB1 [the gene encoding
cyclin B1]+ STK15 )÷5 (if the result is less than 6.5, then the proliferation group score is considered 6.5)
4. Invasion group score=( CTSL2 [the gene encoding cathepsin L2] + MMP11 [the gene encoding stromolysin 3])÷2.
RSU=0.47* HER2- 0.34* ER +1.04* PROLIFERATION + 0.10* INVASION +0.05* CD68 -0.08* GSTM1 -0.07* BAG1
*A Multigene Assay to Predict Recurrence of Tamoxifen-Treated, Node-Negative Breast Cancer
CD68
BAG1
![Page 43: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/43.jpg)
Classifier nodes
Classifier Node
Class B
Class A
…...…...…...…...
…...…...…...…...
…...…...…...…...
Use a trained predictive model such as A Support Vector Machine as a node in your tree
Use
Build
![Page 44: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/44.jpg)
biobranch tree nodes
Branch decision tree
Class B
Class A
…...…...…...…...
…...…...…...…...
…...…...…...…...
Use a previously constructed tree as node
![Page 45: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/45.jpg)
Visually set decision boundary nodes
Visual split
Class B
Class A
…...…...…...…...
…...…...…...…...
…...…...…...…...
![Page 46: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/46.jpg)
Creating a visual split
Draw polygon
Add to treeSelect feature
Select feature
![Page 47: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/47.jpg)
Teach students about overfitting..
![Page 48: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/48.jpg)
Tree Builder
![Page 49: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/49.jpg)
Evaluation panel
View training and testing sets
Performance metrics
Confusion matrix
ROC curve
![Page 50: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/50.jpg)
Navigation
Save your treeNew tree
![Page 51: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/51.jpg)
Tree Collection
Open and edit shared tree
Search trees you create and trees shared with the community
![Page 52: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/52.jpg)
Editing shared tree
Tracks which user created each node
![Page 53: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/53.jpg)
Next steps
• More user testing• More datasets• Lots of users?• Better models?
training data, test data
validation
![Page 54: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/54.jpg)
Even more information!
• Screencasts• http://tinyurl.com/branch-cast• Open source code– https://bitbucket.org/sulab/biobranch– https://bitbucket.org/starinformatics/gbk– https://bitbucket.org/sulab/wikidatabots
![Page 55: 2015 6 bd2k_biobranch_knowbio](https://reader031.vdocuments.us/reader031/viewer/2022032700/55d11ea4bb61ebe2398b4779/html5/thumbnails/55.jpg)
Thanks
Funding and Support
BioGPS: GM83924Gene Wiki: GM089820BD2K COE: GM114833
Andra WaagmeesterSebastian BurgstallerElvira Mitraka
Lynn SchrimlGang FuEvan BoltonPaul PavlidisPeter RobinsonMany WikiDatans
Richard Bruskiewichhttp://starinformatics.com
Karthik GangavarapuVyshakh Babji
Andrew Su
The Prince of Crowdsourcing
ImplicitomeKristina Hettne, Leiden University
Contact: [email protected]@bgood