knetminer - knowledge network miner
TRANSCRIPT
![Page 1: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/1.jpg)
Mining biological knowledge networks for gene-phenotype discovery
Keywan Hassani-Pakhttp://knetminer.rothamsted.ac.uk/
Plant and Animal Genomes Conference 2017
@KnetMiner
![Page 2: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/2.jpg)
The Genotype to Phenotype Challenge
GenotypeSNPs and Indels
OmicsIncludes any ‘omics
PhenotypeFloweringDefence
DevelopmentStress tolerance
Biological Knowledge Network
1. Methods to assemble and visualise an integrated knowledge network of the cell
2. Methods to use the knowledge network to translate genotype to phenotype
![Page 3: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/3.jpg)
• Free and open source
• Data warehousing using a graph-database
• Platform to integrate public and private datasets in various formats
• Provides a GUI, CLI and APIs for reproducible data integration workflows
Ondex – Data Integration Platform
Ondexwww.ondex.org
![Page 4: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/4.jpg)
The approach is generic and works similarly for other species
![Page 5: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/5.jpg)
Let’s get a GWAS dataset…
http://plants.ensembl.org/biomart
#SNP=66,816 | #Gene=27,502 | #Phenotype=107
![Page 6: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/6.jpg)
… transform into a network
close
to
(SNP)
(Phenotype)
associated
![Page 8: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/8.jpg)
inte
ract
s
interacts
close
to
(SNP)
(Phenotype)
associated
… add biological interactions
![Page 9: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/9.jpg)
• Gene-GO• Gene-Phenotype
Gene knock-out or overexpressionText mining publications
• Gene-Publication• Gene-Pathway• Homology to yeast• Homology to crops
Wheat
… finally add other open linked data
>500,000 nodes
>1,500,000 links
Genome-scale knowledge network
![Page 10: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/10.jpg)
Relationships in Crop Knowledge Networks
GO
TO
encodes ortholog
domain
text-mining
involved_in
published
GWAS P-Value 10-8
41% identityEnsemblCompara
Genes Homology Annotations
phen
otyp
e
encodes
Inferred from Mutant PhenotypePMID: 15598800
Genetics
QTL
GWAS
Marker
Interactions Phenotype
Mutations in TTG2 cause phenotypic defects seed color
pigmentation. PMID: 17766401
![Page 11: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/11.jpg)
• Methods needed to evaluate millions of relationships in knowledge network, prioritize genes and extract relevant subnetworks
• Interactive and exploratory tools needed to
enable knowledge discovery and decision making
• Interpretation should be the task of domain experts i.e. biologists!
How to search and interpret too much information?
![Page 12: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/12.jpg)
KnetMiner – Systematic and evidence-based gene discovery
http://knetminer.rothamsted.ac.uk
![Page 13: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/13.jpg)
Web Browser
KnetMiner Client
KnetMiner Server
Servlets and JSP Page
Java Socket
KnowledgeGraph DBOndex API
DHTML
JavaScript
Apache Tomcat
Multithreaded Java Server
HTML, JSON, XML and images over HTTP via Ajax
Views
Java Socket
Java Applet
Flash
KnetMiner Software Architecture
Major improvements to the user-interface.
Re-implemented Java Applet and Flash components in JavaScript.
Now compatible with most OS and touch devices.
![Page 14: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/14.jpg)
Which associations (genes) are worth following up?Often a highly subjective decision
How is genotype translated to phenotype?Often involves multi-omics interactions
![Page 15: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/15.jpg)
KnetMiner search interface
![Page 16: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/16.jpg)
KnetMiner Outputs
![Page 17: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/17.jpg)
Use Case 1 – Mining GWAS and QTL data
![Page 18: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/18.jpg)
• 96 or 192 Arabidopsis inbred lines• Genotyped: 250,000 SNPs• 107 phenotypes were measured
https://arapheno.1001genomes.org/study/1/o Floweringo Defenceo Ionomicso Developmental
• Wilcoxon and EMMA (control population structure) statistical tests
GWAS of 107 Phenotypes in Arabidopsis
Atwell et al., Nature 2010
![Page 19: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/19.jpg)
Examples where GWAS results are simple to interpret
Sodium concentration (Na)
Lesioning (LES)
AvrRpm1
Single, sharp peak of association centred on causal polymorphism
LD decays within 10 kb on average in Arabidopsis
![Page 20: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/20.jpg)
Examples where GWAS results are complex to interpret
FLC gene expression (FLC)
Leaf Number (LN22)
Days to flowering (FT Field)
Peaks are diffuse covering several hundred kb without a clear centre
Causal polymorphisms have not always strongest association
![Page 21: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/21.jpg)
Using KnetMiner to interpret GWAS results
Wilcoxon results
EMMA results
Atwell et al., Nature 2010
Flowering Locus C (FLC) gene expression
![Page 22: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/22.jpg)
Demo: Exploring genes and networks controlling FLC expression
![Page 23: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/23.jpg)
• Petal size QTL in Arabidopsis (in collaboration with John Doonan)
Using KnetMiner to prioritise genes in QTL
![Page 24: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/24.jpg)
Use Case 2 – Mining differentially expressed genes
![Page 25: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/25.jpg)
#25
White grained wheat is more prone to pre-harvest sprouting (PHS)
• PHS is the result of premature germination of grain in the ear and results in loss of bread-making quality
• Red grain colour is associated with increased dormancy and resistance to PHS
• Grain colour is due to proanthocyanidins (condensed tannins) in the testa
Sprouting
Grain colour
+ = white
o = red
Groos et al. (2002)TAG 104, 39-47
Red grain 20dpa
Andy Phillips
![Page 26: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/26.jpg)
67 down-regulated genes37 up-regulated genes
Over hundred statistically significant genes.How are these linked to grain colour and PHS?
Differential Gene Expression Analysis
![Page 27: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/27.jpg)
Google-like search interface
• Search knowledge graph using trait-based keywords
• Real-time user feedback and query suggestions
Trait related keywords
Query term suggestions
![Page 28: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/28.jpg)
Genes linked to grain colour and/or PHS
![Page 29: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/29.jpg)
Genes with direct or indirect links to grain colour and PHS
#29
![Page 30: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/30.jpg)
KnetMiner methodology
![Page 31: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/31.jpg)
Ondex Text-Mining Plugin
Input data• 27,416 Arabidopsis gene names from Phytozome• 52,561 Abstracts from PubMed that contain Arabidopsis• 22,201 curated citations from TAIR• 1,349 Trait Ontology terms from Planteome
Hassani-Pak et al., 2010
0.7
0.6 2.0
text-mining
0.51.0 x
y
BA
occurrs_in
Publication
Concepts
published_in weighted association network
IP=1.7; M=1.2; N=2
yx
BAGeneTO
TO
![Page 32: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/32.jpg)
Text-mining output
These steps connect 5553 Arabidopsis genes to 409 TO terms based on 18,341 co-citations
![Page 33: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/33.jpg)
• Uses TF*IDF to rank documents by their relevance to a search term
• Additionally, considers the properties of gene-evidence networks such as the specificity of documents to a gene the frequency of evidence concepts
• Smart pre-indexing of the knowledge network makes the computation of the score very fast
Gene Ranking
![Page 34: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/34.jpg)
• Web application for very fast search of large genome-scale knowledge graphs
• Ranking of candidate genes based on knowledge mining
• Interactive visualisation of genome and knowledge maps
• Facilitates hypothesis validation and generation
KnetMiner – Making Gene Discovery Efficient & Fun
http://knetminer.rothamsted.ac.uk/
![Page 35: KnetMiner - Knowledge Network Miner](https://reader034.vdocuments.us/reader034/viewer/2022042722/58a1a33c1a28abe6468b4f6d/html5/thumbnails/35.jpg)
Acknowledgements
John Doonan
Sergio FeingoldMartin Castellote
Uwe ScholzMatthias Lange
Andy Law
Keywan Hassani-PakAjit SinghMarco BrandiziMonika MistryLisa LillChris Rawlings
Dave EdwardsPhilipp Bayer
Misha KapusheskyKevin Dialdestoro
@KnetMiner