![Page 1: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/1.jpg)
Open data, compound repurposing, and rare diseases
Andrew Su, Ph.D.@[email protected]://sulab.org
January 30, 2017
Slides: slideshare.net/andrewsu
![Page 2: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/2.jpg)
2
Programmer/Comp sci
Statistician/ Mathematician
Biologist
Data scientist
Bioinformatician Biostatistician
Adapted from http://blog.fejes.ca/?p=2418
…teach STEM students the importance of connecting computational, mathematical, and natural sciences.
![Page 3: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/3.jpg)
3
Credit: http://www.slideshare.net/PhRMA/rare-disease-infographics
![Page 4: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/4.jpg)
4
Credit: http://www.slideshare.net/PhRMA/rare-disease-infographics
![Page 5: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/5.jpg)
Rare disease case study #15
Photo: Retta Beery
![Page 6: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/6.jpg)
6
Bainbridge et al., STM, 2011
![Page 7: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/7.jpg)
7
Photo: Retta Beery
![Page 8: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/8.jpg)
Rare disease case study #28
![Page 9: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/9.jpg)
9
… but no obvious treatments
![Page 10: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/10.jpg)
10
Bainbridge et al., STM, 2011
SPR
![Page 11: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/11.jpg)
What differentiates SPR and NGLY1?11
SPR
![Page 12: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/12.jpg)
12
Sarah Olmsteadhttps://flic.kr/p/364dZW
NGLY1
![Page 13: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/13.jpg)
13
NGLY1(11 PubMed articles)
Congenital disorders of glycosylation
(822)
PNGase(686)
ERAD(1330)
glycosylation(48,862)
alacrima(164)
Genetic interactors
(3016)
symptoms(109,928)
25 million articles in PubMed
![Page 14: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/14.jpg)
The biomedical literature is massive…14
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
0200,000400,000600,000800,000
1,000,0001,200,0001,400,000
Number of new PubMed-indexed articles
![Page 15: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/15.jpg)
… but it is very hard to query and compute15
![Page 16: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/16.jpg)
… but it is very hard to query and compute16
ImatinibCrizotinibErlotinibGefitinibSorafenibLapatinibDasatinib
…
Acute myeloid leukemiaAcute lymphoblastic leukemia
Chronic myelogenous leukemiaChronic lymphocytic leukemia
Hodgkin lymphomaNon-Hodgkin lymphoma
Myeloma…
AND
GleevecGlivecSTI-571STI 571STI571ST1571ST 1571CGP-57148CGP 57148CGP57148CGP57148B…
![Page 17: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/17.jpg)
… but it is very hard to query and compute17
EntrezGene ID HGNC symbol Description
10884 MRPS30 mitochondrial ribosomal protein S30
10914 PAPOLA poly(A) polymerase alpha
11333 PDAP1 PDGFA associated protein 1
11334 TUSC2 tumor suppressor candidate 2
130120 REG3G regenerating islet-derived 3 gamma
5068 REG3A regenerating islet-derived 3 alpha
50807 ASAP1 ArfGAP with SH3 domain, ankyrin repeat and PH domain 1
55 ACPP acid phosphatase, prostate
8853 ASAP2 ArfGAP with SH3 domain, ankyrin repeat and PH domain 2
Human genes referred to as “PAP”
![Page 18: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/18.jpg)
18
Biomedical research relies on effective
Pie
tro B
ellin
iht
tps:
//flic
.kr/p
/k5j
mja
KNOWELDGE MANAGEMENT
![Page 19: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/19.jpg)
Information extraction from biomedical text19
1. Identify biomedical concepts in text
… We report a case of familial systemic mastocytosis with the rare KIT K509I germ line mutation. In vitro treatment with imatinib, dasatinib and PKC412 reduced cell viability of primary mast cells harboring KIT K509I mutation. Both patients with familial systemic mastocytosis had remarkable hematological and skin improvement after three months of imatinib treatment.
Leuk Res. 2014 Oct;38(10):1245-51. doi: 10.1016/j.leukres.
GENES
DISEASES
DRUGS
VARIANTS
![Page 20: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/20.jpg)
Information extraction from biomedical text20
imatinib
dasatinib
PKC412
Familial systemic mastocytosis
KIT
K509I
1. Identify biomedical concepts in text
2. Identify relationships between concepts
Mutation of
Mutation causes
causes
treats
inhibits
![Page 21: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/21.jpg)
21
Goal: Assemble a network of biomedical knowledge that is comprehensive, current, computable and traceable.
![Page 22: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/22.jpg)
22
http://www.navy.mil/management/photodb/photos/101104-N-6383T-508.jpg
![Page 23: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/23.jpg)
The Gene Wiki project, circa 200823
Protein structure
Symbols and identifiers
Tissue expression pattern
Gene Ontology annotations
Links to structured databases
Gene summary
Protein interactions
Linked references
Huss, PLoS Biol, 2008
![Page 24: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/24.jpg)
24
![Page 25: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/25.jpg)
Lissencephaly
Gene-disease annotation databases25
Query: Reelin (RELN)
![Page 26: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/26.jpg)
Gene-disease annotation databases26
Lissencephaly Familial Temporal Lobe Epilepsy
Query: Reelin (RELN)
![Page 27: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/27.jpg)
Gene-disease annotation databases27
Lissencephaly Familial Temporal Lobe Epilepsy OtosclerosisSchizophrenia
Query: Reelin (RELN)
![Page 28: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/28.jpg)
Gene-disease annotation databases28
Lissencephaly Familial Temporal Lobe Epilepsy OtosclerosisSchizophreniaBipolar Disorder Autistic Disorder Alzheimer Disease Schizophrenic Psychology Breast Neoplasms …
Child Development Disorders, Pervasive
Cognition Cognition Disorders Dominance, Cerebral Executive Function Field Dependence-
Independence Functional Laterality Choice Behavior Precursor T-Cell
Lymphoblastic Leukemia-Lymphoma
27 “diseases”
Psychotic Disorders Attention Attention Deficit Disorder
with Hyperactivity Memory Memory, Short-Term Mental Disorders Task Performance and
Analysis Tobacco Use Disorder Weight Gain Schizophrenia, Paranoid
Query: Reelin (RELN)
![Page 29: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/29.jpg)
is to data
is to text
biomedicalProvide a database of the world’s knowledge that anyone can edit
- Denny Vrandečić
![Page 30: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/30.jpg)
Subclass of
Regulates
Physically interacts with
Protein
Neural development
Property:P279
Property:P128
Property:P129
Q8054
Q1345738
VLDL receptor Q1979313
Amyloid beta A4 Q423510
Q13561329
http
://w
ww
.wik
idat
a.or
g/w
iki/Q
1356
1329
Decreased expression in
Property:P1910Schizophrenia Q41112
Bipolar disorder Q131755
![Page 31: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/31.jpg)
Property:P279
Property:P128
Property:P129
Q8054
Q1345738
Q1979313
Q423510
Q13561329
http
s://
ww
w.w
ikid
ata.
org/
w/a
pi.p
hp?a
ctio
n=w
bget
entit
ies&
ids=
Q13
5613
29&
form
at=j
son
Property:P1910Q41112
Q131755
![Page 32: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/32.jpg)
32
![Page 33: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/33.jpg)
Seeding Wikidata with biomedical data
• All human, mouse genes and proteins
• All Gene Ontology terms• All FDA approved drugs • 9,000+ human diseases• 120 reference microbial genomes
Mitraka et al (2015) Semantic Web Applications for the Life SciencesBurgstaller-Muelbacher et al (2016) DatabasePutman et al (2016) Database
![Page 34: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/34.jpg)
Centralizing key data storage34
287 language editions of Wikipedia
Bioinformatics community
Toxicology community
Epidemiology community… …
![Page 35: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/35.jpg)
“Show all tyrosine kinase inhibitors that are used to treat hematologic cancers.”
![Page 36: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/36.jpg)
“Show all human membrane proteins associated with colorectal cancer.”
![Page 37: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/37.jpg)
“Show all monoclonal antibodies used to treat melanoma.”
![Page 38: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/38.jpg)
39
Crowdsourcing via Citizen Science
Biomedical Linked Open Data
![Page 39: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/39.jpg)
40
Sou
rce:
http
s://w
ilson
com
mon
slab
.org
/201
4/03
/06/
calli
ng-a
ll-su
ppor
ters
![Page 40: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/40.jpg)
Question: Can a group of non-scientists collectively perform concept recognition in biomedical texts?
41
![Page 41: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/41.jpg)
42
Experts versus crowd for concept identification
593 PubMed abstracts
6,900 mentions of “disease concepts”
F = 0.87F = 0.78
$$$
![Page 42: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/42.jpg)
43
Experts versus crowd for concept identification
593 PubMed abstracts
6,900 mentions of “disease concepts”
F = 0.87F = 0.87
$$$
• 9 days• 145 workers• Total: $630.96
![Page 43: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/43.jpg)
45
http://mark2cure.org
![Page 44: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/44.jpg)
46
Paid crowdsourcing
• F = 0.84• 28 days• 212 workers• Total cost: $0
$$$
• F = 0.87• 9 days• 145 workers• Total: $630.96
“Help science, please”
Citizen Science
![Page 45: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/45.jpg)
Does Citizen Science scale?47
1,000,000 articles * 10 AE / article 15,828 volunteers
needed10,275 AE * 365 days
212 annotators* 28 days
AE = Annotation events
=
Number of annotation events per year
Number of annotation events per year
per volunteer
![Page 46: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/46.jpg)
Does Citizen Science scale?48
15,828 volunteers
needed
200,000 volunteers
460,000 volunteers
37,000 volunteers
1,000,000+ volunteers
![Page 47: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/47.jpg)
Mapping the biomedical network around NGLY1 49
NGLY1
![Page 48: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/48.jpg)
50
http://mark2cure.org
![Page 49: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/49.jpg)
51
A preliminary view of the NGLY1-focused biological network
1,200 contributors3,200 documents 787,400 annotations
![Page 50: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/50.jpg)
Finding new indications for existing drugs or therapies53
Raynaud’s Syndrome
Fish oil
Abnormal platelet activity
Abnormal blood
viscosity
High blood viscosity
Elevated RBC rigidity
Vasodilation
Low blood triglycerides
Increased prostacyclins
![Page 51: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/51.jpg)
Finding new indications for existing drugs or therapies54
![Page 52: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/52.jpg)
Finding new indications for existing drugs or therapies55
Raynaud’s Syndrome
Fish oil
Abnormal platelet activity
Abnormal blood
viscosity
High blood viscosity
Elevated RBC rigidity
Vasodilation
Low blood triglycerides
Increased prostacyclins
A
C
B
B BB
BB
B
![Page 53: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/53.jpg)
56
A preliminary view of the NGLY1-focused biological network
A
C
B
B BB
BB
B
AB
B BB
BB
B
A
B
B BB
BB
B
![Page 54: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/54.jpg)
57
Biomedical research relies on effective
Pie
tro B
ellin
iht
tps:
//flic
.kr/p
/k5j
mja
KNOWELDGE MANAGEMENT
![Page 55: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/55.jpg)
58
Paul Pavlidis,
UBC
Lynn Schriml,
U Maryland
Matt and Cristina Might,
Crowd volunteers and partners
(Salomon) (Lotz)
(Yang, Maximov) (Topol)
![Page 56: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/56.jpg)
Louis Gioia
Julee Adesara
Toby Li
Karthik G
Erick Scott
Adam Mark
Kevin Xin
Jake Bruggemann
Mike Mayers
Andra Waagmeester
Max Nanis
Cyrus Afrasiabi
Ian MacLeod
Julia Turner
Ginger Tsueng
Sebastien Lelong
Erik Clarke
Jennifer Fouquier
Ben GoodChunlei Wu Shirley Willis
Tobias Meissner Katie Fisch Sandip Chatterjee
Ramya Gamini Greg Stupp Sebastian Burgstaller
Tim Putman Nuria Queralt Rosinach
Sal Loguercio
M2C M2C
GW
GW
GW
GW GW
GW
GW
![Page 57: Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University](https://reader035.vdocuments.us/reader035/viewer/2022070600/58ce66401a28ab2f268b6a2b/html5/thumbnails/57.jpg)
60