building a massive biomedical knowledge graph with citizen science
TRANSCRIPT
![Page 1: Building a massive biomedical knowledge graph with citizen science](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a520a81a28aba8348b466d/html5/thumbnails/1.jpg)
Building a massive biomedical knowledge
graph with citizen scienceBenjamin Good
The Scripps Research Institute @bgood
Not paying attention? be a citizen scientist at http://mark2cure.org
![Page 2: Building a massive biomedical knowledge graph with citizen science](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a520a81a28aba8348b466d/html5/thumbnails/2.jpg)
High level goal: improve access to published knowledge
22
articles added to PubMed per year
1 every 30 seconds, more than million a year
knowledge graph
![Page 3: Building a massive biomedical knowledge graph with citizen science](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a520a81a28aba8348b466d/html5/thumbnails/3.jpg)
Chemicals & drugsGenesOrganismsArea of studyBiological Process
Auto!Knowledge Graph
~10,000 articles
Ngly1 gene
?
New drug candidate?
![Page 4: Building a massive biomedical knowledge graph with citizen science](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a520a81a28aba8348b466d/html5/thumbnails/4.jpg)
Knowledge graph problems
• Assigning meaning to relations
• Incorrect relations • Missing relations • …
![Page 5: Building a massive biomedical knowledge graph with citizen science](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a520a81a28aba8348b466d/html5/thumbnails/5.jpg)
Facts of life in computer processing of human language
• False Positives and False Negatives always
• Human annotators remain the gold standard
• There are not nearly enough professional human annotators to process every document published
5 Not paying attention? be a citizen scientist at http://mark2cure.org
![Page 6: Building a massive biomedical knowledge graph with citizen science](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a520a81a28aba8348b466d/html5/thumbnails/6.jpg)
Observations
• There are about 2.92 billion Internet users
• Lots of them can read English
6 http://www.statista.com/statistics/273018/number-of-internet-users-worldwide/
![Page 7: Building a massive biomedical knowledge graph with citizen science](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a520a81a28aba8348b466d/html5/thumbnails/7.jpg)
Hypothesis
• We can generate the equivalent of massive numbers of professional annotators by aggregating the labor of large numbers of non-professional CITIZEN SCIENTISTS!!!
7
![Page 8: Building a massive biomedical knowledge graph with citizen science](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a520a81a28aba8348b466d/html5/thumbnails/8.jpg)
Building a Knowledge Graph
1. Find mentions of concepts in text
2. Identify relationships between concepts
8
![Page 9: Building a massive biomedical knowledge graph with citizen science](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a520a81a28aba8348b466d/html5/thumbnails/9.jpg)
Before we try for citizens..
• Can non-scientists collectively identify concepts in biomedical texts with high quality?
• We used the Amazon Mechanical Turk crowdsourcing platform to answer the question
9 Not paying attention? be a citizen scientist at http://mark2cure.org
![Page 10: Building a massive biomedical knowledge graph with citizen science](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a520a81a28aba8348b466d/html5/thumbnails/10.jpg)
Highlight the “disease”.
![Page 11: Building a massive biomedical knowledge graph with citizen science](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a520a81a28aba8348b466d/html5/thumbnails/11.jpg)
Answer was yes
• By combining the responses of multiple non-professional members of ‘the crowd’, we achieved equivalent quality to professional annotators
Good et al. “Microtask crowdsourcing for disease mention annotation in pubmed abstracts.” Pacific Symposium on Biocomputing 2015
http://psb.stanford.edu/psb-online/proceedings/psb15/good.pdf
![Page 13: Building a massive biomedical knowledge graph with citizen science](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a520a81a28aba8348b466d/html5/thumbnails/13.jpg)
Same task, different context
![Page 14: Building a massive biomedical knowledge graph with citizen science](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a520a81a28aba8348b466d/html5/thumbnails/14.jpg)
Experiment 1 in progressEvaluating quality and quantity of volunteer annotators
Goal is to complete about 600 abstracts, with 15 volunteers per abstract
Almost there!
![Page 15: Building a massive biomedical knowledge graph with citizen science](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a520a81a28aba8348b466d/html5/thumbnails/15.jpg)
mark2cure experiment 1Tasks/10
New usersLaunchTweet
Blog post
San Diego Union Tribune
Article
11:00am Feb. 9 5423, tasks complete
230 signups, 130 have completed a task
Not paying attention? be a citizen scientist at http://mark2cure.org
![Page 16: Building a massive biomedical knowledge graph with citizen science](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a520a81a28aba8348b466d/html5/thumbnails/16.jpg)
Next steps
• Implement and test a relation extraction workflow
• Start disease-focused knowledge capture missions
• First disease: NGLY1 deficiency
• http://ngly1.org
![Page 17: Building a massive biomedical knowledge graph with citizen science](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a520a81a28aba8348b466d/html5/thumbnails/17.jpg)
Thanks to the mark2cure team!
Max Nanis
Andrew Su
@bgood [email protected]
Ginger Tsueng
Chunlei Wu
Thank you to the citizen scientists
making this possible!
![Page 18: Building a massive biomedical knowledge graph with citizen science](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a520a81a28aba8348b466d/html5/thumbnails/18.jpg)
Why do I Mark2Cure?In memory of my daughter who had Cystic Fibrosis
Studied biology in college and I really miss it! My 4 year old daughter Phoebe is living with and battling rare disease.
I have Ehlers Danlos Syndrome. I hope to help people learn about this painful and debilitating disorder, so that others like me can receive more effective medical care.
I am retired, have a doctorate in medical humanities, and have two children with Gaucher disease. I am just looking for some way to put my education to use.
To give back
I Mark2Cure in memory of my son Mike who had type 1 diabetes.
Take part in something that helps humanity.
![Page 19: Building a massive biomedical knowledge graph with citizen science](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a520a81a28aba8348b466d/html5/thumbnails/19.jpg)
![Page 20: Building a massive biomedical knowledge graph with citizen science](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a520a81a28aba8348b466d/html5/thumbnails/20.jpg)
Increase precision with voting
20
1 or more votes (K=1)This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies.
K=2This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies.
K=3This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies.
K=4This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies.
Aggregation function
![Page 21: Building a massive biomedical knowledge graph with citizen science](https://reader030.vdocuments.us/reader030/viewer/2022032616/55a520a81a28aba8348b466d/html5/thumbnails/21.jpg)
AMT results: 589 abstracts compared to gold standard
21
F = 0.87, k = 6