carlo trugenberger: scientific discovery by machine intelligence: a new avenue fro drug research

12
InfoCodex Semantic Technologies Turning Information into Knowledge Scientific Discovery by Machine Intelligence: A New Avenue for Drug Research? Dr. Carlo A. Trugenberger Co-Founder and Chief Scientific Officer InfoCodex Semantic Technologies AG, CH-9470 Buchs September 2, 2015 1 www.InfoCodex.com Semantics 2015

Upload: semantic-web-company

Post on 13-Apr-2017

335 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: Carlo Trugenberger: Scientific Discovery by Machine Intelligence: A New Avenue fro Drug Research

InfoCodex Semantic Technologies Turning Information into Knowledge

Scientific Discovery by Machine Intelligence: A New Avenue for Drug Research?

Dr. Carlo A. Trugenberger Co-Founder and Chief Scientific Officer

InfoCodex Semantic Technologies AG, CH-9470 Buchs

September  2,  2015   1  www.InfoCodex.com  

Semantics 2015

Page 2: Carlo Trugenberger: Scientific Discovery by Machine Intelligence: A New Avenue fro Drug Research

InfoCodex Semantic Technologies Turning Information into Knowledge

Big changes in pharmaceutical research The end of the blockbuster era? Challenges Opportunities

02/09/15   www.InfoCodex.com   2  

Ø  Genomics / Proteomics Ø  Big data / data mining ➪ structure-based design Ø  Drugs are “computed” rather than discovered

Ø  Costs are exploding Ø  Regulatory pressure Ø  Personalized medicine Ø  Outsourcing of critical processes

Critical for survival: Ø  Shorten time-to market Ø  Early recognition of dead ends

Critical to beat competition: Ø  Data + data analysis power Ø  Machine intelligence

Page 3: Carlo Trugenberger: Scientific Discovery by Machine Intelligence: A New Avenue fro Drug Research

InfoCodex Semantic Technologies Turning Information into Knowledge

The data deluge as an opportunity for eDiscovery Traditional bioinformatics: structured data

New Idea: exploit unstructured data

02/09/15   www.InfoCodex.com   3  

Experiment: Merck + Thomson Reuters + InfoCodex Is it possible to drive drug research by text mining large pools of biomedical documents?

sequence alignment, gene finding, genome assembly, protein structure prediction, gene expression…

PubMed: 22 million citations, growing at the rate of I.7 paper/minute

Page 4: Carlo Trugenberger: Scientific Discovery by Machine Intelligence: A New Avenue fro Drug Research

InfoCodex Semantic Technologies Turning Information into Knowledge

02/09/15   www.InfoCodex.com  4  

The Experiment of Merck & Co with InfoCodex

The tasks: Ø  Discover novel biomarkers for diabetes

and obesity (D&O) by analyzing 120’000 medical publications (PubMed +ClinicalTrials.org + internal)

Ø  Blind experiment, no human feedback

The aim: Ø  Test pure machine intelligence for

“semantic drug research”

Biomarker: $13.6 billion market in 2011, growing to $25 billion by 2016.

Page 5: Carlo Trugenberger: Scientific Discovery by Machine Intelligence: A New Avenue fro Drug Research

InfoCodex Semantic Technologies Turning Information into Knowledge

Semantic technologies in the pharma industry Most existing projects use NLP to extract triples “entity 1-relation-entity

2” sentence by sentence ➪ help to curate ontologies / libraries However: this is not a discovery approach Relations found this way have been explicitly written by human authors

and are thus known in one way or another Going beyond triples: analyze text collections globally to identify small,

seemingly unrelated and unnoticed facts dispersed over isolated texts assembling the scattered pieces of a puzzle Critical: machine intelligence

02/09/15   www.InfoCodex.com   5  

Page 6: Carlo Trugenberger: Scientific Discovery by Machine Intelligence: A New Avenue fro Drug Research

InfoCodex Semantic Technologies Turning Information into Knowledge

The Technology: eDiscovery by InfoCodex Linguistics + Information Theory + Self-Organization

02/09/15   www.InfoCodex.com   6  

Ø  Completely automatic semantic analysis of content. Ø  Designed for uncovering unnoticed correlations amongst information distributed over documents groups and collections (contrary to NLP) Ø  “Assemble the pieces of a puzzle” Ø  Knowledge discovery as opposed to information extraction

Page 7: Carlo Trugenberger: Scientific Discovery by Machine Intelligence: A New Avenue fro Drug Research

InfoCodex Semantic Technologies Turning Information into Knowledge

02/09/15   www.InfoCodex.com   7  

Page 8: Carlo Trugenberger: Scientific Discovery by Machine Intelligence: A New Avenue fro Drug Research

InfoCodex Semantic Technologies Turning Information into Knowledge

Step 1 : establish reference models for biomarkers / phenotypes Ø  Cluster documents describing known biomarkers (224 references found) Ø  Reference model for each cluster → meanings for “biomarkers diabetes” …

Step 2: determine the meaning of unknown words by machine inference. Step 3: analyze documents and generate a list of potential D&O biomarkers/phenotypes by comparison with the reference models. Step4: establish confidence levels

02/09/15   www.InfoCodex.com   8  

Encoded meanings

Page 9: Carlo Trugenberger: Scientific Discovery by Machine Intelligence: A New Avenue fro Drug Research

InfoCodex Semantic Technologies Turning Information into Knowledge

Determination of the meaning of unknown words: machine inference

Example: “Hctz” is a “diuretic drug” and is a synonym of “hydrochlorothiazide” Such relations established only on the basis of machine intelligence combined with internal knowledge base

02/09/15   www.InfoCodex.com   9  

Co-occurrences with words in internal knowledge base → most probable hypernym → “is a” , “has to do”

Page 10: Carlo Trugenberger: Scientific Discovery by Machine Intelligence: A New Avenue fro Drug Research

InfoCodex Semantic Technologies Turning Information into Knowledge

02/09/15   www.InfoCodex.com   10  

The output

Page 11: Carlo Trugenberger: Scientific Discovery by Machine Intelligence: A New Avenue fro Drug Research

InfoCodex Semantic Technologies Turning Information into Knowledge

02/09/15   www.InfoCodex.com   11  

Many uninteresting candidates Too much noise (the problem has been identified and corrected)

Lots of “needles in the haystack” Tens of extremely interesting and valuable candidates with very high potential

The Results

Page 12: Carlo Trugenberger: Scientific Discovery by Machine Intelligence: A New Avenue fro Drug Research

InfoCodex Semantic Technologies Turning Information into Knowledge

Conclusion ü  Approach has high potential for discovery ü  Approach has potential to impact pharma research

q  Speed up time-to-market q  Early recognition of dead ends

X  Improvements in the process are needed: problems have been identified and corrected.

Ø  Most promising is a hybrid approach q Human expertise in formulation of reference models q Human curation of candidates prior to passing to the

laboratory ü  Possibly inevitable development

02/09/15   www.InfoCodex.com   12