extending the "web of drug identity" with knowledge extracted from united states product...
DESCRIPTION
Report on Linked Structured Product Labels (LinkedSPLs) and a study evaluating three different approaches to mapping active ingredients coded in Structured Product Labels to DrugBank.TRANSCRIPT
Biomedical Informatics1
Extending the “Web of Drug Identity” with Knowledge Extracted from United States Product LabelsOktie Hassanzadeh, IBM ResearchQian Zhu, Mayo ClinicRobert Freimuth, Mayo ClinicRichard Boyce*, University of Pittsburgh
Department of Biomedical Informatics
Biomedical Informatics2
Take home message• Drug product labeling is a vital, unique, and
under-utilized source of claims and evidence about drugs – genes, diseases, drugs, drug interactions, special
populations, and adverse reactions
• All American product labeling content is available in an accessible format– Structured Product Labeling (SPL)
• LinkedSPLs is a Linked Data version of SPLs– simplifies access to SPL content
– interoperable with other important drug terminologies
Biomedical Informatics3
Drug product labeling is special?• It complements existing knowledge
sources– 40% of 44 pharmacokinetic drug-drug
interactions affecting 25 drugs were located exclusively in product labeling [1]
– 24% of clinical efficacy trials for 90 drugs were discussed in the product label but not the scientific literature [2]
– 1/5th of the evidence for metabolic pathways for 16 drugs and 19 metabolites was found in product labeling but not the scientific literature [3]1. Boyce RD, Collins C, Clayton M, Kloke J, Horn JR. Inhibitory metabolic drug interactions with newer psycho-tropic drugs: inclusion in package inserts and
influences of concurrence in drug interaction screening software. Ann Pharmacother. 2012;46(10):1287–1298.2. Lee K, Bacchetti P, Sim I. Publication of Clinical Trials Supporting Successful New Drug Applications: A Literature Analysis. PLoS Med. 2008;5(9):e191.3. Boyce R, Collins C, Horn J, Kalet I. Computing with evidence: Part I: A drug-mechanism evidence taxonomy oriented toward confidence assignment. Journal of Biomedical Informatics. 2009;42(6):979–989.
Biomedical Informatics4
Why product labeling has information that is not in the scientific literature
1. Product labels contain a summary of information reported in detail in a drug’s New Drug Application
– Often difficult/impossible for a researcher to access
2. Until recently, there was no requirement to publish pre-market drug study results
– This has changed since ~2010
Biomedical Informatics5
Product labeling is under-utilized by translational researchers• only two out of more than 2,300
MEDLINE abstracts discuss product label NLP [1]
• Several recent informatics projects did not explicitly include product label information [2-6]
1. Query done on 11/26: (Natural Language Processing [MeSH Terms] OR Natural Language Processing [Text Word]) AND ((Drug Labeling [MeSH Terms] OR drug labeling[Text Word]) OR (Product Labeling, Drug [MeSH Terms]) OR ("product labeling" [Text Word]))
2. Segura-Bedmar I, Martinez P, Sanchez-Cisneros D eds. Proceedings of the First Challenge Task: Drug-Drug Interaction Extraction 2011. Huelva, Spain; 2011. Available at: http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-761/. Accessed December 9, 2011.
3. 16. SEMEVAL. Task Description - Extraction of Drug-Drug Interactions from BioMedical Texts. 2012. Available at: http://www.cs.york.ac.uk/semeval-2013/task9/. Accessed November 20, 2012.
4. Percha B, Garten Y, Altman RB. Discovery and explanation of drug-drug interactions via text mining. Pac Symp Biocomput. 2012:410–421.5. Tari L, Anwar S, Liang S, Cai J, Baral C. Discovering drug-drug interactions: a text-mining and reasoning approach based on properties of drug metabolism.
Bioinformatics. 2010;26(18):i547–553.6. Duke JD, Han X, Wang Z, et al. Literature based drug interaction prediction with clinical assessment using electronic medical records: novel myopathy associated
drug interactions. PLoS computational biology. 2012;8(8):e1002614.
Biomedical Informatics6
Doesn’t DrugBank handle this? • Not really!– DrugBank includes product label content from the
Physicians’ Desk Reference (PDR) [1]
– However, the PDR is actually a subset of available product label content
• claims and evidence unique to those drug product labels not included in the PDR will be missing from DrugBank
• potential negative effects on informatics experiments that that require complete drug information.
• E.g., possibly missed drug-interactions (DrugBank 3.0) include cimetidine-sertraline, cimetidine-venlafaxine, cimetidine-citalopram, and venlafaxine-haloperidol. [2-5]
1. Physicians’ Desk Reference, 66th Edition. 2012 Edition. PDR Network; 2011.2. http://dailymed.nlm.nih.gov/dailymed/lookup.cfm?setid=b1de3ed9-1cb8-e419-3f25-5b0aeed5779a. Accessed November 27, 2012.3. http://dailymed.nlm.nih.gov/dailymed/lookup.cfm?setid=cf2d9bee-f8e3-477a-e4b4-f0e82657b7d2. Accessed November 27, 2012.4. http://dailymed.nlm.nih.gov/dailymed/lookup.cfm?setid=4259d9b1-de34-43a4-85a8-41dd214e9177. Accessed November 27, 2012.5. http://dailymed.nlm.nih.gov/dailymed/lookup.cfm?setid=53c3e7ac-1852-4d70-d2b6-4fca819acf26. Accessed November 27, 2012.
Biomedical Informatics7
Second take home point:
• All American product labeling content is available in an accessible format– Structured Product Labeling (SPL)
Biomedical Informatics8
Structured Product Labels (SPLs)
• What you would see if you downloaded an SPL from DailyMed
1. http://www.fda.gov/OHRMS/DOCKETS/98fr/FDA-2005-N-0464-gdl.pdf2. http://www.fda.gov/ForIndustry/DataStandards/StructuredProductLabeling/default.htm3. http://dailymed.nlm.nih.gov/dailymed/downloadLabels.cfm
Biomedical Informatics9
More about SPLs
Biomedical Informatics10
More about SPLs
Biomedical Informatics11
Third take home point
• LinkedSPLs is a Linked Data version of SPLs– simplifies access to SPL content
– interoperable with other important drug terminologies
Biomedical Informatics12
LinkedSPLs – hypothesis
Hypothesis: A Linked Data knowledge base of drug product labels with accurate links to other relevant sources of drug information will provide a dynamic platform for drug information NLP that provides real value to translational researchers
Biomedical Informatics13
LinkedSPLs – A research program
Biomedical Informatics14
LinkedSPLs – A research program
Your annotations would go here!
Biomedical Informatics15
LinkedSPLs – Method
• Currently we are focusing on linking active ingredients in the structured portion of SPLs• unstructured text for future
work
Biomedical Informatics16
Linkage to external sources• There are many sources of drug information
that are complementary to each other.– DrugBank: contains drug targets, pathways,
interactions
– RxNorm: provides UMLS mappings
– ChEBI: provides rigorous classification of drugs
Biomedical Informatics17
Example
prodName rxNormProduct epcClass contraindications
Nefazodone Hydrochloride
rxcui:1098666 SEROTONIN REUPTAKE INHIBITOR
CONTRAINDICATIONS Coadministration of terfenadine, astemizole, cisapride, pimozide, or carbamazepine with nefazodone hydrochloride is contraindicated….
Biomedical Informatics18
What we tested• Three different linking approaches to link
to DrugBank1. Structure string (InChI)
2. Ontology label matching (ChEBI)
3. Unsupervised linkage point discovery (Automated) [1]
1. O. Hassanzadeh et al. “Discovering Linkage Points over Web Data”. To Appear in PVLDB, Vol 6. Issue 6, August 2013
Biomedical Informatics19
Linkage to DrugBank – Results
InChI identifier
ChEBI identifier
InChI + ChEBI
Automatic
InChI identifier 424 261 424 395
ChEBI identifier --- 707 707 650
InChI + ChEBI -- -- 831 791
Automatic -- -- -- 1162
• 1,246 active ingredients could be mapped to DrugBank by at least one method
• 1,096 unmapped ingredients
• The three approaches complement each other
Biomedical Informatics20
• The automatic approach performs very well– A greater number of accurate links discovered
with less effort
• A significant number remain unmapped:– Some salt or racemic forms of mapped
ingredients (e.g., alpha tocopherol acetate D)– Elements (e.g., gold, iodine), and variety of
natural organic compounds including pollens (N~200)
• Not all ingredients are included in DrugBank– other resources may be required to obtain
complete mappings for active ingredients.
Conclusions
Biomedical Informatics21
Want more information?• LinkedSPLs
– http://purl.org/LinkedSPLs
• Google code project– code.google.com/p/swat-4-med-safety/
• Publications– Hassanzadeh, O., Zhu, Qian., Freimuth, RR., Boyce R. Extending the
“Web of Drug Identity” with Knowledge Extracted from United States Product Labels. Proceedings of the 2013 AMIA Summit on Translational Bioinformatics. San Francisco, March 2013.
– Boyce, RD., Freimuth, RR., Romagnoli, KM., Pummer, T., Hochheiser, H., Empey, PE. Toward semantic modeling of pharmacogenomic knowledge for clinical and translational decision support. Proceedings of the 2013 AMIA Summit on Translational Bioinformatics. San Francisco, March 2013.
– Boyce RD, Horn JR, Hassanzadeh O, de Waard A, Schneider J, Luciano JS, Rastegar-Mojarad M, Liakata M. Dynamic enhancement of drug product labels to support drug safety, efficacy, and effectiveness. J Biomed Semantics. 2013 Jan 26;4(1):5. PMID: 23351881.
Biomedical Informatics22
Acknowledgements
• NIH/NIGMS (U19 GM61388; the Pharmacogenomic Research Network)
• Agency for Healthcare Research and Quality (K12HS019461).
Biomedical Informatics23
Backup Slides
Biomedical Informatics24
Linkage in LinkedSPLs
An active ingredient from an SPL
Active ingredient resource in Linked SPLs
SPL resource
dailymed:activeMoiety“OLANZAPINE”
“N7U69T4SZR”dailymed:activeMoietyUNII
Biomedical Informatics25
Linkage to DrugBank – Approach 1
1. FDA UNII table provides structure string:
2. NCI Resolver provides InChIKey:
3. DrugBank record with the above InChIKey provides identifier:
Results:429 out of 2,264 ingredients are linked, out of which 424 are
valid
“N7U69T4SZR”
Starting with UNII….
2-METHYL-4-(4-METHYL-1-PIPERAZINYL)-10H-THIENO(2,3-B)(1,5)BENZODIAZEPINE
KVWDHTXUZHCGIO-UHFFFAOYSA-N
DB00334
Idea: Using NCI Resolver & InChIKey
Biomedical Informatics26
Linkage to DrugBank – Approach 2
“OLANZAPINE”
1. ChEBI preferred name from NCBO Bioportal:
2. ChEBI identifier from NCBO Bioportal:
3. DrugBank record with the above ChEBI identifier provides identifier:
Results:718 out of 2,264 ingredients are linked, out of which 707 are
valid
“OLANZAPINE”
7735
DB00334
Idea: Using ChEBI identifier & NCBO Portal
Starting with name….
Biomedical Informatics27
Linkage to DrugBank – Approach 3
Starting with all data in the FDA UNII table and DrugBank….
1. Index all FDA UNII table and DrugBank XML attributes2. Search for linkage points and score similarity: UNII -> Substance Name DrugBank -> brands -> brand: 0.94 UNII -> Preferred Substance Name DrugBank -> name : 0.91 UNII -> Substance Name DrugBank -> synonyms -> synonym : 0.83 …3. Prune list of linkage points based on cardinality, coverage, and average score4. Establish links between FDA UNII table and DrugBank using the linkage points UNII “OLANZAPINE” DrugBank “Zyprexa” : 1.0 …Results: 1,179 out of 2,264 ingredients are linked, out of which 1,169 are valid
“N7U69T4SZR”
UNII
“OLANZAPINE”
Preferred Substance Name
“2-METHYL-4….”
Molecular Formula
“ZYPREXA”
synonym
Idea: Automatic discovery of linkage points
Biomedical Informatics28
Linkage Point Discovery Framework• A generic framework for unsupervised
discovery of linkage points
Details can be found at: O. Hassanzadeh et al. “Discovering Linkage Points over Web Data”. To Appear in PVLDB, Vol 6. Issue 6, August 2013