extending the "web of drug identity" with knowledge extracted from united states product...

28
Biomedical Informatics 1 Extending the “Web of Drug Identity” with Knowledge Extracted from United States Product Labels Oktie Hassanzadeh, IBM Research Qian Zhu, Mayo Clinic Robert Freimuth, Mayo Clinic Richard Boyce*, University of Pittsburgh Department of Biomedical Informatics

Upload: richard-boyce-phd

Post on 07-May-2015

458 views

Category:

Documents


0 download

DESCRIPTION

Report on Linked Structured Product Labels (LinkedSPLs) and a study evaluating three different approaches to mapping active ingredients coded in Structured Product Labels to DrugBank.

TRANSCRIPT

Page 1: Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

Biomedical Informatics1

Extending the “Web of Drug Identity” with Knowledge Extracted from United States Product LabelsOktie Hassanzadeh, IBM ResearchQian Zhu, Mayo ClinicRobert Freimuth, Mayo ClinicRichard Boyce*, University of Pittsburgh

Department of Biomedical Informatics

Page 2: Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

Biomedical Informatics2

Take home message• Drug product labeling is a vital, unique, and

under-utilized source of claims and evidence about drugs – genes, diseases, drugs, drug interactions, special

populations, and adverse reactions

• All American product labeling content is available in an accessible format– Structured Product Labeling (SPL)

• LinkedSPLs is a Linked Data version of SPLs– simplifies access to SPL content

– interoperable with other important drug terminologies

Page 3: Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

Biomedical Informatics3

Drug product labeling is special?• It complements existing knowledge

sources– 40% of 44 pharmacokinetic drug-drug

interactions affecting 25 drugs were located exclusively in product labeling [1]

– 24% of clinical efficacy trials for 90 drugs were discussed in the product label but not the scientific literature [2]

– 1/5th of the evidence for metabolic pathways for 16 drugs and 19 metabolites was found in product labeling but not the scientific literature [3]1. Boyce RD, Collins C, Clayton M, Kloke J, Horn JR. Inhibitory metabolic drug interactions with newer psycho-tropic drugs: inclusion in package inserts and

influences of concurrence in drug interaction screening software. Ann Pharmacother. 2012;46(10):1287–1298.2. Lee K, Bacchetti P, Sim I. Publication of Clinical Trials Supporting Successful New Drug Applications: A Literature Analysis. PLoS Med. 2008;5(9):e191.3. Boyce R, Collins C, Horn J, Kalet I. Computing with evidence: Part I: A drug-mechanism evidence taxonomy oriented toward confidence assignment. Journal of Biomedical Informatics. 2009;42(6):979–989.

Page 4: Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

Biomedical Informatics4

Why product labeling has information that is not in the scientific literature

1. Product labels contain a summary of information reported in detail in a drug’s New Drug Application

– Often difficult/impossible for a researcher to access

2. Until recently, there was no requirement to publish pre-market drug study results

– This has changed since ~2010

Page 5: Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

Biomedical Informatics5

Product labeling is under-utilized by translational researchers• only two out of more than 2,300

MEDLINE abstracts discuss product label NLP [1]

• Several recent informatics projects did not explicitly include product label information [2-6]

1. Query done on 11/26: (Natural Language Processing [MeSH Terms] OR Natural Language Processing [Text Word]) AND ((Drug Labeling [MeSH Terms] OR drug labeling[Text Word]) OR (Product Labeling, Drug [MeSH Terms]) OR ("product labeling" [Text Word]))

2. Segura-Bedmar I, Martinez P, Sanchez-Cisneros D eds. Proceedings of the First Challenge Task: Drug-Drug Interaction Extraction 2011. Huelva, Spain; 2011. Available at: http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-761/. Accessed December 9, 2011.

3. 16. SEMEVAL. Task Description - Extraction of Drug-Drug Interactions from BioMedical Texts. 2012. Available at: http://www.cs.york.ac.uk/semeval-2013/task9/. Accessed November 20, 2012.

4. Percha B, Garten Y, Altman RB. Discovery and explanation of drug-drug interactions via text mining. Pac Symp Biocomput. 2012:410–421.5. Tari L, Anwar S, Liang S, Cai J, Baral C. Discovering drug-drug interactions: a text-mining and reasoning approach based on properties of drug metabolism.

Bioinformatics. 2010;26(18):i547–553.6. Duke JD, Han X, Wang Z, et al. Literature based drug interaction prediction with clinical assessment using electronic medical records: novel myopathy associated

drug interactions. PLoS computational biology. 2012;8(8):e1002614.

Page 6: Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

Biomedical Informatics6

Doesn’t DrugBank handle this? • Not really!– DrugBank includes product label content from the

Physicians’ Desk Reference (PDR) [1]

– However, the PDR is actually a subset of available product label content

• claims and evidence unique to those drug product labels not included in the PDR will be missing from DrugBank

• potential negative effects on informatics experiments that that require complete drug information.

• E.g., possibly missed drug-interactions (DrugBank 3.0) include cimetidine-sertraline, cimetidine-venlafaxine, cimetidine-citalopram, and venlafaxine-haloperidol. [2-5]

1. Physicians’ Desk Reference, 66th Edition. 2012 Edition. PDR Network; 2011.2. http://dailymed.nlm.nih.gov/dailymed/lookup.cfm?setid=b1de3ed9-1cb8-e419-3f25-5b0aeed5779a. Accessed November 27, 2012.3. http://dailymed.nlm.nih.gov/dailymed/lookup.cfm?setid=cf2d9bee-f8e3-477a-e4b4-f0e82657b7d2. Accessed November 27, 2012.4. http://dailymed.nlm.nih.gov/dailymed/lookup.cfm?setid=4259d9b1-de34-43a4-85a8-41dd214e9177. Accessed November 27, 2012.5. http://dailymed.nlm.nih.gov/dailymed/lookup.cfm?setid=53c3e7ac-1852-4d70-d2b6-4fca819acf26. Accessed November 27, 2012.

Page 7: Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

Biomedical Informatics7

Second take home point:

• All American product labeling content is available in an accessible format– Structured Product Labeling (SPL)

Page 8: Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

Biomedical Informatics8

Structured Product Labels (SPLs)

• What you would see if you downloaded an SPL from DailyMed

1. http://www.fda.gov/OHRMS/DOCKETS/98fr/FDA-2005-N-0464-gdl.pdf2. http://www.fda.gov/ForIndustry/DataStandards/StructuredProductLabeling/default.htm3. http://dailymed.nlm.nih.gov/dailymed/downloadLabels.cfm

Page 9: Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

Biomedical Informatics9

More about SPLs

Page 10: Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

Biomedical Informatics10

More about SPLs

Page 11: Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

Biomedical Informatics11

Third take home point

• LinkedSPLs is a Linked Data version of SPLs– simplifies access to SPL content

– interoperable with other important drug terminologies

Page 12: Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

Biomedical Informatics12

LinkedSPLs – hypothesis

Hypothesis: A Linked Data knowledge base of drug product labels with accurate links to other relevant sources of drug information will provide a dynamic platform for drug information NLP that provides real value to translational researchers

Page 13: Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

Biomedical Informatics13

LinkedSPLs – A research program

Page 14: Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

Biomedical Informatics14

LinkedSPLs – A research program

Your annotations would go here!

Page 15: Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

Biomedical Informatics15

LinkedSPLs – Method

• Currently we are focusing on linking active ingredients in the structured portion of SPLs• unstructured text for future

work

Page 16: Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

Biomedical Informatics16

Linkage to external sources• There are many sources of drug information

that are complementary to each other.– DrugBank: contains drug targets, pathways,

interactions

– RxNorm: provides UMLS mappings

– ChEBI: provides rigorous classification of drugs

Page 17: Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

Biomedical Informatics17

Example

prodName rxNormProduct epcClass contraindications

Nefazodone Hydrochloride

rxcui:1098666 SEROTONIN REUPTAKE INHIBITOR

CONTRAINDICATIONS Coadministration of terfenadine, astemizole, cisapride, pimozide, or carbamazepine with nefazodone hydrochloride is contraindicated….

Page 18: Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

Biomedical Informatics18

What we tested• Three different linking approaches to link

to DrugBank1. Structure string (InChI)

2. Ontology label matching (ChEBI)

3. Unsupervised linkage point discovery (Automated) [1]

1. O. Hassanzadeh et al. “Discovering Linkage Points over Web Data”. To Appear in PVLDB, Vol 6. Issue 6, August 2013

Page 19: Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

Biomedical Informatics19

Linkage to DrugBank – Results

InChI identifier

ChEBI identifier

InChI + ChEBI

Automatic

InChI identifier 424 261 424 395

ChEBI identifier --- 707 707 650

InChI + ChEBI -- -- 831 791

Automatic -- -- -- 1162

• 1,246 active ingredients could be mapped to DrugBank by at least one method

• 1,096 unmapped ingredients

• The three approaches complement each other

Page 20: Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

Biomedical Informatics20

• The automatic approach performs very well– A greater number of accurate links discovered

with less effort

• A significant number remain unmapped:– Some salt or racemic forms of mapped

ingredients (e.g., alpha tocopherol acetate D)– Elements (e.g., gold, iodine), and variety of

natural organic compounds including pollens (N~200)

• Not all ingredients are included in DrugBank– other resources may be required to obtain

complete mappings for active ingredients.

Conclusions

Page 21: Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

Biomedical Informatics21

Want more information?• LinkedSPLs

– http://purl.org/LinkedSPLs

• Google code project– code.google.com/p/swat-4-med-safety/

• Publications– Hassanzadeh, O., Zhu, Qian., Freimuth, RR., Boyce R. Extending the

“Web of Drug Identity” with Knowledge Extracted from United States Product Labels. Proceedings of the 2013 AMIA Summit on Translational Bioinformatics. San Francisco, March 2013.

– Boyce, RD., Freimuth, RR., Romagnoli, KM., Pummer, T., Hochheiser, H., Empey, PE. Toward semantic modeling of pharmacogenomic knowledge for clinical and translational decision support. Proceedings of the 2013 AMIA Summit on Translational Bioinformatics. San Francisco, March 2013.

– Boyce RD, Horn JR, Hassanzadeh O, de Waard A, Schneider J, Luciano JS, Rastegar-Mojarad M, Liakata M. Dynamic enhancement of drug product labels to support drug safety, efficacy, and effectiveness. J Biomed Semantics. 2013 Jan 26;4(1):5. PMID: 23351881.

Page 22: Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

Biomedical Informatics22

Acknowledgements

• NIH/NIGMS (U19 GM61388; the Pharmacogenomic Research Network)

• Agency for Healthcare Research and Quality (K12HS019461).

Page 23: Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

Biomedical Informatics23

Backup Slides

Page 24: Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

Biomedical Informatics24

Linkage in LinkedSPLs

An active ingredient from an SPL

Active ingredient resource in Linked SPLs

SPL resource

dailymed:activeMoiety“OLANZAPINE”

“N7U69T4SZR”dailymed:activeMoietyUNII

Page 25: Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

Biomedical Informatics25

Linkage to DrugBank – Approach 1

1. FDA UNII table provides structure string:

2. NCI Resolver provides InChIKey:

3. DrugBank record with the above InChIKey provides identifier:

Results:429 out of 2,264 ingredients are linked, out of which 424 are

valid

“N7U69T4SZR”

Starting with UNII….

2-METHYL-4-(4-METHYL-1-PIPERAZINYL)-10H-THIENO(2,3-B)(1,5)BENZODIAZEPINE

KVWDHTXUZHCGIO-UHFFFAOYSA-N

DB00334

Idea: Using NCI Resolver & InChIKey

Page 26: Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

Biomedical Informatics26

Linkage to DrugBank – Approach 2

“OLANZAPINE”

1. ChEBI preferred name from NCBO Bioportal:

2. ChEBI identifier from NCBO Bioportal:

3. DrugBank record with the above ChEBI identifier provides identifier:

Results:718 out of 2,264 ingredients are linked, out of which 707 are

valid

“OLANZAPINE”

7735

DB00334

Idea: Using ChEBI identifier & NCBO Portal

Starting with name….

Page 27: Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

Biomedical Informatics27

Linkage to DrugBank – Approach 3

Starting with all data in the FDA UNII table and DrugBank….

1. Index all FDA UNII table and DrugBank XML attributes2. Search for linkage points and score similarity: UNII -> Substance Name DrugBank -> brands -> brand: 0.94 UNII -> Preferred Substance Name DrugBank -> name : 0.91 UNII -> Substance Name DrugBank -> synonyms -> synonym : 0.83 …3. Prune list of linkage points based on cardinality, coverage, and average score4. Establish links between FDA UNII table and DrugBank using the linkage points UNII “OLANZAPINE” DrugBank “Zyprexa” : 1.0 …Results: 1,179 out of 2,264 ingredients are linked, out of which 1,169 are valid

“N7U69T4SZR”

UNII

“OLANZAPINE”

Preferred Substance Name

“2-METHYL-4….”

Molecular Formula

“ZYPREXA”

synonym

Idea: Automatic discovery of linkage points

Page 28: Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

Biomedical Informatics28

Linkage Point Discovery Framework• A generic framework for unsupervised

discovery of linkage points

Details can be found at: O. Hassanzadeh et al. “Discovering Linkage Points over Web Data”. To Appear in PVLDB, Vol 6. Issue 6, August 2013