the chemical most common denominator: use of chemical ...bulletin.acscinf.org/pdfs/247nmacs69.pdf•...
TRANSCRIPT
InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014
1 / 27
The chemical most common denominator:
Use of chemical structures for semantic enrichment
and interlinking of scientific information
V. Eigner-Pitto, J. Eiblmaier, H. Kraut, L. Isenko, H. Saller, P. Loew
InfoChem GmbH, Landsberger Strasse 408, Munich, 81241, Germany
InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014
2 / 27
Outline
• Introduction
o Role of structure searching
o Where do I perform structure searches?
o Cost implications
• Setting the scene: chemical structures as common denominator?
o Publishers efforts
Creation of chemical content
Semantic enrichment of journal articles
• Case Studies:
o Wiley The Smart Article
o Springer Chemistry Data Warehouse
http://www.bubblews.com/news/2372700-tips-to-be-a-professional-content-writer
InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014
3 / 27
Why Structure Searching?
• CICAG (RSC) Survey by Neil Stutchbury, May 20, 2009
Chemical Information Mining: Possibilities and Pitfalls
(http://www.rsc.org/images/ChemInfoMining_tcm18-153536.pdf)
65 responses from Pharma, Academia, Vendors, and Publishers
“Search documents by chemical structure or substructure”?
InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014
4 / 27
Diazepam OR Valium OR Ansiolisina OR Diazemuls OR Relanium OR Stesolid OR
Apaurin OR Faustan OR Seduxen OR Sibazon OR Methyldiazepinone OR Calmocitene
OR Neurolytril OR Bialzepam OR Ceregulart OR Condition OR Diazetard OR Liberetas
OR Relaminal OR Serenamin OR Tranquirit OR Ansiolin OR Apozepam OR Atensine
OR Bensedin OR Calmpose OR Diacepan OR Diazepan OR Dipezona OR Domalium
OR Kiatrium OR Paranten OR Quetinil OR Quiatril OR Quievita OR Renborin OR
Ruhsitus OR Seduksen OR Serenack OR Serenzin OR Stesolin OR Tensopam OR
Horizon OR Lembrol OR Morosan OR Saromet OR Sedipam OR Setonil Anxionil OR
Benzopin OR Calmaven OR Chuansuan OR Desconet OR Desloneg OR Diaceplex OR
Diazepin OR Gewacalm OR Jinpanfan OR Mentalium OR Metamidol OR Nixtensyn OR
Novodipam OR Pacitran OR Paralium OR Prozepam OR Psychopax OR Radizepam OR
Simasedan OR Trankinon OR Trazepam OR Valaxona OR Valiquid OR Valuzepam OR
Vanconin OR Antenex OR Arzepam OR Betapam OR Diapine OR Diaquel OR 7-Chloro-
1,3-dihydro-1-methyl-5-phenyl-2H-1,4-benzodiazepin-2-one OR NCGC00178168-01 OR
WLN: T67 GNV JN IHJ CG G1 KR OR 2H-1,4-Benzodiazepin-2-one, 7-chloro-1,3-
dihydro-1-methyl-5-phenyl- OR CPD000058398 OR SAM001246536 OR
SMR000058398 OR 439-14-5 OR 7-Chloro-1-methyl-5-phenyl-3H-1,4-benzodiazepin-
2(1H)-one OR 7-Chloro-1-methyl-2-oxo-5-phenyl-3H-1,4-benzodiazepine OR 7-Chloro-
1-methyl-5-phenyl-2H-1,4-benzodiazepin-2-one OR C06948 OR D00293 OR 5-24-04-
00300 OR D003975 OR A3662/0155188 OR I06-0194 OR 1-Methyl-5-phenyl-7-chloro-
1,3-dihydro-2H-1,4-benzodiazepin-2-one OR 7-Chloro-1-methyl-5-3H-1,4-
benzodiazepin-2(1H)-one OR 7-chloro-1-methyl-5-phenyl-3H-1,4-benzodiazepin-2-one
OR DZP OR Dap OR Pax OR 11100-37-1 OR 53320-84-6 OR
InChI=1/C16H13ClN2O/c1-19-14-8-7-12(17)9-13(14)16(18-10-15(19)20)11-5-3-2-4-6-
11/h2-9H,10H2,1H
... (343 Synonyms!)
„Full Text Searching is Sufficient!“
WLN
SMILES
SMARTS
ROSDAL
Connection Table
Molfile
SDfile
CML
InChI
InChI Key
http://us.cdn4.123rf.com/168nwm/baz777/baz7771101/baz777110100
058/8576422-cartoon-scienziato-isolato-su-sfondo-bianco.jpg
InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014
5 / 27
Where am I Able to Perform Structure Searches?
InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014
6 / 27
Manuscript submission
Publishing
Cost Implications
Manual Indexing
Database production
http://premium.wpmudev.org/blog/tutorial-
how-to-add-authors-images-to-your-
wordpress-blog/
InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014
7 / 27
Publishers Efforts
• Production of chemical content:
o chemical named entity recognition
o automatic CDX work-up
• Semantic enrichment of journal articles
o RSC: RSC Semantic Publishing (Project Prospect)
o NPG: sematically enriched PDF
o Elsevier: Article of the future
o Wiley: The Smart Article
• Structure search on web-pages (partly)
• Production of chemical content:
o chemical named entity recognition
o automatic CDX work-up
• Semantic enrichment of journal articles
o RSC: RSC Semantic Publishing (Project Prospect)
o NPG: sematically enriched PDF
o Elsevier: Article of the future
o Wiley: The Smart Article
• Structure search on web-pages (partly)
InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014
8 / 27
Manual Indexing
Publishing
Production of Chemical Content
InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014
9 / 27
Publishers Efforts
• Production of chemical content:
o chemical named entity recognition
o automatic CDX work-up
• Semantic enrichment of journal articles
o RSC: RSC Semantic Publishing (Project Prospect)
o NPG: sematically enriched PDF
o Elsevier: Article of the future
o Wiley: The Smart Article
• Structure search on web-pages (partly) http://manuelo-pro.deviantart.com/art/Disclaimer-281316501
• Production of chemical content:
o chemical named entity recognition
o automatic CDX work-up
• Semantic enrichment of journal articles
o RSC: RSC Semantic Publishing (Project Prospect)
o NPG: sematically enriched PDF
o Elsevier: Article of the future
o Wiley: The Smart Article
• Structure search on web-pages (partly)
InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014
10 / 27
• Pioneer work: Project Prospect (2007)
• Online since 2011
• Extraction of chemical names from over
30,000 journal articles
• Integration of compounds into ChemSpider
• Approach integrated within routine
publication processes
• Features:
o Highlighting of:
Compounds
Chemical terms
Biomedical terms
o Link to compounds in ChemSpider
o Structure search only in ChemSpider
RSC: Semantic Publishing
InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014
11 / 27
• XMP-embedded PDFs available online since 2008
• Entity specific annotation service:
o SureChem for chemical compounds
o LuXiD for genes/proteins
o …
• Mix between automated services and editorial QA
• Features:
o Figures and compound browser
o Links to:
Web of Science
PubMed
CAS Reference Linking
o No structure search
Nature Publishing Group: Semantically Enriched PDF
InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014
12 / 27
• Launched 2012
• Guiding principals:
o readability
o discoverability
o extensibility
• Supplementary content, features and external
databases info presented in right sidebar
• Features:
o 3-pane presentation layout:
navigation bar
main content area
right sidebar
o Links to:
NCBI
Reaxys
… (depending on subject)
o No structure search
Elsevier: Article of the Future
InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014
13 / 27
Wiley: The Smart Article
• Launched in 2012
• Goal: providing quick information on chemical compounds
featured in an article, chemical terms in the text, and other
key parts of the chemistry within the article
• Live for following journals and major reference works:
o Chemistry: An Asian Journal
o Chirality
o Applied Organometallic Chemistry
o Journal of Physical Organic Chemistry
o Journal of Heterocyclic Chemistry
o eEros
o Organic Synthesis
o Organic Reactions
• Features:
o Compound browser
o Chemistry term highlighter
o Compound index
o Enhanced abstract page
o Compound record
o Chemistry structure search
InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014
14 / 27
Structure as Common Denominator: 2 Use Cases
Data Warehouse Concept
InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014
15 / 27
The Challenge*
*Reinhard Neudert: Enhancing the User Experience for Wiley Chemistry Content, ICIC 2012 14. – 17. October, Berlin
InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014
16 / 27
Text Annotation and Scheme Enumeration:
Chemistry Enrichment Workflow*
*Reinhard Neudert: Enhancing the User Experience for Wiley Chemistry Content, ICIC 2012 14. – 17. October, Berlin
InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014
17 / 27
Examples
InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014
18 / 27
Data Warehouse Concept
The Challenge
• Interlink different data repositories via chemical structure
• Create one search interface
• Data aggregation / results consolidation
InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014
19 / 27
Data: Selected SpringerLink Subject Collections (1846 – 2011):
• Biomedical and Life Sciences
• Chemistry and Material Science
• Earth and Environmental Science
• Engineering
• Medicine
• Physics and Astronomy
InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014
20 / 27
Structures
Structures (annotated)
Structures
Reactions
Database
Structures
--------
Full-text
Structures (annotated)
Full-text
Structures (annotated)
Full-text
Structures (annotated)
Full-text
Structures (annotated)
Structures
Reactions
Structures
Reactions
Concept: Springer Chemistry Data Warehouse
Springer Chemistry
Data Warehouse
InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014
21 / 27
Document
display
The Demonstrator
Springer Structure
Data Warehouse
Structure search
Display
servers
Client
computers
Contains:
• master index of all structures
• basic molecule attributes
• links to the source page/document
Internet/Intranet
HTTP(S)
InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014
22 / 27
Example: Entry Point Document
InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014
23 / 27
Example: Entry Point Structure Search
InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014
24 / 27
Example: Entry Point Substructure Search
InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014
25 / 27
Summary
• Importance of structure searching and cost implications
• Publisher efforts
o Automatic generation of chemical content
o Semantic enrichment
• Case studies where structure is common denominator
o Generation of chemical content for Wiley
o Springer study: “Data Warehouse”
Proof of concept
Demonstrator
http://writing.phillipmartin.info/la_summary.htm
InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014
26 / 27
Conclusions
• Starting middle 2000 chemical structure gains significance by publishers
• Publishers recognize importance of structure searching
• Chemical content is generated to a greater extent with automatic processes
The chemical structure is an
extremely efficient entity to be
used for effective retrieval as well
as linking of different sources
http://www.nedarc.org/emsDataSystems/lessonslearned.html
InfoChem GmbH © 2014 Dr. Valentina Eigner Pitto ACS National Meeting, Dallas, March 19, 2014
27 / 27
Acknowledgments
• Reinhard Neudert (Wiley)
• Wendy Warr
• InfoChem Team
o Josef Eiblmaier
http://www.wien2k.at/pictures/pa2005/pa/Thank%20you%20for%20your%20attention%2001.html
http://www.allenschool.edu/blog-online/questions-medical-billing-job-offered/2681/