text mining for chemistry and building a public platform for document markup
DESCRIPTION
Text Mining for Chemistry and Building a Public Platform for Document Markup The identification of chemical names in documents has provided platforms to enable structure-based searching of patents and mark-up chemistry publications. A natural extension is the ability to make chemistry articles, blog pages, wiki pages and other documents searchable by the extracted chemical structures. The ChemSpider database is built on a database of over 21 million unique chemical entities from close to 200 data sources and provides a rich resource of information for chemists. We will report on our efforts to integrate chemical name extraction with the ChemSpider platform to enable structure searching of Open Access chemistry articles, and online chemistry materials. We will unveil our online document markup platform for chemists to make both their open- and closed-access publications searchable by the language of chemistry – the structure.TRANSCRIPT
![Page 1: Text Mining for Chemistry and Building a Public Platform for Document Markup](https://reader035.vdocuments.us/reader035/viewer/2022070315/554ead93b4c905fb7c8b4f0c/html5/thumbnails/1.jpg)
Text mining for chemistry Text mining for chemistry and building a public and building a public
platform for document platform for document markupmarkup
Antony WilliamsAntony Williams
![Page 2: Text Mining for Chemistry and Building a Public Platform for Document Markup](https://reader035.vdocuments.us/reader035/viewer/2022070315/554ead93b4c905fb7c8b4f0c/html5/thumbnails/2.jpg)
Building the Primary Web Portal for Chemistry
Searching and Reading Searching and Reading Articles…Articles…
Online search tools for chemistry articles are Online search tools for chemistry articles are generally text-basedgenerally text-based
Searching articles based on chemical structure Searching articles based on chemical structure and substructure is very expensive.. but is and substructure is very expensive.. but is changingchanging
Text-mining is a “hot area” of research ….but Text-mining is a “hot area” of research ….but what is public? What depends on public curation? what is public? What depends on public curation?
![Page 3: Text Mining for Chemistry and Building a Public Platform for Document Markup](https://reader035.vdocuments.us/reader035/viewer/2022070315/554ead93b4c905fb7c8b4f0c/html5/thumbnails/3.jpg)
Building the Primary Web Portal for Chemistry
Text-Based Search Tools Text-Based Search Tools
GoogleGoogle Pubmed Pubmed Google ScholarGoogle Scholar Publishers websitesPublishers websites And 10s of other resources….And 10s of other resources….
![Page 4: Text Mining for Chemistry and Building a Public Platform for Document Markup](https://reader035.vdocuments.us/reader035/viewer/2022070315/554ead93b4c905fb7c8b4f0c/html5/thumbnails/4.jpg)
Building the Primary Web Portal for Chemistry
Vancomycin Through Vancomycin Through PubChemPubChem
![Page 5: Text Mining for Chemistry and Building a Public Platform for Document Markup](https://reader035.vdocuments.us/reader035/viewer/2022070315/554ead93b4c905fb7c8b4f0c/html5/thumbnails/5.jpg)
Building the Primary Web Portal for Chemistry
Vancomycin Text SearchesVancomycin Text Searches
PubmedPubmed
Google ScholarGoogle Scholar
![Page 6: Text Mining for Chemistry and Building a Public Platform for Document Markup](https://reader035.vdocuments.us/reader035/viewer/2022070315/554ead93b4c905fb7c8b4f0c/html5/thumbnails/6.jpg)
Building the Primary Web Portal for Chemistry
Online Structure Searching of Online Structure Searching of ArticlesArticles
Some capabilities from publishers starting Some capabilities from publishers starting to show upto show up
![Page 7: Text Mining for Chemistry and Building a Public Platform for Document Markup](https://reader035.vdocuments.us/reader035/viewer/2022070315/554ead93b4c905fb7c8b4f0c/html5/thumbnails/7.jpg)
Building the Primary Web Portal for Chemistry
Publishers should adopt/add Publishers should adopt/add InChIsInChIs
RSC and Nature Publishing Group RSC and Nature Publishing Group have!have!
![Page 8: Text Mining for Chemistry and Building a Public Platform for Document Markup](https://reader035.vdocuments.us/reader035/viewer/2022070315/554ead93b4c905fb7c8b4f0c/html5/thumbnails/8.jpg)
Building the Primary Web Portal for Chemistry
![Page 9: Text Mining for Chemistry and Building a Public Platform for Document Markup](https://reader035.vdocuments.us/reader035/viewer/2022070315/554ead93b4c905fb7c8b4f0c/html5/thumbnails/9.jpg)
Building the Primary Web Portal for Chemistry
ChemMantis - Single Click ChemMantis - Single Click Mark-up Mark-up
![Page 10: Text Mining for Chemistry and Building a Public Platform for Document Markup](https://reader035.vdocuments.us/reader035/viewer/2022070315/554ead93b4c905fb7c8b4f0c/html5/thumbnails/10.jpg)
Building the Primary Web Portal for Chemistry
Name-Structure PairsName-Structure Pairs
![Page 11: Text Mining for Chemistry and Building a Public Platform for Document Markup](https://reader035.vdocuments.us/reader035/viewer/2022070315/554ead93b4c905fb7c8b4f0c/html5/thumbnails/11.jpg)
Building the Primary Web Portal for Chemistry
Converting Detected Names…Converting Detected Names…
Names are searched against a validated Names are searched against a validated dictionary (this expands as ChemSpider is dictionary (this expands as ChemSpider is curatedcurated
If not found then they are passed through If not found then they are passed through a Name to Structure algorithma Name to Structure algorithm
If they cannot convert then ChemSpider is If they cannot convert then ChemSpider is searched for non-validated namessearched for non-validated names
![Page 12: Text Mining for Chemistry and Building a Public Platform for Document Markup](https://reader035.vdocuments.us/reader035/viewer/2022070315/554ead93b4c905fb7c8b4f0c/html5/thumbnails/12.jpg)
Building the Primary Web Portal for Chemistry
RED UnderlineRED UnderlineNon-validated, Cannot Convert Non-validated, Cannot Convert
through NTSthrough NTS ““Names” can be Names” can be
added to Suppress added to Suppress ListList
![Page 13: Text Mining for Chemistry and Building a Public Platform for Document Markup](https://reader035.vdocuments.us/reader035/viewer/2022070315/554ead93b4c905fb7c8b4f0c/html5/thumbnails/13.jpg)
Building the Primary Web Portal for Chemistry
BLUE UnderlineBLUE UnderlineName to Structure Converted Name to Structure Converted
![Page 14: Text Mining for Chemistry and Building a Public Platform for Document Markup](https://reader035.vdocuments.us/reader035/viewer/2022070315/554ead93b4c905fb7c8b4f0c/html5/thumbnails/14.jpg)
Building the Primary Web Portal for Chemistry
Deposit StructuresDeposit Structures
![Page 15: Text Mining for Chemistry and Building a Public Platform for Document Markup](https://reader035.vdocuments.us/reader035/viewer/2022070315/554ead93b4c905fb7c8b4f0c/html5/thumbnails/15.jpg)
Building the Primary Web Portal for Chemistry
Entity Extraction built Entity Extraction built around modified around modified algorithms from SureChemalgorithms from SureChem
Optimized for Optimized for “publications”“publications”
Dictionaries for chemical Dictionaries for chemical entities, groups, reactions, entities, groups, reactions, elements, families, elements, families, species…species…
Dictionaries can be Dictionaries can be expanded – presently expanded – presently adding PDBadding PDB
![Page 16: Text Mining for Chemistry and Building a Public Platform for Document Markup](https://reader035.vdocuments.us/reader035/viewer/2022070315/554ead93b4c905fb7c8b4f0c/html5/thumbnails/16.jpg)
Building the Primary Web Portal for Chemistry
Species..Species..
![Page 17: Text Mining for Chemistry and Building a Public Platform for Document Markup](https://reader035.vdocuments.us/reader035/viewer/2022070315/554ead93b4c905fb7c8b4f0c/html5/thumbnails/17.jpg)
Building the Primary Web Portal for Chemistry
What do you do with a markup What do you do with a markup system?system?
Test it, Show it off and make it available…Test it, Show it off and make it available… Tested on chemistry articles so why not Tested on chemistry articles so why not
HOST articles?HOST articles? ……and create an online journal…and create an online journal…
![Page 18: Text Mining for Chemistry and Building a Public Platform for Document Markup](https://reader035.vdocuments.us/reader035/viewer/2022070315/554ead93b4c905fb7c8b4f0c/html5/thumbnails/18.jpg)
Building the Primary Web Portal for Chemistry
The ChemSpider JournalThe ChemSpider Journal
![Page 19: Text Mining for Chemistry and Building a Public Platform for Document Markup](https://reader035.vdocuments.us/reader035/viewer/2022070315/554ead93b4c905fb7c8b4f0c/html5/thumbnails/19.jpg)
Building the Primary Web Portal for Chemistry
Open Access Community Open Access Community JournalJournal
![Page 20: Text Mining for Chemistry and Building a Public Platform for Document Markup](https://reader035.vdocuments.us/reader035/viewer/2022070315/554ead93b4c905fb7c8b4f0c/html5/thumbnails/20.jpg)
Building the Primary Web Portal for Chemistry
Deposit ArticleDeposit Article
Import URL or DocumentImport URL or Document Copy-PasteCopy-Paste MarkupMarkup
![Page 21: Text Mining for Chemistry and Building a Public Platform for Document Markup](https://reader035.vdocuments.us/reader035/viewer/2022070315/554ead93b4c905fb7c8b4f0c/html5/thumbnails/21.jpg)
Building the Primary Web Portal for Chemistry
Copy-Paste VersionCopy-Paste VersionMartin Walker Monthly ArticleMartin Walker Monthly Article
![Page 22: Text Mining for Chemistry and Building a Public Platform for Document Markup](https://reader035.vdocuments.us/reader035/viewer/2022070315/554ead93b4c905fb7c8b4f0c/html5/thumbnails/22.jpg)
Building the Primary Web Portal for Chemistry
Chemical namesChemical names
![Page 23: Text Mining for Chemistry and Building a Public Platform for Document Markup](https://reader035.vdocuments.us/reader035/viewer/2022070315/554ead93b4c905fb7c8b4f0c/html5/thumbnails/23.jpg)
Building the Primary Web Portal for Chemistry
Names, Elements, Groups, Names, Elements, Groups, FamiliesFamilies
![Page 24: Text Mining for Chemistry and Building a Public Platform for Document Markup](https://reader035.vdocuments.us/reader035/viewer/2022070315/554ead93b4c905fb7c8b4f0c/html5/thumbnails/24.jpg)
Building the Primary Web Portal for Chemistry
OutlinksOutlinks
![Page 25: Text Mining for Chemistry and Building a Public Platform for Document Markup](https://reader035.vdocuments.us/reader035/viewer/2022070315/554ead93b4c905fb7c8b4f0c/html5/thumbnails/25.jpg)
Building the Primary Web Portal for Chemistry
Mark Up Open Access ArticleMark Up Open Access Article
![Page 26: Text Mining for Chemistry and Building a Public Platform for Document Markup](https://reader035.vdocuments.us/reader035/viewer/2022070315/554ead93b4c905fb7c8b4f0c/html5/thumbnails/26.jpg)
Building the Primary Web Portal for Chemistry
Online Journals and Live DataOnline Journals and Live Data
![Page 27: Text Mining for Chemistry and Building a Public Platform for Document Markup](https://reader035.vdocuments.us/reader035/viewer/2022070315/554ead93b4c905fb7c8b4f0c/html5/thumbnails/27.jpg)
Building the Primary Web Portal for Chemistry
A Community Resource of A Community Resource of SpectraSpectra
Spectra deposited on ChemSpider as Spectra deposited on ChemSpider as “Open Data” are available to anybody to “Open Data” are available to anybody to “Embed” in their articles, blogs, wikis etc“Embed” in their articles, blogs, wikis etc
![Page 28: Text Mining for Chemistry and Building a Public Platform for Document Markup](https://reader035.vdocuments.us/reader035/viewer/2022070315/554ead93b4c905fb7c8b4f0c/html5/thumbnails/28.jpg)
Building the Primary Web Portal for Chemistry
Present DictionariesPresent Dictionaries
Chemical names - ChemSpider Validated Chemical names - ChemSpider Validated NamesNames
Reactions - Wikipedia Named Reactions Reactions - Wikipedia Named Reactions and RSC Reaction Ontology reactionsand RSC Reaction Ontology reactions
Species – Wikipedia “species”Species – Wikipedia “species”
To add – New DictionariesTo add – New Dictionaries PDB codesPDB codes IUPAC Gold BookIUPAC Gold Book
![Page 29: Text Mining for Chemistry and Building a Public Platform for Document Markup](https://reader035.vdocuments.us/reader035/viewer/2022070315/554ead93b4c905fb7c8b4f0c/html5/thumbnails/29.jpg)
Building the Primary Web Portal for Chemistry
ConclusionsConclusions
The internet enables chemistry – and at a reduced The internet enables chemistry – and at a reduced costcost
Web 2.0 is here and improving quality – to benefit Web 2.0 is here and improving quality – to benefit 3.03.0
Question Quality!Question Quality! Crowdsourcing for expansion, curation and Crowdsourcing for expansion, curation and
integrationintegration Classical models may die quite quickly – business Classical models may die quite quickly – business
models must change soon or failmodels must change soon or fail Publishers – Publishers – heed the profileration of InChIs for heed the profileration of InChIs for
ChemistryChemistry