biotea poster biolinks at ismb 2013
TRANSCRIPT
Biotea: RDFizing PubMed Central in Support for the Paper as an Interface to the Web of Data
Leyla Garcia CastroDepartamento de Leguajes y Sistemas Informáticos
Universitat Jaumé I
Alexander Garcia, Casey Mclaughlin, Institute for Digital Information,
Florida State University. Tallahassee
Corresponding author: [email protected]
In a nutshell, Biotea at http://biotea.idiginfo.org• Is a semantic dataset for full-text, open-access subset of PubMed Central• Makes extensive use of existing ontologies and semantic enrichment services• Supports the generation of self-describing machine- readable scholarly
documents. • Comprises a flexible and adaptable set of tools for metadata enrichment and
semantic processing of biomedical documents.• Provides semantically rich and highly interconnected dataset with self-describing
content.
AGC and CM have been funded by US DoD Grant MOMRP w81xwh-10-2-0181.
Scholarly data and documents are of most value when they are interconnected rather than independent Christine L. Borgman
Consuming the dataset, a first prototype
Graph-based retrieval for the terms “catalase”; only shared terms with more than 30 associated biological terms are included in the results.
Search and retrieval based on human gene names: the term is resolved with GeneWiki, and the associated UniProt accession is used in the query
RDF4PMC and Bio2RDF
1. Retrieval: Metadata + Cloud of annotations
Enriched content based on annotations is displayed in the interactive zone
Interactive zone
Contextual reading
Graphical tools
2. Enriched content facts-based reading
NXML
MetadataBIBO
RDFized article
Content CNT Provenance PROV-OVOID
Annotation
Enriched contentRDFization
1. Metadata & content
2. Semantic content enrichment
RDF4PMC, our workflow
3. Navigating the neighborhood
Consuming the dataset, SPARQL and API Retrieval Service
A list of terms and their related topics http://biotea.idiginfo.org/api/terms
A list of topics and their related vocabularies http://biotea.idiginfo.org/api/topics
All topics related to a term e.g., http://biotea.idiginfo.org/api/topics?term=cancer
All vocabularies related to a term e.g., http://biotea.idiginfo.org/api/vocabularies?term=cancer
All terms that start with a specific string (for autocompletion) e.g.,http://biotea.idiginfo.org/api/terms?prefix=canc
All topics related to a vocabulary e.g., http://biotea.idiginfo.org/api/topics?vocabulary=po
RDF of articles that include a term e.g., http://biotea.idiginfo.org/api/articles?term=cancer
Count of RDF of articles that include a term e.g., http://biotea.idiginfo.org/api/articles?term=cancer&count=true
A list of vocabularies and their prefixes http://biotea.idiginfo.org/vocabularies
RDF of articles that include a vocabulary e.g., http://biotea.idiginfo.org/api/articles?vocabulary=po
SPARQL query Query expressed in natural language
SELECT distinct ?pmidWHERE { ?article a bibo:AcademicArticle ; bibo:pmid ?pmid .?annotation a aot:ExactQualifier ;ao:annotatesResource ?article ;ao:hasTopic <http://purl.obolibrary.org/obo/CHEBI_60004> .}
Retrieving PubMed identifier for those articles that
have been semantically annotated with the biological
entity CHEBI:60004. The semantic annotation comes
from the occurrence of the term “mixture” in any
paragraph of the retrieved articles.