efficient biomedical literature mining › documents › sophicdocs › biolt product p… ·...
Post on 03-Jul-2020
1 Views
Preview:
TRANSCRIPT
Product: BioXM Knowledge Management Environment
Applications: knowledge management and semantic data integration,
research collaboration, information publishing and project management
Product contact: info@biomax.com
Biomax Informatics AG, Lochhamer Str. 9, 82152 Martinsried, Germany, +49 89 895574-0
Sophic Systems Alliance, Inc., 200 Main Street, Suite 201, Falmouth, MA 02536, USA, (508) 495-3801
Internet: www.biomax.com, www.sophicalliance.com
Efficient biomedical literature miningScientists involved in disease-specific research, target-gene identification, target
validation, chemical-compound development, diagnostics and treatment spend valuable
time screening scientific literature. At the expense of productive research time, the
results are often unstructured and notably incomplete.
The BioLT™ Literature Mining Tool provides an alternative. The intuitive software
performs structured text mining using a number of highly curated biological and medical
term dictionaries. The tool extracts relations from search terms and their synonyms to
terms in selected dictionaries. More than 166 million pre-calculated relations and free-
text search capabilities ensure comprehensive research area coverage. The resulting
structured information can be easily shared, extended and updated. The results provide
a starting point to generate and refine knowledge and hypotheses. The BioLT tool allows
researchers to save time and produce significantly superior output compared to common
PubMed searches. Integration of the BioLT tool in research infrastructures, for example
the BioXM™ Knowledge Management Environment, can improve efficiencies and
outcomes of R&D projects considerably.
Building a knowledge base for oncology with BioLT linguistics
The BioLT tool is the central data mining component used to create a manually curated,
up-to-date index covering all cancer genes, including their compound and disease
relationships. Preliminary results were published at ISMB 2005.
Biomax also offers to carry out customized text-mining projects for other disease areas
and biological contexts.
The BioLT tool provides
comprehensive, structured and
ranked answers to the
following types of questions:
• Which genes and proteins are
known to be related to breast
cancer?
(For example, the BioLT tool
presents a sorted list of about
3,000 gene/protein terms,
compared to over 130,000
abstracts in PubMed)
• For obesity, which genes show
genetic variation and which
varients are described (e.g.,
nutrigenomics and
pharmacogenomics use cases)?
• Which diseases and drug
compounds are potentially
related to Alzheimer’s disease?
The BioLT tool with query results for diseases related to the protein apoE
Biomax Informatics AG, Lochhamer Str. 9, 82152 Martinsried, Germany, +49 89 895574-0
Sophic Systems Alliance, Inc., 200 Main Street, Suite 201, Falmouth, MA 02536, USA, (508) 495-3801
Internet: www.biomax.com, www.sophicalliance.com
Automatically generated expert
knowledge
The BioLT tool delivers clearly
structured results with extraordinary
recall and precision, as shown in the
following benchmark example. The
BioLT results were compared to a
manually curated list of "all major
pathways and hereditary cancer
predisposition types" each related to
one of 57 representative predisposition
genes (Vogelstein and Kinzler, 2004*).
With 100% recall, all 57 genes and 57
cancer types were represented in the
BioLT dictionaries. 95% of the
relationships were ranked in the top
three results of up to thousands of hits.
For the remaining three genes, the
corresponding diseases were found in
positions four and five.
The BioLT tool automatically generates
comprehensive results comparable to
the knowledge of expert scientists. The
BioLT text-mining approach works for
other disease areas (such as
cardiovascular, neurological and
infectious diseases) and for additional
biological research areas as well.
Integration into biological and
medical project management
The BioLT tool uses hiqh-quality
thematic dictionaries to identify
relationships between research objects.
The dictionaries can be extended and
customized. The following dictionaries
are currently available:
• Disease — 260,000 entries
• Gene name — 130,000 human gene
names, including name variants
• Compound — 82,000 entries
• Pathway — 61,000 entries
• Organism — 275,000 entries
• Other subdomains (e.g.,
polymorphism, therapy, tissues, cells)
These relationship data sets can be
imported into the BioXM Knowledge
Management Environment for further
curation. With the upload, they are
automatically integrated into a user-
defined biological or medical context.
Thus, BioLT results become part of an
efficient infrastructure even for large
distributed R&D projects.
* Vogelstein B and Kinzler KW (2004) Cancergenes and the pathways they control. Nat Med10(8):789–99
Text-mining technology
In contrast to classical information
retrieval systems, the BioLT software
preprocesses the underlying text
databases (such as scientific or patent
information) with specific background
information. The system first recognizes
all chunks of text (phrases), special
patterns for scientific notations and
words belonging to terminology
dictionaries. After the syntactic analysis,
the system tries to determine the
meaning of ambiguous terms. To
ensure the most complete results,
potentially false meanings are marked,
but are not deleted from the knowledge
database. The resulting text databases
are manually curated by experts to
create the thematic dictionaries used
by the BioLT system.
The BioLT tool uses the BioRS™
Integration and Retrieval System to add
Boolean free-text search capabilities.
Diverse analysis parameters including
the scope of the search, the level of
precision, the resolution of terms with
multiple meanings and the statistical
representation of the results can be
selected.
Dictionary terms in all abstracts BioLT results in the context of a clinical study, displayed in the BioXM software
Biomax, BioLT, BioRS and BioXM are registered trademarks of Biomax Informatics AG in Germany and other countries. Registered names, trademarks, etc., used in this docu-ment, even when not specifically marked as such, are not to be considered unprotected by law. BIOLTPPR0602
FREE TRIAL The BioLT tool for efficient text mining of the MEDLINE database is available using a common Web browser from
www.biomax.com/products/biolt/biolt.htm. Contact us for a free demo account and see how the BioLT tool can speed your research.
top related