text mining for organism and environment names
DESCRIPTION
Text mining for organism and environment namesTRANSCRIPT
Lars Juhl Jensen
Text mining for organismand environment names
who am I?
sequence analysis
protein networks
string-db.org
chemical networks
stitch-db.org
group leader
proteomics
subcellular localization
compartments.jensenlab.org
tissue expression
tissues.jensenlab.org
disease associations
medical informatics
Jensen et al., Nature Reviews Genetics, 2012
cofounder
me
why text mining?
data mining
unstructured text
>10 km
too much to read
computer
as smart as a dog
teach it specific tricks
named entity recognition
comprehensive lexicon
organisms
environments
expansion rules
plural and adjective forms
flexible matching
hyphens and spaces
“black list”
a
execution modes
C++ batch tagger
Python API
web service
questions?