data mining: ontologies
TRANSCRIPT
-
8/12/2019 Data Mining: Ontologies
1/17
Faculty of Computer Science
2006CMPUT 605 March 31, 2013
Towards Applying Text Mining and Natural
Language Processing for Biomedical
Ontology Acquisition
Inniss T., Light M., Thomas G., Lee J., Grassi M., Williams A.
TMBIO(2006)
John G
-
8/12/2019 Data Mining: Ontologies
2/17
2006
Department of Computing Science
CMPUT 605
Focus
Ontology for describing age-related macular degeneration
(AMD)
Comparison of the accuracy of three methods for Ontology
Natural Language Processing (NLP)Text Mining (SAS Text Miner)Human Expert
Manual and adhoc knowledge acquisition
IDOCS (Intelligent Distributed Ontology Consensus System)
-
8/12/2019 Data Mining: Ontologies
3/17
2006
Department of Computing Science
CMPUT 605
Introduction
No existing common and standardized vocabularyfor classification of disease types for certain eye-diseases
Clinicians, dispersed geographically, may usedifferent terms to describe the same condition
Research aimed at extracting the feature and
attribute descriptions for the vocabulary of AMD,
and build an Ontology from that.
-
8/12/2019 Data Mining: Ontologies
4/17
2006
Department of Computing Science
CMPUT 605
Related Work
Lot of research done, since 1990s, for applying
NLP techniques in medicine, bio-medicine etc.
NLP & Text Data Mining have been recognized to
play an important role in this endeavor
Research focused on online repositories such as
Medline & PubMed
NLP systems developed: MedLee, UMLS, GENIES
etc.
-
8/12/2019 Data Mining: Ontologies
5/17
2006
Department of Computing Science
CMPUT 605
IDOCS
-
8/12/2019 Data Mining: Ontologies
6/17
2006
Department of Computing Science
CMPUT 605
Methodology
Four clinical experts in retinal diseases enlisted to
view 100 eye sample images of AMD
Experts in different geographic locations
Described the observations using digital voice
recordersno artificially imposed vocabulary
constraints
Another retinal expert for manual parsing of the
transcribed textextracting key words,
organization of key-words into categories etc.
-
8/12/2019 Data Mining: Ontologies
7/17 2006
Department of Computing Science
CMPUT 605
Methodology: NLP
NLP: Used for information extraction and automatic
summarization.
Identify short sequences of words having meaning
over and above a meaning composed directly fromtheir partsextreme programming
Ngram Statistics Package (NSP) used for
collocation discovery in case of bi-grams
Word-pair associations measured by PMI
-
8/12/2019 Data Mining: Ontologies
8/17 2006
Department of Computing Science
CMPUT 605
Methodology: NLP
Large PMI for larger degree of association between
the words
s
-
8/12/2019 Data Mining: Ontologies
9/17 2006
Department of Computing Science
CMPUT 605
Methodology:Text Mining (SAS Text Miner)
Collection of documents (corpus) used as input to
any text mining algorithm
Corpus broken into tokens or terms (tokens in a
particular language)
Term weighting Measures: Entropy, Inverse
Document Frequency (IDF), Global Frequency (GF) -
IDF, None (Global weight of 1) & Normal term wt.
-
8/12/2019 Data Mining: Ontologies
10/17 2006
Department of Computing Science
CMPUT 605
Results: Human Experts
-
8/12/2019 Data Mining: Ontologies
11/17 2006
Department of Computing Science
CMPUT 605
Results: NLP
-
8/12/2019 Data Mining: Ontologies
12/17
2006
Department of Computing Science
CMPUT 605
Results: Text Miner
Frequency wt. None
Term wt. Normal
-
8/12/2019 Data Mining: Ontologies
13/17
2006
Department of Computing Science
CMPUT 605
Comparison
sss
-
8/12/2019 Data Mining: Ontologies
14/17
2006
Department of Computing Science
CMPUT 605
Comparison
Thus text mining is a viable and effective method for
determining vocabulary to describe a particular disease
Text Mining found a lot of terms that NLP found
Human Expert is the best Ground Truth
-
8/12/2019 Data Mining: Ontologies
15/17
2006
Department of Computing Science
CMPUT 605
Ontology Generation
-
8/12/2019 Data Mining: Ontologies
16/17
2006
Department of Computing Science
CMPUT 605
Conclusion and Future Work
Human experts are the best, but they did miss
some key descriptors
Text Mining and NLP can enhance the generation of
feature generations, by preventing the above case
As a consequence more robust vocabulary can be
generated
Extensionevaluate the effectiveness of the
automated tools, text mining & NLP
Different weighting schemes will be tried in the
future
-
8/12/2019 Data Mining: Ontologies
17/17
2006
Department of Computing Science
CMPUT 605
Thank You For Your Attention!