intelligent information directory system for clinical documents
DESCRIPTION
Intelligent Information Directory System for Clinical Documents. Qinghua Zou 6/3/2005. Dr. Wesley W. Chu (Advisor). Keyword Search Problems Hard to compose good keywords Lack an outlook of the content Interchangeable words. When searching clinical reports. Intelligent Directory System. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/1.jpg)
Intelligent Information Directory System for Clinical Documents
Qinghua Zou
6/3/2005
Dr. Wesley W. Chu (Advisor)
![Page 2: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/2.jpg)
When searching clinical reports
Keyword Search Problems
Hard to compose good keywords
Lack an outlook of the content
Interchangeable words
![Page 3: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/3.jpg)
Intelligent Directory System
1. Overview 2. Extracting Key Concepts 3. Mining Topics 4. Building Directories 5. Searching 6. Conclusion
![Page 4: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/4.jpg)
1. System Overview
![Page 5: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/5.jpg)
2. Concept Extraction
2.1 Introduction 2.2 Our approach: IndexFinder
Index Phase (Offline) Search Phase (Real Time)
2.3 Experiments 2.4 Summary
![Page 6: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/6.jpg)
2.1 Motivation Clinical texts are
valuable in medical practice Search relevant reports Search similar patients
What is key information? UMLS provides
key medical concepts Our Goal
Extract UMLS concepts from clinical texts
Clinical Texts
•Extract key info.•Standard terms
![Page 7: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/7.jpg)
2.1 Previous Approaches
Free text
ip
dp i1
i0 vplambs
will v0
eat
dp
oats
NLP Parser
UMLS
Mapping
UMLS Concepts
Noun phrases
•lambs•oats
![Page 8: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/8.jpg)
2.1 Problems of Previous Approaches
Concepts cannot be discovered if they are not in a single noun phrase. E.g. In “second, third, and fourth ribs”,
“Second rib” can not be discovered.
Difficult to scale to large text computing. Natural language processing requires
significant computing resources
![Page 9: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/9.jpg)
2.2 Our Approach: IndexFinder
Free text
NLP Parser
Noun phrases
UMLS
Mapping Concepts
We would discard all words in the text except “lung” and “cancer”.
Our approach: UMLSfree text Previous: free textUMLS
Suppose UMLS contains only“Lung cancer”
Indexing
Index Data ~80MB
UMLS 2GB Index phase(offline)
conceptsFilteringExtracting
Free text Search phase(real time)
![Page 10: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/10.jpg)
2.2 Our Approach: What’s New?
Knowledge-based approach Using the compact index data
without using any database system
Permuting words in a sentence to generate UMLS concept candidates.
Using filters to eliminate irrelevant concepts.
![Page 11: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/11.jpg)
2.2 Concept Candidates GenerationAssumptions Knowledge base provides a
phrase table. Each phrase (concept) is a
set of words. An input text T is
represented as a set of words.
Goal Combining words in T to
generate concept candidates
Example T={D,E,F}
Answer: 5
![Page 12: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/12.jpg)
2.2 Search Phase: FilteringUse filters to eliminate irrelevant
concepts Syntactic filter:
Word combination is limited within a sentence.
Semantic filter: Filter out irrelevant concepts using
semantic types (e.g. body part, disease, treatment, diagnose).
Filter out general concepts using the ISA relationship and keep the more specific ones.
![Page 13: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/13.jpg)
2.3 Experiment Comparison with MetaMap [3]
Input: A small mass was found in the left hilum of the lung.
MetaMap
IndexFinder
![Page 14: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/14.jpg)
2.4 Summary An efficient method that maps from UMLS
to free text for extracting concepts without using any database system.
Syntactic and semantic filters are used to eliminate irrelevant candidates.
IndexFinder is able to find more specific concepts than NLP approaches.
IndexFinder is scalable and can be operated in real time.
![Page 15: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/15.jpg)
3. Mining Topics: SmartMiner
3.1 Introduction 3.2 Search Space 3.3 SmartMiner 3.4 Experiment 3.5 Summary
![Page 16: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/16.jpg)
3.1 Introduction
A Topic (assumption) a set of concepts a frequent pattern
Finding topics by data mining Frequent patterns, or Maximal frequent patterns
Require efficient data mining
![Page 17: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/17.jpg)
3.1 Data Mining Problem
1: a b c d e2: a b c d3: b c d4: b e5: c d e
id: item setDataset
MinSup=2
MFI abcd, be, cde
What itemsets are frequent itemsets (FI)?
a, b, c, d, e, ab, ac, ad, bc, bd, be, cd, ce, de, abc, abd, acd, bcd, cde,
abcd
Maximal frequent itemset(MFI): No superset is frequent.
![Page 18: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/18.jpg)
3.1 Why MFI not FI? Mining FI is infeasible when there exists long FI. E.g, Suppose we have a 20-item frequent set a1 a2 … a20. All of its subset are frequent, i.e., 220=1,048,576
Mining MFI is fast and we can generate all the FI.
![Page 19: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/19.jpg)
3.1 Previous work
Superset checking. A study shows that CPU spends 40% time for superset checking.
Search tree is too large A large number of support counting
Need more efficient method
![Page 20: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/20.jpg)
3.2 Search spaceGiven 5 items: a, b, c, d, e. What is the search space?
Ø, a, b, c, d, e, ab, ac, ad, ae, bc, …, abcde
We use “head:tail” to denote the space as:
:abcdesimplify
Ø:abcde
What is the space of ? ab:cd
ab, abc, abd, abcd
![Page 21: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/21.jpg)
3.2 Space decomposition
For a space :abcde, if abcg is frequent,
Then, the known space any subset of abc is frequent known space is :abc
The unknown space are: Any itemsets contain d or e. d:abce and e:abc
:abcde = d:abce + e:abc + :abc
![Page 22: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/22.jpg)
3.3 The basic idea
(b) SmartMiner Strategy
SmartMiner takes advantages of the information from previous steps.
(a) Previous approach
B2
…
A1
B1 …
Creating B2 before exploring B1
Bn B’
…
A1
B1 …
Creating B’ after exploring B1
Using information from B to prune the space at B’
![Page 23: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/23.jpg)
3.3 The tail information
For the space :abcde, if we know abcf, abcg and abfg are frequent, then we project them to the space. abcf abc. abcg abc. abfg ab.
Thus Tinf(abcf,abcg, abfg|:abcde)={abc}
![Page 24: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/24.jpg)
3.4 Running time on Mushroom
0
1
10
100
1000
10 1 0.1 0.01 Minimum Support (%)
Total Time(sec)SmartMinerGenMaxMafia
![Page 25: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/25.jpg)
3.5 Summary
SmartMiner uses tail information to guide the mining, efficient since A smaller search tree. No superset checking. Reduces the number of support counting.
![Page 26: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/26.jpg)
4. Building Directories
4.1 Introduction 4.2 Knowledge Hierarchies 4.3 User Specification 4.4 Directory Generation 4.5 Integration various
directories 4.6 Summary
![Page 27: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/27.jpg)
4.1 Introduction
Three Inputs Topics
Key Content Knowledge
trees Meaningful
User specs Customized
![Page 28: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/28.jpg)
4.2 Knowledge Hierarchies UMLS concept hierarchies
PA: parent-child relationship RA: rather-than relationship
Problems A concept: several parents, different granularity
[lung cancer] [Neoplasms, Respiratory Tract] [lung cancer] [Neoplasms, Respiratory
System] A concept: hundreds of paths to roots
[lung cancer]: 233 different paths in UMLS by PA
![Page 29: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/29.jpg)
4.2 Select Proper Hierarchies Set source preference order, e.g
[disease]: ICD9>SNOMED>MeSH [body part]: SNOMED>ICD9
Select proper granularity C: a set of concepts; n: a path node Score function for selecting the
node n S(n)=|{ci| cin, ci in C}|
Expert review
![Page 30: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/30.jpg)
4.3 User Specifications A good directory ~ usage pattern User spec usage pattern User may have different specs A spec: a series of knowledge
names [disease] + [body part], or [body part] + [disease]
Build a directory for a spec by the ordering
![Page 31: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/31.jpg)
4.4 Directory GenerationAn example
User spec 1: d + p [disease] + [body part]
User spec 2: p + d [body part] + [disease]
![Page 32: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/32.jpg)
4.4 ~ An example d + p
p + d
1
1 11
1 1 11
![Page 33: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/33.jpg)
4.4 ~ Algorithm
![Page 34: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/34.jpg)
4.5 Integration various directories
For each Di, get all dir paths to Di
A Di is tree: XML Key words can
associate with tree nodes
Query: xpath Exist redundant
information
![Page 35: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/35.jpg)
4.5 simplified model Keep only the
first level knowledge trees
For //d6//p6, we use XPath query
//doc[//d6 and //p6]
Size smaller, require some computation
![Page 36: Intelligent Information Directory System for Clinical Documents](https://reader036.vdocuments.us/reader036/viewer/2022062315/568150bb550346895dbed898/html5/thumbnails/36.jpg)
4.6 Summary
Build directory by Topics Knowledge hierarchies User specifications
Mapping directories to XML By collecting directory paths for
each document Leverage on existing XML
technologies