medical digital library to support scenario specific information retrieval wesley w....

38
Medical Digital Medical Digital Library to Support Library to Support Scenario Specific Scenario Specific Information Retrieval Information Retrieval Wesley W. Chu [email protected] [email protected] Computer Science Department Computer Science Department University of California University of California Los Angeles, California Los Angeles, California

Post on 21-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

Medical Digital Library to Medical Digital Library to Support Scenario Specific Support Scenario Specific

Information RetrievalInformation Retrieval

Wesley W. Chu

[email protected]@cs.ucla.edu

Computer Science DepartmentComputer Science Department

University of CaliforniaUniversity of California

Los Angeles, CaliforniaLos Angeles, California

Wesley W. Chu

[email protected]@cs.ucla.edu

Computer Science DepartmentComputer Science Department

University of CaliforniaUniversity of California

Los Angeles, CaliforniaLos Angeles, California

Page 2: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

A Project of theA Project of theNIH Grant at UCLANIH Grant at UCLA

A Digital File Room for Patient Care, Education, and ResearchA Digital File Room for Patient Care, Education, and Research

Wesley W. Chu, PhDWesley W. Chu, PhD

Hooshang Kangarloo, MDHooshang Kangarloo, MD

Usha Sinha, PhDUsha Sinha, PhD

David B. Johnson, PhDDavid B. Johnson, PhD

Bernard Churchill, MDBernard Churchill, MD

John D. N. Dionisio, PhDJohn D. N. Dionisio, PhD

Richard Johnson, MDRichard Johnson, MD

Osman Ratib, MD, PhDOsman Ratib, MD, PhD

Page 3: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

• • Background Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress

BackgroundBackground

• Current file rooms managing patient records have limited functionality– Main goal of mapping patient ID to patient records

• PACS implementations are an electronic version of the traditional file room

• Current file rooms managing patient records have limited functionality– Main goal of mapping patient ID to patient records

• PACS implementations are an electronic version of the traditional file room

Page 4: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

• • Background Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress

BackgroundBackground

• Finding relevant information for a particular user is time consuming and labor intensive

• Finding relevant information for a particular user is time consuming and labor intensive

• Poorly structured and incomplete results, which may affect patient management

• Current search tools limited for general use and not tailored to specific users or tasks

• Poorly structured and incomplete results, which may affect patient management

• Current search tools limited for general use and not tailored to specific users or tasks

Lack of structure makes...Lack of structure makes...

Page 5: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

• • Background Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress

Digital File Room RequirementsDigital File Room Requirements

A navigable information space providing:– Relevant and reputable information– Access to similar patient records– Content-based cross referencing– Dynamically updated data repository– Tailored access for specific users and devices

A navigable information space providing:– Relevant and reputable information– Access to similar patient records– Content-based cross referencing– Dynamically updated data repository– Tailored access for specific users and devices

Page 6: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

• • BackgroundBackground • Hypothesis • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Specific Aims • Significance • Approach and Innovations • Research Progress

HypothesesHypotheses

• A digital file room (digital library) that delivers relevant and structured answers to specific query can be developed from existing medical databases

• Such a digital file room will increase user satisfaction and improve patient management

• A digital file room (digital library) that delivers relevant and structured answers to specific query can be developed from existing medical databases

• Such a digital file room will increase user satisfaction and improve patient management

Page 7: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

Specific AimsSpecific AimsSA1 Develop a system that identifies and provides access to reputable

information sources

SA2 Provide users with greater query capability (e.g. similar-to, approximate)

SA3 Extract knowledge from patient data, medical literature and radiology teaching files to support content-based cross-referencing

SA4 Provide access to dynamically updated collections based on patient data

SA5 Adapt information retrieval to user and device characteristics

SA1 Develop a system that identifies and provides access to reputable information sources

SA2 Provide users with greater query capability (e.g. similar-to, approximate)

SA3 Extract knowledge from patient data, medical literature and radiology teaching files to support content-based cross-referencing

SA4 Provide access to dynamically updated collections based on patient data

SA5 Adapt information retrieval to user and device characteristics

• • Background • Hypothesis Background • Hypothesis • Specific Aims • Specific Aims • Significance • Approach and Innovations • Research Progress• Significance • Approach and Innovations • Research Progress

Page 8: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

SignificanceSignificance

Extend patient record to provide tailored and timely access to a broader array of reputable medical information

Extend patient record to provide tailored and timely access to a broader array of reputable medical information

• • Background • Hypothesis • Specific Aims Background • Hypothesis • Specific Aims • Significance • Significance • Approach and Innovations • Research Progress• Approach and Innovations • Research Progress

Page 9: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

• • Background • Hypothesis • Specific AimsBackground • Hypothesis • Specific Aims • Significance • Significance • Approach and Innovations • Approach and Innovations • Research Progress• Research Progress

Approach and InnovationsApproach and Innovations

• Intelligent information registration– Provide access to multiple, related data sources through a single

access point

• Content-based navigation and matching– Develop similarity matching based on medical concepts & patterns

– Content correlation

• User and device modeling– Adaptive information retrieval based on user and device models

• Scenario-based information web (proxies)– Develop information web linking clustered data sources for a

given set of related tasks (i.e., scenario)

• Intelligent information registration– Provide access to multiple, related data sources through a single

access point

• Content-based navigation and matching– Develop similarity matching based on medical concepts & patterns

– Content correlation

• User and device modeling– Adaptive information retrieval based on user and device models

• Scenario-based information web (proxies)– Develop information web linking clustered data sources for a

given set of related tasks (i.e., scenario)

Page 10: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

• • Background • Hypothesis • Specific AimsBackground • Hypothesis • Specific Aims • Significance • Significance • Approach and Innovations • Approach and Innovations • Research Progress• Research Progress

Intelligent Information RegistrationIntelligent Information Registration

Registers multiple information sources to provide transparent access through a single point (proxy object).

– Information requests are routed to appropriate data sources based on query characteristics

– Data sources are hierarchically clustered according to a four-layer data model

Registers multiple information sources to provide transparent access through a single point (proxy object).

– Information requests are routed to appropriate data sources based on query characteristics

– Data sources are hierarchically clustered according to a four-layer data model

Procedure database data:billing, cpt

Laboratory databases

Ortho Incontinence IncontinenceNeurological Orthosummarization

Procedures Labsmeta-data

Patientproxy-object(access point)

Page 11: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

• • Background • Hypothesis • Specific AimsBackground • Hypothesis • Specific Aims • Significance • Significance • Approach and Innovations • Approach and Innovations • Research Progress• Research Progress

Content-Based Navigation & Content-Based Navigation & MatchingMatching

Two types of navigation– Navigation of the information space using

proxies and content correlation– Pattern/similarity navigation using type

abstraction hierarchies (TAHs)

Two types of navigation– Navigation of the information space using

proxies and content correlation– Pattern/similarity navigation using type

abstraction hierarchies (TAHs)

Page 12: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

Pattern-Based Type Pattern-Based Type Abstraction HierarchiesAbstraction Hierarchies

• Scalable, hierarchical knowledge structures that facilitate similarity matching

• Scalable, hierarchical knowledge structures that facilitate similarity matching

• • Background • Hypothesis • Specific AimsBackground • Hypothesis • Specific Aims • Significance • Significance • Approach and Innovations • Approach and Innovations • Research Progress• Research Progress

Type Vadequate holding,poor storage,poor emptying

Adequate holding

Poorholding

Type IIadequate holding,adequate storage,poor emptying

Type IIIpoor holding,adequate storage,poor emptying

Type IVpoor holding,poor storage,poor emptying

6 dayM

Incontinence

7 moF

12 yrM

25 yrF

28 dayM

24 moF

15 yrM

20 yrF

Page 13: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

• • Background • Hypothesis • Specific AimsBackground • Hypothesis • Specific Aims • Significance • Significance • Approach and Innovations • Approach and Innovations • Research Progress• Research Progress

Adaptive Information RetrievalAdaptive Information Retrieval

• Tailors query processing and query results according to:– Particular user– Characteristics of their device

• Examples:– Doctors prefer JAMA or Lancet while patients prefer Time or

CNN.– High resolution workstations support large, detailed imaging

studies while portable devices need lower-bandwidth data.

• Allows the system to retrieve appropriate data for a particular query, user, and device

• Tailors query processing and query results according to:– Particular user– Characteristics of their device

• Examples:– Doctors prefer JAMA or Lancet while patients prefer Time or

CNN.– High resolution workstations support large, detailed imaging

studies while portable devices need lower-bandwidth data.

• Allows the system to retrieve appropriate data for a particular query, user, and device

Page 14: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

• • Background • Hypothesis • Specific AimsBackground • Hypothesis • Specific Aims • Significance • Significance • Approach and Innovations • Approach and Innovations • Research Progress• Research Progress

Scenario-Based ProxyScenario-Based Proxy

A framework that defines, for a particular domain and set of tasks, the access methods to and the relationships between information sources.

A framework that defines, for a particular domain and set of tasks, the access methods to and the relationships between information sources.

Patient

UCLA HFC

Procedures Labs

HFC BloodMD Office UCLA Blood

– intelligent information registration

– pattern-based similarity matching

– adaptive information retrieval

– information web

Type V

Adequate holding Inadequateholding

Type II Type III Type IV

Page 15: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

• • Background • Hypothesis • Specific AimsBackground • Hypothesis • Specific Aims • Significance • Significance • Approach and Innovations • Approach and Innovations • Research Progress• Research Progress

Scenario-Based Information Scenario-Based Information WebWeb

A directed graph that defines access paths for navigation A directed graph that defines access paths for navigation among proxy objectsamong proxy objects

correlated-to

similar-tocorrelated-to

similar-to

Teaching FileTeaching File

PatientPatient

LiteratureLiterature

Page 16: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

• • Background • Hypothesis • Specific AimsBackground • Hypothesis • Specific Aims • Significance • Significance • Approach and Innovations • Approach and Innovations • Research Progress• Research Progress

PatientPatient LiteratureLiterature

Teaching FileTeaching File

correlated-tocorrelated-to

similar-tosimilar-to correlated-tocorrelated-to

similar-tosimilar-to

Scenario-Based Information Scenario-Based Information WebWeb

• Similar-to links relate objects based on their Similar-to links relate objects based on their similaritysimilarity– patients similar by age, sex, and diseasepatients similar by age, sex, and disease

Extends the scope of the digital file room into a digital

medical library• Correlated-to links relate objects based on related Correlated-to links relate objects based on related

contentcontent– disease can be correlated to relevant literaturedisease can be correlated to relevant literature

Page 17: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

Research ProgressResearch Progress

• Phrase IndexingPhrase generated from a n-word combination in a

sentence.– Domain Specific Retrieval– Document Summarization

• Content Correlation– Linking of relevant documents via patterns

• Phrase IndexingPhrase generated from a n-word combination in a

sentence.– Domain Specific Retrieval– Document Summarization

• Content Correlation– Linking of relevant documents via patterns

• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress

Page 18: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

Domain Specific RetrievalDomain Specific Retrieval• Document are grouped into domain-specific

collections– Medical patient reports

– Web sites are often tailored to specific subject areas

• Phrases can capture content better than single word, thus improve retrieval performance

• Document are grouped into domain-specific collections– Medical patient reports

– Web sites are often tailored to specific subject areas

• Phrases can capture content better than single word, thus improve retrieval performance

• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress

Page 19: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

Problem With Longer PhrasesProblem With Longer Phrases

1.00E+00

1.00E+01

1.00E+02

1.00E+03

1.00E+04

1.00E+05

1.00E+06

1.00E+07

1.00E+08

1.00E+09

1.00E+10

1.00E+11

1.00E+12

1 2 3 4 5 6

100 worddocument125 worddocument150 worddocument100^n

14-wordsentence

Large combinatorial problem

To process longer phrases it is necessary to partitiondocuments into smaller segments

• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress

Page 20: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

Phrase AnalysisPhrase Analysis

• A phrase is defined as any 2, 3 or 4 words co-occurring in a sentence (word combination)

• Very large number of possible phrases– Use a stoplist to

remove “useless” words

– Normalize words to a common stem

• A phrase is defined as any 2, 3 or 4 words co-occurring in a sentence (word combination)

• Very large number of possible phrases– Use a stoplist to

remove “useless” words

– Normalize words to a common stem

rightthe upper lobe mass is seen again

rightThe upper lobe mass is seen again.sentence

casenormalization

right upper lobe mass seen againstop wordremoval

right upp lob mass seen againstemming

right upplob mass seenagainsorting

right

upp

lob

mass

seen

again

candidate2-wordcombinations

again

again right

again

again

lob mass

lob

seen

upp

lob

lob

mass right

seen

upp

right seen

upp

seen upp

mass

mass

right

• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress

Page 21: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

Document Retrieval EvaluationDocument Retrieval Evaluation• Preliminary evaluation

– A domain specific collection of documents

– Can phrase analysis limited to sentences improve retrieval effectiveness?

– SMART system (single word terms) used as baseline

• Data

– Thoracic radiology patient reports

– Dictated reports

– Describe anatomy and abnormal findings such as enlarged lymph nodes and cancer masses

• Preliminary evaluation

– A domain specific collection of documents

– Can phrase analysis limited to sentences improve retrieval effectiveness?

– SMART system (single word terms) used as baseline

• Data

– Thoracic radiology patient reports

– Dictated reports

– Describe anatomy and abnormal findings such as enlarged lymph nodes and cancer masses

• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress

Page 22: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

Domain SpecificDomain SpecificDocument RetrievalDocument Retrieval• Query: “right upper lobe mass”• Query: “right upper lobe mass”

• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress

Page 23: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

Automatic Text SummarizationAutomatic Text SummarizationSalton Method• Given a text file with n paragraphs• A paragraph can be represented by Di=(di1, di2, …, dim)

– dik is the weight to represent the importance for term Tk(word or phrase)

• The pair-wise similarity of two paragraphsSim(Di,Dj) = dik * djk , k = 1..m

Text relationship map:• Nodes = paragraph• Links = pair-wise similarity of the connected nodes• Links are created if Sim(Di, Dj) > threshold

Bushiness of a node = # of links of a nodeText Summarization derived from the Bushy nodes.

• • Background • Hypothesis • Specific Aims • Significance • Approach and Innovations Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress

P1

Pn

P5 P4

P2

P3

Page 24: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

Performance Comparison of Sultan’s Summarization Performance Comparison of Sultan’s Summarization Method Based on Phrase and Single WordMethod Based on Phrase and Single Word

Aspirin.txt words 2W phrases 3W phrases

Threshold 0.1 0.2 0.3 0.1 0.2 0.3 0.1 0.2 0.3

Paragraphs

Ranking

Based on

Bushiness

No.1 4 6 8 2 2 2 2 2 2

No.2 6 8 2 3 3 3 3 3 3

No.3 8 3 3 6 6 6 8 8 8

No.4 1 4 4 1 4 4 4 4 4

No.5 5 5 5 8 5 5 6 6 6

No.6 2 1 6 4 1 1 5 5 5

No.7 3 2 1 5 8 8 7 7 7

No.8 9 9 9 7 7 7 1 1 1

No.9 7 7 7 9 9 9 9 9 9

• • Background • Hypothesis • Specific Aims • Significance • Approach and Innovations Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress

Summarization based on Phrases are less sensitive to Threshold setting than Single Words.

Page 25: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

N-words Distribution N-words Distribution

0

50

100

150

200

250

300

350

400

450

500

1 2 3 4 5 6 7 8 9

N-Word

Nu

mb

er

Aspirin1

Aspirin2

Elian04

LAPD06

CNN-Bush

CNN-Florida

Page 26: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

0

30

60

90

120

150

180

1 2 3 4 5 6 7 8 9

N-Word

Nu

mb

er o

f D

isti

nct

Fre

qu

ent

Wo

rds

Aspirin1

Aspirin2

Elian04

LAPD06

CNN-Bush

CNN-Florida

Number Distinct Freq WordsNumber Distinct Freq Words

Page 27: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

Number of Valid SentencesNumber of Valid Sentences

0

10

20

30

40

50

60

70

80

90

1 2 3 4 5 6 7 8 9

N-Word

Nu

mb

er o

f V

alid

Sen

ten

ces

Aspirin1

Aspirin2

Elian04

LAPD06

CNN-Bush

CNN-Florida

Page 28: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

Performance Comparison Performance Comparison   Saton David df

0.1 0.2 0.3

Apirin0113 sent

1 2,12,9,3,7 3,9,1,4,7 1,2,3,7,9 9,2,3,12,7 0

2 9,2,3,4,7 2,3,4,9,7 2,1,3,7,9 9,12,2,3,4 1

3 2,9,3,12,7 2,9,12,7,1 12,2,7,9,3 12,9,2,1,7 0

4 2,9,12,4,7 9,2,12,4,7 9,12,3,4,7 12,9,4,2,7 0

5 12,4,9 12,4,9 12,4,9 12,4,9 0

Apirin0268 sent

1 14,12,22,66,20 12,22,36,66,1 1,12,14,15,20 14,12,66,22,20 0

2 12,14,66,15,20 36,15,20,22,66 66,1,12,14,20 14,12,66,22,15 0

3 12,14,66,22,21 14,22,66,12,36 66,12,14,21,22 14,12,66,22,18 0

4 14,66,12,21,22 14,66,22,12,21 12,14,22,66,68 14,22,12,66,68 0

5 14,12,15,18,22 14,12,15,18,22 14,12,18,22,36 14,12,18,15,22 0

Elian0492 sent

1 26,76,33,59,2 26,33,76,2,59 26,76,2,44,33 26,76,2,33,7 1

2 26,76,7,33,29 26,76,82,7,29 6,76,26,27,29 26,76,7,2,59 1

3 6,26,7,76,2 6,2,27,26,44 6,27,26,2,29 26,7,76,2,59 1

4 26,2,76,6,7 26,6,7,24,28 7,24,26,84,85 26,7,84,2,85 0

5 26,7,76,84,6 26,7,76,84,85 7,24,26,28,85 7,26,85,84,2 1

Page 29: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

Comparison (cont)Comparison (cont)  Saton David df

0.1 0.2 0.3

LAPD0627 sent

1 6,7,20,25,5 6,19,20,25,7 6,7,14,19,25 6,7,20,25,19 0

2 18,20,6,19,25 6,18,19,24,20 19,18,24,9,20 5,7,1,20,6 3

3 19,6,7,18,25 7,25,5,6,1 1,5,7,14,19 1,5,7,20,12 2

4 7,5,6,1,8 7,5,1,6,8 1,5,6,7,9 1,5,7,12,17 2

5 1,5,7,12,17 1,5,7,12,17 1,5,7,12,17 1,5,7,12,17 0

CNNbush14 sent

1 12,5,6,8,11 12,5,6,11,8 5,6,11,12,1 5,12,8,11,6 0

2 12,5,8,3,11 5,8,12,3,7 5,12,8,7,3 5,12,8,11,9 1

3 5,12,3,8,10 5,8,3,9,10 5,8,10,12,6 5,12,11,9,8 1

4 5,12,8,7,9 5,12,6,8,9 5,12,6,8,9 5,11,12,9,8 1

5 5,8,12,6,7 5,12,6,8,9 5,12,6,8,9 5,11,12,9,8  

Florida49 sent

1 29,11,2,41,40 29,41,11,26,2 29,41,26,11,14 29,11,17,40,48 2

2 11,29,40,17,48 17,29,40,22,11 17,20,35,40,22 17,11,20,40,22 0

3 17,29,20,40,48 17,6,22,29,11 28,22,26,4,6 17,20,11,22,40 0

4 17,11,22,6,2 2,11,20,6,17 2,11,20,6,17 17,20,11,22,2 0

5 2,11,20,17,22 2,11,20,17,22 2,11,17,20,22 17,11,22,2,20 0

Page 30: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

Content CorrelationContent Correlation• Given a document in one collection, content

correlation links relevant documents in another document collection

• Given a document in one collection, content correlation links relevant documents in another document collection

PatientRecords

New EnglandJournal of Medicine

CNNTime

• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress

Page 31: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

Document ClusterDocument ClusterBy PatternBy Pattern

• A pattern is a set of unique terms that characterize some features in the data set

• Patterns can be found in a collection of documents by data mining

• Documents are grouped into clusters based on patterns via clustering technique

• A pattern is a set of unique terms that characterize some features in the data set

• Patterns can be found in a collection of documents by data mining

• Documents are grouped into clusters based on patterns via clustering technique

• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress

Page 32: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

Cluster SignatureCluster Signature• Every cluster can be classified according to the occurrence

frequency of the patterns

• Looking to answer:

– The set of patterns summarize a given cluster?

– How the patterns related among the clusters ?

• Every cluster can be classified according to the occurrence frequency of the patterns

• Looking to answer:

– The set of patterns summarize a given cluster?

– How the patterns related among the clusters ?

Patient Records

Literature

• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress

Page 33: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

Deriving Cluster SignatureDeriving Cluster Signature• Metrics

– Local Cluster Certainty (LCC) measures the coverage of a pattern in a given cluster (Popularity)

– The Global Cluster Certainty (GCC) measures the coverage of a pattern among clusters (Exclusiveness)

• The Cluster Signature is the set of those patterns that have both high LCC and GCC

• Documents from one collection (source) can be linked to relevant clusters in another collection (target)

• Metrics– Local Cluster Certainty (LCC) measures the coverage of a

pattern in a given cluster (Popularity)– The Global Cluster Certainty (GCC) measures the coverage of

a pattern among clusters (Exclusiveness) • The Cluster Signature is the set of those patterns that have

both high LCC and GCC• Documents from one collection (source) can be linked to

relevant clusters in another collection (target)

Patient Records

Literature

• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress

Page 34: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

Preliminary ResultsPreliminary Results• A collection of 69 pediatric urology literature abstracts taken from

Medline were clustered using the complete link clustering algorithm– 3 large clusters, each with 2 or more sub-clusters

• GCC and LCC were calculated for patterns found in several sub-clusters

• Data from one sub-cluster is reported here

• A collection of 69 pediatric urology literature abstracts taken from Medline were clustered using the complete link clustering algorithm– 3 large clusters, each with 2 or more sub-clusters

• GCC and LCC were calculated for patterns found in several sub-clusters

• Data from one sub-cluster is reported here

Document # Title

1 Complications in pediatric urological laparoscopy: results of a survey

2 Laparoscopic surgery in pediatric urology

3 [Laparoscopic interventions in pediatric urology]

4 Role of laparoscopic surgery in pediatric urology

5 [Laparoscopic interventions in urology]

6 Laparoscopic heminephroureterectomy in pediatric patients

• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress

Page 35: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

GCCGCC• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress

Term/Phrase Cl

Pediatr 1.0

Result 1.0

Patient 1.0

Perform 1.0

Compl 1.0

Laparoscop 1.0

Urolog 0.34

Laparoscop pediatr 1.0

Laparoscop perform 1.0

Diagnost laparoscop 0.35

Laparoscop operat 0.35

Compl rate 0.35

Laparoscop patient 0.35

Laparoscop operat perform 0.0817

Laparoscop patient perform 0.0817

LCCLCCTerm/Phrase Cg

Laparoscop 0.1887

Compl 0.0817

Child Laparoscop 1.0

Laparoscop patient 1.0

Compl Laparoscop 1.0

Comple techn 1.0

<MEAS> compl 1.0

Laparoscop perform 0.6088

Compl rate 0.4564

Laparoscop patient perform 1.0

Laparoscop perform procedur 1.0

<MEAS> compl rate 1.0

Laparoscop pediatr perform 1.0

Compl laparoscop techn 1.0

Page 36: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

Project SummaryProject Summary

A system that provides:– relevant and reputable

information,

– access to similar patient records,

– content-based cross referencing,

– a dynamically updated data repository, and

– tailored access for specific users and devices

A system that provides:– relevant and reputable

information,

– access to similar patient records,

– content-based cross referencing,

– a dynamically updated data repository, and

– tailored access for specific users and devices

will:– augment the patient

record to provide tailored and timely access to a broader array of reputable information and

– extend the digital file room into a digital medical library.

will:– augment the patient

record to provide tailored and timely access to a broader array of reputable information and

– extend the digital file room into a digital medical library.

• • Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research ProgressBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress

Page 37: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

Research ResultsResearch Results• Phrase Indexing

– Developed an efficient algorithm for extracting n-word features from textual documents

– Phrase index provide better results than single word index in document retrieval and summarization

• Content Correlation via Cluster Signature (LCC & GCC)– Preliminary results reveal the feasibility using cluster

signature for linking relevant documents

• Work begun on proxy for information navigation

• Phrase Indexing– Developed an efficient algorithm for extracting n-word

features from textual documents

– Phrase index provide better results than single word index in document retrieval and summarization

• Content Correlation via Cluster Signature (LCC & GCC)– Preliminary results reveal the feasibility using cluster

signature for linking relevant documents

• Work begun on proxy for information navigation

• • Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research ProgressBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress

Page 38: Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chuwwc@cs.ucla.edu Computer Science Department University of California

Future WorkFuture Work

• Develop Ontology for Intelligent Information Registration

• User Model for Information Retrieval

• Develop Ontology for Intelligent Information Registration

• User Model for Information Retrieval

• • Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research ProgressBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress