the neighborhood auditing tool james geller michael halper yehoshua perl c. paul morrey

79
The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

Upload: alysa-roling

Post on 01-Apr-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

The NeighborhoodAuditing Tool

James GellerMichael HalperYehoshua PerlC. Paul Morrey

Page 2: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

22

Research PaperC.P. Morrey, J. Geller, M. Halper, Y. Perl.

The Neighborhood Auditing Tool: A hybrid interface for auditing the UMLS. J Biomed Inform, 42(3):468-89, 2009.

Page 3: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

33

Overview

Goals of an Auditor’s Tool for the UMLS Principles of Auditing with Neighborhoods The Idea of a Hybrid Display Current State of the NAT: Serving the Auditor Presentation of NAT Features Live Audit Session Planned State of the NAT: Guiding the Auditor Conclusions Future Work

Page 4: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

44

Auditing the UMLS

About 150 source vocabularies It is natural that inconsistencies will appear Over 2.1 million concepts and nearly 9.7

million terms* Two level structure consisting of the

Semantic Network and the Metathesaurus

*UMLS Metathesaurus version 2009AA

Page 5: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

5

Previous Work on Auditing H. Gu, Y. Perl, J. Geller, M. Halper, L. Liu, and J.J. Cimino. Representing

the UMLS as an Object-oriented Database: Modeling Issues and Advantages. J Am Med Inform Assoc, 7(1):66-80, 2000.

J. Geller, H. Gu, Y. Perl, and M. Halper. Semantic refinement and error correction in large terminological knowledge bases. Data & Knowledge Engineering, 45(1):1-32, 2003.

J.J. Cimino, H. Min, and Y. Perl. Consistency across the hierarchies of the UMLS Semantic Network and Metathesaurus. J Biomed Inform, 36(6):450-461, 2003.

H. Gu, Y. Perl, G. Elhanan, H. Min, L. Zhang, Y. Peng. Auditing concept categorizations in the UMLS. Artif Intell Med, 31(1):29-44, 2004.

Y. Chen, Y. Perl, J. Geller, and J.J. Cimino. Analysis of a study of the users, uses, and future agenda of the UMLS. J Am Med Inform Assoc, 14(2):221-231, 2007.

Page 6: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

6

Previous Work on Auditing (cont’d)

H. Gu, G. Hripcsak, Y. Chen, C.P. Morrey, G. Elhanan, J.J. Cimino, J. Geller, and Y. Perl. Evaluation of a UMLS auditing process of semantic type assignments. In J.M. Teich, J. Suermondt, and G. Hripcsak, editors, Proc AMIA Symp, pages 294-298, Chicago IL, Nov. 2007.

Y. Chen, H. Gu, Y. Perl, J. Geller, M. Halper. Structural group auditing of a UMLS semantic type's extent. J Biomed Inform. 2009 Feb;42(1):41-52.

L. Chen, C.P. Morrey, H. Gu, M. Halper, Y. Perl. Modeling multi-typed structurally viewed chemicals with the UMLS Refined Semantic Network. J Am Med Inform Assoc, 16(1):116-31, 2009.

Y. Chen, H. Gu, Y. Perl, J. Geller. Structural group-based auditing of missing hierarchical relationships in UMLS. J Biomed Inform. 2009 Jun;42(3):452-67.

Y. Chen, H. Gu, Y. Perl, M. Halper, and J. Xu, Expanding the extent of a UMLS Semantic Type via Group Neighborhood Auditing. J Am Med Inform Assoc, Accepted for publication.

Page 7: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

7

How we did it before the NAT: Provide Info as Paper Form

CPT: C1081844 Antonospora locustaeSRC: NCBISTY: T004T009 Fungus + InvertebrateDEF:SYN: Antonospora locustae | Nosema locustaePAR: Antonospora{STY: Invertebrate}CHD:

Data shown for this concept is from the UMLS Metathesaurus version 2006AC

Page 8: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

88

Auditing Results also Paper Form(C1081844) Antonospora locustaeSTY: Fungus + Invertebrate

No errors Semantic Type Error: Fungus Semantic Type Error: Invertebrate Add Semantic Type______________________ Ambiguity Other error_____________________________ Comments _____________________________

______________________________________

Page 9: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

99

Goals of an Auditor’s Tool for the UMLS

Display relevant information to the auditor. Do not overwhelm the auditor with too

much information. Help the auditor focus on areas most likely

to contain errors.Algorithms suggest likely erroneous conceptsConcepts are reviewed in a neighborhood

display

Page 10: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

1010

Principles of Auditing with Neighborhoods

Several years of experience: Auditing is to a large degree a “local” activity.

Concepts have two kinds of knowledge elements:Textual Knowledge Elements: Preferred term,

CUI, synonyms, LUI, definition, sources, semantic types

Contextual Knowledge Elements: Neighbors

Page 11: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

1111

Neighborhoods

Focus concept: The concept presently under review

Immediate Neighborhood: The set of concepts reachable from the focus concept by stepping one relationship (up, down, lateral, etc.)

Extended neighborhood: Includes parents of parents (grandparents), children of children (grandchildren) and siblings. No lateral chains.

Page 12: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

12

References about Neighborhood M.S. Tuttle, D.D. Sherertz, N.E. Olson, M.S. Erlbaum,

W.D. Sperzel, and L.F. Fuller, et al. Using META-1, the first version of the UMLS Metathesaurus. In Proc 14th Annu Symp Comput Appl Med Care, pages 131-135, Washington, D.C., 1990.

S.J. Nelson, M.S. Tuttle, W.G. Cole, D.D. Sherertz, W. D. Sperzel, M.S. Erlbaum, L.L. Fuller, N.E. Olson, From meaning to term: semantic locality in the UMLS Metathesaurus. In Proc Annu Symp Comput Appl Med Care, pages 209-213, Washington, D.C., 1991.

Page 13: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

1313

Immediate Neighborhood

Microsporidia, Unclassified

Microsporidia <protozoa>

Dictyocoela Edhazardia

FibrillanosemaMicrosporidium

Kabatana

Oligosporidium

Cellular aspects of

Microbiological

Pathogenicity Aspects

virologic

Page 14: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

1414

Extended Neighborhood

RELATIONSHIPS

SIBLINGS

GRANDCHILDREN

CHILDREN

FOCUS CONCEPT

PARENTS

GRANDPARENTS

Microsporidia, Unclassified

Microsporidia <protozoa>

Erroneous concept

fungus

PHYLUM MICROSPORA

Protozoa

Sporozeoa

Dictyocoela Edhazardia

FibrillanosemaMicrosporidium

Dictyocoela berillonum

Dictyocoela cavimanum

Edhazardia aedis

Fibrillanosema crangonycis

Microsporidium 57864

Dictyocoela dehayesum

Dictyocoela duebenum

Dictyocoela grammarellum

Dictyocoela muelleri

Dictyocoela sp.L11

Kabatana

Kabatana takedai

Microsporidium africanum

Microsporidium ceylonensis

Microsporidium cypselurus

Microsporidium prosopium

Microsporidium seriolae

Oligosporidium

Oligosporidium occidentalis

Microsporea

Cellular aspects of

Microbiological

Pathogenicity Aspects

virologic

SIB

Page 15: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

15

Up-Extended and Down-Extended Neighborhood

An up-extended neighborhood includes grandparents and the immediate neighborhood.

A down-extended neighborhood includes grandchildren and the immediate neighborhood.

Give auditor all s/he needs but not more.

Page 16: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

16

Semantic Type Neighborhood

If we provide the semantic types for every concept, those also form a neighborhood.

It is important to keep the information of which semantic types are assigned to which concepts.

Page 17: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

1717

The Idea of a Hybrid Display

Diagrams are wonderful – as long as they fit on one screen.

Indented text is wonderful – as long as there are no or very few multiple parents.

But the UMLS does not fit onto one screen and there are many cases of multiple parents.

Page 18: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

1818

What makes a diagram wonderful?

You can follow parent/child paths with your eyes.

You can get a feeling for everything a concept is connected to with one look.

You can see multiple parents and multiple paths with one look.

You can see global features (short and bushy versus tall and sparse, or (gasp!) tall and bushy).

Page 19: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

1919

What makes indented text wonderful?

Indentation expresses parenthood compactly and elegantly.

There are no lines crossing. You don’t need a layout algorithm. There is a linear order in which to study

text.

Page 20: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

2020

The Idea of a Hybrid Display (cont.)

Keep the best features of text and the best features of diagrams.

Maintain relative positions between the focus concept and its children, parents, etc.

Eliminate clutter of arrows.

Page 21: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

2121

A Hybrid Diagram/Form Display of a Neighborhood

Children

Focus ConceptSynonyms Relationships

Parents

Page 22: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

2222

Desirable Information Beyond Neighborhoods

Concept definition for Focus Concept Sources for concepts and relationships Assigned Semantic Types of concepts Definitions of relevant Semantic Types Global view of the Semantic Network

Indented (better for wide branches)Graphical (better for almost everything else)

Page 23: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

2323

Current State of the NAT: Serving the Auditor

The Neighborhood Auditing Tool has been implemented to fully support display of neighborhoods.

Navigation to adjacent neighboring concepts is an easy click.

Additional features listed before have been implemented.

Page 24: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

2424

Demonstration of NAT Features

Neighborhood Grandparents and

grandchildren Synonyms Relationships: Concept,

Sibling, Term Focus concept definition Sources: Concepts,

Relationships Display CUIs Semantic Type display

Semantic Type definition Semantic Network

(indented) Semantic Network

(diagram) Navigation Search (full, partial) Viewing History Choice of release Choice of sources

offline version

Page 25: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

2525

Audit Example: A Cycle of Three Concepts An SQL query found three concepts

that participate in a PAR/CHD cycle. We follow an auditor’s review of this

cycle. O. Bodenreider, Circular hierarchical

relationships in the UMLS: etiology, diagnosis, treatment, complications and prevention. Proc AMIA Symp. 2001:57-61

offline version

Page 26: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

The Cycle of Three Concepts

Mood Disorders

Affective Disorders, Psychotic

Bipolar Disorder

Relationship Sources: Medical Subject Headings National Drug File Reference Terminology SNOMED-2 Alcohol and Other Drug Thesaurus

Relationship Sources: Medical Subject Headings National Drug File Reference Terminology

Relationship Source: DSM-IV

Relationship Sources: DSM-IV and many others

Page 27: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

Recommended Modeling

Mood Disorders

Affective Disorders, PsychoticBipolar Disorder

Page 28: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

2828

Audit Example: Semantic Types

An algorithm determined that the concept Antonospora locustae was likely assigned incorrect semantic types.

We follow an auditor’s review of this concept.

offline version

Page 29: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

29

Preliminary Evaluation Study with NAT

Compare paper-based auditing and NAT-based auditing.

Counterbalanced groups. Recall improves with NAT use. Auditors

seem willing to investigate more concepts. Precision stays the same. Auditors’ mental

process does not improve.

Page 30: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

3030

Conclusions

Preliminary study showed that people are more successful finding errors with NAT than with paper sources.

Recall improved with the NAT, precision did not.

NAT seems to nicely complement use of the UMLSKS.

Page 31: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

3131

Future Work

Integration of algorithms for developing “audit sets” with NAT.

Recording and reporting auditor recommendations.

Facilitate team auditing where several auditors review the same sample.

Managing and reporting work flow of auditor teams.

Page 32: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

32

The Neighborhood Auditing Tool is available online at:

http://nat.njit.edu

Page 33: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

3333

Page 34: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

Auditor

Errors Recall Precision F

with NAT

w/o NAT

with NAT

w/o NAT

with NAT

w/o NAT

with NAT

w/o NAT

1 57 45 0.97 0.82 0.53 0.51 0.86 0.63

2 22 20 0.43 0.35 0.55 0.55 0.48 0.43

3 39 34 0.64 0.58 0.46 0.53 0.54 0.55

4 56 44 0.55 0.54 0.30 0.34 0.39 0.42

Avg. 44 36 0.65 0.57 0.46 0.48 0.57 0.51

Preliminary Evaluation Study

Page 35: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

Improved Recall

The auditor finds it easy to search for more errors in the neighborhood of the suspicious concept.

With better recall and the same precision you still find more errors.

Page 36: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

Semantic Types Example

The concept Antonospora locustae was selected for audit by an algorithm that found it was the only concept assigned to the intersection Fungus + Invertebrate in the UMLS 2007AA.

Page 37: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 38: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 39: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 40: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 41: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 42: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 43: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 44: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 45: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 46: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 47: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 48: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 49: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 50: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 51: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

NAT Features Demonstration

Page 52: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

Neighborhood

Page 53: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 54: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 55: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 56: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 57: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 58: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 59: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 60: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 61: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 62: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 63: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 64: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 65: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 66: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 67: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 68: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 69: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 70: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 71: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 72: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 73: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 74: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey

Cycle Example

An SQL query provided us with a list of concepts in the Metathesaurus that participate in cycles of length three.

One of these cycles exists among the concepts Bipolar Disorder, Mood Disorders, and Affective Disorders, Psychotic.

Page 75: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 76: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 77: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 78: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey
Page 79: The Neighborhood Auditing Tool James Geller Michael Halper Yehoshua Perl C. Paul Morrey