determining the syntactic structure of medical terms in clinical notes

23
Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov [email protected]

Upload: ojal

Post on 11-Jan-2016

22 views

Category:

Documents


0 download

DESCRIPTION

Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov [email protected]. Determining the Syntactic Structure of Medical Terms in Clinical Notes. The goal of this presentation is to present a simple but effective approach to identify the syntactic structure of three word terms. Goal. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Determining the Syntactic Structure of Medical Terms in Clinical Notes

Determining the Syntactic Structure of Medical Terms in Clinical Notes

Bridget T. McInnesTed Pedersen

Serguei V. Pakhomov

[email protected]

Page 2: Determining the Syntactic Structure of Medical Terms in Clinical Notes

Goal

The goal of this presentation is to present a simple but effective approach to identify the

syntactic structure of three word terms

Page 3: Determining the Syntactic Structure of Medical Terms in Clinical Notes

Importance

Potentially improve the analysis of unrestricted medical text Mapping of medical text to standardized

terminologies

Unsupervised syntactic parsing

Page 4: Determining the Syntactic Structure of Medical Terms in Clinical Notes

Syntactic Structure of Terms

w1 w2 w3 w1 w2 w3 w1 w2 w3 w1 w2 w3

Monolithic

Non-branching Right-branchingLeft-branching

blue = independencegreen = dependence

Page 5: Determining the Syntactic Structure of Medical Terms in Clinical Notes

Example

small bowel obstruction

Page 6: Determining the Syntactic Structure of Medical Terms in Clinical Notes

Syntactic Structure of Example

small bowel obstruction

small bowel obstruction small bowel obstruction small bowel obstruction small bowel obstruction

Monolithic

Non-branching Right-branchingLeft-branching

Page 7: Determining the Syntactic Structure of Medical Terms in Clinical Notes

Method used to determine the structure of a term

The Log Likelihood Ratio is the ratio between the observed probability of a term occurring and the probability it would be expected to occur

Probability of Term Occurring-----------------------------------

Expected Probability of Term

Page 8: Determining the Syntactic Structure of Medical Terms in Clinical Notes

Log Likelihood Ratio

The expected probability of a term is often based on the Non-branching (Independence) Model

P(small bowel obstruction)-----------------------------------

P(small) P(bowel) P(obstruction)

EXPECTED PROBABILITY

OBSERVED PROBABILITY

Page 9: Determining the Syntactic Structure of Medical Terms in Clinical Notes

Extended Log Likelihood Ratio

The expected probabilities can be calculated using two other hypothesis (models)

Non-branching Right-branchingLeft-branching

P(small)P(bowel)P(obstruction) P(small bowel) P(obstruction) P(small) P(bowel obstruction)

Page 10: Determining the Syntactic Structure of Medical Terms in Clinical Notes

Three Log Likelihood Ratio Equations

P(small bowel obstruction)-----------------------------------

P(small) P(bowel) P(obstruction)

P(small bowel obstruction)-----------------------------------

P(small bowel) P(obstruction)

P(small bowel obstruction)-----------------------------------

P(small) P(bowel obstruction)

Non-branching

Right-branching Left-branching

Page 11: Determining the Syntactic Structure of Medical Terms in Clinical Notes

Expected Probability

The expected probability of a term differs as does the Log Likelihood Ratio

Non-branching Right-branchingLeft-branching

P(small) P(bowel) P(obstruction) P(small bowel) P(obstruction) P(small) P(bowel obstruction)

LL = 11,635.45 LL = 5,169.81 LL = 8,532.90

Page 12: Determining the Syntactic Structure of Medical Terms in Clinical Notes

Model Fitting

The model with the lowest Log Likelihood Ratio best describes the underlying structure of the

term

Non-branching Right-branchingLeft-branching

P(small) P(bowel) P(obstruction) P(small bowel) P(obstruction) P(small) P(bowel obstruction)

LL = 11,635.45 LL = 5,169.81 LL = 8,532.90

Page 13: Determining the Syntactic Structure of Medical Terms in Clinical Notes

ReCap

The Log Likelihood Ratio is calculated for each possible model Non-branching

Right-branching

Left-branching

The probabilities for each model are obtained from a corpus

The term is assigned the structure whose model has the lowest Log Likelihood Ratio

Page 14: Determining the Syntactic Structure of Medical Terms in Clinical Notes

Test Set

Contains 708 three word terms from the SNOMED-CT

73 terms

Monolithic

Non-branching Right-branchingLeft-branching

6 terms 378 terms251 terms

Page 15: Determining the Syntactic Structure of Medical Terms in Clinical Notes

Test Set (cont)

Syntactic structure of each term was determined through the consensus of two medical text index experts (kappa = 0.704)

The probabilities were obtained from over 10,000 Mayo Clinic clinical notes

Page 16: Determining the Syntactic Structure of Medical Terms in Clinical Notes

Monolithic Results

Left branching Right branching Our Method0

10

20

30

40

50

60

70

80

Agreement

Technique

Per

cen

tag

e ag

reem

ent

wit

h h

um

an e

xper

ts

35.5

53.4

74.8

Page 17: Determining the Syntactic Structure of Medical Terms in Clinical Notes

Results without Monolithic Terms

Left branching Right branching Our Method0

10

20

30

40

50

60

70

80

Agreement

Technique

Per

cen

tag

e ag

reem

ent

wit

h h

um

an e

xper

ts

39.5

59.5

83.5

Page 18: Determining the Syntactic Structure of Medical Terms in Clinical Notes

Limitations

Monolithic structures possibly identify through collocation extraction or

dictionary lookup

As the number of words in a term grows so does the number of hypothesis (models) to be evaluated only consider adjacent models

limit the length of the terms to 5 or 6 words

Page 19: Determining the Syntactic Structure of Medical Terms in Clinical Notes

Conclusions

Present a simple but effective method to identify the structure of three word terms

The method uses the Log Likelihood Ratio

Could be extended to identify the structure of for four, five and six word terms

Page 20: Determining the Syntactic Structure of Medical Terms in Clinical Notes

Future Work

Improve accuracy of method explore other measures of association

Chi-squared, Phi, Dice coefficient ...

incorporate multiple measures together

Extend our method to four and five word terms difficulty: finding a test set

Page 21: Determining the Syntactic Structure of Medical Terms in Clinical Notes

Thank you

Software:

Ngram Statistic Package (NSP)www.d.umn.edu/~tpederse/nsp.html

Log Likelihood Ratio Modelswww.cs.umn.edu/~bthomson/mti.html

Page 22: Determining the Syntactic Structure of Medical Terms in Clinical Notes

Log Likelihood Equation

2 * ∑xyz ( nxyz * log(nxyz / mxyz) )

Page 23: Determining the Syntactic Structure of Medical Terms in Clinical Notes

Expected Values

2 * ∑xyz ( nxyz * log(nxyz / mxyz) )

Non-branching: mxyz = nx++ * n+y+ * n++z / n+++

Left-branching: mxyz = nxy+ * n++z / n+++

Right-branching: mxyz = nx++ * n+yz / n+++