christopher rhodes uhd-reu 17 july 2009 clinical free processing

CHRISTOPHER RHODESUHD-REU

17 JULY 2009

Clinical Free Processing

Clinical Free Text

Primary data about patients As opposed to journal articles

Problems posed to Natural Language Processing Documents need to be edited for confidentiality

reasons which takes time and money These texts do not follow strictly-edited format, which

means the texts could contain various sub-language characterisitics (fragmented sentences, abbreviations, doctor-dependant notes, etc…)

ICD-9-CM

"The International Classification of Diseases, 9th Revision, Clinical Modification" (ICD-9-CM)

ICD-9-CM Examples

Astham 493.xx

Diabetes 250.xx

Hyperlipidemia 272.0 - 272.4

Arthritis 714.0 – 715.9

Hypertension 401.1 – 401.9

Ischemic heart disease 410.0 – 414.9x

Depression/dysthymia 296.2, 296.3, 296.82, 296.9, 300.4, 309.0, 309.1, 311

“x” refers to any possible number in that subset

The Research

Automated System for assigning ICD-9-CM codes using Natural Language Processing

The types of files: Radiology Reports Contains Majority ICD-9-CM code Contain Clinical History and Impressions

Set of Training and Testing Data

The Research

Testing File -> Parsed Testing File (miniPar)

Where We Are

Beginning State Complete program ran with 50.9% accuracy Merged Training files to specific codes

Possible advances that didn’t work Changing from total summed score from all training sentences

in a training document to the highest individual sentence score of all the training sentences (i.e. Ideally the best sentence match)

Manipulating the way the score/weight is calculated for the above method

Current state Reverted to original program’s algorithm for finding the score Normalized the merged training files (Score / total sentences) Complete program now ran with 60.5% accuracy

Future Hopes

Word Importance Medical words should receive higher priority (score)

than non-medical words that are matched between sentences. We will use the UMLS medical database for active word searching and comparing.

Negation Search for words like “no”, “hardly”, “none”, “doesn’t”,

“not” and accurately deals with certain Training documents For instance “No pneumonia” should not match to a

training document with the code for pneumonia, but it will if negation is not taken into consideration

Goal

Our goal when adding the previous two attributes to our program is to have about 80% accuracy or more.

The top accuracy for this type of program was submitted by Szeged at 89.08% accuracy.

References

A Shared Task Involving Multi-label Classification of Clinical Free Text - John P. Pestian, Christopher Brew, Pawe Matykiewicz, DJ Hovermale, Neil Johnson, K. Bretonnel Cohen, W lodzis law Duch4