christopher rhodes uhd-reu 17 july 2009 clinical free processing
TRANSCRIPT
CHRISTOPHER RHODESUHD-REU
17 JULY 2009
Clinical Free Processing
Clinical Free Text
Primary data about patients As opposed to journal articles
Problems posed to Natural Language Processing Documents need to be edited for confidentiality
reasons which takes time and money These texts do not follow strictly-edited format, which
means the texts could contain various sub-language characterisitics (fragmented sentences, abbreviations, doctor-dependant notes, etc…)
ICD-9-CM
"The International Classification of Diseases, 9th Revision, Clinical Modification" (ICD-9-CM)
ICD-9-CM Examples
Astham 493.xx
Diabetes 250.xx
Hyperlipidemia 272.0 - 272.4
Arthritis 714.0 – 715.9
Hypertension 401.1 – 401.9
Ischemic heart disease 410.0 – 414.9x
Depression/dysthymia 296.2, 296.3, 296.82, 296.9, 300.4, 309.0, 309.1, 311
“x” refers to any possible number in that subset
The Research
Automated System for assigning ICD-9-CM codes using Natural Language Processing
The types of files: Radiology Reports Contains Majority ICD-9-CM code Contain Clinical History and Impressions
Set of Training and Testing Data
The Research
Testing File -> Parsed Testing File (miniPar)
Where We Are
Beginning State Complete program ran with 50.9% accuracy Merged Training files to specific codes
Possible advances that didn’t work Changing from total summed score from all training sentences
in a training document to the highest individual sentence score of all the training sentences (i.e. Ideally the best sentence match)
Manipulating the way the score/weight is calculated for the above method
Current state Reverted to original program’s algorithm for finding the score Normalized the merged training files (Score / total sentences) Complete program now ran with 60.5% accuracy
Future Hopes
Word Importance Medical words should receive higher priority (score)
than non-medical words that are matched between sentences. We will use the UMLS medical database for active word searching and comparing.
Negation Search for words like “no”, “hardly”, “none”, “doesn’t”,
“not” and accurately deals with certain Training documents For instance “No pneumonia” should not match to a
training document with the code for pneumonia, but it will if negation is not taken into consideration
Goal
Our goal when adding the previous two attributes to our program is to have about 80% accuracy or more.
The top accuracy for this type of program was submitted by Szeged at 89.08% accuracy.
References
A Shared Task Involving Multi-label Classification of Clinical Free Text - John P. Pestian, Christopher Brew, Pawe Matykiewicz, DJ Hovermale, Neil Johnson, K. Bretonnel Cohen, W lodzis law Duch4