my 9-year-old asked me what i could possibly teach researchers at ibm about nlp: requirements for...

Download My 9-year-old Asked me What I Could Possibly Teach Researchers at IBM about NLP: Requirements for Useful Natural Language Understanding of Clinical Text

If you can't read please download the document

Upload: eleanor-mathews

Post on 18-Jan-2018

217 views

Category:

Documents


0 download

DESCRIPTION

Biomedical Language Understanding

TRANSCRIPT

My 9-year-old Asked me What I Could Possibly Teach Researchers at IBM about NLP: Requirements for Useful Natural Language Understanding of Clinical Text Wendy W. Chapman, PhD Division of Biomedical Informatics Biomedical Language Understanding Lab Background BA Linguistics Chinese Literature PhD Medical Informatics Post-doc BMI Faculty DBMI U of Utah Wisconsin U of Utah U of Pittsburgh UCSD Faculty DBMI 2010 Biomedical Language Understanding IBM's Waston moves on to medicine Fresh off its butt-kicking performance on Jeopardy!, IBMs supercomputer "Watson" has enrolled in medical school at Columbia University, New York Daily News February 18th 2011 IBM's computer could very well herald a whole new era in medicine. " ComputerWorld February 17, 2011 Dr. Watson?? Clinical NLP Since 1960s Why has clinical NLP had little impact on clinical care? Overview Challenges and trends in clinical NLP Requirements for successful and useful clinical NLP What issues are important in clinical NLP? Two clinical NLP applications Clinical Environments Challenging Organizational factors Complex regulations/rules to follow Many departments/groups to involve Conservative culture Users Busy and demanding clinicians Diverse backgrounds and needs Most do not like frequent or significant changes Workflows Complex Vary by users/departments/specialties Courtesy Yang Huang, Kaiser Permanente Sharing difficult Sharing clinical data difficult politically Have not had shared datasets for development and evaluation NLP applications silos and black boxes No open source clinical NLP applications No open source modules trained on clinical reports Methods trained on general English lose accuracy on clinical text Recent Trends Shared tasks I2b2 Cincinnati Challenge TREC medical records Some available data University of Pittsburgh NLP Repository I2b2 reports MIMIC II dataset Open source tools HiTex cTAKES Pos tagger Chunker Deep parser Generating layered grammatical and semantic annotations for 1 million words AMIA NLP Working Group annotation grant SHARP proposal Common Models for Interoperability Map of NLP annotation types from many clinical systems Guy Divita Common information models UIMA common type system Common annotation model for shared input/output Mapping NLP schemas to clinical models Clinical event models (IHC) Overview Challenges in clinical NLP Requirements for successful and useful clinical NLP What issues are important in clinical NLP? Two clinical NLP applications Jeremiahs Question Posed this question to NLP WG discussion list 90 responses What could you possibly teach IBM researchers about NLP? Successful Medical AI System Can explain reasoning Has the ability to deal with uncertainty Can model with missing data Integrates linguistic and domain knowledge Can answer questions about facts, degree, quantity, location, size, color, similarity Performs temporal reasoning Can explain disease process by relating higher-level inductions with first principles Physiology, anatomy, pathophysiology Models plans integrating general guidelines with individual facts Useful Clinical Application Performing as well as a physician, improving patient care are not good enough motivators for use Fits within workflow Not cost any time Save time/money Very few false positives or repeated mistakes Alert fatigue Accounts for the fact that mistakes could cause harm Assists human rather than replace human Overview Challenges in clinical NLP Requirements for successful and useful clinical NLP What issues are important in clinical NLP? Two clinical NLP applications Why is Clinical NLP Difficult? Named entity recognition Linguistic variation Polysemy Finding validation Implication Contextual attribute assignment Negation Uncertainty Temporality Discourse processing Report structure Coreference Linguistic Variation Different Words with the Same Meaning Derivation mediastinal = mediastinum Inflection opacity = opacities; cough = coughed Synonymy Addisons Disease: Addison melanoderma, adrenal insufficiency, adrenocortical insufficiency, asthenia pigemntosa, bronzed disease, melasma addisonii, Chest wall tenderness: chest wall did demonstrate some slight tenderness when the patient had pressure applied to the right side of the thoracic cage Polysemy One Word With Multiple Meanings General polysemy Patient was prescribed codeine upon discharge The discharge was yellow and purulent Acronyms and Abbreviations APC: activated protein c, adenomatosis polyposis coli, adenomatous polyposis coli, antigen presenting cell, aerobic plate count, advanced pancreatic cancer, age period cohort, alfalfa protein concentrated, allophycocyanin, anaphase promoting complex, anoxic preconditioning, anterior piriform cortex, antibody producing cells, atrial premature complex, Negation Approximately half of all clinical concepts in dictated reports are negated* Explicit negation The mediastinum is not widened Mediastinal widening: absent Implied absence without negation Lungs are clear upon auscultation Rales/crackles: absent Rhonchi: absent Wheezing: absent *Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. Evaluation of negation phrases in narrative clinical reports. Proc AMIA Sym. 2001:105-9. Uncertainty Unsure treated for a presumptive sinusitis Reasoning It was felt that the patient probably had a cerebrovascular accident involving the left side of the brain. Other differentials entertained were perhaps seizure and the patient being post-ictal when he was found, although this consideration is less likely Reason for exam R/O out pneumonia. Temporality Clinical reports tell a story Past medical history History of CHF presenting with shortness of left-sided chest pain. Conditional or generic mentions He should return for fever or increased shortness of breath. Temporal course of disease Patient presents with chest pain After administration of nitroglycerin, the chest pain resolved. Finding Validation Mention of a finding in the text does not guarantee the patient has the finding She received her influenza vaccine His temperature was taken in the ED Some findings require values Fever Temperature 38.5C Oxygen desaturation Oxygen saturation low Oxygen saturation 85% on room air Implication Audience for patient reports is physicians Lay people less accurate at determining if a chest x- ray report shows evidence of Pneumonia Pneumonia not mentioned in 2/3 of positive reports Sentence level inference There were hazy opacities in the lower lobes Localized infiltrate Report level inference Localized infiltrates Probable pneumonia Report Structure Anatomic Location sometimes in section header NECK: no adenopathy. Some sections carry more weight IMPRESSION: atelectasis Some reports contain pasted text difficult to process Cardiovascular: [ ] Angina [ ] MI [x ] HTN [ ] CHF [ ] PVD [ ] DVT [ ] Arrhythmias [ ] Previous PTCA [ ] Previous Cardiac Surgery [ ] Negative - Denies CV problems Coreference Chest x-ray again shows a well- circumscribed nodule located in the left upper lobe. The tumor has increased in size since the last exam with a diameter of approximately 2 cm. How big is the nodule? Has the nodule increased in size? Where is the tumor? Overview Challenges in clinical NLP Requirements for successful and useful clinical NLP What issues are important in clinical NLP? Two clinical NLP applications Two Applications from the BLU Lab TopazOnyx Topaz ED Reports NLP Modules Locate Instances of 55 conditions Assign values to contextual features Syndrome Classifier Determine values For 55 conditions Topaz ED Reports NLP Modules Locate Instances of 55 core concepts Assign values to contextual features Determine values For 55 concepts Core Concept acute chronic absent Step 1 Locate core concepts in text The patient is a **AGE[in 40s]-year-old black female with a past history of pneumonia. She presents with a chief complaint of right-sided chest tightness that is pleuritic in nature. She has had a productive cough for the last 4 days. Her husband has been recently treated for bronchitis. She denies shortness of breath and fever. IndexFinder Map to Core Concepts ED Reports Emergency Department Reports Tokenizers words sentences sections trigger terms UMLS Concepts four rules Map CUIs to Concepts Direct: cough (CUI) Cough Compound: short (CUI) + breath (CUI) Dyspnea Measurement: Temp + 40C Fever Section: Abdomen section + distended Abdominal Distention Negation Is the condition negated? Negated Affirmed Patient Experience Did the patient experience the condition? Yes No Temporality When did the condition occur? Historical Recent Generic/Conditional Was the condition mentioned in a generic or conditional sense? Generic Not Step 2 - Assign Contextual Features ConText (NegEx extension) Dyspnea Acute Pneumonia Fever Pleuritic Pain Asthma CHF... Dyspnea Negation:Affirmed Patient Exp:Yes Temporality:Recent Certainty:High Integrator Short of breath for 4 daysDoes not appear dyspneic. Dyspnea Negation:Negated Patient Exp:Yes Temporality:Recent Certainty:High Step 3 - Integrate Information Evaluation Test Set 30 ED reports Reference Standard Physician annotation of 55 ALRS conditions in reports 651 ALRS Conditions Locate Instances of 55 conditions Assign values to contextual features Determine values For 55 conditions Results: Assign Values to 55 Conditions Weighted kappa TopazPhysician 0.85 Topaz shows high agreement with a physician Onyx At (translucency, numberEight) & surfaceOf (numberEight, mesial) & stateOf (translucency, possible) Semantic Models Semantic Models Syntactic Analyzer Syntactic Analyzer Context-free Grammar Context-free Grammar Training Corpus Training Corpus Semantic Analyzer Semantic Analyzer Eight mesial might have a slight translucency Semantic Models Concept Models - Model single concepts Semantic Network - Model relationships Concept Models eight mesio might have a slight translucency Use words to infer concepts Use syntax to place words in semantically-typed leaf nodes Bayesian classifiers Caries Dental Condition Tooth 8 Tooth # Mesial Surface Possible State SuperficialCaries SeverityCondition Concept Models as Templates Dental Condition* Caries Condition Concept* caries Condition Termtranslucency Severity Concept* Superficial Severity Termslight Dental Condition SuperficialCaries SeverityCondition slight translucency Caries Dental Condition Model Relationships among Concepts/Templates Semantic Network at (Cond: translucency, AL: numberEight) surfaceOf (AL: numberEight, S: mesial) stateOf (Cond: translucency, St: possible) at (Condition, Anatomic location) surfaceOf (AnatomicLocation, Surface) stateOf (Dental Condition, State) Creating Training Cases for Concepts and Relationships Templates Semantic Model ONYXs annotations allow detailed question answering Show me all of the reports indicating pneumonia without atelectasis Are there any patients with consolidation in both lungs? Show me all cases of pneumonia > 2 weeks old Show me reports that describe findings consistent with pneumonia in the right lung Speech NLP Chart Onyx Dental Exams Number one Is missing. Two is fine. Caries on Tooth 3. Titus Schleyer, Lee Christensen, Peter Haug, Jeannie Irwin, Henk Conclusion Many challenges in clinical NLP Inherent in developing clinical AI systems Caused by peculiarities of clinical language Growing community More sharing more progress Foundation Applications Translation Need to develop useful tools Collaborators Lee Christensen Peter Haug Sascha Dublin Eric Baldwin Henk Harkema Danielle Mowery John Dowling Brian Chapman Mike Conway Thank you! Review of Clinical Records Contextual Feature Assignment 1,620 annotations FeatureRecallPrecision Negation(773)97%97% Historical(98)67%74% Hypothetical (40)83%94% Experiencer (8)50%100%