download easycie and dataset - byu data mining labdml.cs.byu.edu/chs/ichi2017/content/tut2.pdf ·...
TRANSCRIPT
B I O M E D I C A L I N F O R M A T I C S
DOWNLOAD EASYCIE AND DATASET
https://goo.gl/evdf1c
https://goo.gl/xjr7xe
B I O M E D I C A L I N F O R M A T I C S
EASYCIE: A DEVELOPMENT PLATFORM TO SUPPORT
QUICK AND EASY, RULE-BASED
CLINICAL INFORMATION EXTRACTIONJIANLIN SHI, MS MD
DANIELLE MOWERY MS PHD
FIFTH IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS
2017 TUTORIAL
B I O M E D I C A L I N F O R M A T I C S
WHAT IS NATURAL LANGUAGE PROCESSING (NLP)?
Natural
language
processing
Structured data
(machine
interpretable) ✓ Classify
✓ Extract
✓ SummarizeClinical Texts
B I O M E D I C A L I N F O R M A T I C S
WHAT IS NATURAL LANGUAGE PROCESSING (NLP)?
✓ Classify – patients for stroke
✓ Extract
✓ Summarize
Mowery D, Hill B, Chapman W, Cannon-Albright Lisa, Majersik J. Development of a knowledge base to
support the automatic classification of a computable ischemic stroke phenotype from electronic
medical records. Neurology: Genetics; 2017. PubMed PMID: 28428978; PubMed Central PMCID:
PMC5390740
B I O M E D I C A L I N F O R M A T I C S
WHAT IS NATURAL LANGUAGE PROCESSING (NLP)?
Garvin JH, DuVall SL, South BR, et al. Automated extraction of ejection fraction for quality
measurement using regular expressions in Unstructured Information Management Architecture (UIMA) for heart failure. Journal of the American Medical Informatics Association : JAMIA. 2012;19(5):859-866.
✓ Classify
✓ Extract – identify ejection fractions
✓ Summarize
The left ventricular
cavity size and wall
thickness appear
normal. The wall
motion and left
ventricular systolic
function appears
hyperdynamic with
estimated ejection
fraction of 70%.
There is near-cavity
obliteration seen.
ejection fraction = 70 percent
B I O M E D I C A L I N F O R M A T I C S
WHAT IS NATURAL LANGUAGE PROCESSING (NLP)?
Mowery DL, Jordan P, Wiebe J, Harkema H, Dowling J, Chapman WW. Semantic Annotation of
Clinical Events for Generating a Problem List. AMIA Annual Symposium Proceedings. 2013;2013:1032-
1041.
✓ Classify
✓ Extract
✓ Summarize – create an active problem list
“Diagnosis: myocardial infarction (MI)”..
B I O M E D I C A L I N F O R M A T I C S
WHY IS NLP SO DIFFICULT?Synonyms
coughs coughburning up fevershort of breath dyspnea
Abbreviations/Acronymsfeb febrile or february?n/v nausea/vomitingsob shortness of breath
Truncationsposs possible
Concatenationsblurredvision burred visionflus sxs flu symptoms
Misspellings & typographic errorsnausa nauseadiahrea diarrhea
QuantificationsBP 140/90 hypertension
Contextual descriptorsno cough cough_NEGATEDchildhood cough cough_HISTORICALbrother has cough cough_NOT_PATIENTreturn if cough cough_HYPOTHETICAL
Discoursesentences sections notes visits
B I O M E D I C A L I N F O R M A T I C S
HOW IS NLP USED IN THE CLINICAL DOMAIN?
• Clinical Decision Support
– Identifying Medline articles to support clinician information needs (Zhang et al. 2013)
• Quality Improvement
– Measuring quality of colonoscopy procedures (Harkema et al. 2011)
• Hospital Operations
– Automating the coding of medicall billing codes (Stanfill et al. 2010)
• Genetic Studies
– Supporting high throughput phenotyping (Pathak et al. 2013)
• Biosurvelliance
– Detecting Influenza from emergency department visits (Ye et al. 2014)
B I O M E D I C A L I N F O R M A T I C S
USE CASES: CONTRACEPTIVE METHODS
Shi J, Mowery D, Chapman WW, Zhang M, Sanders J, Gawron L. Extracting Intrauterine Device Usage
from Clinical Texts using Natural Language Processing. ICHI (in press).
MotivationAutomatically identifying high-risk women not
using contraceptive methods (e.g., intrauterine
device) for counseling and reproductive planning
could mitigate adverse outcomes.
Needs AssessmentExtract mentions of contraceptive methods and
their contexts from clinical textsLori Gawron MD MPH
B I O M E D I C A L I N F O R M A T I C S
USE CASES: CONTRACEPTIVE METHODS
Learn more about EASYCIE
applied to this use case:
August 25, 2017. 5:05pm-6:30pm
Poster Session 2: #21
Jianlin Shi MD MS
Shi J, Mowery D, Chapman WW, Zhang M, Sanders J, Gawron L. Extracting Intrauterine Device Usage
from Clinical Texts using Natural Language Processing. ICHI (in press).
B I O M E D I C A L I N F O R M A T I C S
USE CASES: PNEUMONIA DIAGNOSIS
South BR, Mowery DL, Kramer H, Jones B, Castine M, Hillert D, Sibitsky M, Chapman WW. Assessing Visualization and Semantic
Priming on Classifying Supporting, Refuting, or Uncertain Evidence for Suspected Pneumonia Case Review. (under review)
MotivationDiagnosing patients with pneumonia can be
difficult due to presentation of non-specific signs
and symptoms and elusive pathogen discovery
Needs AssessmentExtract variables (e.g., fever, rales, worsening
cough) associated with pneumonia to improve
diagnostic accuracy and reduce cognitive biases.Barbara Jones MD, MSCI
C O N F I D E N T I A L
• Getting back to the basics
• Collecting ingredients and reading the recipes
• Cooking with the “easy button”; Multi-tasking in the kitchen
• Hands on exercise
• Borrowing a cup of sugar
ROADMAP
C O N F I D E N T I A L
Getting back to the basics
B I O M E D I C A L I N F O R M A T I C S
NLP PIPELINEFHx: Sister had childhood fevers.Sentence Segmentation
FHx : Sister had childhood fevers .Tokenization
Family History : Sister had childhood fever.Term Normalization
Family History : Sister had childhood fever .JJ NN NN VBD JJ NN .
Part-of-Speech Tagging
Family History : Sister had childhood fever .[ NP ] : [ NP ] [ VP ] .
Shallow Parsing
Family History : Sister had childhood fever .UMLS code: Fever- C0015967
Named Entity Recognition
Family History : Sister had childhood fever .temporality = historicalexperiencer = nonpatientnegation = affirmed
Assertion Classification
B I O M E D I C A L I N F O R M A T I C S
NLP PIPELINEFHx: Sister had childhood fevers.Sentence Segmentation
FHx : Sister had childhood fevers .Tokenization
Family History : Sister had childhood fever.Term Normalization
Family History : Sister had childhood fever .JJ NN NN VBD JJ NN .
Part-of-Speech Tagging
Family History : Sister had childhood fever .[ NP ] : [ NP ] [ VP ] .
Shallow Parsing
Family History : Sister had childhood fever .UMLS code: Fever- C0015967
Named Entity Recognition
Family History : Sister had childhood fever .temporality = historicalexperiencer = nonpatientnegation = affirmed
Assertion Classification
EasyCIE processed
on backend
B I O M E D I C A L I N F O R M A T I C S
USE CASE: PNEUMONIA INDICATORS
http://www.cdc.gov/nhsn/pdfs/pscmanual/17pscnosinfdef_current.pdf
B I O M E D I C A L I N F O R M A T I C S
IDENTIFYING FEVER EXPRESSIONS FROM CLINICAL TEXT
Lay terms
B I O M E D I C A L I N F O R M A T I C S
IDENTIFYING FEVER EXPRESSIONS FROM CLINICAL TEXT
Morphology
afebrile: a = without
febrile = fever
B I O M E D I C A L I N F O R M A T I C S
IDENTIFYING FEVER EXPRESSIONS FROM CLINICAL TEXT
Quantifications• Units
• Celsius• Fahrenheit
• Numbers• Whole• Decimal
B I O M E D I C A L I N F O R M A T I C S
IDENTIFYING FEVER EXPRESSIONS FROM CLINICAL TEXT
Other contexts• Course
• “fever abated”
• Hypothetical• “return if
fever”
C O N F I D E N T I A L
Collecting ingredients
and reading the recipe
B I O M E D I C A L I N F O R M A T I C S
POS_DOC NEG_DOC
Input corpus Define rule files Apply the rules Compare outputs
Update rule files
ROADMAP
B I O M E D I C A L I N F O R M A T I C S
POS_DOC NEG_DOC
ROADMAP
One or more positive mentions of pneumonia indicators No positive mentions of pneumonia indicators
B I O M E D I C A L I N F O R M A T I C S
POS_DOC NEG_DOC
Input corpus Define rule files
ROADMAP
C O N F I D E N T I A L
DEFINE MENTION ANNOTATION LOGIC
• Assertions– Negation: Affirmed/Certain
• “complains of shortness of breath”
• “cannot rule out pneumonia”
• “likely pna”
– Temporality: Present• “dx of bacter. pneumonia”
• “return if worsening fever”
– Experiencer: Patient• “Pt has worsening cough”
• “Patient is febrile”
• Value in abnormal range– high fever: Temp > 101.3 F
– low oxygen saturation: O2
saturation less than 90%
• Assertions– Negation: Negated/Uncertain
• “denies shortness of breath”
• “rule out pneumonia”
• “unlikely pna”
– Temporality:
Historical/Hypothetical• “history of bacter. pneumonia”
• “return if fever”
– Experiencer: Non-patient• “brother has worsening cough”
• “roommate is febrile”
• Value in normal range– Normal or low fever: Temp <101.3 F
– low oxygen saturation: O2
saturation under 90%
Inclusionary mention (+) Exclusionary mention (-)
B I O M E D I C A L I N F O R M A T I C S
IDENTIFY TARGETS AND MODIFIERS
Patient denies cough but complains of headache.
trigger term termination
term
scope
Finding (target): cough
Negation(modifier): negated
NegEx algorithm
Courtesy: Wendy Chapman
Finding (target): headache
Negation(modifier): (default) affirmed
B I O M E D I C A L I N F O R M A T I C S
IDENTIFY FEVER AND MODIFIERS
She stated she was burning up..
Finding (target): fever
Negation(modifier): (default) affirmed
Uncertain (modifier): (default) certain
Temporality (modifier): (default) present
Experiencer (modifier): (default) patient
B I O M E D I C A L I N F O R M A T I C S
IDENTIFY FEVER AND MODIFIERS
fever had abated..
Finding (target): fever
Negation(modifier): negated
Uncertain (modifier): (default) certain
Temporality (modifier): (default) present
Experiencer (modifier): (default) patient
B I O M E D I C A L I N F O R M A T I C S
IDENTIFY FEVER AND MODIFIERS
temperature of 38C..
Finding (target): fever
Negation(modifier): (default) affirmed
Finding (target): fever
Negation(modifier): (default) affirmed
Uncertain (modifier): (default) certain
Temporality (modifier): (default) present
Experiencer (modifier): (default) patient
B I O M E D I C A L I N F O R M A T I C S
IDENTIFY FEVER AND MODIFIERS
She improved and became afebrile..
Finding (target): fever
Negation(modifier): negated
Uncertain (modifier): (default) certain
Temporality (modifier): (default) present
Experiencer (modifier): (default) patient
B I O M E D I C A L I N F O R M A T I C S
POS_DOC NEG_DOC
Input corpus Define rule files Apply the rules
ROADMAP
C O N F I D E N T I A L
DEFINE TASK LOGIC AKA “THE RECIPE”
Is this mention an indicator of pneumonia?
Is the mention inclusionary (+)? Is the mention exclusionary (-)?
Ignore mentionMark mention
Is there one mention of a pneumonia diagnoses?
Classify document as NEG_DOCClassify document as POS_DOC
Step 1: Classify indicators of pneumonia in document
Step 2: Classify whether document contains 1+ indicators
B I O M E D I C A L I N F O R M A T I C S
DEFINE MENTION ANNOTATION LOGIC
Is this mention an indicator of pneumonia?
Is the mention inclusionary (+)? Is the mention exclusionary (-)?
Ignore mentionMark mention
“Patient has fever” “Patient should return if febrile”
B I O M E D I C A L I N F O R M A T I C S
DEFINE DOCUMENT CLASSIFICATION LOGIC
Is there one mention of a pneumonia indicator?
Classify document as NEG_DOCClassify document as POS_DOC
“Patient has PNA” “Pneumonia unlikely”
B I O M E D I C A L I N F O R M A T I C S
POS_DOC NEG_DOC
Input corpus Define rule files Apply the rules Compare outputs
ROADMAP
B I O M E D I C A L I N F O R M A T I C S
HOW WELL DOES NLP DETECT INDICATORS?
= POS_DOC
= NEG_DOC= POS_DOC
= NEG_DOC
Expert reviewed
NLP classified
True positive
False positive
False negative
True negative
B I O M E D I C A L I N F O R M A T I C S
HOW WELL DOES NLP DETECT INDICATORS?
= POS_DOC
= NEG_DOC= POS_DOC
= NEG_DOC
Expert reviewed
NLP classified
Sensitivity (recall) =TP / TP+FN
n= 4 = 57%
4+3
TP: 4
TN: 1FP: 2
FN: 3
B I O M E D I C A L I N F O R M A T I C S
HOW WELL DOES NLP DETECT INDICATORS?
= POS_DOC
= NEG_DOC= POS_DOC
= NEG_DOC
Expert reviewed
NLP classified
Positive predictive value (precision) =TP / TP+FP
n
TP: 4
TN: 1FP: 2
FN: 3
= 4 = 67%4+2
B I O M E D I C A L I N F O R M A T I C S
HOW WELL DOES NLP DETECT INDICATORS?
= POS_DOC
= NEG_DOC= POS_DOC
= NEG_DOC
Expert reviewed
NLP classified
F1-score (harmonic mean of precision and recall) = 2 * (p * r)(p + r)
TP: 4
TN: 1FP: 2
FN: 3
= 2 x (57% x 67%) = 62% (57% + 67%)
B I O M E D I C A L I N F O R M A T I C S
POS_DOC NEG_DOC
Input corpus Define rule files Apply the rules Compare outputs
Update rule files
ROADMAP
C O N F I D E N T I A L
Cooking with the “easy button” &
Multi-tasking in the kitchen
B I O M E D I C A L I N F O R M A T I C S
POS_DOC NEG_DOC
Input corpus Define rule files Apply the rules Compare outputs
Update rule files
ROADMAP
Easy Clinical Information Extractor
B I O M E D I C A L I N F O R M A T I C S
PREPARE THE PRACTICE DATASET(ALREADY DONE)
MIMIC II demo dataset
https://physionet.org/mimic2/demo/
Consist of 4000 deceased ICU patients
Select 50 encounters has ICD9 code start with: 480 Viral pneumonia
481 Pneumococcal
482 Other bacterial pneumonia
483 Pneumonia due to other specified organism
484 Pneumonia in infectious diseases classified elsewhere
485 Bronchopneumonia, organism unspecified
486 Pneumonia, organism unspecified
And 50 encounters that do not have any ICD9 code above
B I O M E D I C A L I N F O R M A T I C S
PREPARE THE PRACTICE DATASET(ALREADY DONE)
• For demonstration purpose, sampled:
• 70 radiology reports,
• 20 discharge summaries
• 10 nursing notes
B I O M E D I C A L I N F O R M A T I C S
GOAL:
• Identify any indication of pneumonia
– Including signs and symptoms
– Diagnoses
– Lab tests
– CT findings
– Not include treatments (narrow the scope)
• Conclude whether a document has any
indication or not
B I O M E D I C A L I N F O R M A T I C S
PREPARE THE GOLD STANDARD(ALREADY DONE)
• Two clinical annotators
• The 3rd annotator solves the disagreement
• Split the dataset to 60 for training, 40 for
testing.
B I O M E D I C A L I N F O R M A T I C S
ANNOTATION SCHEMA
• IND_PNEUMONIA: for any mention that
indicate pneumonia
• Pos_Doc: for the documents that have any
indication of pneumonia
• Neg_Doc: for the documents that do not
have
B I O M E D I C A L I N F O R M A T I C S
POS_DOC NEG_DOC
Input corpus
ROADMAP
B I O M E D I C A L I N F O R M A T I C S
IMPORT TEXT DOCUMENTS
POS_DOC NEG_DOC
Click on ImportDocuments
B I O M E D I C A L I N F O R M A T I C S
IMPORT GOLD ANNOTATIONS
POS_DOC NEG_DOC
Click on ImportAnnotations
B I O M E D I C A L I N F O R M A T I C S
POS_DOC NEG_DOC
Input corpus Define rule files
ROADMAP
B I O M E D I C A L I N F O R M A T I C S
DEFINING RULES
• Target rules
• Context rules
• Feature inference rules
• Document inference rules
Courtesy: Wendy Chapman
B I O M E D I C A L I N F O R M A T I C S
DEFINING TARGET RULES
Most of concepts have already been included, need improve
B I O M E D I C A L I N F O R M A T I C S
DEFINE MODIFIER AND TARGET RULES FILES
Click on Valueto update rules for
ruleFile and cRuleFile
modifiers = ruleFiletargets = cRuleFile
B I O M E D I C A L I N F O R M A T I C S
DEFINE RULEFILE (MODIFIER RULE FILE)
Actual are extracted and classified: “high fever”
Pseudos are ignored: “Yellow Fever vaccination clinic”
Modifiers and their value sets; (d) = default value
Negation = {affirmed (d), negated} --
Certainty = {certain (d), uncertain}
Temporality = {present (d), historical, hypothetical}
Experiencer = {patient (d), nonpatient}
Rule file (modifier dictionary given)
rule string|direction|trigger type|modifier|window size
resolved|backward|actual|negation |30
possible |forward| actual| uncertain|8
in the past| bidirectional| actual|temporality| 8
mom|forward|actual|experiencer|8
vaccination clinic|backward|pseudo|8
B I O M E D I C A L I N F O R M A T I C S
DEFINING CONTEXT RULES
Most of concepts have already been included, need improve
B I O M E D I C A L I N F O R M A T I C S
DEFINING FEATURE INFERENCE RULES
To exclude the mentions that you don’t want (Done)
B I O M E D I C A L I N F O R M A T I C S
DEFINING DOCUMENT INFERENCE RULES
When conclude "Pos_Doc" (Done)
B I O M E D I C A L I N F O R M A T I C S
POS_DOC NEG_DOC
Input corpus Define rule files Apply the rules
ROADMAP
Easy Clinical Information Extractor
B I O M E D I C A L I N F O R M A T I C S
RUN EASYCIE – ONE CLICK!
Select RunEasyCIE
B I O M E D I C A L I N F O R M A T I C S
REVIEW AND COMPARE RESULTS – ONE CLICK!
Select ViewOutputinDB
B I O M E D I C A L I N F O R M A T I C S
RESULT VIEW
B I O M E D I C A L I N F O R M A T I C S
POS_DOC NEG_DOC
Input corpus Define rule files Apply the rules Compare outputs
ROADMAP
B I O M E D I C A L I N F O R M A T I C S
COMPARE OUTPUTS
B I O M E D I C A L I N F O R M A T I C S
REVIEW & ANALYZE ERRORS
B I O M E D I C A L I N F O R M A T I C S
DEBUG ERRORS (1)Use a snippet to test the pipeline
B I O M E D I C A L I N F O R M A T I C S
DEBUG ERRORS (2)A step by step output display for each component
B I O M E D I C A L I N F O R M A T I C S
DEBUG ERRORS (3)Details view of all clues for final output
B I O M E D I C A L I N F O R M A T I C S
POS_DOC NEG_DOC
Input corpus Define rule files Apply the rules Compare outputs
Update rule files
ROADMAP
Easy Clinical Information Extractor
C O N F I D E N T I A L
Hands-on exercise
C O N F I D E N T I A L
Borrowing a cup of sugar
B I O M E D I C A L I N F O R M A T I C S
REUSE OTHERS’ WORK
B I O M E D I C A L I N F O R M A T I C S
AUTOMATE THE REST CONFIGURATION
B I O M E D I C A L I N F O R M A T I C S
ACKNOWLEDGEMENTS
Wendy Chapman
Barbara Jones
Kelly Peterson
B I O M E D I C A L I N F O R M A T I C Sd a n i e l l e . m o w e r y @ u t a h . e d u
Arches
National Park
Capitol Reef
National Park
Cedar Breaks
National Park
T
h
a
n
k
y
o
u
B I O M E D I C A L I N F O R M A T I C S
PLEASE FILL IN THE SURVEY
• We highly appreciate your feedbacks:
• https://goo.gl/forms/1EcprdPCkzqhTWy52
• Thank you!