download easycie and dataset - byu data mining labdml.cs.byu.edu/chs/ichi2017/content/tut2.pdf ·...

76
B IOMEDICAL I NFORMATICS DOWNLOAD EASYCIE AND DATASET https://goo.gl/evdf1c https://goo.gl/xjr7xe

Upload: others

Post on 31-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

DOWNLOAD EASYCIE AND DATASET

https://goo.gl/evdf1c

https://goo.gl/xjr7xe

Page 2: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

EASYCIE: A DEVELOPMENT PLATFORM TO SUPPORT

QUICK AND EASY, RULE-BASED

CLINICAL INFORMATION EXTRACTIONJIANLIN SHI, MS MD

DANIELLE MOWERY MS PHD

FIFTH IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS

2017 TUTORIAL

Page 3: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

WHAT IS NATURAL LANGUAGE PROCESSING (NLP)?

Natural

language

processing

Structured data

(machine

interpretable) ✓ Classify

✓ Extract

✓ SummarizeClinical Texts

Page 4: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

WHAT IS NATURAL LANGUAGE PROCESSING (NLP)?

✓ Classify – patients for stroke

✓ Extract

✓ Summarize

Mowery D, Hill B, Chapman W, Cannon-Albright Lisa, Majersik J. Development of a knowledge base to

support the automatic classification of a computable ischemic stroke phenotype from electronic

medical records. Neurology: Genetics; 2017. PubMed PMID: 28428978; PubMed Central PMCID:

PMC5390740

Page 5: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

WHAT IS NATURAL LANGUAGE PROCESSING (NLP)?

Garvin JH, DuVall SL, South BR, et al. Automated extraction of ejection fraction for quality

measurement using regular expressions in Unstructured Information Management Architecture (UIMA) for heart failure. Journal of the American Medical Informatics Association : JAMIA. 2012;19(5):859-866.

✓ Classify

✓ Extract – identify ejection fractions

✓ Summarize

The left ventricular

cavity size and wall

thickness appear

normal. The wall

motion and left

ventricular systolic

function appears

hyperdynamic with

estimated ejection

fraction of 70%.

There is near-cavity

obliteration seen.

ejection fraction = 70 percent

Page 6: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

WHAT IS NATURAL LANGUAGE PROCESSING (NLP)?

Mowery DL, Jordan P, Wiebe J, Harkema H, Dowling J, Chapman WW. Semantic Annotation of

Clinical Events for Generating a Problem List. AMIA Annual Symposium Proceedings. 2013;2013:1032-

1041.

✓ Classify

✓ Extract

✓ Summarize – create an active problem list

“Diagnosis: myocardial infarction (MI)”..

Page 7: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

WHY IS NLP SO DIFFICULT?Synonyms

coughs coughburning up fevershort of breath dyspnea

Abbreviations/Acronymsfeb febrile or february?n/v nausea/vomitingsob shortness of breath

Truncationsposs possible

Concatenationsblurredvision burred visionflus sxs flu symptoms

Misspellings & typographic errorsnausa nauseadiahrea diarrhea

QuantificationsBP 140/90 hypertension

Contextual descriptorsno cough cough_NEGATEDchildhood cough cough_HISTORICALbrother has cough cough_NOT_PATIENTreturn if cough cough_HYPOTHETICAL

Discoursesentences sections notes visits

Page 8: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

HOW IS NLP USED IN THE CLINICAL DOMAIN?

• Clinical Decision Support

– Identifying Medline articles to support clinician information needs (Zhang et al. 2013)

• Quality Improvement

– Measuring quality of colonoscopy procedures (Harkema et al. 2011)

• Hospital Operations

– Automating the coding of medicall billing codes (Stanfill et al. 2010)

• Genetic Studies

– Supporting high throughput phenotyping (Pathak et al. 2013)

• Biosurvelliance

– Detecting Influenza from emergency department visits (Ye et al. 2014)

Page 9: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

USE CASES: CONTRACEPTIVE METHODS

Shi J, Mowery D, Chapman WW, Zhang M, Sanders J, Gawron L. Extracting Intrauterine Device Usage

from Clinical Texts using Natural Language Processing. ICHI (in press).

MotivationAutomatically identifying high-risk women not

using contraceptive methods (e.g., intrauterine

device) for counseling and reproductive planning

could mitigate adverse outcomes.

Needs AssessmentExtract mentions of contraceptive methods and

their contexts from clinical textsLori Gawron MD MPH

Page 10: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

USE CASES: CONTRACEPTIVE METHODS

Learn more about EASYCIE

applied to this use case:

August 25, 2017. 5:05pm-6:30pm

Poster Session 2: #21

Jianlin Shi MD MS

Shi J, Mowery D, Chapman WW, Zhang M, Sanders J, Gawron L. Extracting Intrauterine Device Usage

from Clinical Texts using Natural Language Processing. ICHI (in press).

Page 11: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

USE CASES: PNEUMONIA DIAGNOSIS

South BR, Mowery DL, Kramer H, Jones B, Castine M, Hillert D, Sibitsky M, Chapman WW. Assessing Visualization and Semantic

Priming on Classifying Supporting, Refuting, or Uncertain Evidence for Suspected Pneumonia Case Review. (under review)

MotivationDiagnosing patients with pneumonia can be

difficult due to presentation of non-specific signs

and symptoms and elusive pathogen discovery

Needs AssessmentExtract variables (e.g., fever, rales, worsening

cough) associated with pneumonia to improve

diagnostic accuracy and reduce cognitive biases.Barbara Jones MD, MSCI

Page 12: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

C O N F I D E N T I A L

• Getting back to the basics

• Collecting ingredients and reading the recipes

• Cooking with the “easy button”; Multi-tasking in the kitchen

• Hands on exercise

• Borrowing a cup of sugar

ROADMAP

Page 13: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

C O N F I D E N T I A L

Getting back to the basics

Page 14: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

NLP PIPELINEFHx: Sister had childhood fevers.Sentence Segmentation

FHx : Sister had childhood fevers .Tokenization

Family History : Sister had childhood fever.Term Normalization

Family History : Sister had childhood fever .JJ NN NN VBD JJ NN .

Part-of-Speech Tagging

Family History : Sister had childhood fever .[ NP ] : [ NP ] [ VP ] .

Shallow Parsing

Family History : Sister had childhood fever .UMLS code: Fever- C0015967

Named Entity Recognition

Family History : Sister had childhood fever .temporality = historicalexperiencer = nonpatientnegation = affirmed

Assertion Classification

Page 15: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

NLP PIPELINEFHx: Sister had childhood fevers.Sentence Segmentation

FHx : Sister had childhood fevers .Tokenization

Family History : Sister had childhood fever.Term Normalization

Family History : Sister had childhood fever .JJ NN NN VBD JJ NN .

Part-of-Speech Tagging

Family History : Sister had childhood fever .[ NP ] : [ NP ] [ VP ] .

Shallow Parsing

Family History : Sister had childhood fever .UMLS code: Fever- C0015967

Named Entity Recognition

Family History : Sister had childhood fever .temporality = historicalexperiencer = nonpatientnegation = affirmed

Assertion Classification

EasyCIE processed

on backend

Page 16: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

USE CASE: PNEUMONIA INDICATORS

http://www.cdc.gov/nhsn/pdfs/pscmanual/17pscnosinfdef_current.pdf

Page 17: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

IDENTIFYING FEVER EXPRESSIONS FROM CLINICAL TEXT

Lay terms

Page 18: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

IDENTIFYING FEVER EXPRESSIONS FROM CLINICAL TEXT

Morphology

afebrile: a = without

febrile = fever

Page 19: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

IDENTIFYING FEVER EXPRESSIONS FROM CLINICAL TEXT

Quantifications• Units

• Celsius• Fahrenheit

• Numbers• Whole• Decimal

Page 20: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

IDENTIFYING FEVER EXPRESSIONS FROM CLINICAL TEXT

Other contexts• Course

• “fever abated”

• Hypothetical• “return if

fever”

Page 21: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

C O N F I D E N T I A L

Collecting ingredients

and reading the recipe

Page 22: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

POS_DOC NEG_DOC

Input corpus Define rule files Apply the rules Compare outputs

Update rule files

ROADMAP

Page 23: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

POS_DOC NEG_DOC

ROADMAP

One or more positive mentions of pneumonia indicators No positive mentions of pneumonia indicators

Page 24: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

POS_DOC NEG_DOC

Input corpus Define rule files

ROADMAP

Page 25: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

C O N F I D E N T I A L

DEFINE MENTION ANNOTATION LOGIC

• Assertions– Negation: Affirmed/Certain

• “complains of shortness of breath”

• “cannot rule out pneumonia”

• “likely pna”

– Temporality: Present• “dx of bacter. pneumonia”

• “return if worsening fever”

– Experiencer: Patient• “Pt has worsening cough”

• “Patient is febrile”

• Value in abnormal range– high fever: Temp > 101.3 F

– low oxygen saturation: O2

saturation less than 90%

• Assertions– Negation: Negated/Uncertain

• “denies shortness of breath”

• “rule out pneumonia”

• “unlikely pna”

– Temporality:

Historical/Hypothetical• “history of bacter. pneumonia”

• “return if fever”

– Experiencer: Non-patient• “brother has worsening cough”

• “roommate is febrile”

• Value in normal range– Normal or low fever: Temp <101.3 F

– low oxygen saturation: O2

saturation under 90%

Inclusionary mention (+) Exclusionary mention (-)

Page 26: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

IDENTIFY TARGETS AND MODIFIERS

Patient denies cough but complains of headache.

trigger term termination

term

scope

Finding (target): cough

Negation(modifier): negated

NegEx algorithm

Courtesy: Wendy Chapman

Finding (target): headache

Negation(modifier): (default) affirmed

Page 27: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

IDENTIFY FEVER AND MODIFIERS

She stated she was burning up..

Finding (target): fever

Negation(modifier): (default) affirmed

Uncertain (modifier): (default) certain

Temporality (modifier): (default) present

Experiencer (modifier): (default) patient

Page 28: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

IDENTIFY FEVER AND MODIFIERS

fever had abated..

Finding (target): fever

Negation(modifier): negated

Uncertain (modifier): (default) certain

Temporality (modifier): (default) present

Experiencer (modifier): (default) patient

Page 29: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

IDENTIFY FEVER AND MODIFIERS

temperature of 38C..

Finding (target): fever

Negation(modifier): (default) affirmed

Finding (target): fever

Negation(modifier): (default) affirmed

Uncertain (modifier): (default) certain

Temporality (modifier): (default) present

Experiencer (modifier): (default) patient

Page 30: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

IDENTIFY FEVER AND MODIFIERS

She improved and became afebrile..

Finding (target): fever

Negation(modifier): negated

Uncertain (modifier): (default) certain

Temporality (modifier): (default) present

Experiencer (modifier): (default) patient

Page 31: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

POS_DOC NEG_DOC

Input corpus Define rule files Apply the rules

ROADMAP

Page 32: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

C O N F I D E N T I A L

DEFINE TASK LOGIC AKA “THE RECIPE”

Is this mention an indicator of pneumonia?

Is the mention inclusionary (+)? Is the mention exclusionary (-)?

Ignore mentionMark mention

Is there one mention of a pneumonia diagnoses?

Classify document as NEG_DOCClassify document as POS_DOC

Step 1: Classify indicators of pneumonia in document

Step 2: Classify whether document contains 1+ indicators

Page 33: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

DEFINE MENTION ANNOTATION LOGIC

Is this mention an indicator of pneumonia?

Is the mention inclusionary (+)? Is the mention exclusionary (-)?

Ignore mentionMark mention

“Patient has fever” “Patient should return if febrile”

Page 34: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

DEFINE DOCUMENT CLASSIFICATION LOGIC

Is there one mention of a pneumonia indicator?

Classify document as NEG_DOCClassify document as POS_DOC

“Patient has PNA” “Pneumonia unlikely”

Page 35: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

POS_DOC NEG_DOC

Input corpus Define rule files Apply the rules Compare outputs

ROADMAP

Page 36: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

HOW WELL DOES NLP DETECT INDICATORS?

= POS_DOC

= NEG_DOC= POS_DOC

= NEG_DOC

Expert reviewed

NLP classified

True positive

False positive

False negative

True negative

Page 37: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

HOW WELL DOES NLP DETECT INDICATORS?

= POS_DOC

= NEG_DOC= POS_DOC

= NEG_DOC

Expert reviewed

NLP classified

Sensitivity (recall) =TP / TP+FN

n= 4 = 57%

4+3

TP: 4

TN: 1FP: 2

FN: 3

Page 38: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

HOW WELL DOES NLP DETECT INDICATORS?

= POS_DOC

= NEG_DOC= POS_DOC

= NEG_DOC

Expert reviewed

NLP classified

Positive predictive value (precision) =TP / TP+FP

n

TP: 4

TN: 1FP: 2

FN: 3

= 4 = 67%4+2

Page 39: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

HOW WELL DOES NLP DETECT INDICATORS?

= POS_DOC

= NEG_DOC= POS_DOC

= NEG_DOC

Expert reviewed

NLP classified

F1-score (harmonic mean of precision and recall) = 2 * (p * r)(p + r)

TP: 4

TN: 1FP: 2

FN: 3

= 2 x (57% x 67%) = 62% (57% + 67%)

Page 40: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

POS_DOC NEG_DOC

Input corpus Define rule files Apply the rules Compare outputs

Update rule files

ROADMAP

Page 41: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

C O N F I D E N T I A L

Cooking with the “easy button” &

Multi-tasking in the kitchen

Page 42: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

POS_DOC NEG_DOC

Input corpus Define rule files Apply the rules Compare outputs

Update rule files

ROADMAP

Easy Clinical Information Extractor

Page 43: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

PREPARE THE PRACTICE DATASET(ALREADY DONE)

MIMIC II demo dataset

https://physionet.org/mimic2/demo/

Consist of 4000 deceased ICU patients

Select 50 encounters has ICD9 code start with: 480 Viral pneumonia

481 Pneumococcal

482 Other bacterial pneumonia

483 Pneumonia due to other specified organism

484 Pneumonia in infectious diseases classified elsewhere

485 Bronchopneumonia, organism unspecified

486 Pneumonia, organism unspecified

And 50 encounters that do not have any ICD9 code above

Page 44: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

PREPARE THE PRACTICE DATASET(ALREADY DONE)

• For demonstration purpose, sampled:

• 70 radiology reports,

• 20 discharge summaries

• 10 nursing notes

Page 45: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

GOAL:

• Identify any indication of pneumonia

– Including signs and symptoms

– Diagnoses

– Lab tests

– CT findings

– Not include treatments (narrow the scope)

• Conclude whether a document has any

indication or not

Page 46: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

PREPARE THE GOLD STANDARD(ALREADY DONE)

• Two clinical annotators

• The 3rd annotator solves the disagreement

• Split the dataset to 60 for training, 40 for

testing.

Page 47: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

ANNOTATION SCHEMA

• IND_PNEUMONIA: for any mention that

indicate pneumonia

• Pos_Doc: for the documents that have any

indication of pneumonia

• Neg_Doc: for the documents that do not

have

Page 48: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

POS_DOC NEG_DOC

Input corpus

ROADMAP

Page 49: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

IMPORT TEXT DOCUMENTS

POS_DOC NEG_DOC

Click on ImportDocuments

Page 50: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

IMPORT GOLD ANNOTATIONS

POS_DOC NEG_DOC

Click on ImportAnnotations

Page 51: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

POS_DOC NEG_DOC

Input corpus Define rule files

ROADMAP

Page 52: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

DEFINING RULES

• Target rules

• Context rules

• Feature inference rules

• Document inference rules

Courtesy: Wendy Chapman

Page 53: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

DEFINING TARGET RULES

Most of concepts have already been included, need improve

Page 54: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

DEFINE MODIFIER AND TARGET RULES FILES

Click on Valueto update rules for

ruleFile and cRuleFile

modifiers = ruleFiletargets = cRuleFile

Page 55: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

DEFINE RULEFILE (MODIFIER RULE FILE)

Actual are extracted and classified: “high fever”

Pseudos are ignored: “Yellow Fever vaccination clinic”

Modifiers and their value sets; (d) = default value

Negation = {affirmed (d), negated} --

Certainty = {certain (d), uncertain}

Temporality = {present (d), historical, hypothetical}

Experiencer = {patient (d), nonpatient}

Rule file (modifier dictionary given)

rule string|direction|trigger type|modifier|window size

resolved|backward|actual|negation |30

possible |forward| actual| uncertain|8

in the past| bidirectional| actual|temporality| 8

mom|forward|actual|experiencer|8

vaccination clinic|backward|pseudo|8

Page 56: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

DEFINING CONTEXT RULES

Most of concepts have already been included, need improve

Page 57: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

DEFINING FEATURE INFERENCE RULES

To exclude the mentions that you don’t want (Done)

Page 58: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

DEFINING DOCUMENT INFERENCE RULES

When conclude "Pos_Doc" (Done)

Page 59: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

POS_DOC NEG_DOC

Input corpus Define rule files Apply the rules

ROADMAP

Easy Clinical Information Extractor

Page 60: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

RUN EASYCIE – ONE CLICK!

Select RunEasyCIE

Page 61: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

REVIEW AND COMPARE RESULTS – ONE CLICK!

Select ViewOutputinDB

Page 62: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

RESULT VIEW

Page 63: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

POS_DOC NEG_DOC

Input corpus Define rule files Apply the rules Compare outputs

ROADMAP

Page 64: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

COMPARE OUTPUTS

Page 65: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

REVIEW & ANALYZE ERRORS

Page 66: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

DEBUG ERRORS (1)Use a snippet to test the pipeline

Page 67: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

DEBUG ERRORS (2)A step by step output display for each component

Page 68: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

DEBUG ERRORS (3)Details view of all clues for final output

Page 69: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

POS_DOC NEG_DOC

Input corpus Define rule files Apply the rules Compare outputs

Update rule files

ROADMAP

Easy Clinical Information Extractor

Page 70: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

C O N F I D E N T I A L

Hands-on exercise

Page 71: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

C O N F I D E N T I A L

Borrowing a cup of sugar

Page 72: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

REUSE OTHERS’ WORK

Page 73: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

AUTOMATE THE REST CONFIGURATION

Page 74: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

ACKNOWLEDGEMENTS

Wendy Chapman

Barbara Jones

Kelly Peterson

Page 75: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C Sd a n i e l l e . m o w e r y @ u t a h . e d u

Arches

National Park

Capitol Reef

National Park

Cedar Breaks

National Park

T

h

a

n

k

y

o

u

Page 76: Download easycie and dataset - BYU Data Mining Labdml.cs.byu.edu/chs/ichi2017/content/Tut2.pdf · download easycie and dataset . b i o m e d i c a l i n f o r m a t i c s easycie:

B I O M E D I C A L I N F O R M A T I C S

PLEASE FILL IN THE SURVEY

• We highly appreciate your feedbacks:

• https://goo.gl/forms/1EcprdPCkzqhTWy52

• Thank you!