big mechanism for processing eeg clinical information on big data aim 1: automatically recognize and...

1
Big Mechanism for Processing EEG Clinical Information on Big Data Aim 1: Automatically Recognize and Time-Align Events in EEG Signals Aim 2: Automatically Recognize Critical Clinical Concepts in EEG Reports Aim 3: Automatic Patient Cohort Retrieval www.nedcdata.org AUTOMATIC DISCOVERY AND PROCESSING OF EEG COHORTS FROM CLINICAL RECORDS Iyad Obeid and Joseph Picone The Neural Engineering Data Consortium Temple University Sanda Harabagiu The Human Language Technology Research Institute University of Texas at Dallas Human Language Technology Research Institute Anticipated Outcomes World’s largest publicly available annotated EEG signal corpus and a set of high-performance BigData tools that allow rapid development of new biomedical applications using dense data. High-performance automatic identification of clinical events as well as medical concepts, spatial and temporal information. A patient cohort retrieval system operating on a very large corpus of EEG signals and reports. Clinical evaluation of the patient cohort system through clinical expert judgments. Acknowledgements Research reported in this poster was supported by National Human Genome Research Institute of the National Institutes of Health under award number 1U01HG008468. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The TUH EEG Corpus development was sponsored by the Defense Advanced Research Projects Agency (DARPA), Temple University’s College of Engineering and Office of Research. References Abstract Electronic medical records (EMRs) contain unstructured text, temporally constrained measurements (e.g., vital signs), multichannel signal (e.g., EEGs), and image data (e.g., MRIs). We are developing a patient cohort retrieval system that allows clinicians to retrieve relevant EEG signals and EEG reports using standard queries (e.g. “Young patients with focal cerebral dysfunction who were treated with Topamax”). Aim 1: Automatically recognize and time-align EEG events that contribute to a diagnosis. Aim 2: Automatically recognize critical concepts in the EEG reports. Aim 3: Automatic patient cohort retrieval. Aim 4: Evaluation and analysis of the results of the patient cohort retrieval. Our focus is the automatic interpretation of a clinical BigData resource – the TUH EEG Corpus. An important outcome will be the existence of an annotated BigData archive of EEGs. The TUH EEG Corpus An electroencephalogram (EEG) measures electrical activity in the brain. A typical EEG exam following the 10-20 system consists of 21 electrodes: The TUH EEG Corpus contains over 28,000 sessions collected from 15,000+ patients over a period of 14 years at Temple University Hospital. EEG data is stored in EDF files: Clinical data has many challenges including variations in channel configurations, report formats, and noise due to patient movements. Deidentified medical reports are available that include a brief patient history and a neurologist’s findings. Data is unstructured and the report formats vary. Hidden Markov models used for sequential decoding. Deep learning used to model spatial/temporal context. Active learning is used to do unsupervised training on the signal data. Aim 4: Evaluation and Analysis Experts will be recruited to generate query topics. Several sources (e.g., ClinicalTrials.gov, PUBMED) will be used to develop topics. Judges who are physicians in residence and medical students will be recruited to evaluate relevance judgments. We will conduct user acceptance testing using three focus groups: expert annotators, clinicians and medical students. Several formal user acceptance studies will be conducted by measuring user satisfaction using a standard 5‑point Likert scale and also collecting open- ended information on its perceived value. We will assess quantitatively the impact on productivity by measuring the amount of time required to review an EEG and generate a report. We will also assess the value of the patient cohort retrieval system for medical student training by working with medical students in training at Temple’s School of Medicine. Q uery Query Analysis M edical Concept Recognition EventAttribute Identification Clinical EventRecognition: Signal/Type/M odality/Polarity PatientAge Recognition PatientGenderRecognition KeyPhrase Extraction KeyPhrase Expansion EEG Reports EEG Signals EEG Signal and Report Analysis Spatial/Tem poral Expression & Relation Identification EEG Index Patient EEG Retrieval Re-ranking Ranked Patient EEGs QMKG EEG Signal Events & Features EEG Record Similar Query Qualified Medical Knowledge Graph TUH EEG Signals EEG Reports Ranked Patient EEGs Patient Cohort Retrieval EEG Record EEG Signal EEG Events EEG Record Events Spatial Information Temporal Information Medical Concepts (Clinical Picture) EEG Signal CRF-Based Clinical Event Boundary Detector Clinical Records (i2b2) TREC Medical Records Gigaword (NYT, AFP, APW, CAN, LTW, XIN) Wikipedia PubMed Central Brown Clustering UMLS SVM -Based Event Type Detector SVM -Based Event Modality Detector SVM -Based Event Polarity Detector M edical Concept Recognizer Test Problem Treatment Clinical Departm ent Evidential Occurrence Factual Conditional Possible Proposed Positive Negative EEG event EEG pattern EEG Records EEG activity CRF-Based EEG Event Attribute Detector EEG Section Recognition EEG Events & Features SVM -Based Attribute Type Detector EEG technique EEG im pression EEG clinical interpretation Otherattribute Field Description Example 1 Version Number 0 2 Patient ID TUH123456789 3 Gender M 4 Date of Birth 57 8 Firstname_Lastname TUH123456789 11 Startdate 01-MAY-2010 13 Study Number/ Tech. ID TUH123456789/TAS X 14 Start Date 01.05.10 15 Start Time 11.39.35 16 Number of Bytes in Header 6400 17 Type of Signal EDF+C 19 Number of Data Records 207 20 Dur. of a Data Record (Secs) 1 21 No. of Signals in a Record 24 27 Signal[1] Prefiltering HP:1.000 Hz LP:70.0 Hz N:60.0 28 Signal[1] No. Samples/Rec. 250 www.hlt.utdallas.edu

Upload: maryann-goodwin

Post on 18-Jan-2016

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Big Mechanism for Processing EEG Clinical Information on Big Data Aim 1: Automatically Recognize and Time-Align Events in EEG Signals Aim 2: Automatically

Big Mechanism for Processing EEG Clinical Information on Big Data

Aim 1: Automatically Recognize and Time-Align Events in EEG Signals

Aim 2: Automatically Recognize Critical Clinical Concepts in EEG Reports

Aim 3: Automatic Patient Cohort Retrieval

www.nedcdata.org

AUTOMATIC DISCOVERY AND PROCESSING OF EEG COHORTSFROM CLINICAL RECORDS

Iyad Obeid and Joseph PiconeThe Neural Engineering Data Consortium

Temple University

Sanda HarabagiuThe Human Language Technology Research Institute

University of Texas at Dallas

Human Language Technology

Research Institute

Anticipated Outcomes • World’s largest publicly available annotated EEG

signal corpus and a set of high-performance BigData tools that allow rapid development of new biomedical applications using dense data.

• High-performance automatic identification of clinical events as well as medical concepts, spatial and temporal information.

• A patient cohort retrieval system operating on a very large corpus of EEG signals and reports.

• Clinical evaluation of the patient cohort system through clinical expert judgments.

Acknowledgements• Research reported in this poster was supported

by  National Human Genome Research Institute of the National Institutes of Health under award number  1U01HG008468.

• The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

• The TUH EEG Corpus development was sponsored by the Defense Advanced Research Projects Agency (DARPA), Temple University’s College of Engineering and Office of Research.

References• Harati, A., et al. (2014). THE TUH EEG CORPUS: A

Big Data Resource for Automated EEG Interpretation. Proceedings of IEEE SPMB.

• G.K. Roberts, B. Rink and S.M. Harabagiu, “A flexible framework for recognizing events, temporal expressions, and temporal relations in clinical text.” JAMIA 2013 Sep-Oct;20(5):867-75.

Abstract • Electronic medical records (EMRs) contain

unstructured text, temporally constrained measurements (e.g., vital signs), multichannel signal (e.g., EEGs), and image data (e.g., MRIs).

• We are developing a patient cohort retrieval system that allows clinicians to retrieve relevant EEG signals and EEG reports using standard queries (e.g. “Young patients with focal cerebral dysfunction who were treated with Topamax”).

Aim 1: Automatically recognize and time-align EEG events that contribute to a diagnosis.

Aim 2: Automatically recognize critical concepts in the EEG reports.

Aim 3: Automatic patient cohort retrieval.

Aim 4: Evaluation and analysis of the results of the patient cohort retrieval.

• Our focus is the automatic interpretation of a clinical BigData resource – the TUH EEG Corpus.

• An important outcome will be the existence of an annotated BigData archive of EEGs.

The TUH EEG Corpus• An electroencephalogram (EEG) measures

electrical activity in the brain.

• A typical EEG exam following the 10-20 system consists of 21 electrodes:

• The TUH EEG Corpus contains over 28,000 sessions collected from 15,000+ patients over a period of 14 years at Temple University Hospital.

• EEG data is stored in EDF files:

• Clinical data has many challenges including variations in channel configurations, report formats, and noise due to patient movements.

• Deidentified medical reports are available that include a brief patient history and a neurologist’s findings.

• Data is unstructured and the report formats vary.

• Hidden Markov models used for sequential decoding.

• Deep learning used to model spatial/temporal context.

• Active learning is used to do unsupervised training on the signal data.

Aim 4: Evaluation and Analysis• Experts will be recruited to generate query

topics. Several sources (e.g., ClinicalTrials.gov, PUBMED) will be used to develop topics.

• Judges who are physicians in residence and medical students will be recruited to evaluate relevance judgments.

• We will conduct user acceptance testing using three focus groups: expert annotators, clinicians and medical students.

• Several formal user acceptance studies will be conducted by measuring user satisfaction using a standard 5‑point Likert scale and also collecting open-ended information on its perceived value.

• We will assess quantitatively the impact on productivity by measuring the amount of time required to review an EEG and generate a report.

• We will also assess the value of the patient cohort retrieval system for medical student training by working with medical students in training at Temple’s School of Medicine.

Query Query Analysis

Medical Concept Recognition

Event AttributeIdentification

Clinical Event Recognition: Signal/Type/Modality/Polarity

Patient Age Recognition Patient Gender Recognition

KeyPhrase Extraction

KeyPhrase Expansion

EEG ReportsEEG Signals EEG Signal and Report Analysis

Spatial/Temporal Expression & Relation Identification

EEGIndex

PatientEEG

Retrieval

Re-ranking

Ranked Patient EEGs

QMKGEEG SignalEvents & Features

EEG Record

Similar

Query

Qualified Medical Knowledge Graph

TUHEEG SignalsEEG Reports Ranked

Patient EEGs

PatientCohort

RetrievalEEG Record

EEG SignalEEG Events

EEG Record EventsSpatial InformationTemporal InformationMedical Concepts (Clinical Picture)

EEG Signal

CRF-Based Clinical Event Boundary Detector

Clinical Records (i2b2)

TREC Medical Records

Gigaword (NYT, AFP, APW,CAN, LTW, XIN)

WikipediaPubMed Central

Brown Clustering

UMLSSVM-Based Event

Type Detector

SVM-Based Event Modality Detector

SVM-Based Event Polarity Detector

Medical Concept

Recognizer

Test

Problem

Treatment

Clinical Department

Evidential

Occurrence

Factual

Conditional

Possible

Proposed

Positive

Negative

EEG event

EEG pattern

EEG Records

EEG activity

CRF-BasedEEG Event

Attribute Detector

EEG Section Recognition EEG Events& Features

SVM-Based Attribute Type

Detector EEG technique

EEG impression

EEG clinicalinterpretation

Other attribute

Field Description Example

1 Version Number 0

2 Patient ID TUH123456789

3 Gender M

4 Date of Birth 57

8 Firstname_Lastname TUH123456789

11 Startdate 01-MAY-2010

13 Study Number/ Tech. ID TUH123456789/TAS X

14 Start Date 01.05.10

15 Start Time 11.39.35

16 Number of Bytes in Header 6400

17 Type of Signal EDF+C

19 Number of Data Records 207

20 Dur. of a Data Record (Secs) 1

21 No. of Signals in a Record 24

27 Signal[1] Prefiltering HP:1.000 Hz LP:70.0 Hz N:60.0

28 Signal[1] No. Samples/Rec. 250

www.hlt.utdallas.edu