1 open health natural language processing consortium (ohnlp) mayo clinic: guergana savova, ph.d....

Open Health Natural Language Processing Consortium

(OHNLP)

Mayo Clinic:Guergana Savova, Ph.D.

James Masanzclinicalnlp@mayo.edu

IBM Watson Research:Anni Coden, Ph.D.Michael Tanenblatt

mednlp@us.ibm.com

Overview

• OHNLP? Oh, NLP?

• Demo of a clinical OHNLP system (cTAKES)

• Demo of a medical OHNLP system (MedKAT) with extensions to pathology (/P)

• How can I adapt the system to my data?

• Lively discussion: how can I get involved, OHNLP future steps…

Open Health Natural Language Processing Consortium

• www.ohnlp.org (part of caBIG Vocabulary Knowledge Center web presence)

• Goal• Foster an open-source collaborative community around

clinical NLP that can deliver best-of-breed annotators, leverage the dynamic features of UIMA flow-control, and establish the infrastructure for clinical NLP.

• Two open source releases as part of OHNLP• Mayo’s pipeline for processing clinical notes (cTAKES)

• IBM’s pipeline for processing medical notes (MedKAT) and pathology reports (MedKAT/P)

Other non-OHNLP clinical NLP Systems

• Proprietary• medLEE (Columbia University)• Topaz (University of Pittsburgh)• Vanderbilt University• caTIES (University of Pittsburgh)• MPLUS/Onyx (University of Utah)• VA Hospital system

• Open Source• i2b2 HITEx (Health Information Text Extraction)

Clinical example:clinical Text Analysis and

Knowledge Extraction System (cTAKES)

Presenters:Guergana Savova

James Masanz

Overview• cTAKES

• Developed at Mayo Clinic

• Goals:

• Phenotype extraction

• Generic – to be used for a variety of retrievals and use cases

• Expandable – at the information model level and methods

• Modular

• Cutting edge technologies – best methods combining existing practices and novel research with rapid technology transfer

• Best software practices (80M+ notes)

• Commitment to both R and D in R&D

cTAKES: Components

• Clinical narrative as a sublanguage

• Core components• Sentence boundary detection (OpenNLP technology)

• Tokenization (rule-based)

• Morphologic normalization (NLM’s LVG)

• POS tagging (OpenNLP technology)

• Shallow parsing (OpenNLP technology)

• Named Entity Recognition• Dictionary mapping (lookup algorithm)

• Machine learning (MAWUI)

• Negation and context identification (NegEx)

Output Example: Disorder Object

• “No evidence of unstable angina.”

• Disorder• Text: unstable angina

• Associated code: SNOMED 4557003

• Named entity type: disease/disorder

• Status: current

• Negation: true

Methods

• Preliminary results:

• Savova, Guergana; Kipper-Schuler, Karin; Buntrock, James and Chute, Christopher. 2008. UIMA-based clinical information extraction system. LREC 2008: Towards enhanced interoperability for large HLT systems: UIMA for NLP.

• Manuscript with detailed system description and evaluation under review (JAMIA)

cTAKES demo

Medical example:Medical Knowledge Analysis System

MedKAT and MedKAT/P

Presenters:Anni Coden

Michael Tanenblatt

Overview

• MedKAT and MedKAT/P• Developed at IBM

• Goal:

• Identification of concepts and their attributes based on a standard or proprietary terminology/ontology

• /P adaptation to pathology reports – relation extraction

• Modular, Generic, Expandable

• Terminology, Conceptual Model

• Easy adaptation to specific corpus and conventions

• Integration into institutional system

• Ongoing commitment to Research and Development

Core Components

• Document structure

• Syntactic tools (tokenization ... Shallow parsing)

• Concept identification

• Negation

• Relationship extraction

Extracted data F-scoreAnatomic site 0.95Histology 0.98Size 1.00Date 1.00Grade 0.98Gross Desc 0.80Lymph Nodes 0.81Primary Tumor 0.82Metastatic Tumor 0.65

Document Structure

Output

Cancer Disease Knowledge

Representation Model

• Query by Model / Cancer

• Detailed view of annotations in Document Analyzer

• http://domino.research.ibm.com/comm/research_projects.nsf/pages/medicalinformatics.index.html

Adaptation

Presenters:Anni Coden

Michael Tanenblatt

Adaptation

• Sentence breaks

• Text case

• Part of speech tags

• Shallow parser

• Dictionary lookup

• Document structure

Sentence Breaks

• Some solutions:• Use annotator to re-break sentences• Retrain tagger

Case/Part of Speech Tags

• Some solutions:• Retrain tagger• Use UIMA annotator to create a “true

case” view

Part of Speech Tags

• Some solutions:• Retrain tagger• Use dictionary lookup to modify

incorrect tags• Create rule-based annotator to

modify incorrect tags

Shallow Parser

Dictionary Lookup

• Dictionary entries can be added, changed, deleted

• Dictionary entry attributes can be added, changed, deleted

• Search parameters can be modified

• Post processing filters

• Tokenization of text and dictionary should be the same

Document Structure

• Plain text or XML (e.g., CDA)

• Processes specific document section types (e.g., diagnosis)

• Detection of formatting (e.g. bullets)

• Detection of relations between sections

• Making implicit conventions explicit (e.g. meaning of title)

Discussion: Future of OHNLP.ORG

• Provided seed annotators and tools

• Goal: growing community• Annotators, tools• Methodologies• Gold standards

• Common type system for plug-and-play

• What are the hurdles?

Hands-on Customization

MedKAT

• Dictionary adaptation

• Concept identification parameters

• Document structure detection

cTAKES

• Negation window

• Lookup window

• Dictionary modifications

Questions?

1 open health natural language processing consortium (ohnlp) mayo clinic: guergana savova, ph.d....

Documents

office redesign: the planned care model dave eitrheim md red...

shareholder debriefing executive directors presenting:...

tobacco dependence: hot topics richard d. hurt, m.d....

levine-clark, michael, maria savova, and jason price,...

eurostat overview of the mip dissemination estp course - mip...

presented by franko kulaga guergana anguelova moritz broelz

naming and describing services chaplains provide rev. dean...

©2019 mfmer | slide-1 · ©2019 mfmer | slide-2 mentorship...

title: study number: initial date: principal investigator...

current status of pdt in gastroenterology 2015: esophageal...

malaria global missions health conference louisville,...

working paper #35, fall 2007 nadezhda dimitrova savova ... -...

apache clinical text analysis and knowledge extraction...

maria savova, the claremont colleges terese heidenwolf,...

constrained optimization methods in health services ... ·...

ocean carriers - guergana anguelova - · ppt file · web...

iatrogenic immunosuppression and cutaneous …...

savova price can your budget keep up with the electronic...

version 3 - hl7’s “swiss army knife” what it is. what...

ecad.tu-sofia.bgecad.tu-sofia.bg/et/1998/statii...