open health natural language processing consortium

9
1 Open Health Natural Language Processing Consortium www.ohnlp.org (part of caBIG Vocabulary Knowledge Center web presence) Goal foster an open-source collaborative community around clinical NLP that can deliver best-of-breed annotators, leverage the dynamic features of UIMA flow-control, and establish the infrastructure for clinical NLP. Two open source releases as part of OHNLP Mayo’s pipeline for processing clinical notes (cTAKES) IBM’s pipeline for processing medical notes (MedKAT) and pathology reports (MedKAT/P)

Upload: nickolas-knight

Post on 18-Jan-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Open Health Natural Language Processing Consortium

1

Open Health Natural Language Processing Consortium

• www.ohnlp.org (part of caBIG Vocabulary Knowledge Center web presence)

• Goal• foster an open-source collaborative community around

clinical NLP that can deliver best-of-breed annotators, leverage the dynamic features of UIMA flow-control, and establish the infrastructure for clinical NLP.

• Two open source releases as part of OHNLP• Mayo’s pipeline for processing clinical notes (cTAKES)• IBM’s pipeline for processing medical notes (MedKAT)

and pathology reports (MedKAT/P)

Page 2: Open Health Natural Language Processing Consortium

2

Page 3: Open Health Natural Language Processing Consortium

3

Page 4: Open Health Natural Language Processing Consortium

4

cTAKES Technical Details • Open source release March 15, 2009

• www.ohnlp.org• Downloads: Documentation and Downloads• Technical details: Publications

• Framework • IBM’s Unstructured Information Management Architecture

(UIMA) open source framework

• Methods • Natural Language Processing methods (NLP)

• Application • High-throughput phenotype extraction system (80M+ notes;

80B+ tokens)

Page 5: Open Health Natural Language Processing Consortium

5

cTAKES Components

• Core components• Sentence boundary detection (OpenNLP)• Tokenization (rule-based)• Morphologic normalization (NLM’s “norm”)• POS tagging (OpenNLP)• Shallow parsing (OpenNLP)• Named Entity Recognition

• Diseases/disorders, signs/symptoms, procedures, anatomical sites, medications

• Dictionary mapping (lookup algorithm)• Machine learning (MAWUI)

• Negation and status identification (NegEx)

Page 6: Open Health Natural Language Processing Consortium

6

cTAKES Type System

Page 7: Open Health Natural Language Processing Consortium

7

cTAKES example

Page 8: Open Health Natural Language Processing Consortium

8

Current Efforts - I

• Anaphoric relations and coreference (as part of the Ontology Development and Information Extraction project, University of Pittsburgh) (2008 - 2011)

• In collaboration with Chapman and Crowley

• Semantic processing of the clinical text (in collaboration with Palmer, Martin and Ward, University of Colorado) (2009 - 2011)

• Treebanking (deep parses)• Predicate-argument structure and semantic labeling

(PropBanking)• UMLS relations (except temporal relations)

Page 9: Open Health Natural Language Processing Consortium

9

Current Efforts - II• Temporal relation discovery (2010-2014)

• In collaboration with Palmer, Martin and Ward, University of Colorado

• Lexical resources for the clinical domain (2010-2015)• In collaboration with Chapman, University of

Colorado and Elhadad, Columbia University• A la Treebank and clinical named entities with

attributes and modifiers