realizing efficient annotation with ehost: extensible human oracle ... shen... · manually...
TRANSCRIPT
Jianwei Leng, MS1,2, Brett R. South, MS1,2,3, Brad Adams, MS1,2, Tyler B. Forbush, Shuying Shen, MStat1,2,3, Scott L. DuVall, PhD, Wendy Chapman, PhD5
1VA Salt Lake City Health Care System, IDEAS Center, University of Utah, 2Department of Internal Medicine, 3Biomedical Informatics, and 4Radiology, University of Utah, Salt Lake City, UT, 5University of California, San Diego, Division of Biomedical Informatics, La Jolla, California
• Client application that can run on most operating systems that supports Java including, Microsoft Windows x86/x64 platforms, Apple Mac OS X, Sun Solaris, and Linux.
• Supports standardized formats including a file folder system, and structured XML inputs and outputs allowing integration with other open source tools for annotation and knowledge management including Knowtator3 and Protégé4.
Objectives
Systems Architecture
Server Integration and Future Work
References: 1. South, BR, Shen S, Leng J, Forbush T, DuVall SL, Chapman WW. A Prototype Tool Set to
Support Machine-Assisted Annotation. In BioNLP 2012. 2012. Montreal, Canada.
Contact information: [email protected] VA Consortium for Healthcare Informatics Research 500 Foothill Drive, Salt Lake City, UT 84148, (801) 499-1175
Acknowledgements: VA Consortium for Healthcare Informatics Research (CHIR), VA HSR HIR 08-374, the VA Informatics and Computing Infrastructure (VINCI), VA HIR 08-204, and NIH Grant U54 HL 108460 for integrating Data for Analysis, Annonymization and Sharing (iDASH), NIGMS 7R01GM090187.
eHOST System Features
Availability and Documentation
Abstract
• Introduce an open source annotation tool called the Extensible Human Oracle Suite of Tools (eHOST) and a server side administration component called called the Chart Review Administration Server for Patient Review (CASPR).
• Basic and advanced system functionalities that include: an annotation interface, error analysis and reporting, integration of machine-assisted approaches, and semi-automated curation of information.
Manually annotating documents is costly, time-consuming and labor-intensive. A clear opportunity exists to develop new tools and assess functionalities that introduce efficiencies into the process of generating reference standards for a variety of development tasks. In the biomedical domain, an infrastructure is needed that will support large-scale secure annotation of sensitive clinical data as well as distributed annotation approaches.
Figure 1. eHOST (Extensible Human Oracle Suite of Tools)
• Oracle Mode: Find and annotate identical strings of text using the same annotation class (Figure 5).
• Semi-Automated curation: reduce candidate entries in pre-annotation dictionaries and improve processing speed of machine-assisted pre-annotation.
• Integrated regular expressions builder: build and apply custom regular expression libraries to identify specific terms, or other information that commonly occurs in clinical reports.
• Integrated UMLS Search function: to support data normalization tasks often associated with annotation of clinical texts (Figure 4).
Basic System Features • Schema builder: using eHOST and/or CASPR users can
design annotation schema representing information classes, assign attributes, and build relations between classes (Figure 1,2).
• Corpus management: eHOST provides workspace and active project editors. CASPR supports a MySQL database backend (Figures 3 and 5).
• Annotation mode: identify and mark candidate spans of text using annotation schema (Figure 1). eHOST also supports difference matching, error checking and calculation of standard reporting metrics.
• Coupling eHOST with CASPR provides a means for distributed annotation allowing a study coordinator to quickly set up new annotation projects, plan and re-plan annotation assignments and manage submitted data.
• Data are written and stored in a queriable database. • CASPR manages which annotations belong to which
projects, datasets, tasks, batches, and annotator assignments, allowing appropriate presentation of annotations to any assigned task in a project workflow.
• Future directions will include a more formal usability assessment that will integrate distributed annotation using the eHOST/CASPR interfaces.
• API documentation, a demo project, and source code for eHOST available: http://code.google.com/p/ehost/.
������
Figure 2. (CASPR)Chart Review Administration Server for Patient Review
Study���Coordinator
CASPR���Annotation
Admin���Server
Corpus & Schemas
Chart Review Administration Server for Patient Review (CASPR)
eHOST
eHOST
eHOST
eHOST
eHOST
INTERNET
Figure 4. Embedded UMLS Searching Function in eHOST
Figure 3. Complete Solution
for document Level Review using eHOST and CASPR
MySQL Database
Annotations
Coordinator
Web
Inte
rface
Generate Schema
Load files
Assign tasks & Define Workflow
AdjudicatorAdj2
Annotator A1
Annotator A2
Annotator A3
AdjudicatorAdj1
Sync to eHO
ST for A
nnotator A1
Task for A1
eHOST ���A1
Human Annotation Using
eHOST
Adjudication
eHO
ST
CAS
PR
Figure 5. eHOST/CASPR Workflow
Realizing Efficient Annotation with eHOST: extensible Human Oracle Suite of Tools