automated abstracting - ncra san antonio 2015
TRANSCRIPT
Why this topic?
Cancer in the 1960’s – hushed tones
Cancer in the 1990’s – we are finding cures!
Cancer in 2015 – we can be very hopeful, but
realize it may be an ongoing malady
For example, we are more aware of late
after effects with pediatric patients,
and now with adults
A snapshot In time
Casefinding
Follow-up (80% reference yr
90% 5 yrs)
Abstracting (90% in 6 months) Reporting
“The abstract is the basis of registry functions” NCRA Informational Abstracts, March 19, 2015
To what extent can we automate the process of
completing an abstract?
Will “auto-abstracting” enhance the quality and
timeliness of your cancer reporting?
We will examine:
Success with automation to date
• Cancer casefinding from pathology reports
• Cancer casefinding from head and neck diagnostic imaging
reports
• Cancer record linkage
• Extraction of synoptic data from text reports
• Our understanding of the pathophysiology of neoplasia
• Investigations used to diagnose cancer
• Classification of malignancies
• New cancer therapies
• Evolving reporting standards
• Where diagnosis and treatment occur
• All of these are assisted by advances in information technology
Reviewing Our Changing Environment
Harrison’s Principles of Internal Medicine, 1970
Classification of
Leukemias, 1970
Harrison’s Principles of Internal Medicine, 2005
2018?
Classification of
Leukemias
and Lymphoid
Malignacies, 2005
2015 1995
Diagnosing Lymphoma
• Radiograph
• CT Scan
• Biopsy
• Blood Work
• Radiograph
• CT Scan
• Biopsy
• Blood Work
• CD 30 Expression
• Anaplastic Lymphoma
Kinase (ALK)
• PET Scan (FDG Avidity) Source(s): NEJM Vol 333, No 12, p 784 Sep 21, 1995 | NEJM Vol 372, No 7, p 650 Feb 12, 20152009-2010
Increasing Complexity & Volume of Cancer Data
Genetic Abnormalities in Chronic Lymphocytic Leukemia
Sequence the DNA of 91 patients
To examine the spectrum of mutations
in this disease All patients
are clinically the same but
genetically different
NEJM Vol 365, No 26
And may respond to different
treatments!
• Personalized / Precision Medicine
• Disease detection / screening
• Determining Extent of Disease
• Detection of recurrence
• Outcome and its time course
Cancer is a progressive disease, so very accurate
information is necessary for
• Integration of registry data with screening programs
• Extraction of staging information
• How to continue to gather data
• How to identify and track recurrence over the time course of
cancer
• Again, the current abstract format does not allow for some
of these data elements
So an automated design must incorporate
Plotting the time course of events:
• Each child’s continuum of care is a series of events
• Each event is annotated with socio-economic details
• Quality of Life considerations
2009
1st Diagnosis
Treatment
2010
Active Follow-up
2014
Recurrence
Treatment
2015
Active Follow-up
Active Cancer Registry (Pediatric)
• Who has access to the information – full audit trail required
• HIPAA / HITECH security and privacy requirements
• Financial penalties for data breaches
• Who is custodian of stored records over time, perhaps past
the legal retention period
Regulatory and Security Concerns
Source(s): The State of Cancer Care in America, 2014: A Report by the American Society of Clinical Oncology Journal of Oncology Practice JOP.2014.001386 Published online March 10, 2014
Most Oncology Data are now Electronic
2008
2012-2013
EMR Adoption in 4,300 Oncology Practices
Lab Systems
Lab Systems
EMR EMR
Dx Imaging
Genetic Biomarkers
Pharmaceutical
Surgical Radiotherapy
Death Registries
Increasingly all in Electronic Format
Cancer Care Information Sources
Getting the documents out of current
Electronic Medical Record systems
1.
Getting the data out of the documents 2.
Keeping updated with the different
vocabularies and standards
3.
Challenges with Automating Sources
Eliminating redundancy 1.
Assessing data reliability and quality 2.
Determining how the data can be
processed automatically
3.
Challenges with Automating Data
To what extent can we automate the
process of completing an abstract?
The Initial Question
Use Artificial Intelligence (A.I.)
Our Approach
Relegate the repetitive to
Artificial Intelligence systems,
Focus registry staff on more complex tasks
Our Practical and Economical Approach
The capability of a computer to emulate
intelligent human behavior
Artificial Intelligence
Most successful in clearly demarcated domains
For example, it works well in protein
electrophoresis interpretation, but not in
general internal medicine
General Purpose
Limited Accuracy
Specific Purpose
Very high Accuracy
Where to Apply A.I.
Hard Easy
Tic-Tac-Toe
Checkers
Chess
GO
Image Processing
Voice Recognition
Locomotion
Composition Natural Language Processing
Artificial Intelligence
• Your smart phone assistant can sort and present the
restaurants within one mile…
• THEN YOU DECIDE
• Artificial Intelligence can perform the basic data gathering
tasks for an abstract…
• THEN YOU APPLY YOUR EXPERTISE
A.I. on a Practical Level
Artificial Intelligence
Search Optimization
Neural Networks
Semantic Maps
Bayesian Networks
Quantum Computing
Speech Recognition
Computer Vision
Sub-Disciplines
Inference Systems
Expert Systems
Natural Language Processing
Markov Random Fields
Pattern Matching
State Machines
Inference Systems
Expert Systems
Natural Language Processing
Search Optimization
Neural Networks
Semantic Maps
Bayesian Networks
Quantum Computing
Speech Recognition
Computer Vision
Markov Random Fields
Pattern Matching
State Machines
AIM’s Focus
“Time flies like an arrow”
“Fruit flies like an apple”
“All the lymph nodes are negative”
“Diagnosis: ALL”
Natural Language Processing can distinguish
“Time flies like an arrow” from “Fruit flies like an apple”
and
“ALL” from “all”
• Words in Context
• Modifiers, especially negation
• Regular Expressions
• Grammar / Formatting / Punctuation
Knowledge Base Knowledge
Base
Functional Architecture
Knowledge Base
Rule Base
Primary Search
Report Decomposition
Concept Isolation
Negation Detection
Concept-Value Synthesis
Pathology Report
Final Variable-Value Pairs
Exclusion Process
Refinement
A.I. Engine
Data processed automatically, confidence level 95%+
1.
Data processed automatically, then reviewed by a CTR due to a lower confidence level
2.
Incomplete or contradictory data that must be adjudicated by a CTR
3.
Confidence Level of the Data
We have already successfully automated:
• Cancer casefinding from pathology reports
• Cancer casefinding from head and neck diagnostic imaging
reports
• Cancer record linkage
• Extraction of synoptic data from text reports
• Pathology
• Diagnostic Imaging
• Genome Sequencing
• Biomarkers
• Surgery
• Chemotherapy
• Radiotherapy
• Follow-up
• Recurrence
• And ?
Next logical step: Our current project is to put it all together to
auto-populate the abstract!
With CDA - Clinical Data Architecture – we can
consolidate documents of differing formats
Maintained by HL7 and is a key component of government
health initiatives
A document standard is now available
Cancer Abstract Data Inputs
radiology
radiotherapy
chemotherapy
1999 2015
50
100 Sensitivity
Specificity
99.7
98.9
Performance Improvement Over Time
The Experience Curve (E-Path)
SENSITIVITY receiving the data that you want [few false negatives]
SPECIFICITY excluding the data that you don’t want [few false positives]
Standardization
Consistency
Auto-Populated Fields
Completeness
Accuracy
Low Cost
Benefits of Auto-Abstracting
How far have we automated the process of completing an abstract?
Will “auto-abstracting” enhance the quality and timeliness of your cancer reporting?
Yes!
To a very significant extent!
Your registry remains the basis of a comprehensive
cancer record based on a national standard within an
existing legal framework!
(not that expensive new EMR system in the hospital)
Auto-abstracting will help to ensure that:
For your cancer registry, and the cancer registry community, this will raise your profile and the level of respect that you receive in your hospital!