wendy)chapman) danielle)mowery) - idash · pdf filewendy)chapman) danielle)mowery)))...
TRANSCRIPT
![Page 1: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s](https://reader034.vdocuments.us/reader034/viewer/2022042801/5a98dacf7f8b9adb5c8d50f0/html5/thumbnails/1.jpg)
integra(ng Data for Analysis, Anonymiza(on, and SHaring
Natural Language Processing Wendy Chapman Danielle Mowery
![Page 2: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s](https://reader034.vdocuments.us/reader034/viewer/2022042801/5a98dacf7f8b9adb5c8d50f0/html5/thumbnails/2.jpg)
Tools & Services
Collabora(ve Knowledge Authoring
Visualiza(on of NLP Annota(ons
Evalua(on Workbench
De-‐Iden(fica(on
Classifier Development
Annota(on Environment
Increase access to text through NLP
Decrease Burden of Developing NLP
NLP Tools & Services for iDASH
Surveillance from TwiOer
NLP App Customiza(on
![Page 3: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s](https://reader034.vdocuments.us/reader034/viewer/2022042801/5a98dacf7f8b9adb5c8d50f0/html5/thumbnails/3.jpg)
Overview
• How can we encourage sharing of clinical data? » Crea(ng an iDASH de-‐iden(fica(on applica(on
• How can we decrease the burden in crea(ng training cases and annota(ng? » Developing an iDASH annota(on environment » Demo of the iDASH annota(on environment
• De-‐iden(fica(on use case
7/19/12 Supported by the NIH Grant U54 HL108460 to the University of California, San Diego 3
![Page 4: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s](https://reader034.vdocuments.us/reader034/viewer/2022042801/5a98dacf7f8b9adb5c8d50f0/html5/thumbnails/4.jpg)
Enabling Data Sharing
• Kawasaki Disease DBP has pa(ent data » images » structured data » clinical reports
• Sharing this clinical data with other researchers » Offers opportuni(es for research advances » Presents many challenges
• How can we enable sharing of Kawasaki Disease and other clinical data? » Informed consent » Customizable DUA for data providers » HIPAA-‐compliant storage
Supported by the NIH Grant U54 HL108460 to the University of California, San Diego 4 7/19/12
![Page 5: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s](https://reader034.vdocuments.us/reader034/viewer/2022042801/5a98dacf7f8b9adb5c8d50f0/html5/thumbnails/5.jpg)
De-‐iden(fica(on of Clinical Data
• Missing link » Tool for removing 18 HIPAA Iden(fiers
• Headers – fairly straighXorward • Text – more difficult
7/19/12 Supported by the NIH Grant U54 HL108460 to the University of California, San Diego 5
NAME: Yongsan Wong MRN: 5238492 DOB: 06.06.2006 -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ This is a 14-‐month-‐old baby boy (Yongsan) who was transferred from Children’s Community with presump(ve diagnosis of Kawasaki with fever for more than 5 days and conjunc(vi(s, mild arthri(s with edema, rash, resolving and with elevated neutrophils and thrombocytosis, elevated CRP and ESR. When he was sent to the hospital, he had a fever of 102.
Pa(ent names
Hospital names
Medical record numbers
…
Headers Text
![Page 6: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s](https://reader034.vdocuments.us/reader034/viewer/2022042801/5a98dacf7f8b9adb5c8d50f0/html5/thumbnails/6.jpg)
Customizable De-‐iden(fica(on Service
BoB
Run de-id tool locally
Retrain on local data
Evaluate de-id On local data
Produce de-id texts
Enable sharing of clinical data
1. Pre-trained de-id application
2. Interface for corrections & retraining
3. Support for evaluation of output
Danielle Mowery, BreO South, Anurag Nara, Liqin Wang, Mingyuan Zhang, Shazia Ashfaq, Melissa Tharp
![Page 7: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s](https://reader034.vdocuments.us/reader034/viewer/2022042801/5a98dacf7f8b9adb5c8d50f0/html5/thumbnails/7.jpg)
Customizable De-‐iden(fica(on Service
BoB
Run de-id tool locally
Retrain on local data
Evaluate de-id On local data
Produce de-id texts
Enable sharing of clinical data
1. Pre-trained de-id application
2. Interface for corrections & retraining
3. Support for evaluation of output
![Page 8: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s](https://reader034.vdocuments.us/reader034/viewer/2022042801/5a98dacf7f8b9adb5c8d50f0/html5/thumbnails/8.jpg)
1. Build a Shareable De-identified Corpus
• MT Samples » Website with thousands of medical transcriptions » Minimally de-identified » Freely available
• Pilot annotation phase » 6 annotators » 350 reports
• Distributed annotation phase » Recruit community annotators » 2,000 reports
Research Ques(ons:
-‐ What is the best way to train many annotators?
-‐ How does pre-‐annota(on help?
-‐ Does clustering data improve speed?
Danielle Mowery, BreO South, Liqin Wang, Mingyuan Zhang, Anurag Narra, Shazia Ashfaq
![Page 9: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s](https://reader034.vdocuments.us/reader034/viewer/2022042801/5a98dacf7f8b9adb5c8d50f0/html5/thumbnails/9.jpg)
BoB: Best of Breed
7/19/12 Supported by the NIH Grant U54 HL108460 to the University of California, San Diego 9
Ini(al De-‐id Tool -‐ BoB • Developed at the Salt Lake City VA • Incorporates techniques used in all other de-‐iden(fica(on applica(ons
• Sta(s(cal • Regular expressions • Dic(onaries
Eventually add other open source tools for user to select from
![Page 10: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s](https://reader034.vdocuments.us/reader034/viewer/2022042801/5a98dacf7f8b9adb5c8d50f0/html5/thumbnails/10.jpg)
Customizable De-‐iden(fica(on Service
BoB
Run de-id tool locally
Retrain on local data
Evaluate de-id On local data
Produce de-id texts
Enable sharing of clinical data
1. Pre-trained de-id application
2. Interface for corrections & retraining
3. Support for evaluation of output
![Page 11: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s](https://reader034.vdocuments.us/reader034/viewer/2022042801/5a98dacf7f8b9adb5c8d50f0/html5/thumbnails/11.jpg)
2. Interface for Correction & Retraining
eHOST
University of Utah – BreO South, Chris Leng
![Page 12: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s](https://reader034.vdocuments.us/reader034/viewer/2022042801/5a98dacf7f8b9adb5c8d50f0/html5/thumbnails/12.jpg)
Customizable De-‐iden(fica(on Service
BoB
Run de-id tool locally
Retrain on local data
Evaluate de-id On local data
Produce de-id texts
Enable sharing of clinical data
1. Pre-trained de-id application
2. Interface for corrections & retraining
3. Support for evaluation of output
![Page 13: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s](https://reader034.vdocuments.us/reader034/viewer/2022042801/5a98dacf7f8b9adb5c8d50f0/html5/thumbnails/13.jpg)
Document & annota(ons
Outcome Measures for Selected Annota(ons
Select Classifica(ons to View
Report List
AOributes for Selected
Annota(on
Rela(onships for Selected
Annota(on Christensen, Murphy, Frabetti, Rodriguez, Savova
3. Evalua(on Workbench
![Page 14: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s](https://reader034.vdocuments.us/reader034/viewer/2022042801/5a98dacf7f8b9adb5c8d50f0/html5/thumbnails/14.jpg)
Crea(ng a Training Set
7/19/12 Supported by the NIH Grant U54 HL108460 to the University of California, San Diego 14
• Time consuming » Recruiting & training annotators for high agreement
• Expensive » Domain experts especially expensive » Need annotation by multiple people
• Logistically challenging » Managing files and batches of reports » Setting up annotation tool
• Redundant » Hasn’t someone created a schema for this before?
![Page 15: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s](https://reader034.vdocuments.us/reader034/viewer/2022042801/5a98dacf7f8b9adb5c8d50f0/html5/thumbnails/15.jpg)
Overview
• How can we encourage sharing of clinical data? » Crea(ng an iDASH de-‐iden(fica(on applica(on
• How can we decrease the burden in crea(ng training cases and annota(ng? » Developing an iDASH annota(on environment » Demo of the iDASH annota(on environment
• De-‐iden(fica(on use case
7/19/12 Supported by the NIH Grant U54 HL108460 to the University of California, San Diego 15
![Page 16: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s](https://reader034.vdocuments.us/reader034/viewer/2022042801/5a98dacf7f8b9adb5c8d50f0/html5/thumbnails/16.jpg)
iDASH Annotation Environment
Annotation Admin eHOST
Client apps on local computer
S Duvall, B South, G Savova, N Elhadad, H Hochheiser
Goal: provide an environment to decrease the burden of annotation
Annotator Registry
iDASH Web Services
Evalua(on Workbench
![Page 17: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s](https://reader034.vdocuments.us/reader034/viewer/2022042801/5a98dacf7f8b9adb5c8d50f0/html5/thumbnails/17.jpg)
Annotator Registry
• Enlist for annota(on • Cer(fy for annota(on tasks
» Personal health informa(on » Part-‐of-‐speech tagging » UMLS mapping
• Set pay rate • Searchable • Available for inclusion in new annota(on task
hOp://idash.ucsd.edu/nlp-‐annotator-‐registry
![Page 18: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s](https://reader034.vdocuments.us/reader034/viewer/2022042801/5a98dacf7f8b9adb5c8d50f0/html5/thumbnails/18.jpg)
1. Assign annotators to a task
Annota(on Admin
![Page 19: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s](https://reader034.vdocuments.us/reader034/viewer/2022042801/5a98dacf7f8b9adb5c8d50f0/html5/thumbnails/19.jpg)
2. Create a Schema
![Page 20: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s](https://reader034.vdocuments.us/reader034/viewer/2022042801/5a98dacf7f8b9adb5c8d50f0/html5/thumbnails/20.jpg)
3. Assign users and set (me expecta(ons
![Page 21: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s](https://reader034.vdocuments.us/reader034/viewer/2022042801/5a98dacf7f8b9adb5c8d50f0/html5/thumbnails/21.jpg)
3. Keep track of progress
![Page 22: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s](https://reader034.vdocuments.us/reader034/viewer/2022042801/5a98dacf7f8b9adb5c8d50f0/html5/thumbnails/22.jpg)
eHOST
Syncs with Annotation Admin » Download schema to annotate with » Download batch of reports to annotate » Upload annotated reports
![Page 23: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s](https://reader034.vdocuments.us/reader034/viewer/2022042801/5a98dacf7f8b9adb5c8d50f0/html5/thumbnails/23.jpg)
Evalua(on Workbench
• Compare annota(ons from two sources • Drill down to understand differences • Calculate outcome measures • Perform error analysis
7/19/12 Supported by the NIH Grant U54 HL108460 to the University of California, San Diego 23
![Page 24: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s](https://reader034.vdocuments.us/reader034/viewer/2022042801/5a98dacf7f8b9adb5c8d50f0/html5/thumbnails/24.jpg)
Demo of iDASH Annotation Environment
Annotation Admin eHOST
Client apps on local computer
Danielle Mowery
Annotator Registry
Evalua(on Workbench
iDASH Web Services
![Page 25: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s](https://reader034.vdocuments.us/reader034/viewer/2022042801/5a98dacf7f8b9adb5c8d50f0/html5/thumbnails/25.jpg)
Conclusion
• iDASH NLP Ecosystem goals » Decrease barriers to sharing of clinical data » Enhance clinical data use for research
• Leveraging the iDASH secure cloud
• Future work » Evaluate and extend the Annota(on environment for crowdsourcing
» Create a customizable de-‐id applica(on for iDASH users
7/19/12 Supported by the NIH Grant U54 HL108460 to the University of California, San Diego 25
![Page 26: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s](https://reader034.vdocuments.us/reader034/viewer/2022042801/5a98dacf7f8b9adb5c8d50f0/html5/thumbnails/26.jpg)
7/19/12 Supported by the NIH Grant U54 HL108460 to the University of California, San Diego
Thank you!
Ques(ons?