![Page 1: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/1.jpg)
School of ComputingFACULTY OF ENGINEERING
NLP for Health Informatics: text-mining patient records
SNOMED CT based semantic tagging of medical narratives
Verbal Autopsy corpus for Machine Learning of Cause of Death
E-Health GATEway to the Clouds
Saman Hina, Sammy Danso, Eric Atwell, Owen Johnson
Natural Language Processing Group
![Page 2: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/2.jpg)
School of ComputingFACULTY OF ENGINEERING
SNOMED CT based semantic tagging of medical narratives --------------------------------------------------------------------- Research Objective Key Challenges Resources Methods
1. Baseline Application2. SNOMED CT Rule-based semantic tagger
Results Conclusion and Future Work
![Page 3: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/3.jpg)
School of ComputingFACULTY OF ENGINEERING
Sample Text Output
![Page 4: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/4.jpg)
School of ComputingFACULTY OF ENGINEERING
Research Objective ---------------------------------------------------------------------To design a novel approach for extraction of semantic information from unstructured medical narratives.
The underlying research hypothesis is that it is possible to annotate natural language medical narratives with high accuracy using SNOMED CT healthcare data standard.
Healthcare Data standards Secure Consistent Authentic sharing among healthcare users with codes.
![Page 5: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/5.jpg)
School of ComputingFACULTY OF ENGINEERING
Key Challenges ---------------------------------------------------------------------
• Clinicians have different ways of expressing one single medical term and do not follow language of healthcare data standards which is a challenge in extracting domain knowledge.
• Not having domain expert. • Use of synonyms, abbreviations , paraphrasing the concepts and different preferred names of a concept increases the complexity of the current research challenge.
• Different patterns of section headers, capitalization of words and content.
![Page 6: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/6.jpg)
School of ComputingFACULTY OF ENGINEERING
![Page 7: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/7.jpg)
School of ComputingFACULTY OF ENGINEERING
Annotations Frequency
Clinical Documents (discharge summaries, progress notes) 1176
Words 965,244
Sentences 51756
SNOMED CT Concepts in the corpus 67575
Corpus-----------------------------------------------------------------------• Corpus from the fourth i2b2/VA 2010 challenge.• Contains discharge summaries and progress notes from four healthcare partners.
![Page 8: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/8.jpg)
School of ComputingFACULTY OF ENGINEERING
Data Standard ------------------------------------------------------------------SNOMED-CT (Systematized Nomenclature of Medicine Clinical Terms) a comprehensive international data standard for clinical terminology. • Number of Concepts from SNOMED CT : 356,432• 16 out of 31Semantic types from SNOMED CT have been used to develop SNOMED CT semantic tagger.
1.Attribute 9. Person2.Body Structure 10. Physical Object3.Disorder 11. Procedure4.Environment 12. Product Or Substance5.Findings 13. Qualifier Value6.Observable Entity 14. Record Artifact 7.Occupation 15. Regime/ Therapy8.Organism 16. Situation
![Page 9: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/9.jpg)
School of ComputingFACULTY OF ENGINEERING
Annotation scheme for Gold Standard Corpus--------------------------------------------------------------------
• Pre annotation of corpus using SNOMED CT dictionary application (Baseline system).
• Reviewing the corpus manually and mark the remaining concepts considering the following language issues; Synonyms, abbreviations, incomplete concepts, paraphrase of concepts and concepts under section headings.
• Concepts which are not identified correctly should be removed.
•In case of non domain user, NCBO bioportal annotator will be used to annotate the gold standard corpus by searching the key words and bigrams of the possible concepts.
![Page 10: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/10.jpg)
School of ComputingFACULTY OF ENGINEERING
• Concepts should be marked up to three levels of granularity.
• Agreement of gold standard is more than 90 %.
![Page 11: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/11.jpg)
School of ComputingFACULTY OF ENGINEERING
Baseline System - SNOMED CT Dictionary Application-----------------------------------------------------------------------•Basic language processing (Tokenize, Sentence Splitting, POS tagging)
•Concepts have been tagged automatically (Dictionary, Lookup).
•SNOMED CT knowledge base was developed by constructing separate dictionaries of 16 semantic types.
•6 out of 16 tags performed well with dictionary application.
1. Disorder 4. Record Artifact
2. Observable Entity 5. Regime/Therapy
3. Person 6. Situation
![Page 12: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/12.jpg)
School of ComputingFACULTY OF ENGINEERING
Optimization of SNOMED CT Knowledgebase---------------------------------------------------------------• Optimizing the concepts in SNOMED CT semantic types to write general rules for semantic tagger.
•Optimization process reduce the size of knowledge base by removing un necessary and ambiguous information.
Entire lung -> LungEar NOS -> Ear
• Long multiword concepts have been transformed into individual concepts to solve paraphrasing problem.
Radiography of chest and lung -> 1. Radiography 2. Chest 3.lung
![Page 13: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/13.jpg)
School of ComputingFACULTY OF ENGINEERING
Rule-based SNOMED CT Semantic tagger --------------------------------------------------------------- This application use the optimized SNOMED CT dictionary as knowledgebase.
Documents containing narratives
Tokenizer Sentence Splitter
Part Of Speech Tagger
Morphological Analyzer
SNOMED CT knowledge base
Rules
Colour coded SNOMED CT Semantic types
Extracting concepts and
plural concepts
![Page 14: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/14.jpg)
School of ComputingFACULTY OF ENGINEERING
SNOMED CT Semantic
Types
Baseline Semantic Tagger
SNOMED CT Semantic Tagger
Recall(%)
Precision
(%)
F-Score(%)
Recall
(%)
Precision
(%)
F-Score(%)
Body Structure 47 89 61 74 74 74
Disorder 82 97 89 78 94 85
Environment 51 96 67 80 63 70
Observable Entity
85 89 87 82 87 85
Occupation 40 100 57 80 73 77
Person 92 100 96 93 100 96
Procedure 44 96 60 87 73 80
Record Artifact 100 77 87 100 77 87
Regime/Therapy
96 96 96 96 93 95
Situation 80 95 87 72 95 82
Average 61 96 75 82 79 81
![Page 15: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/15.jpg)
School of ComputingFACULTY OF ENGINEERING
Conclusion and Future Work---------------------------------------------------------------
•Corpus containing long multiword concepts has been automatically extracted and tagged with 10 out 16 SNOMED semantic types.•Annotation of unseen test corpus will be completed by domain users to test SNOMED CT semantic tagger.•Optimization of the remaining SNOMED CT semantic types to construct general rules.•Corpus annotations will be contributed to the users through i2b2 organizers.
![Page 16: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/16.jpg)
Samuel Danso1,3, Eric Atwell1, Owen Johnson1, Guus ten Asbroek2, Seyi Soromekun2, Karen Edmond2, Chris Hurt4, Lisa Hurt2, Charles Zandoh3, Charlotte Tawiah3, Zelee Hill2, Justin
Fenty2, Seeba Amenga Etego3, Seth Owusu Agyei2,3, and Betty R Kirkwood2.1 University of Leeds 2 London School of Hygiene and Tropical Medicine 3 Kintampo Health Research Centre, Ghana4 University of Cardiff
Presented BySamuel Danso
PhD Student - NLP Research Group, University of [email protected]
21st July 2011
![Page 17: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/17.jpg)
![Page 18: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/18.jpg)
Causes of Death Information – The global picture
• About 57 million people die each year
• Cause of Death Information is vitally important to health planners and policy makers at all levels.
• How do we find out the 67% ?– Verbal Autopsy
![Page 19: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/19.jpg)
Who What
WHO and national/ international bodiesglobal and national cause-specific mortality estimates; ICD coding
local public health managers top-ranking causes of death and public health priorities
epidemiologists and health services researchers relating to specific populations and sub-groups
institutional managers and clinical auditorspatterns for deaths within institutions and health care systems
medical and legal practitioners individual causes for particular cases
Use of CoD Information
Source Byass et al, 2007
![Page 20: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/20.jpg)
• basically, a narrative of an account of an incident that led to the death of a person.
• An idea from the 17th Century used in the UK and other developed countries. Now recommended by WHO as the standard approach used in the developing countries.
![Page 21: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/21.jpg)
![Page 22: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/22.jpg)
![Page 23: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/23.jpg)
Coded part
Sample of VA Data for Infant Death I
5.2.6. Health worker measured the blood pressure and told you it was high…… 1. Yes 2 No 8. NKDHIGHBP
5.2.7. Convulsions like in children…………………………….………………1. Yes 2 No 8. NK
DCONVULSE
5.2.8. Fever during labour………………………………….……..…………… 1. Yes 2 No 8. NK DFEVER
5.2.9. Umbilical cord delivered before the baby………………………………. 1. Yes 2. No 8. NK DPROLAPSE
5.2.10. Umbilical cord around the baby’s neck…………………………………. 1. Yes 2. No 8. NK DCORDNECK
5.2.11. Heavy bleeding during labour or after delivery………………………… 1. Yes 2. No 8. NK DBLEED
5.2.12. Did somebody put their hand inside the womb to remove the placenta?.. 1. Yes 2. No 8. NK DRETPLAC
5.2.13. Other: 1. Yes 2. No 8. NK DOTHER
![Page 24: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/24.jpg)
free text partCan you tell me something about your pregnancy?I was never ill though out my pregnancy. I started ANC in the 5th month of my pregnancy at Dwenewoho and then continued every month. I did not start ANC earlier because I was not ill and not also attended to ANC because I was well.Can you tell me something about your labourlabour started me on Monday early dawn when I experienced waist, stomach pains and the break of water. I visited Kintampo hospital on Tuesday and was given one drip of water. I was also given blood.Can you tell me something about the baby?the baby was nice.Can you tell me what happened after delivery?something was done because it could not cry immediately after delivery. So enema pump 'Bentua' was used to it something on the nose before it was able to breath. The nurses said the baby's time was not due Any signs and symptoms before the death of the child ?
I did not know what kill the baby. According to the nurses its time was not due so it was kept in an incubator and it died the next day.
Sample of VA Data for Infant Death II
![Page 25: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/25.jpg)
![Page 26: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/26.jpg)
![Page 27: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/27.jpg)
![Page 28: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/28.jpg)
![Page 29: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/29.jpg)
![Page 30: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/30.jpg)
Characteristics of corpus
![Page 31: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/31.jpg)
• Data sparseness and imbalance - 46 categories
![Page 32: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/32.jpg)
Characteristics of corpus: free text
Some Statistics
Average number of word tokens per document
150
Estimated number of word tokens ≈ 1.5 Million
Number of documents- infant ≈ 8000
Number of documents– adult women
≈ 2500
Diagnosis information 10500
![Page 33: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/33.jpg)
Characteristics of corpus: free text
“WHEN THE CHILD WAS SIXTEEN (16) DAYS OLD SHE FELL SICK WHICH LAUTED FOR THREE (3) DAYS BEFORE SHE DIED. THE CHILD HAVING DIFFICULT BREATING. ANY TIME, SHE BREATHS, YOU SEE A HOLE IN THE CHEST, AND ALSO MAKING NOISE IN THE CHEST. SHE HAD CONVULSION WHEN SHE WAS SEVERTEEN (17) DAYS OLD BEFORE SHE DIED THE FOLLOWING DAY. SHE ALSO HAD A BULGING FONTENED AND SEVERE HOT BODY WHICH LASTED
FOR TWO (2) DAYS BEFORE SHE DIED. THE CHILD ALSO HAD A FIT WHICH SHE COULD NOT OPEN HER MOUTH.”
“WHEN THE CHILD WAS SIXTEEN (16) DAYS OLD SHE FELL SICK WHICH LAUTED FOR THREE (3) DAYS BEFORE SHE DIED. THE CHILD HAVING DIFFICULT BREATING. ANY TIME, SHE BREATHS, YOU SEE A HOLE IN THE CHEST, AND ALSO MAKING NOISE IN THE CHEST. SHE HAD CONVULSION WHEN SHE WAS SEVERTEEN (17) DAYS OLD BEFORE SHE DIED THE FOLLOWING DAY. SHE ALSO HAD A BULGING FONTENED AND SEVERE HOT BODY WHICH LASTED
FOR TWO (2) DAYS BEFORE SHE DIED. THE CHILD ALSO HAD A FIT WHICH SHE COULD NOT OPEN HER MOUTH.”
Misspellings Misspellings
Inappropriate use of punctuation marks
Inappropriate use of punctuation marks
Grammatical errorGrammatical error
• Spelling and grammatical mistakes posing parsing problems
![Page 34: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/34.jpg)
• Different ways of expressing the same concept.• Baby came out• Baby landed• Gave birth
• Local words– “– ‘afam’– ‘bentoa’
• Abbreviations – ANC = Antenatal Clinic– TBA = Traditional Birth Antendant
• Fuzzy expression of clinical concepts. Sometimes no clinical concept expressed at all. (..” I visited Kintampo hospital on Tuesday and was given one drip of water. ..”)
Delivery
![Page 35: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/35.jpg)
• Missing values (-)• 215 variables• Entries are coded
– sex = 1, 2, 8 or 9– Weight= 1.45, 9.99 or 8.88
• Continues revision of questionnaire resulting in blank values for some variables
![Page 36: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/36.jpg)
![Page 37: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/37.jpg)
Results: 46 categories - combined dataset
![Page 38: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/38.jpg)
Results: 6 categories – combined dataset
![Page 39: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/39.jpg)
Results: 46 categories – time of death
![Page 40: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/40.jpg)
Results: 6 categories – time of death
![Page 41: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/41.jpg)
• Key lessons– CRISP-DM is the appropriate methodology for this project
– It is feasible to use machine learning techniques to classify CoD in Verbal Autopsies
– Split of dataset by clinical definitions into homogenous sets improves classifier performance
– Classification at top level of hierarchy of CoD could lead to increase in performance across classifiers due to number of classes (46 to 6) and instances per class.
Discussion and Conclusion
![Page 42: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/42.jpg)
Discussion and ConclusionCan you tell me something about your pregnancy?I was never ill though out my pregnancy. I started ANC in the 5th month of my pregnancy at Dwenewoho and then continued every month. I did not start ANC earlier because I was not ill and not also attended to ANC because I was well.Can you tell me something about your labourlabour started me on Monday early dawn when I experienced waist, stomach pains and the break of water. I visited Kintampo hospital on Tuesday and was given one drip of water. I was also given blood.Can you tell me something about the baby?the baby was nice.Can you tell me what happened after delivery?something was done because it could not cry immediately after delivery. So enema pump 'Bentua' was used to it something on the nose before it was able to breath. The nurses said the baby's time was not due Any signs and symptoms before the death of the child ?
I did not know what kill the baby. According to the nurses its time was not due so it was kept in an incubator and it died the next day.
Other Uses of corpus?
![Page 43: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/43.jpg)
e-Health GATEway to the Clouds
http://www.comp.leeds.ac.uk/nlp/e-health• WP1: Clouds on the White Rose Grid VRE• Deliverables: A secure cloud-based VRE on the White Rose Grid
(Month 2), e-health records from TPP stored (Month 3), access and research support tools (Month 3). Iterative refinement (Month 3-5).
• WP2: GATEway component • Deliverables: A GATE plug-in module capable of securely
pseudonymising the free text elements of the example e-health records (Month 3). Iterative refinement (Month 3-5).
• WP3: Evaluation and Sustainability • Deliverables: Evaluation of WP1 and WP2 combined into a cohesive
e-health VRE (Month 5), sustainability plan (Month 4), dissemination as a case study (mid Month 5), hand-over to ongoing support by YCHI (Month 6)
![Page 44: NLP for Health Informatics: text-mining patient records](https://reader035.vdocuments.us/reader035/viewer/2022081512/56812b38550346895d8f49f8/html5/thumbnails/44.jpg)
School of ComputingFACULTY OF ENGINEERING
We welcome e-Health MSc / PhD Projects
SNOMED CT based semantic tagging of medical narratives
Verbal Autopsy corpus for Machine Learning of Cause of Death
E-Health GATEway to the Clouds
Saman Hina, Sammy Danso, Eric Atwell, Owen Johnson
Natural Language Processing Group