enhancing text classifiers to identify disease aspect information rey-long liu dept. of medical...
TRANSCRIPT
![Page 1: Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bf9e1a28abf838c9480e/html5/thumbnails/1.jpg)
Enhancing Text Classifiers to Identify
Disease Aspect Information
Rey-Long Liu
Dept. of Medical Informatics
Tzu Chi University
Taiwan
![Page 2: Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bf9e1a28abf838c9480e/html5/thumbnails/2.jpg)
Outline
• Research background
• Problem definition
• The proposed approach: IDAI
• Empirical evaluation
• Conclusion
Disease Aspect Classification 2
![Page 3: Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bf9e1a28abf838c9480e/html5/thumbnails/3.jpg)
Research Background
Disease Aspect Classification 3
![Page 4: Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bf9e1a28abf838c9480e/html5/thumbnails/4.jpg)
Disease Aspect Information (DAI)
Disease Aspect Classification 4
An example from MedlinePlus: Several passages about three aspects of kidney cancer: treatment, symptom and sign, and etiology. It also contains several passages not related to any aspect.
You have two kidneys ... Kidney cancer forms in the … Risk factors include smoking, having certain genetic conditions and …. Often, kidney cancer doesn't have early symptoms. However, see your health care provider if you notice Blood in your urineA lump in your abdomen…Pain in your side…Treatment depends on your age, …. It might include surgery, radiation, chemotherapy …
![Page 5: Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bf9e1a28abf838c9480e/html5/thumbnails/5.jpg)
Disease Knowledge Map: An Application of DAI
Disease Aspect Classification 5
![Page 6: Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bf9e1a28abf838c9480e/html5/thumbnails/6.jpg)
Identification of DAI
Disease Aspect Classification 6
Healthcare professionals & consumers
Disease Info.
Query & Aspect
Medical texts for specific diseases Disease
Aspects Classifier
Disease aspect information
symptoms
diagnosistreatment
etiologyprevention
Healthcare decision support system
Disease Info.
Cross-disease query
Medical information provider
Verified Info.
Aspect Info.
![Page 7: Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bf9e1a28abf838c9480e/html5/thumbnails/7.jpg)
Problem Definition
Disease Aspect Classification 7
![Page 8: Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bf9e1a28abf838c9480e/html5/thumbnails/8.jpg)
Goals
• Modeling the identification of DAI as a text classification problem– Disease aspects are predefined categories of
interest, not brief descriptions of information needs
• Developing a technique to enhance various kinds of text classifiers – Given a medical text, the classifier can be more
capable in identifying those texts that talk about aspects of diseases
Disease Aspect Classification 8
![Page 9: Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bf9e1a28abf838c9480e/html5/thumbnails/9.jpg)
Related Work• Text classification (TC)
– Weakness: multi-aspect information in a text will incur noises to text classifiers
• Segment extraction for topic detection– Weakness: designed for specific descriptions
(not for categories)
• Passage extraction for TC– Weakness: location and length of the passages
that are relevant to a specific category becoming another problem of TC
Disease Aspect Classification 9
![Page 10: Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bf9e1a28abf838c9480e/html5/thumbnails/10.jpg)
The Proposed Approach: IDAI
Disease Aspect Classification 10
![Page 11: Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bf9e1a28abf838c9480e/html5/thumbnails/11.jpg)
IDAI: Revising Term Frequency (TF) to Improve
Classifiers
Disease Aspect Classification 11
Categories (aspects)
Classifier Development
Training
Testing
Underlying Text ClassifierIDAI
Classification
Training Texts
A text (d)
Assessing Term Frequencies (TF)
TF of terms w.r.t. each category
Identifying Term-Category Correlation type
![Page 12: Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bf9e1a28abf838c9480e/html5/thumbnails/12.jpg)
Two Strategies for TF Revision
Disease Aspect Classification 12
Underlying classifier G Enhanced classifier G+IDAI
Feature sets TF revision by IDAI
Accepting relevant texts
P: Set of positively correlated features (Strategy I) TF of a feature f is
amplified (reduced) if neighbors of f have the same (different) correlation type to the category(Strategy II) TF of a feature f in Q is reduced if f appears in a text segment that mainly mentions features in P
Rejecting irrelevant texts
Q: Set of negatively correlated features
![Page 13: Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bf9e1a28abf838c9480e/html5/thumbnails/13.jpg)
• Revised TF(t,d,c) = WindowTF(t,d,c), if t is positively correlated to c; (for Strategy I)
Maxc’c{WindowTF(t,d,c’)} - InconsistencyTF(t,d,c), if t is negatively correlated to c (for Strategy II)
• WindowTF(t,d,c) =k(0.5+Pwindow,k), for each occurrence of t at k,
Pwindow,k = Distance-based sum of weights of other positively correlated terms in a window at k
• InconsistencyTF(t,d,c) = k(Pinconsistency,k), for each occurrence of t at k,
Pinconsistency,k=0.5How the text segment before k is dominated by the terms positively correlated to c
Disease Aspect Classification 13
![Page 14: Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bf9e1a28abf838c9480e/html5/thumbnails/14.jpg)
Empirical Evaluation
Disease Aspect Classification 14
![Page 15: Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bf9e1a28abf838c9480e/html5/thumbnails/15.jpg)
Experimental Data• Top-10 fatal diseases and top-20 cancers in
Taiwan– Total # of diseases: 28– Source: Web sites of hospitals, healthcare
associations, and department of health in Taiwan– Disease aspects (categories): 5 spects: etiology,
diagnosis, treatment, prevention, and symptom.– Splitting the texts into aspects: 4669 texts about
individual aspects– Test data: Randomly sampling 10% of the 4669 texts
and merging them into test texts of 1 to 5 aspectsDisease Aspect Classification 15
![Page 16: Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bf9e1a28abf838c9480e/html5/thumbnails/16.jpg)
Underlying Classifiers & Experimental Baselines
• Underlying classifier – The Support Vector Machine (SVM)
classifier
• Baseline enhancer– CTFA (Liu, 2010), which employs Strategy I
for better TC
– CTFA does not consider Strategy II Disease Aspect Classification 16
![Page 17: Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bf9e1a28abf838c9480e/html5/thumbnails/17.jpg)
Results
Disease Aspect Classification 17
![Page 18: Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bf9e1a28abf838c9480e/html5/thumbnails/18.jpg)
Disease Aspect Classification 18
![Page 19: Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bf9e1a28abf838c9480e/html5/thumbnails/19.jpg)
Conclusion
Disease Aspect Classification 19
![Page 20: Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bf9e1a28abf838c9480e/html5/thumbnails/20.jpg)
• Disease knowledge map (Dmap)– Supporting evidence-based medicine, health
education, and healthcare decision support
• A key step to build a Dmap: Automatic identification of disease aspect information (DAI)
• Identification of DAI as a text classification problem
• Term proximity as key information to enhance existing classifiers to classify DAI
Disease Aspect Classification 20