![Page 1: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c525e7/html5/thumbnails/1.jpg)
Intelligent Database Systems Lab
Presenter : Chang,Chun-Chih
Authors : Youngjoong Ko, Jungyun Seo
2009, IPM
Text classification from unlabeled documents with bootstrapping
and feature projection techniques
![Page 2: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c525e7/html5/thumbnails/2.jpg)
Intelligent Database Systems Lab
Outlines
MotivationObjectivesMethodologyExperimentsConclusionsComments
![Page 3: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c525e7/html5/thumbnails/3.jpg)
Intelligent Database Systems Lab
Motivation
• A general inductive process automatically builds a text classifier by learning, generally known as supervised learning.
• The most notable problem is that they require a large number of labeled training documents for accurate learning.
![Page 4: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c525e7/html5/thumbnails/4.jpg)
Intelligent Database Systems Lab
Objectives
• The propose a new text classification method based on unsupervised or semi-supervised learning
• The proposed method launches text classification tasks with only unlabeled documents.
![Page 5: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c525e7/html5/thumbnails/5.jpg)
Intelligent Database Systems Lab
Methodology-Framework
![Page 6: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c525e7/html5/thumbnails/6.jpg)
Intelligent Database Systems Lab
Methodology -Creating keyword lists
![Page 7: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c525e7/html5/thumbnails/7.jpg)
Intelligent Database Systems Lab
Methodology -Creating keyword lists
1 = 1.0+( 1.0 - 1.0 )
Student
traffic
is
1.0
1.0
Title WordTitle WordStudent
trafficbook
0.05
0.6
1.15 = 0.6+( 0.6 – 0.05 )
![Page 8: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c525e7/html5/thumbnails/8.jpg)
Intelligent Database Systems Lab
Methodology -Extracting & verifying centroid-context
![Page 9: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c525e7/html5/thumbnails/9.jpg)
Intelligent Database Systems Lab
Methodology-Creating the context-cluster of each category
1.
![Page 10: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c525e7/html5/thumbnails/10.jpg)
Intelligent Database Systems Lab
Methodology-Creating the context-cluster of each category2.
3.
![Page 11: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c525e7/html5/thumbnails/11.jpg)
Intelligent Database Systems Lab
Methodology-Creating the context-cluster of each category
EX: 1. eat Banana 2. taste Banana 3. eat Apple
![Page 12: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c525e7/html5/thumbnails/12.jpg)
Intelligent Database Systems Lab
Methodology-The TCFP classifier with robustness from noisy data
![Page 13: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c525e7/html5/thumbnails/13.jpg)
Intelligent Database Systems Lab
Methodology-The TCFP classifier with robustness from noisy data
![Page 14: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c525e7/html5/thumbnails/14.jpg)
Intelligent Database Systems Lab
Experiments
![Page 15: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c525e7/html5/thumbnails/15.jpg)
Intelligent Database Systems Lab
Experiments
![Page 16: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c525e7/html5/thumbnails/16.jpg)
Intelligent Database Systems Lab
Experiments
![Page 17: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c525e7/html5/thumbnails/17.jpg)
Intelligent Database Systems Lab
Experiments
![Page 18: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c525e7/html5/thumbnails/18.jpg)
Intelligent Database Systems Lab
Experiments
![Page 19: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c525e7/html5/thumbnails/19.jpg)
Intelligent Database Systems Lab
Experiments
![Page 20: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c525e7/html5/thumbnails/20.jpg)
Intelligent Database Systems Lab
Experiments
![Page 21: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c525e7/html5/thumbnails/21.jpg)
Intelligent Database Systems Lab
Conclusions
• The proposed method is useful for low-cost text classification
• If some text classification tasks require high accuracy, can be used as an assistant tool for easily creating training data.
![Page 22: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents](https://reader036.vdocuments.us/reader036/viewer/2022070404/56649f345503460f94c525e7/html5/thumbnails/22.jpg)
Intelligent Database Systems Lab
Comments
• Advantages– faster – less expensive
• Applications– Text classification