text learning
DESCRIPTION
Text Learning. Tom M. Mitchell Aladdin Workshop Carnegie Mellon University January 2003. 1. CoTraining learning from labeled and unlabeled data. Redundantly Sufficient Features. my advisor. Professor Faloutsos. Redundantly Sufficient Features. my advisor. Professor Faloutsos. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/1.jpg)
Text Learning
Tom M. MitchellAladdin Workshop
Carnegie Mellon UniversityJanuary 2003
![Page 2: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/2.jpg)
1. CoTraining learning from labeled and unlabeled data
![Page 3: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/3.jpg)
Redundantly Sufficient FeaturesProfessor Faloutsos my advisor
![Page 4: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/4.jpg)
Redundantly Sufficient FeaturesProfessor Faloutsos my advisor
![Page 5: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/5.jpg)
Redundantly Sufficient Features
![Page 6: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/6.jpg)
Redundantly Sufficient FeaturesProfessor Faloutsos my advisor
![Page 7: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/7.jpg)
CoTraining Setting
)()()()(,
:
221121
21
xfxgxgxggandondistributiunknownfromdrawnxwhere
XXXwhereYXflearn
• If– x1, x2 conditionally independent given y– f is PAC learnable from noisy labeled data
• Then– f is PAC learnable from weak initial classifier
plus unlabeled data
![Page 8: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/8.jpg)
Co-Training Rote Learner
My advisor+
-
-
pageshyperlinks
![Page 9: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/9.jpg)
Co-Training Rote Learner
My advisor+
-
-
pageshyperlinks
-
--
-
++
![Page 10: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/10.jpg)
Co-Training Rote Learner
My advisor+
-
-
pageshyperlinks
-
--
-
++
-
-
-
+
+
![Page 11: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/11.jpg)
Co-Training Rote Learner
My advisor+
-
-
pageshyperlinks
-
--
-
++
-
-
-
+
++
+
-
-
![Page 12: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/12.jpg)
Co-Training Rote Learner
My advisor+
-
-
pageshyperlinks
-
--
-
++
-
-
-
+
++
+
-
-
+
+
-
![Page 13: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/13.jpg)
What if CoTraining Assumption Not Perfectly Satisfied?
-
+
+
+
![Page 14: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/14.jpg)
What if CoTraining Assumption Not Perfectly Satisfied?
-
+
+
+
![Page 15: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/15.jpg)
• Idea: Want classifiers that produce a maximally consistent labeling of the data
• If learning is an optimization problem, what function should we optimize?
What if CoTraining Assumption Not Perfectly Satisfied?
-
+
+
+
![Page 16: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/16.jpg)
What Objective Function?
Lyx
Lyx
xgyE
xgyE
EEE
,
222
,
211
))(ˆ(2
))(ˆ(1
21
Error on labeled examples
![Page 17: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/17.jpg)
What Objective Function?
Ux
Lyx
Lyx
xgxgE
xgyE
xgyE
EcEEE
22211
,
222
,
211
3
))(ˆ)(ˆ(3
))(ˆ(2
))(ˆ(1
321
Error on labeled examples
Disagreement over unlabeled
![Page 18: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/18.jpg)
What Objective Function?
2
2211
,
22211
,
222
,
211
43
2)(ˆ)(ˆ
||||1
||14
))(ˆ)(ˆ(3
))(ˆ(2
))(ˆ(1
4321
ULxLyx
Ux
Lyx
Lyx
xgxgUL
yL
E
xgxgE
xgyE
xgyE
EcEcEEE
Error on labeled examples
Disagreement over unlabeled
Misfit to estimated class priors
![Page 19: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/19.jpg)
What Function Approximators?
![Page 20: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/20.jpg)
What Function Approximators?
• Same fn form as Naïve Bayes, Max Entropy• Use gradient descent to simultaneously learn
g1 and g2, directly minimizing E = E1 + E2 + E3 + E4
• No word independence assumption, use both labeled and unlabeled data
j
jj xwe
xg1,
1
1)(ˆ1
j
jj xwe
xg2,
1
1)(ˆ2
![Page 21: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/21.jpg)
Gradient CoTraining
j
jj xwe
xg1,
1
1)(ˆ1
j
jj xwe
xg2,
1
1)(ˆ2
![Page 22: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/22.jpg)
Classifying Jobs for FlipDog
X1: job titleX2: job description
![Page 23: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/23.jpg)
Gradient CoTraining Classifying FlipDog job descriptions: SysAdmin vs. WebProgrammer
Final Accuracy
Labeled data alone: 86%
CoTraining: 96%
![Page 24: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/24.jpg)
Gradient CoTraining Classifying Upper Case sequences as Person Names
25 labeled
5000 unlabeled
2300 labeled
5000 unlabeled
Using labeled data only
Cotraining
Cotraining without fitting class priors (E4)
.73
.87.76
* sensitive to weights of error terms E3 and E4
.89 *.85 *
*
![Page 25: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/25.jpg)
CoTraining Summary
• Key is getting the right objective function– Class priors is an important term– Can min-cut algorithms accommodate this?
• And minimizing it…– Gradient descent local minima problems– Graph partitioning possible?
![Page 26: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/26.jpg)
The Problem/Opportunity• Must train classifier to be website-independent, but
many sites exhibit website-specific regularities
Question• How can program learn website-specific regularities
for millions of sites, without human labeling data?
![Page 27: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/27.jpg)
Learn Local Regularities for Page Classification
![Page 28: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/28.jpg)
Learn Local Regularities for Page Classification1. Label site using global classifier
![Page 29: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/29.jpg)
Learn Local Regularities for Page Classification1. Label site using global classifier (cont educ page)
![Page 30: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/30.jpg)
Learn Local Regularities for Page Classification1. Label site using global classifier
2. Learn local classifiers
![Page 31: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/31.jpg)
Learn Local Regularities for Page Classification
CEd.html
1. Label site using global classifier
2. Learn local classifiers, CECourse(x) :-
under(x,http://….CEd.html)
linkto(x,http://…music.html)
1 < inDegree (x) < 4
globalConfidence(x) > 0.3 Music.html
![Page 32: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/32.jpg)
Learn Local Regularities for Page Classification
CEd.html
1. Label site using global classifier
2. Learn local classifiers,
3. Apply local classifier, to modify global labels
Music.html
![Page 33: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/33.jpg)
Learn Local Regularities for Page Classification
CEd.html
1. Label site using global classifier
2. Learn local classifier
3. Apply local classifier, to modify global labels
Music.html
![Page 34: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/34.jpg)
Results of Local Learning: Cont.Education Course Page
• Learning global classifier only:– precision .81, recall .80
• Learning global classifier plus site-specific classifiers for 20 local sites:– precision .82, recall .90
![Page 35: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/35.jpg)
Learning Site-Specific Regularities: Example 2
• Extracting “Course-Title” from web pages
![Page 36: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/36.jpg)
![Page 37: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/37.jpg)
Local/Global Learning Algorithm
• Train global course title extractor (word based)
• For each new university site:– Apply global title extractor– For each page containing extracted titles
• Learn page-specific rules for extracting titles, based on page layout structure
• Apply learned rules to refine initial labeling
![Page 38: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/38.jpg)
![Page 39: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/39.jpg)
![Page 40: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/40.jpg)
![Page 41: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/41.jpg)
X
X
![Page 42: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/42.jpg)
Local/Global Learning Summary• Approach:
– Learn global extractor/classifier using content features– Learn local extractor/classifier using layout features– Design restricted hypothesis language for local, to
accommodate sparse training data
• Algorithm to process a new site:– Apply global extractor/classifier to label site– Train local extractor/classifier on this data– Apply local extractor/classifier to refine labels
![Page 43: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/43.jpg)
Other Local Learning Approaches• Rule covering algorithms: each rule a local
model– But require supervised labeled data for each locality
• Shrinkage-based techniques, eg., for learning hospital-independent and hospital-specific models for medical outcomes – Again, requires labeled data for each hospital
• This is different – no labeled data for new sites
![Page 44: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/44.jpg)
When/Why does this work??• Local and global models use independent,
redundantly sufficient features• Local models learned within low-dimension
hypothesis language
• Related to co-training!
![Page 45: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/45.jpg)
Other Uses?
+ Global and website-specific information extractors
+ Global and program-specific TV segment classifiers?
+ Global and environment-specific robot perception?
– Global and speaker-specific speech recognition?
– Global and hospital-specific medical diagnosis?
![Page 46: Text Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/5681682d550346895dddca8e/html5/thumbnails/46.jpg)
Summary
• Cotraining:– Classifier learning as minimization problem– Graph partitioning algorithm possible?
• Learning site-specific structure:– Important structure involves long-distance
relationships– Strong local graph structure regularities are
highly useful