slides ppt
TRANSCRIPT
![Page 1: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/1.jpg)
Learning from labelled and unlabeled data
Semi-Supervised Learning
Filipe Tiago Alves de Magalhães
Machine Learning – PDEEC 2008/2009
12-04-23
![Page 2: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/2.jpg)
Semi-Supervised Learning
Supervised Learning
Unsupervised Learning
Semi-SupervisedLearning
discover patterns in the datathat relate data attributeswith a target (class) attribute.
These patterns are thenutilized to predict thevalues of the targetattribute in futuredata instances.
The data have notarget attribute (unlabeled).
We want to explore thedata to find some intrinsicstructures in them.
Labbeled + unlabeled data
Typically, plenty ofunlabeled data available.
Tries to improve the predictive power using both labelled and unlabeled data. (Expected to be better than using one alone) 2
![Page 3: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/3.jpg)
3
Semi-Supervised Learning
Unlabeled data is easy to obtain
Labelled data can be difficult to obtain- human annotation is boring- may require experts- may require special equipment- very time-consuming
Examples:- Web page classification (billions of pages)- Email classification (SPAM or No-SPAM)- Speech annotation (400h for each hour of conversation)- …
![Page 4: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/4.jpg)
4
Semi-Supervised Learning
Although we (or specialists) do not need to spend such a big effort labelling data,a great concern must be faced for the design of good models, feature extraction,kernels definition.
Semi-Supervised learning can be seen as an excellent way to improve the resultsthat we would get using exclusively supervised or non-supervised methods, for thesame scenario.
![Page 5: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/5.jpg)
5
Semi-Supervised LearningSometimes, it may not be so hard to label data…
www.espgame.org
Takes advantage of player’s intervention in order toenrich the training of automatic learning algorithms
Tries to guess the user’s genderbased on his/her choices.
After that, we tell if it wasright or wrong
![Page 6: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/6.jpg)
6
Semi-Supervised Self-Training of Object Detection Models
Chuck RosenbergGoogle, Inc.
Martial HebertCarnegie Mellon University
Henry SchneidermanCarnegie Mellon University
7th IEEE Workshops on Application of Computer Vision (WACV/MOTION'05)2005
![Page 7: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/7.jpg)
7
Semi-Supervised LearningSelf-Training
AlgorithmRepeat • Train a classifier C with training data L• Classify data in U with C• Find a subset U’ of U with the most confident scores• L + U’ L • U – U’ U
L = (Xi , Yi )
U = (Xi , ? )
Set of labelled data
Set of unlabeled data
![Page 8: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/8.jpg)
8
Semi-Supervised Self-Training of Object Detection Models
Object detection based on its shape- time-consuming- exhaustive labelling (background, foreground, object, non-object)
Try to simplify the collection and preparation of training data
- combining data labelled in different ways
- labelling of each image region can take the form of a probability distribution over labels (“weakly” labelled)
- e.g., is more likely that the object is present in the centre of the image
- e.g., a certain image has a high likelihood of containing the object, but its position is unknown.
Object detection
![Page 9: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/9.jpg)
9
Semi-Supervised Self-Training of Object Detection ModelsTraining Approaches
Generic detection algorithm for classification of a subwindow in an image as being part ofthe “object” class or the “clutter/everything else” class
X – image feature vectorsxi – data at a specific location in the image (i = {1, … ,n} indexes images locations)Y – classf – foregroundb – background θf – parameters of the foreground modelθb – parameters of the background model
If
![Page 10: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/10.jpg)
10
Semi-Supervised Self-Training of Object Detection ModelsTraining Approaches
EM approach
![Page 11: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/11.jpg)
11
Semi-Supervised Self-Training of Object Detection ModelsTraining Approaches
EM approach
There are many reasons why EM may not perform well in a particular semi-supervised training context.
- EM solely finds a set of model parameters which maximize the likelihood of the data.
- Fully labeled data may not sufficiently constrain the solution, which means that there may be solutions which maximize the data likelihood but do not optimize classification performance.
![Page 12: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/12.jpg)
12
Semi-Supervised Self-Training of Object Detection ModelsTraining Approaches
Alternative
![Page 13: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/13.jpg)
13
Semi-Supervised Self-Training of Object Detection ModelsDetector Overview (Experimental Setup)
1. Subwindow is processed for lighting correction
2. Two-level wavelet transform is applied
3. Features are computed by vector quantizing groups of wavelet coefficients4. Subwindow is classified by thresholding a linear combination of the log-likelihood
ratios of the features
Cascade architecture → only image patches which are accepted by the first detector are passed on to the next
![Page 14: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/14.jpg)
14
Semi-Supervised Self-Training of Object Detection ModelsData (Experimental Setup)
Set with positive examples – 231 images480 training examplesIndependent test set – 44 images102 test examples15000 negative examplesTraining examples – 24 x 16 pixels (rotated, scaled and cropped)
sample training images and the training examples associated with them
Landmark used on a typical training image
200-300 pixels high and 300-400 pixels wide
![Page 15: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/15.jpg)
15
Semi-Supervised Self-Training of Object Detection ModelsTraining (Experimental Setup)
Training the model with fully labeled data consists of the following steps:
1. Given the training data landmark locations• geometrically normalize the training example subimages;• apply lighting normalization to the subimages;• generate synthetic training examples (scaling, shifting and rotating)
2. Compute the wavelet transform of the subimages3. Quantize each group of wavelet coefficients and build a naïve Bayes model with
respect to each group to discriminate between positive and negative examples4. Adjust the naïve Bayes model using boosting, but maintaining a linear decision
function, effectively performing gradient descent on the margin
5. Compute a ROC curve for the detector using a cross validation set6. Choose a threshold for the linear function, based on the final performance
desired
![Page 16: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/16.jpg)
16
Semi-Supervised Self-Training of Object Detection ModelsSelection Metrics (Experimental Setup)
Selection metric is crucial to the performance of the training
1. Confidence selection• Computed at every iteration by applying the detector trained from the
current set of labelled data to the weakly labelled data set.• Detection with highest confidence is selected and added to the training
set
2. MSE selection• Is calculated for each weakly labelled example by evaluating the
distance between the corresponding image window and all of the other templates in the training data (including the original labelled examples and the weakly labelled examples added in prior iterations)
![Page 17: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/17.jpg)
17
Semi-Supervised Self-Training of Object Detection ModelsSelection Metrics (Experimental Setup)
The candidate image and the labeled images are first normalized with a specific set of processing steps before the MSE based score metric is computed.
The score is based on the Mahalanobis distance
![Page 18: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/18.jpg)
18
Semi-Supervised Self-Training of Object Detection ModelsSelection Metrics (Experimental Setup)
The detector must be accurate in localization but need not be accurate in detection since false detection will be discarded due to their large MSE distances to all of the training examples.
This is crucial to ensure the performance of the training algorithm with small initial training sets.
This is also part of the reason for the MSE to outperform the confidencemetric, which requires the detector to be accurate in both localization and detection performance.
MSE selection metricDetector
position
scale
![Page 19: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/19.jpg)
19
Semi-Supervised Self-Training of Object Detection ModelsExperiment Scenarios (Experiments and Analysis)
Each experiment was repeated using a different initial random subset, in order to avoid the variance that was being observed in the detector performance and in the behaviour of the semi-supervised training process.
Experiment = specific set of experimental conditions
Run = each repetition of that experiment
Mostly, 5 runs were performed for each experiment
Typically, 20 weakly labelled images were added to the training set at each iteration, because of the substantial training time of the detector.Ideally, only a single image would be added at each iteration.
![Page 20: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/20.jpg)
20
Semi-Supervised Self-Training of Object Detection ModelsEvaluation Metrics (Experiments and Analysis)Each run was evaluated by using the area under the ROC curve (AUC).
Because different experimental conditions affect performance, the AUCs were normalized relatively to the full data performance of that run.
if (performance level = = 1.0) {
the model being evaluated has the same performanceas it would if all of the labelled data was utilised
}if (performance level < 1.0){
the model has a lower performancethan that achieved with the full data set
}
To compute the full data performance, each specific run is trained with the full data set and its performance is recorded.
![Page 21: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/21.jpg)
21
Semi-Supervised Self-Training of Object Detection ModelsBaseline training configurations (Experiments and Analysis)
Smooth regime was chosen in order to perform experiments under conditionswhere the addition of weakly labelled data would make a difference.
![Page 22: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/22.jpg)
22
Semi-Supervised Self-Training of Object Detection ModelsSelection Metrics (Experiments and Analysis)
Does the choice of the selection metric make a substantial difference in the performance of the semi-supervised training?
MSE metricConfidence metric
![Page 23: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/23.jpg)
23
Semi-Supervised Self-Training of Object Detection ModelsSelection Metrics (Experiments and Analysis)
Does the choice of the selection metric make a substantial difference in the performance of the semi-supervised training?
![Page 24: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/24.jpg)
24
Semi-Supervised Self-Training of Object Detection ModelsRelative size of fully Labelled Data(Experiments and Analysis)
How many weakly labelled examples do we need to add to the training set in order to reach the best detector performance?
![Page 25: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/25.jpg)
25
Semi-Supervised Self-Training of Object Detection ModelsConclusions/Discussion
1. The results showed that it was possible to achieve detection performance that was close to the base performance obtained with the fully labelled data, even when a small fraction of the training data was used in the initial training set.
2. The experiments showed that the self-training approach to semi-supervised training can be applied to an existing detector that was originally designed for supervised training.
3. The MSE selection metric consistently outperformed the confidence metric. More generally, the self-training approach using an independently-defined selection metric outperforms both the confidence metrics and the batch EM approaches.
During the training process, the distribution of the labeled data at any particular iteration may not match the actual underlying distribution of the data.
![Page 26: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/26.jpg)
26
Semi-Supervised Self-Training of Object Detection ModelsConclusions/Discussion
Original unlabeled data and labelled dataTrue labels for the unlabeled data
(c),(d) The points labelled by the incremental self-training algorithm after 5 iterations using the confidence metric and the Euclidean metric, respectively.
![Page 27: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/27.jpg)
27
Semi-Supervised Self-Training of Object Detection ModelsFuture Work
Study the relation between the semi-supervised training approach evaluated here with the co-training approaches.
Develop more precise guidelines for selecting the initial training set.
The approach could be extended to training examples that are labelled in differentways. For example, some images may be provided with scale information and nothing else. Additional information may be provided such as the rough shape of the object, or a prior distribution over its location in the image.
![Page 28: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/28.jpg)
28
Still Awake???
ZZZZZZZZZZZZZZ…..
![Page 29: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/29.jpg)
29
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
Andrew B. GoldbergComputer Sciences DepartmentUniversity of Wisconsin-Madison
Xiaojin ZhuComputer Sciences DepartmentUniversity of Wisconsin-Madison
TextGraphs: HLT/NAACL Workshop on Graph-based Algorithms for Natural Language Processing
2006
![Page 30: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/30.jpg)
30
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
?
?
?
Sentiment Categorization
![Page 31: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/31.jpg)
31
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
Sentiment Categorization
![Page 32: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/32.jpg)
32
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
What we saw is rating inferenceBo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentimentcategorization with respect to rating scales. In Proceedings of the ACL.
In this work…• Graph-based Semi-supervised Learning• Main assumption encoded in graph:• Similar documents should have similar ratings
![Page 33: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/33.jpg)
33
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
![Page 34: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/34.jpg)
34
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
![Page 35: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/35.jpg)
35
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
![Page 36: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/36.jpg)
36
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
![Page 37: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/37.jpg)
37
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
50% accuracy
![Page 38: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/38.jpg)
38
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
![Page 39: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/39.jpg)
39
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
100% accuracy
![Page 40: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/40.jpg)
40
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
Goal
![Page 41: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/41.jpg)
41
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
Approach
![Page 42: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/42.jpg)
42
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
Measuring Loss over the Graph
![Page 43: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/43.jpg)
43
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
![Page 44: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/44.jpg)
44
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
![Page 45: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/45.jpg)
45
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
![Page 46: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/46.jpg)
46
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
![Page 47: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/47.jpg)
47
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
![Page 48: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/48.jpg)
48
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
![Page 49: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/49.jpg)
49
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
Minimization nowis non- trivial
![Page 50: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/50.jpg)
50
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
Finding a Closed-Form Solution
Fortunately, w
e
![Page 51: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/51.jpg)
51
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
Finding a Closed-Form Solution
Vector of f values forall reviews
Vector of given labels yi for labelled reviews and predicted labels for
unlabeled reviews
C =
Labelled Unlabeled
![Page 52: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/52.jpg)
52
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
Finding a Closed-Form Solution
Constant parameter
Graph Laplacian matrix
![Page 53: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/53.jpg)
53
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
Graph Laplacian MatrixAssume n labelled and unlabeled documents
![Page 54: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/54.jpg)
54
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
Finding a Closed-Form Solution
![Page 55: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/55.jpg)
55
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
ExperimentsPredict 1 to 4 stars ratings for reviews• 4-author data (Pang and Lee, 2005)
• 1770, 902, 1307 and 1027 documents, respectively
• • Each document represented as a {0,1} word-presence vector, normalized to sum 1• Positive-Sentence Percentage (PSP) similarity (Pang and Lee, 2005)
• Tuned parameters with cross-validation
* Joachims, T., Transductive Inference for Text Classification using Support Vector Machines, in Proceedings of the Sixteenth International Conference on Machine Learning. 1999, Morgan Kaufmann Publishers Inc.
*
![Page 56: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/56.jpg)
56
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
Experiments
PSPi is defined as the percentage of positive sentences in review xi.
The similarity between reviews xi, xj is the cosine angle between the vectors(PSPi,1-PSPi) and (PSPj, 1-PSPj)
Positive sentences are identified using a binary classifier trained on a “snippetdata set” (10662 documents)
![Page 57: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/57.jpg)
57
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
Experiments
Low ratings tend to get low PSP scoresHigh ratings tend to get high PSP scores
The trend was qualitatively the same as in Pang and Lee (2005) (Naïve Bayes)
![Page 58: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/58.jpg)
58
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
Experiments
c = k/LSize of labelled set
Number of labelled neighbours
Optimal Values (through cross-validation)
c = 0.2 α = 1.5
α = ak + bk’Number of unlabeled
neighbours
![Page 59: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/59.jpg)
59
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
Results
Graph-based SSLoutperforms other
methodsfor small labelled set sizes
![Page 60: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/60.jpg)
60
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
Alternative Similarity Measure
The cosine between word vectors containing all words,each weighted by its mutual information
Scaling of mutual information values (maximum = 1)
Previously found values → weights for corresponding words in the word vectors
Words in the movie review data that did not appear in the “snippet data set” were excluded
Optimal Values (through cross-validation)
c = 0.1 α = 1.5
![Page 61: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/61.jpg)
61
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
Results
20 trial average unlabeled set accuracy for each author across different labelled set sizes and methods
In each row, in green is the best result and any results that could not be distinguished from it with a paired t-test at the 0.05 level.
![Page 62: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/62.jpg)
62
Seeing stars when there aren’t many stars:Graph-based semi-supervised learning for sentiment categorization
Conclusions and Future Work
Graph-based semi-supervised learning based on PSP similarity achieved better performance than all other methods in all four author corpora.
However, for larger labelled sets its performance was not so good.a) Maybe, because SVM regressor trained on a large labelled set can achieve fairly high __accuracy without considering relationships between examples.b) PSP similarity is not accurate enough, thus biasing the overall performance when labelled __data is abundant.
Investigate better document representations and similarity measures.
Extend the method to inductive learning setting
Experiment cross-reviewer and cross-domain analysis, such as using a model learned on movie reviews to help classify product reviews.
![Page 63: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/63.jpg)
63
Human Semi-Supervised Learning
Q: Do humans also use semi-supervised learning?
A: Apparently, yes!
![Page 64: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/64.jpg)
64
Human Semi-Supervised LearningSome evidences…
Face recognition is a very challenging computational task.
However, it is an easy task for humans.
Differences between two views of the same face are much larger than those between two different faces viewed at the same angle. +
Hint: Temporal association
+ Sinha, P., et al., Face recognition by humans: 20 results all computer vision researchers should know about. 2006, MIT.
![Page 65: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/65.jpg)
65
Human Semi-Supervised LearningSome evidences…
Observers were shown sequences of novel faces in which the identity of the face changed as the head rotated.
As a result, observers showed a tendency totreat the views as if they were of the same person.
We are continuously associating views of objects to support later recognition, and that we do so not only on the basis of the physical similarity, but also the correlated appearance in time of the objects.
suggests
image sequence Unlabeled data
Wallis, G. and H. Bülthoff, Effects of temporal association on recognition memory, in National Academy of Sciences. 2001. p. 4800-4804.
![Page 66: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/66.jpg)
66
Human Semi-Supervised LearningSome evidences…
17-month infants listen to a word, see an object
They wanted to measure their ability to associate the word and the object
If the word was heard many times before (without seeing the object;unlabeled data), association was stronger.
If the word was not heard before, association was weaker.
Image taken from www.dalla.is
Graf, E., et al., Can Infants Map Meaning to Newly Segmented Words?: Statistical Segmentation and Word Learning. Psychological Science, 2007. 18(3): p. 254-260.
![Page 67: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/67.jpg)
67
Better understanding of the human cognitive model,can guide the development of better machine learningalgorithms or make existent even better and robust…
Human Semi-Supervised Learning
![Page 68: Slides ppt](https://reader035.vdocuments.us/reader035/viewer/2022081422/5563ae3fd8b42ac70d8b4924/html5/thumbnails/68.jpg)
68
References• Rosenberg, C., M. Hebert, and H. Schneiderman, Semi-Supervised Self-Training of Object Detection Models, in Proceedings of the Seventh IEEE Workshops on Application of Computer Vision (WACV/MOTION'05) - Volume 1 - Volume 01. 2005, IEEE Computer Society.
• Goldberg, A.B. and X. Zhu. Seeing stars when there aren't many stars: Graph-based semi-supervised learning for sentiment categorization. in TextGraphs: HLT/NAACL Workshop on Graph-based Algorithms for Natural Language Processing. 2006.
• Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentimentcategorization with respect to rating scales. In Proceedings of the ACL.
• Joachims, T., Transductive Inference for Text Classification using Support Vector Machines, in Proceedings of the Sixteenth International Conference on Machine Learning. 1999, Morgan Kaufmann Publishers Inc.
• Sinha, P., et al., Face recognition by humans: 20 results all computer vision researchers should know about. 2006, MIT.
• Wallis, G. and H. Bülthoff, Effects of temporal association on recognition memory, in National Academy of Sciences. 2001. p. 4800-4804.
• Graf, E., et al., Can Infants Map Meaning to Newly Segmented Words?: Statistical Segmentation and Word Learning. Psychological Science, 2007. 18(3): p. 254-260.