![Page 1: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/1.jpg)
I n t r o d u c t i o n t o S u p e r v i s e dM a c h i n e L e a r n i n g C o n c e p t s
P R E S E N T E D B Y B . B a r l a C a m b a z o g l u F e b r u a r y 2 1 , 2 0 1 4⎪
![Page 2: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/2.jpg)
2
Guest Lecturer’s Background
![Page 3: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/3.jpg)
3
Lecture Outline
Basic concepts in supervised machine learning Use case: Sentiment-focused web crawling
![Page 4: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/4.jpg)
Basic Concepts
![Page 5: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/5.jpg)
5
What is Machine Learning?
Wikipedia: “Machine learning is a branch of artificial intelligence, concerning the construction and study of systems that can learn from data.”
Arthur Samuel: “Field of study that gives computers the ability to learn without being explicitly programmed.”
Tom M. Mitchell: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”
![Page 6: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/6.jpg)
6
Unsupervised versus Supervised Machine Learning
Unsupervised learning› Assumes unlabeled data (the desired output is not known)› Objective is to discover the structure in the data
Supervised learning› Trained on labeled data (the desired output is known)› Objective is to generate an output for previously unseen input data
![Page 7: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/7.jpg)
7
Supervised Machine Learning Applications
Common› Spam filtering› Recommendation and ranking› Fraud detection› Stock price prediction
Not so common› Recognize the user of a mobile device based on how he holds and moves the phone› Predict whether someone is a psychopath based on his twitter usage› Identify whales in the ocean based on audio recordings› Predict in advance whether a product launch will be successful or not
![Page 8: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/8.jpg)
8
Terminology
Instance Label Feature Training set Test set Learning model Accuracy
Toy problem: To predict the income level of a person based on his/her facial attributes.
![Page 9: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/9.jpg)
9
Instances
![Page 10: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/10.jpg)
10
Categorical Labels
![Page 11: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/11.jpg)
11
Numeric Labels
$12K$11K$9K$8K$7K$5K$1K $2K $3K $4K
![Page 12: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/12.jpg)
12
Features
Blonde No White No Male 5cm
Bald No White Yes Male 0cm
White No Black Yes Male 3cm
Dark Yes White No Female 12cm
![Page 13: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/13.jpg)
13
Training Set
Blonde No White No Male 5cm
Bald No White Yes Male 0cm
White No Black Yes Male 3cm
Dark Yes White No Female 12cm
![Page 14: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/14.jpg)
14
Test Set
Dark No White No Female 14cm
Dark No White Yes Male 6cm
Dark No Black No Male 6cm
Dark Yes White No Female 15cm
![Page 15: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/15.jpg)
15
Training and Testing
Model
Training
Testing
Prediction
Test instanceSet of training instances
![Page 16: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/16.jpg)
16
Accuracy
Actual labels
Predicted labels
Accuracy = # of correct predictions / total number of predictions = 2 / 4 = 50%
![Page 17: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/17.jpg)
17
Precision and Recall
In certain cases, there are two class labels and predicting a particular class correctly is more important than predicting the other.
A good example is top-k ranking in web search.
Performance measures:› Recall› Precision
![Page 18: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/18.jpg)
18
Some Practical Issues
Problem: Missing feature values
Solution:› Training: Use the most frequently observed (or
average) feature value in the instance’s class.› Testing: Use the most frequently observed (or
average) feature value in the entire training set.
Problem: Class imbalance
Solution› Oversampling: Duplicate the training
instances in the small class› Undersampling: User fewer instances
from the bigger class
![Page 19: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/19.jpg)
19
Majority Classifier
Training: Find the class with the largest number of instances.
Testing: For every test instance, predict that class as the label, independent of the features of the test instance.
Model
PredictionTesting
Class
Size 13 8 4
![Page 20: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/20.jpg)
20
k-Nearest Neighbor Classifier
Training: None! (known as a lazy classifier).
Testing: Find the k instances that are most similar to the test instance and use majority voting to decide on the label.
k = 3
![Page 21: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/21.jpg)
21
Decision Tree Classifier
Training: Build a tree where leaves represent labels and branches represent features that lead to those labels.
Testing: Traverse the tree using the feature values of the test instance.
Black White
Black Not blackYesNo
![Page 22: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/22.jpg)
22
Naïve Bayes Classifier
Training: For every feature value v and class c pair, we compute and store in a lookup table the conditional probability P(v | c).
Testing: For each class c, we compute:
P( | ) = 0.40
P( | ) = 0.65
P( | ) = 0.78
![Page 23: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/23.jpg)
23
Other Commonly Used Classifiers
Support vector machines Boosted decision trees Neural networks
![Page 24: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/24.jpg)
Use Case:Sent iment-Focused Web Crawl ing
G. Vural, B. B. Cambazoglu, and P. Senkul, “Sentiment-focused web crawling”, CIKM’12, pp. 2020-2024.
![Page 25: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/25.jpg)
25
Problem
Early discovery of the opinionated content in the Web is important.
Use cases› Measuring brand loyalty or product adoption› Politics› Finance
We would like to design a sentiment-focused web crawler that aims to maximize the amount of sentimental/opinionated content fetched from the Web within a given amount of time.
![Page 26: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/26.jpg)
26
Web Crawling
Subspaces› Downloaded pages› Discovered pages› Undiscovered pages
![Page 27: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/27.jpg)
27
Sentiment-Focused Web Crawling
Challenge: to predict the sentimentality of an “unseen” web page, i.e., without having access to the page content.
![Page 28: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/28.jpg)
28
Features
Assumption: Sentimental pages are more likely to be linked by other sentimental pages.
Idea: Build a learning model using features extracted from› Textual content of referring pages› Anchor text on the hyperlinks› URL of the target page
![Page 29: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/29.jpg)
29
Labels
Our data (ClueWeb09-B) lacks ground-truth sentiment scores. We created a ground-truth using the SentiStrength tool.
› Assigns a sentiment score (between 0 and 8) to each web page as its label. A small scale user-study is conducted with three judges to
verify the suitability of this ground-truth.› 500 random pages sampled from the collection.› pages are labeled as sentimental or not sentimental.
Observations› 22% of the pages are labeled as sentimental.› High agreement between judges: the overlap is above 85%.
![Page 30: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/30.jpg)
30
Learner and Performance Metric
As the learner, we use the LibSVM software in the regression mode.
We rebuild the prediction model at regular intervals throughout the crawling process.
As the main performance metric, we compute the total sentimentality score accumulated after fetching a certain number of pages.
![Page 31: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/31.jpg)
31
Evaluated Crawlers
Proposed crawlers› based on the average
sentiment score of referring page content
› based on machine learning
Oracle crawlers› highest sentiment score› highest spam score› highest PageRank
Baseline crawlers › random› indegree-based› breadth first
![Page 32: Introduction to Supervised Machine Learning Concepts](https://reader035.vdocuments.us/reader035/viewer/2022062400/5681688f550346895ddf1585/html5/thumbnails/32.jpg)
32
Performance