feature engineering in machine learningchunlial/docs/20160717_feature_handout_1.pdffeature...
TRANSCRIPT
![Page 1: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/1.jpg)
Feature Engineering in Machine Learning
Chun-Liang Li (李俊良) [email protected]
2016/07/17@
Feature Engineering in Machine Learning
![Page 2: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/2.jpg)
About Me
2
Academic Competition
Working
• NTU CSIE BS/MS (2012/2013)• Advisor: Prof. Hsuan-Tien Lin
• CMU MLD PhD (2014-)• Advisor: Prof. Jeff Schneider
Prof. Barnabás Póczos
• KDD Cup 2011 Champions KDD Cup 2013 Champions
• With Prof. Chih-Jen Lin Prof. Hsuan-Tien Lin Prof. Shou-De Lin Many students
(2012 intern) (2015 intern)
![Page 3: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/3.jpg)
What is Machine Learning?• What is Machine Learning?
3
Learning PredictionExisting Data
Machine (Algorithm)
Model
New Data
Model
Prediction
Data: Several length-d vectors
![Page 4: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/4.jpg)
Data? Algorithm?• In academic
• Assume we are given good enough data (in d-dimensional of course )
• Focus on designing better algorithms Sometimes complicated algorithms imply publications
• In practice
• Where is your good enough data?
• Or, how to transform your data into a d-dimensional one?
4
![Page 5: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/5.jpg)
From Zero to One: Create your features by your observations
5
![Page 6: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/6.jpg)
An Apple
6
How to describe this picture?
![Page 7: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/7.jpg)
More Fruits• Method I: Use size of picture
• Method II: Use RGB average
• Many more powerful features developed in computer vision
7
(640, 580) (640, 580)
(219, 156, 140) (243, 194, 113) (216, 156, 155)
![Page 8: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/8.jpg)
Case Study (KDD Cup 2013)• Determine whether a paper is written by a given
author
8
We are given raw text of these
Data: https://www.kaggle.com/c/kdd-cup-2013-author-paper-identification-challenge
![Page 9: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/9.jpg)
NTU Approaches
9
Feature Engineering
Several Algorithms
Combining Different Models
Pipeline Feature Engineering
Observation
Encode into Feature
Result
![Page 10: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/10.jpg)
First observation: Authors Information
• Are these my (Chun-Liang Li) papers? (Easy! check author names) 1. Chun-Liang Li and Hsuan-Tien Lin. Condensed filter tree for cost-sensitive multi-label classification. 2. Yao-Nan Chen and Hsuan-Tien Lin. Feature-aware label space dimension reduction for multi-label classification.
• Encode by name similarities (e.g., how many characters are the same)
• Are Li, Chun-Liang and Chun-Liang Li the same?
• Yes! Eastern and Western order
• How about Li Chun-Liang? (Calculate the similarity of the reverse order)
• Also take co-authors into account
• 29 features in total
10
![Page 11: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/11.jpg)
Second Observation: Affiliations
• Are Dr. Chi-Jen Lu and Prof. Chih-Jen Lin the same?
• Similar name: Chi-Jen Lu v.s. Chih-Jen Lin
• Shared co-author (me!)
• Take affiliations into account!
• Academia Sinica v.s. National Taiwan University
• 13 features in total
11
![Page 12: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/12.jpg)
Last of KDD Cup 2013• Many other features, including
• Can you live for more than 100 years? At least I think I can’t do research after 100 years
• More advanced: social network features
12
SummaryThe 97 features designed by students won the competition
![Page 13: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/13.jpg)
Furthermore• If I can access the content, can I do better?
13
Author: Robert Galbraith
Who is Robert Galbraith?
“I thought it was by a very mature writer, and not a first-timer.” — Peter James
Definitely
![Page 14: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/14.jpg)
Writing Style?• “I was testing things like word length, sentence
length, paragraph length, frequency of particular words and the pattern of punctuation” — Peter Millican (University of Oxford)
14
1 23 4
5
![Page 15: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/15.jpg)
15
Game Changing Point: Deep Learning
![Page 16: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/16.jpg)
Common Type of Data• Image
• Text
16
![Page 17: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/17.jpg)
Representation Learning• Deep Learning as learning hidden representations
• An active research topic in academia and industry
17
Use last layer to extract features (Krizhevsky et al., 2012)
(Check Prof. Lee’s talk and go to deep learning session later )
Raw data
![Page 18: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/18.jpg)
Use Pre-trained Network• Yon don’t need to train a network by yourself
• Use existing pre-trained network to extract features
• AlexNet • VGG • Word2Vector
18
ResultSimply using deep learning features achieves state-of-the-art
performance in many applications
![Page 19: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/19.jpg)
Successful Example• The PASCAL Visual Object Classes Challenge
19
Mea
n Av
erag
e Pr
ecis
ion
0
0.15
0.3
0.45
0.6
2005 2007 2008 2009 2010 2012 2013 2014
Deep learning result(Girshick et al. 2014)
HoG feature Slow progress on feature engineering and algorithms before deep learning
![Page 20: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/20.jpg)
20
Curse of Dimensionality:Feature Selection and Dimension Reduction
![Page 21: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/21.jpg)
The more, the better?
21
Practice
If we have 1,000,000 data with 100,000 dimensions, how much memory do we need? Ans:
TheoryWithout any assumption, you need data to achieve error for d- dimensional data
106 ⇥ 105 ⇥ 8= 8⇥ 1011 (B)= 800 (GB)
✏O(
1
✏d)
Noisy Feature
Is every feature useful? Redundancy?
![Page 22: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/22.jpg)
Feature Selection• Select import features
• Reduce dimensions
• Explainable Results
22
Commonly Used Tools• LASSO (Sparse Constraint)• Random Forests • Many others
![Page 23: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/23.jpg)
KDD Cup Again• In KDD Cup 2013, we actually generated more
than 200 features (some secrets you won’t see in the paper )
• Use random forests to select only 97 features, since many features are unimportant and even harmful, but why?
23
![Page 24: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/24.jpg)
Non-useful Features• Duplicated features
• Example I: Country (Taiwan) v.s. Coordinates (121, 23.5)
• Example II: Date of birth (1990) v.s. Age (26)
• Noisy features
• Noisy information (something wrong in your data)
• Missing values (something missing in your data)
• What if we still have too many features?
24
![Page 25: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/25.jpg)
Dimension Reduction• Let’s visualize the data (a perfect example)
• Non-perfect example in practice
25
Commonly Used Tools• Principal Component Analysis (PCA)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1 1.2 1.4−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1 1.2 1.4−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
Trade-off between information and space
One dimension is enough
![Page 26: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/26.jpg)
PCA — Intuition • Let’s apply PCA on these faces
(raw pixels) and visualize the coordinates
26
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
http://comp435p.tk/
![Page 27: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/27.jpg)
PCA — Intuition (cont.)• We can use very few base faces to approximate
(describe) the original faces
27
http://comp435p.tk/(Sirovich and Kirby, Low-dimensional procedure for the characterization of human faces)
1 2 3 4 5 6 7 8 9
![Page 28: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/28.jpg)
PCA — Case Study• CIFAR-10 image classification
with raw pixels as features and using approximated kernel SVM
28
Dimensions Accuracy Time
3072 (all) 63.1% ~2 Hrs100 (PCA) 59.8% 250 s
(Li and Pòczos, Utilize Old Coordinates: Faster Doubly Stochastic Gradients for Kernel Methods, UAI 2016)
Trade-off between information, space and time
![Page 29: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/29.jpg)
PCA in Practice• Practical concern:
• Time complexity:
• Space complexity:
• Remark: Use fast approximation for large-scale problem (e.g., >100k dimensions)
1. PCA with random projection (implemented in scikit-learn) (Halko et al., Finding Structure with Randomness, 2011)
2. Stochastic algorithms (easy to implement from scratch)(Li et al., Rivalry of Two Families of Algorithms for Memory-Restricted Streaming PCA, AISTATS 2016)
29
O(Nd2)
O(d2)
Small ProblemPCA takes <10 seconds for CIFAR-10 dataset (d=3072) by using 12 cores (E5-2620)
![Page 30: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/30.jpg)
Conclusion• Observe the data and encode them into meaningful features
• Deep learning is a powerful tool to use
• Reduce number of features if necessary
• Reduce non-useful features
• Computational concern
30
Existing Data Machine (Algorithm)
Features (Simple) AlgorithmExisting Data
Beginning:
Now:
![Page 31: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/31.jpg)
31
Thanks! Any Question?
![Page 32: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/32.jpg)
References1. Richard Szeliski. Computer Vision: Algorithms and Applications, 2010. 2. Senjuti Basu Roy, Martine De Cock, Vani Mandava, Swapna Savanna, Brian Dalessandro, Claudia
Perlich, William Cukierski, and Ben Hamner. The Microsoft academic search dataset and KDD cup 2013. In KDD Cup 2013 Workshop, 2013.
3. Chun-Liang Li, Yu-Chuan Su, Ting-Wei Lin, Cheng-Hao Tsai, Wei-Cheng Chang, Kuan-Hao Huang, Tzu-Ming Kuo, Shan-Wei Lin, Young-San Lin, Yu-Chen Lu, Chun-Pai Yang, Cheng-Xia Chang, Wei-Sheng Chin, Yu-Chin Juan, Hsiao-Yu Tung, Jui-Pin Wang, Cheng-Kuang Wei, Felix Wu, Tu-Chun Yin, Tong Yu, Yong Zhuang, Shou-De Lin, Hsuan-Tien Lin, and Chih-Jen Lin. Combination of feature engineering and ranking models for paper-author identification in KDD Cup 2013. In JMLR, 2015.
4. How JK Rowling was unmasked. http://www.bbc.com/news/entertainment-arts-23313074 5. Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new
perspectives. In IEEE PAMI, 2015. 6. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep
convolutional neural networks. In NIPS, 2012. 7. Karen Simonyan and Andrew Zisserman. Very Deep Convolutional Networks for Large-Scale Image
Recognition. In ICLR, 2015. 8. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word
Representations in Vector Space. Technical Report, 2013.
32
![Page 33: Feature Engineering in Machine Learningchunlial/docs/20160717_feature_handout_1.pdfFeature Engineering in Machine Learning ... • With Prof. Chih-Jen Lin ... Hsiao-Yu Tung, Jui-Pin](https://reader031.vdocuments.us/reader031/viewer/2022022506/5ac22fbe7f8b9ae45b8e3f90/html5/thumbnails/33.jpg)
9. Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.
10. Matthew A. Turk, and Alex Peatland. Face Recognition Using Eigenfaces. In CVPR, 1991. 11. Chun-Liang Li, and Barnabás Póczos. Utilize Old Coordinates: Faster Doubly Stochastic
Gradients for Kernel Methods. In UAI, 2016. 12. Nathan Halko, Per-Gunnar Martinsson, Joel A. Tropp. Finding structure with randomness:
Probabilistic algorithms for constructing approximate matrix decompositions. In SIAM Rev., 2011.
13. Chun-Liang Li, Hsuan-Tien Lin and, Chi-Jen Lu. Rivalry of Two Families of Algorithms for Memory-Restricted Streaming PCA. In AISTATS, 2016.
33