understanding feature space in machine learning - data science pop-up seattle
TRANSCRIPT
![Page 1: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/1.jpg)
#datapopupseattle
Understanding Feature Space in Machine Learning
Alice ZhengDirector of Data Science, Dato
RainyData Datoinc
![Page 2: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/2.jpg)
#datapopupseattle
UNSTRUCTUREDData Science POP-UP in Seattle
www.dominodatalab.com
DProduced by Domino Data Lab
Domino’s enterprise data science platform is used by leading analytical organizations to increase productivity, enable collaboration, and publish
models into production faster.
![Page 3: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/3.jpg)
Understanding Feature Space in Machine Learning
Alice Zheng, DatoOctober, 2015
3
![Page 4: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/4.jpg)
‹#›
My journey so far
Applied+machine+learning+
(Data+science)
Build+ML+tools
Shortage+of+experts+
and+good+tools.
![Page 5: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/5.jpg)
‹#›
Why machine learning?
Model data.Make predictions.Build intelligent
applications.
![Page 6: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/6.jpg)
‹#›
The machine learning pipeline
I+fell+in+love+the+instant+I+laid+
my+eyes+on+that+puppy.+His+
big+eyes+and+playful+tail,+his+
soft+furry+paws,+…
Raw+data
FeaturesModels
Predictions
Deploy+in+
production
![Page 7: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/7.jpg)
‹#›
Three things to know about ML• Feature = numeric representation of raw data• Model = mathematical “summary” of features• Making something that works = choose the right model
and features, given data and task
![Page 8: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/8.jpg)
Feature = Numeric representation of raw data
![Page 9: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/9.jpg)
‹#›
Representing natural text
It#is#a#puppy#and#it#is#extremely#cute.
What’s+important?+
Phrases?+Specific+
words?+Ordering?+
Subject,+object,+verb?
Classify:++
puppy+or+not?Raw+Text
{“it”:2,++
+“is”:2,++
+“a”:1,++
+“puppy”:1,++
+“and”:1,+
+“extremely”:1,+
+“cute”:1+}
Bag+of+Words
![Page 10: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/10.jpg)
‹#›
Representing natural text
It#is#a#puppy#and#it#is#extremely#cute.
Classify:++
puppy+or+not?Raw+Text Bag+of+Words
it 2
they 0
I 0
am 0
how 0
puppy 1
and 1
cat 0
aardvark 0
cute 1
extremely 1
… …
Sparse+vector+
representation
![Page 11: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/11.jpg)
‹#›
Representing images
Image+source:+“Recognizing+and+learning+object+categories,”++
Li+Fei[Fei,+Rob+Fergus,+Anthony+Torralba,+ICCV+2005—2009.
Raw+image:++
millions+of+RGB+triplets,+
one+for+each+pixel
Classify:++
person+or+animal?Raw+Image Bag+of+Visual+Words
![Page 12: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/12.jpg)
‹#›
Representing imagesClassify:++
person+or+animal?Raw+Image Deep+learning+features
3.29+
[15+
[5.24+
48.3+
1.36+
47.1+
[1.92
36.5+
2.83+
95.4+
[19+
[89+
5.09+
37.8
Dense+vector+
representation
![Page 13: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/13.jpg)
‹#›
Feature space in machine learning• Raw data ! high dimensional vectors• Collection of data points ! point cloud in feature space• Feature engineering = creating features of the appropriate
granularity for the task
![Page 14: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/14.jpg)
Visualizing Feature Space
![Page 15: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/15.jpg)
Crudely speaking, mathematicians fall into two categories: the algebraists, who find it easiest to reduce all problems to sets of numbers and variables, and the geometers, who understand the world through shapes. -- Masha Gessen, “Perfect Rigor”
![Page 16: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/16.jpg)
‹#›
Visualizing bag-of-words
puppy
cute
1
1
I+have+a+puppy+and+
it+is+extremely+cute
I#have#a#puppy#and#it#is#extremely#cute
it 1
they 0
I 1
am 0
how 0
puppy 1
and 1
cat 0
aardvark 0
zebra 0
cute 1
extremely 1
… …
![Page 17: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/17.jpg)
‹#›
Visualizing bag-of-words
puppy
cute
1
1
1
extremely
I+have+a+puppy+and++
it+is+extremely+cute
I+have+an+extremely++
cute+cat
I+have+a+cute++
puppy
![Page 18: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/18.jpg)
‹#›
Document point cloudword+1
word+2
![Page 19: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/19.jpg)
Model = Mathematical “summary” of features
![Page 20: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/20.jpg)
‹#›
What is a summary?• Data ! point cloud in feature space• Model = a geometric shape that best “fits” the point cloud
![Page 21: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/21.jpg)
‹#›
Classification modelFeature+2
Feature+1
Decide+between+two+classes
![Page 22: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/22.jpg)
‹#›
Clustering modelFeature+2
Feature+1
Group+data+points+tightly
![Page 23: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/23.jpg)
‹#›
Regression modelTarget
Feature
Fit+the+target+values
![Page 24: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/24.jpg)
Visualizing Feature Engineering
![Page 25: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/25.jpg)
‹#›
When does bag-of-words fail?
puppy
cat
2
1
1
have
I+have+a+puppy
I+have+a+cat
I+have+a+kitten
Task:+find+a+surface+that+separates++
documents+about+dogs+vs.+cats
Problem:+the+word+“have”+adds+fluff++
instead+of+information
I+have+a+dog+
and+I+have+a+pen
1
![Page 26: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/26.jpg)
‹#›
Improving on bag-of-words• Idea: “normalize” word counts so that popular words are
discounted• Term frequency (tf) = Number of times a terms appears in a
document• Inverse document frequency of word (idf) =
• N = total number of documents• Tf-idf count = tf x idf
![Page 27: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/27.jpg)
‹#›
From BOW to tf-idf
puppy
cat
2
1
1
have
I+have+a+puppy
I+have+a+cat
I+have+a+kitten
idf(puppy)+=+log+4+
idf(cat)+=+log+4+
idf(have)+=+log+1+=+0
I+have+a+dog+
and+I+have+a+pen
1
![Page 28: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/28.jpg)
‹#›
From BOW to tf-idf
puppy
cat1
have
tfidf(puppy)+=+log+4+
tfidf(cat)+=+log+4+
tfidf(have)+=+0
I+have+a+dog+
and+I+have+a+pen,+
I+have+a+kitten
1
log+4
log+4
I+have+a+cat
I+have+a+puppy
Decision+surface
Tf[idf+flattens+
uninformative+
dimensions+in+the+
BOW+point+cloud
![Page 29: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/29.jpg)
‹#›
That’s not all, folks!• Geometry is the key to understanding feature space and
machine learning• Many other fun topics:- Feature normalization- Feature transformations-Model regularization
• Dato is hiring! [email protected]@RainyData,+@DatoInc
![Page 30: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/30.jpg)
#datapopupseattle
@datapopup #datapopupseattle
![Page 31: Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle](https://reader035.vdocuments.us/reader035/viewer/2022062503/58f9a973760da3da068b6ec1/html5/thumbnails/31.jpg)
#datapopupseattle
Thank You To Our Sponsors