berkeley dataproduct talk
TRANSCRIPT
![Page 1: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/1.jpg)
Data Products Deep Dive
Pete Skomoroch @peteskomoroch 3/31/14 Berkeley CS194-‐16: Intro to Data Science
![Page 2: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/2.jpg)
Some Background
• Physics/Math BS Undergrad • Analyst/ SoGware Engineer @ProfitLogic -‐ 3.5 years • Biodefense Engineer / ML Student @ MIT -‐ 3.5 years • Sr. Research Engineer @ AOL Search -‐ 1 year • Director @ Juice AnalyScs -‐ 1 year • ConsulSng @ Cloudera, Amazon etc -‐ 1 year • Principal Data ScienSst @ LinkedIn -‐ 4 years
![Page 3: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/3.jpg)
Four types of data scienSst (at least)
source: "Analyzing the Analyzers" O'Reilly Media
![Page 4: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/4.jpg)
Data ScienSsts create data products
![Page 5: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/5.jpg)
The data product process
• Verify you are solving the right problem • Theory + model design • Measurement: data collecSon and cleaning • Feature engineering & model development • Error analysis and invesSgaSon • Iterate and improve each step in the process • Leverage derived data to build new products
![Page 6: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/6.jpg)
Data factories & flywheels
Source: h`p://www.linkedin.com/channels/disrupt2013 Steve Jennings/Ge`y
Images Entertainment
![Page 7: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/7.jpg)
Data Product Example: LinkedIn Skills
• Skill ExtracSon and StandardizaSon Pipeline • Skill Pages • Skills SecSon on Member Profiles • Suggested Skills Algorithm and Email • Skill Endorsements
![Page 8: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/8.jpg)
![Page 9: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/9.jpg)
![Page 10: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/10.jpg)
Skill Discovery: Unsupervised Topics from Profile SpecialSes SecSon
10
Extract
![Page 11: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/11.jpg)
Topic Clustering & Phrase Sense DisambiguaSon
11
![Page 12: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/12.jpg)
DeduplicaSon Signals from Mechanical Turk
12
![Page 13: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/13.jpg)
Sample Task for Mechanical Turk Workers
13
![Page 14: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/14.jpg)
Mechanical Turk StandardizaSon
![Page 15: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/15.jpg)
Skill Phrase DeduplicaSon
15
![Page 16: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/16.jpg)
Lead designer and engineer for the implementaSon of a user-‐centric, fully-‐configurable UI for data aggregaSon and reporSng. Developed over 20 SaaS custom applicaSons using Python, Javascript and RoR.
Tagging Skill Phrases • Tagging: Extract potenSal skill phrases from text
• Standardize unambiguous phrase variants
16
JavaScript RoR SaaS Python
ror rubyonrails ruby on rails development ruby rails ruby on rail
Ruby on Rails
Document (ex: Profile)
TokenizaSon
Skills Tagger
Phrases (up to 6 words)
Skills Classifier
Skills (unordered)
Skills (ranked by relevance)
![Page 17: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/17.jpg)
![Page 18: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/18.jpg)
![Page 19: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/19.jpg)
![Page 20: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/20.jpg)
![Page 21: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/21.jpg)
![Page 22: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/22.jpg)
![Page 23: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/23.jpg)
![Page 24: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/24.jpg)
![Page 25: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/25.jpg)
![Page 26: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/26.jpg)
![Page 27: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/27.jpg)
![Page 28: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/28.jpg)
![Page 29: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/29.jpg)
![Page 30: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/30.jpg)
30
![Page 31: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/31.jpg)
Skills Related to “Big Data”
31
![Page 32: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/32.jpg)
Skills Correlated with the Job Title “Data ScienSst”
32
![Page 33: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/33.jpg)
SkillRank: Algorithm for Top People
33
![Page 34: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/34.jpg)
How do we get more people into the skill graphs?
![Page 35: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/35.jpg)
Suggested Skills Inference • How suggested/inferred skills work:
– The skill likelihood is a condiSonal model
– ProbabiliSes are combined using a Naïve Bayes Classifier
• If you are an engineer at Apple, you probably know
about iPhone Development.
35
Profile
Extract a`ributes
- Company ID - Title ID - Groups ID - Industry ID - …
Skills Classifier
Skills (ranked by likelihood)
Feature Vectors
![Page 36: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/36.jpg)
![Page 37: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/37.jpg)
![Page 38: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/38.jpg)
![Page 39: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/39.jpg)
![Page 40: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/40.jpg)
![Page 41: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/41.jpg)
Skill RecommendaSons for Your LinkedIn Profile
41
49% Conversion
4% Conversion
![Page 42: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/42.jpg)
ReputaSon: Build Endorsements Product to Collect More Graph Edges
42
![Page 43: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/43.jpg)
PYMK + Suggested Skills
43
![Page 44: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/44.jpg)
44
Viral Growth: 1 Billion Endorsements in 5 Months
![Page 45: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/45.jpg)
Social Viral Tagging = Lots of Data
Suggested endorsements
Skill recommendaSons Skill markeSng
Virality only
![Page 46: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/46.jpg)
How Did We Gather this Data?
46
1. Desire + Social Proof 2. Viral Loops + Network Effects 3. Data FoundaSon + RecommendaSon
Algorithms
![Page 47: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/47.jpg)
Recap: Data Product EvoluSon
• Skill ExtracSon and StandardizaSon Pipeline • Skill Pages • Skills SecSon on Member Profiles • Suggested Skills Algorithm and Email > 20M members • Skill Endorsements > 60M members, 3B+ Edges • Big product wins in engagement, recall, relevance • SkillRank & ReputaSon integraSon… • Sets stage for next generaSon of products
![Page 48: Berkeley Dataproduct Talk](https://reader030.vdocuments.us/reader030/viewer/2022020517/577c7d551a28abe0549e5e09/html5/thumbnails/48.jpg)
QuesSons?
@peteskomoroch h`p://datawrangling.com h`p://www.linkedin.com/in/peterskomoroch