berkeley dataproduct talk

Post on 09-Jul-2016

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data  Products  Deep  Dive  

Pete  Skomoroch  @peteskomoroch  3/31/14  Berkeley  CS194-­‐16:  Intro  to  Data  Science  

Some  Background  

•  Physics/Math  BS  Undergrad  •  Analyst/  SoGware  Engineer  @ProfitLogic  -­‐  3.5  years  •  Biodefense  Engineer  /  ML  Student  @  MIT  -­‐  3.5  years  •  Sr.  Research  Engineer  @  AOL  Search  -­‐  1  year  •  Director  @  Juice  AnalyScs  -­‐  1  year  •  ConsulSng  @  Cloudera,  Amazon  etc  -­‐  1  year  •  Principal  Data  ScienSst  @  LinkedIn  -­‐  4  years  

Four  types  of  data  scienSst  (at  least)  

source:  "Analyzing  the  Analyzers"  O'Reilly  Media  

Data  ScienSsts  create  data  products  

The  data  product  process  

•  Verify  you  are  solving  the  right  problem    •  Theory  +  model  design  •  Measurement:  data  collecSon  and  cleaning  •  Feature  engineering  &  model  development  •  Error  analysis  and  invesSgaSon  •  Iterate  and  improve  each  step  in  the  process  •  Leverage  derived  data  to  build  new  products  

Data  factories  &  flywheels  

Source:  h`p://www.linkedin.com/channels/disrupt2013  Steve  Jennings/Ge`y  

Images  Entertainment  

Data  Product  Example:  LinkedIn  Skills  

•  Skill  ExtracSon  and  StandardizaSon  Pipeline  •  Skill  Pages  •  Skills  SecSon  on  Member  Profiles  •  Suggested  Skills  Algorithm  and  Email  •  Skill  Endorsements  

Skill  Discovery:  Unsupervised  Topics  from  Profile  SpecialSes  SecSon  

10  

Extract

Topic  Clustering  &  Phrase  Sense  DisambiguaSon  

11  

DeduplicaSon  Signals  from  Mechanical  Turk  

12  

Sample  Task  for  Mechanical  Turk  Workers  

13  

Mechanical  Turk  StandardizaSon  

Skill  Phrase  DeduplicaSon  

15  

Lead  designer  and  engineer  for  the  implementaSon  of  a  user-­‐centric,  fully-­‐configurable  UI  for  data  aggregaSon  and  reporSng.  Developed  over  20  SaaS  custom  applicaSons  using  Python,  Javascript  and  RoR.  

Tagging  Skill  Phrases  •  Tagging:  Extract  potenSal  skill  phrases  from  text  

 

 

•  Standardize  unambiguous  phrase  variants  

16  

JavaScript RoR SaaS Python

ror rubyonrails ruby on rails development ruby rails ruby on rail

Ruby on Rails

Document  (ex:  Profile)  

TokenizaSon  

Skills  Tagger  

Phrases (up to 6 words)

Skills  Classifier  

Skills (unordered)

Skills (ranked by relevance)

30  

Skills  Related  to  “Big  Data”  

31  

Skills  Correlated  with  the  Job  Title  “Data  ScienSst”  

32  

SkillRank:  Algorithm  for  Top  People  

33  

How  do  we  get  more  people  into  the  skill  graphs?  

Suggested  Skills  Inference  •  How  suggested/inferred  skills  work:  

–  The  skill  likelihood  is  a  condiSonal  model  

–  ProbabiliSes  are  combined  using  a  Naïve  Bayes  Classifier      

 •  If  you  are  an  engineer  at  Apple,  you  probably  know  

about  iPhone  Development.  

   

35  

Profile  

Extract  a`ributes  

- Company ID - Title ID - Groups ID - Industry ID - …

Skills  Classifier  

Skills (ranked by likelihood)

Feature Vectors

Skill  RecommendaSons  for  Your  LinkedIn  Profile  

41  

49%  Conversion  

4%  Conversion  

ReputaSon:  Build  Endorsements  Product  to  Collect  More  Graph  Edges  

42  

PYMK  +  Suggested  Skills  

43  

44  

Viral Growth: 1 Billion Endorsements in 5 Months

Social  Viral  Tagging  =  Lots  of  Data  

Suggested  endorsements  

Skill  recommendaSons  Skill  markeSng  

Virality  only  

How  Did  We  Gather  this  Data?    

46  

1.  Desire  +  Social  Proof  2.  Viral  Loops  +  Network  Effects  3.  Data  FoundaSon  +  RecommendaSon  

Algorithms    

Recap:  Data  Product  EvoluSon  

•  Skill  ExtracSon  and  StandardizaSon  Pipeline  •  Skill  Pages  •  Skills  SecSon  on  Member  Profiles  •  Suggested  Skills  Algorithm  and  Email    >  20M  members  •  Skill  Endorsements    >  60M  members,  3B+  Edges  •  Big  product  wins  in  engagement,  recall,  relevance  •  SkillRank  &  ReputaSon  integraSon…  •  Sets  stage  for  next  generaSon  of  products  

QuesSons?  

@peteskomoroch    h`p://datawrangling.com  h`p://www.linkedin.com/in/peterskomoroch  

 

top related