the other 99% of a data science project

49
THE OTHER 99% OF A DATA SCIENCE PROJECT Open Data Science Conference Santa Clara | November 4-6th 2016 Eugene Mandel @eugmandel

Upload: eugene-mandel

Post on 16-Apr-2017

1.044 views

Category:

Software


4 download

TRANSCRIPT

Page 1: The Other 99% of a Data Science Project

THE OTHER 99% OF A DATA SCIENCE PROJECT

Open Data Science ConferenceSanta Clara | November 4-6th

2016Eugene Mandel

@eugmandel

Page 2: The Other 99% of a Data Science Project

∎ @eugmandel∎ lead of data science at

directly∎ formerly:

□data science team at Jawbone

□co-founder qualaroo, jaxtr

ABOUT ME

Page 3: The Other 99% of a Data Science Project

DATA SCIENCE NEEDS PRODUCT MANAGEMENTsuccess of a data science project has as much to do with product management as with data science

Page 4: The Other 99% of a Data Science Project

2 KINDS OF DATA SCIENCE B

ANALYZE

A

BUILD

Page 5: The Other 99% of a Data Science Project

PAYFORPARKINGWITHYOURPHONE

Page 6: The Other 99% of a Data Science Project

DON’TYOUKNOWME?!

Page 7: The Other 99% of a Data Science Project

∎ “don’t you know me?!” -> “you get me!”

∎ get smarter with every interaction

∎ reduce search space

SMART PRODUCTS

Page 8: The Other 99% of a Data Science Project

SMART PRODUCTS

BUT NOT THAT SMART...

Page 9: The Other 99% of a Data Science Project

SMARTPRODUCTSGOPROBABILISTIC

Page 10: The Other 99% of a Data Science Project

THE OTHER 99% PERCENT

algorithms

Page 11: The Other 99% of a Data Science Project

Show and explain your web, app or software projects using these gadget templates.

PARKING APP

ON DEMAND CUSTOMER SUPPORT

Page 12: The Other 99% of a Data Science Project

LOOKING FOROPPORTUNITIES

Page 13: The Other 99% of a Data Science Project

PROBLEM: choose support tickets that expert users can resolve

Page 14: The Other 99% of a Data Science Project

LOOKING FOR OPPORTUNITIES

Page 15: The Other 99% of a Data Science Project

CHOOSERESOLVABLETICKETSWITHMACHINELEARNING

Page 16: The Other 99% of a Data Science Project

GETTING THE DATA

Page 17: The Other 99% of a Data Science Project

GETTING ALLIES

Page 18: The Other 99% of a Data Science Project

GETTING THE DATA

Page 19: The Other 99% of a Data Science Project

CLEAN YOUR DATAAutomated bug reportsSurveysBounced emailsInternal ticketsEmail metadataEmail threads...

Page 20: The Other 99% of a Data Science Project

GUYS CLEAN A DATASET, GET RICH

Page 21: The Other 99% of a Data Science Project

FEATURE ENGINEERING

Page 22: The Other 99% of a Data Science Project

TRAINING - COLD START PROBLEMall tickets

tickets seen by expert

Page 23: The Other 99% of a Data Science Project

TRAINING -GET LABELS

“Is there a cat in this picture?” “Is this support ticket resolvable?”

Page 24: The Other 99% of a Data Science Project

TRAINING -GET LABELS

∎ label manually∎ derive labels from user

behavior∎ derive labels from external

sources∎ mix

Page 25: The Other 99% of a Data Science Project

My favorite data science algorithm is division.

Monica RogatiFormer VP of Data, Jawbone & LinkedIn data scientist

Page 26: The Other 99% of a Data Science Project

TokenizationBag of words (BOW)Tf–idfRandom Forest Classifier

MODEL

Page 27: The Other 99% of a Data Science Project

DEVELOPMENT

Page 28: The Other 99% of a Data Science Project

PLAYING WELL WITH ENGINEERING

∎ gaining trust∎ development process

Page 29: The Other 99% of a Data Science Project

POINTS OF INTEGRATION

online or offline?

Page 30: The Other 99% of a Data Science Project

DEVELOPMENT

integration - broad APIs

Page 31: The Other 99% of a Data Science Project

“NAPKIN ARCHITECTURE”

Page 32: The Other 99% of a Data Science Project

IS IT WORKING? evaluatingdataproducts

Image source: https://themouseandthewindmill.wordpress.com

Page 33: The Other 99% of a Data Science Project

accuracyprecision/recalldriven by business

EVALUATION METRICS

Page 34: The Other 99% of a Data Science Project

IS IT WORKING? QA’ing dataproducts

Image source: https://themouseandthewindmill.wordpress.com

Page 35: The Other 99% of a Data Science Project

PLAYING WELL WITH DEVOPS

Page 36: The Other 99% of a Data Science Project

BRIDGING TECHSTACKS

Page 37: The Other 99% of a Data Science Project

IN PRODUCTION

Page 38: The Other 99% of a Data Science Project

THE KNOBS:HOW TO CONTROL THE PRODUCT

∎ on/off switch per customer∎ prediction threshold∎ exclusions

Page 39: The Other 99% of a Data Science Project

“... SMART…”“... AI …”“...MACHINE LEARNING…”“...INTELLIGENT…”

NAMING THINGS

Page 40: The Other 99% of a Data Science Project

UPDATING THE MODEL

∎ input data changes∎ users behaviour changes∎ dataset grows

Page 41: The Other 99% of a Data Science Project

NEGATIVE SAMPLINGsend small % of predicted negativeas if they were positive

predicted positive

Page 42: The Other 99% of a Data Science Project

NEGATIVE LABELINGsend small % of predicted negativefor manual labeling

predicted positive

Page 43: The Other 99% of a Data Science Project

∎ “Would you be able to resolve this ticket successfully?”

∎ “Would an expert user be able to resolve this ticket successfully?”

∎ “Would an expert user be able to resolve this ticket successfully without getting a negative rating?”

LABELING - HOW TOPHRASE THE QUESTION?

Page 44: The Other 99% of a Data Science Project

∎ customers∎ sales∎ account managers∎ marketing∎ execs

MESSAGING

Page 45: The Other 99% of a Data Science Project

CUSTOMER ENGAGEMENT PLAYBOOK

Page 46: The Other 99% of a Data Science Project

DATA ETHICS

Page 47: The Other 99% of a Data Science Project

INTERPRETABILITY

Image source:https://en.wikipedia.org/wiki/File:Blue_Poles_(Jackson_Pollock_painting).jpg

Page 48: The Other 99% of a Data Science Project

THANKS!Eugene Mandel@eugmandel

Page 49: The Other 99% of a Data Science Project

∎ Presentation template by SlidesCarnival∎ Images:

□ http://jedismedicine.blogspot.com/□ Jawbone□ Directly□ Wikipedia□ https://themouseandthewindmill.wordpress.com□ http://www.imdb.com/

CREDITS