data science and its relevance in product management
TRANSCRIPT
Data Science and its relevance in Product Management
- Amit Sharma
Director Architecture and Data Science
July 29, 2016
Copyright © 2015 ADP, LLC. Proprietary and Confidential.
2
About ADP
• Human Capital Management
• Small, Midsized, Large, Multinational markets • Revenues: $10.9 billion in fiscal 2015 • ADP pays 24 million (1 in 6) workers in U.S., and 12 million elsewhere • FORTUNE 500®: Ranked 251 (2015) • Forbes® Global 2000: Ranked 370 (2015)
Payroll Tax Time HR Talent Benefits
3
Agenda
• Data Science applications in ADP • Applying Data Science in Product Management • Applying Product Management in Data Science • Exercise
7
Data Science in ADP
Classifica6on Predic6ve Op6miza6on Text Mining
Recommenders Social User Behavior
9
Classification
• Used for standardization across clients for Benchmarking – Job Titles, Department Names, Business Functions – Pay codes, Termination reasons
• Legacy Text mining/clustering techniques limitations – Variety of data is the challenge due to customizations – Short String documents, highly abbreviated
• Lack of standards – Opportunity to define industry standards
10
Predictive analytics
• Employee Turnover Probability – What is the risk of an Employee leaving in next quarter
• Workforce Capacity Planning – What is the risk of an Employee taking an unplanned leave
11
Optimization
• Merit Increase Guidelines in compensation management • Linear Programming problem • Objective
– Best utilization of budget amount to reward employees • Constraints
– Respect budget – Stick to Compensation philosophy – Handle Compa-ratios – Minimum and Maximum increment limits
12
Text Mining and Analytics • Natural Language Processing
– Semantic Search
• Product Management decisions – Analyze Customer feedback based on call logs (Voice-of-Customer) – Product Roadmap and MVP decisions
• Which features should be picked up first – UX design
• Frequent pattern mining to identify which fields are used together – Sentiment Analysis
• Clickstreams – Google Analytics for usage monitoring – Success of a new rollout
• Is the feature being used correctly
13
Social Network Analysis
• Network chart based on communication proximity • Leadership Radar • Performance Management
14
Data science in Product Management
• Defining the product Roadmap – Call Log Analytics
• Designing the UX – Navigation (Applicant dropout Analysis) – Unused Fields – Default values for fields – Grouping fields and conditional logic – Template definition
16
UX Design • Identify fields and scenarios • Identify common values and co-occurrence patterns • Redefine UI to leverage the patterns
– Create Templates – Remove/Group/Default fields
Scenario # Field1 Field2 Field3 Field1 + Field2 Field2+Field3 Field1+Field3 Field1+2+3 Scenario 1 Empty Default Empty 10% -‐ 70% -‐ Scenario 2 Filled (90%) Default Filled (90%) 80% 10% 10% 5% Scenario 3 Empty Default Filled (90%) 10% -‐ 10% -‐
17
Page 1
Page 2
Page 3
Page 4
Page 5
27
Possible Reasons
• Asking for registraEon
• Too many fields to fill in
• Possible usability issue
Total number of users
114
9
18
2
2 Total Users Lost 58
Conversion Funnels – User Drop Rate
• Some opEons not enabled?
• Too many steps possibly
18
Product Management in Data Science
• Correlation does not imply Causation • Linear Vs. Non Linear Model • Explicability and Actionability • Predictive and Prescriptive
19
Typical challenges in Machine Learning • Training Model
– Train-test split – Building ground truth
• Not enough features – Data Exchange with Clients – Use external data such as BLS, Weather, Geo
• Crowd-sourcing for validation – Clients
• Inbuilt auto-correction for learning models – Outsourced
• Mechanical Turks
20
Model Validation • Explicability and Actionability
– Regression – Rule-based – Deep Learning
• Model Accuracy
– Coverage – Accuracy – Recall/Sensitivity
• Model Comparison – ROC Curves
21
Curse of Dimensionality
• Too many features or dimensions • No of data points required grows exponentially with
features
• Solution – More data points, the merrier – Regularization – Feature selection using entropy
22
Getting started with Data Science
Tools
• Excel • RStudio • Rweka • Tableau • CliqSense
Learning online • Coursera – Machine Learning • Udacity • Caltech Online • Kaggle
24
Problem #1
How do you measure the goodness such a model?
hLp://pgfplots.net/Ekz/examples/regression-‐line/
25
Problem #2
hLps://commons.wikimedia.org/wiki/File:Decision_tree_model.png
How do you measure the goodness such a model?