scaling the data scientist

15
Scaling the Data Scientist Dr. Ira Cohen, Chief Data Scientist, HP Software

Upload: ceana

Post on 25-Feb-2016

103 views

Category:

Documents


1 download

DESCRIPTION

Scaling the Data Scientist. Dr. Ira Cohen, Chief Data Scientist, HP Software. HP-Software and Data Science. HP-Software products collect huge amounts of IT data. Requirements. Changes. Defects. Security events. System Monitoring. Logs. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Scaling the Data Scientist

Scaling the Data Scientist

Dr. Ira Cohen, Chief Data Scientist, HP Software

Page 2: Scaling the Data Scientist

2 Data Science Office @ HPSW

HP-Software and Data Science

HP-Software products collect huge amounts of IT data

Customers want us to transform the data to actionable information

System Monitoring

Events

Defects

Incidents

Logs

Changes

ConfigurationTest data

Requirements

“Big Data & Predictive Analytics: The Future of IT Management” Mike Gualtieri, Forrester

Security events

Network dataApp Monitoring

Page 3: Scaling the Data Scientist

3 Data Science Office @ HPSW

Need

Expertise

Expertise in machine learning

Expertise in the products domain

Infrastructure

Data platforms

Development Tools

Page 4: Scaling the Data Scientist

4 Data Science Office @ HPSW

A tale of two worldsData Scientists

• Few• Limited domain knowledge• Tools: R, Matlab, Mahout, Knime,

Weka, Sas, …

Developers/SMEs• Plentiful• Limited data science knowledge• Tools: IDEs, Excel

Page 5: Scaling the Data Scientist

5 Data Science Office @ HPSW

Developer Data analytics specialist

Our solution

Page 6: Scaling the Data Scientist

6 Data Science Office @ HPSW

How?

• Training• Mentoring• Community

• Data infrastructure• New Dev tool

Page 7: Scaling the Data Scientist

7 Data Science Office @ HPSW

Training: Practical Machine Learning• 4 day training• Commitment to complete first project

•Big data foundations

•Problem definition

Data

•Attribute construction

•Transformations

Processing•Attribute selection

•Dimensionality reduction

Filtering

•Supervised•Unsupervised

Learning

• Validation methods• Accuracy measures

Testing

Page 8: Scaling the Data Scientist

Practical Machine LearningOhad Assulin, Efrat Egozi Levi, Ira Cohen

Automatic Event

Prioritization

Anat Levinger & Roy

Wallerstein

Automatic

Vulnerability

Categorization

Barak Raz & Ben

FeherClassifying Security

EventsYoni Roit & Omer Weissman

Early detection of anomalous behavior

in IT systems Yonatan Ben Simhon & Yaneeve Shekel

Cloud Delivery Optimization (CDO)

Ran, LeviURL to Action ClassificationBoaz Shor & Eyal Kenigsberg

Predictive Analytics in

Release ManagementSigalit Sade

Sales Pipeline Early Warning

Gabriel, Alvarado

Page 9: Scaling the Data Scientist

Pushing My Buttons

Gil Zieder, Ofer Eliassaf, Boris Kozorovitzky

Page 10: Scaling the Data Scientist

10 Data Science Office @ HPSW

The process @ work

•Problem definition

Data

•Attribute construction

•Normalization

Processing•Attribute selection

Filtering

•Supervised•Classification

Learning

• Minimize false negatives

Testing

9 open source projects, 8806 individual commitsGet labels of “good” or “bad” commit by running tests after each commit“good” – tests pass, “bad” – tests fail

As a Pusher or DevOps of a project you would like to know if the given change set is safe to push into the production branch.

80 attributes per commitsource control, previous commits, and code complexity based attributes:e.g., average change frequency, previous commit state, cyclomatic complexity

Rank based attribute selection

Classification algorithmsK-NN, SVM, Decision Tree, Random Forest, …

87% Accuracy with K-NN

Page 11: Scaling the Data Scientist

11 Data Science Office @ HPSW

Analytic specialist program: Results

> 70 developers

trained

Before: 4

> 30 new capabilities since April

2013

Before: 1

1 Data scientist per

10 new capabilities

Before: 1:1

Development time

reduced by 70%

Before: 12 months

Page 12: Scaling the Data Scientist

12 Data Science Office @ HPSW

Can we do better?• Yes. From months to days! • How? – Create a simple tool for analytic specialists– Automate the data scientist as much as possible

Page 13: Scaling the Data Scientist

13 Data Science Office @ HPSW

Project Titan

Page 14: Scaling the Data Scientist

14 Data Science Office @ HPSW

Titan: Demo

Page 15: Scaling the Data Scientist

15 Data Science Office @ HPSW

Scaling the data scientist

Analytic specialists• Develops using

standard machine learning

• Uses simplified tool

Data Scientist• Provides expert

advice • Develops new types

of machine learning solutions