enterprise data science at scale @ princeton, nj 14-nov-2017

11
© Hortonworks Inc. 2011 – 2017. All Rights Reserved Enterprise Data Science at Scale: Introducing Data Science Experience (DSX) Future of Data – Princeton Meetup 14-November-2017

Upload: timothy-spann

Post on 22-Jan-2018

248 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017

© Hortonworks Inc. 2011 – 2017. All Rights Reserved

Enterprise Data Science at Scale: Introducing Data Science Experience (DSX)

Future of Data – Princeton Meetup14-November-2017

Page 2: Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017

© Hortonworks Inc. 2011 – 2017. All Rights Reserved

Presenter

Tim Spann

Page 3: Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017

© Hortonworks Inc. 2011 – 2017. All Rights Reserved

à #1 Pure Open Source Hadoop Distribution

à 1000+ customers and 2100+ ecosystem partners

à Employs the original architects, developers and operators of Hadoop from Yahoo!

à Best-in-class 24x7 customer support

à Leading professional services and training

à #1 Data Science Platform (Source: Gartner)

à OpenPOWER performance leadership

à Flexible, software defined storage

à #1 SQL Engine for complex, analytical workloads

à Leader in On-premise and Hybrid Cloud solutions

+

IBM + Hortonworks = Unlocking Actionable Insights

Page 4: Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017

© Hortonworks Inc. 2011 – 2017. All Rights Reserved

Data Science In Action

Data ScientistsResponsible for “The Math”

Data EngineersResponsible for “The Data”

Business AnalystResponsible for “The Business”

The Team The Process

Corporate ITResponsible for “Technology”

Page 5: Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017

© Hortonworks Inc. 2011 – 2017. All Rights Reserved

Data Science Challenges

Data Scientists“I like my own tools”“How can I productionize my model”

Data Engineers“I need a central place for data”“How can I efficiently transform data”

Business Analyst”I need to visualize the shape of data”“How can we fail fast and prototype quickly”

The Team The Process Productionizing with data

So many tools & limited compute resources

Data Discovery

Model detioriation & data evolution

Corporate IT“How do I govern and secure this?”“I can’t support all of these tools”

Page 6: Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017

© Hortonworks Inc. 2011 – 2017. All Rights Reserved

The IBM + HWK Data Science Experience

Data ScientistsTools: R Studio, Juypter, Zeppelin, H20, etcModel management

Data EngineersPlace all data assets in one placeProductionize models with REST endpoints

Business AnalystRich data visualizationCommunity and collaboration of knowledge

The Team The Process

Corporate ITRun secure & governed data scienceOne experience to support many tools

Collaboration

Community

Page 7: Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017

© Hortonworks Inc. 2011 – 2017. All Rights Reserved

Data Science Solution

Community Open Source Scale & Enterprise Security

• Find tutorials and datasets• Connect with Data Scientists• Ask questions• Read articles and papers• Fork and share projects

• Code in Scala/Python/R/SQL• Zeppelin & Jupyter Notebooks• RStudio IDE and Shiny• Apache Spark• Your favorite libraries

• Data Science at Scale• Run Spark Jobs on HDP Cluster• Secure Hadoop Support• Ranger Atlas Support for Data• Support for ABAC

Model Management

• Data Shaping Pipeline UI• Auto-data preparation & modeling• Advanced Visualizations• Model management & deployment• Documented Model APIs

Data Science Experience

Page 8: Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017

© Hortonworks Inc. 2011 – 2017. All Rights Reserved

DEMO

Page 9: Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017

© Hortonworks Inc. 2011 – 2017. All Rights Reserved

Use Case

à All industries are effected by churn.à Being able to predict churn helps

companies take action and keep customers longer.

à The more historical data, the better the model

à Data collected and labeled over time based on churn.

à Using a Random Forest we will predict future churners.

Customer Churn Architecture

Page 10: Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017

© Hortonworks Inc. 2011 – 2017. All Rights Reserved

Demo ScenarioAssessing Customer Churn Probability in Real Time

• Stored long term data on customer churn behavior

• New real time data coming in

• Predict a customers churn probability before they churn

• Alert the proper departments | manager

• Business monitors customer retention outlook & performance

Page 11: Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017

© Hortonworks Inc. 2011 – 2017. All Rights Reserved

Demo ScenarioProblems Solved

• Data Scientist collaborate, learn new tools & frameworks

• Choice of tools, notebooks and languages

• Run favorite notebook on all data in the HDP Cluster

• Deploy the model to production

• Leverage the production model to deliver insights to business

• Monitor models and retrain models as new data comes in