enterprise data science at scale @ princeton, nj 14-nov-2017

Post on 22-Jan-2018

248 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

© Hortonworks Inc. 2011 – 2017. All Rights Reserved

Enterprise Data Science at Scale: Introducing Data Science Experience (DSX)

Future of Data – Princeton Meetup14-November-2017

© Hortonworks Inc. 2011 – 2017. All Rights Reserved

Presenter

Tim Spann

© Hortonworks Inc. 2011 – 2017. All Rights Reserved

à #1 Pure Open Source Hadoop Distribution

à 1000+ customers and 2100+ ecosystem partners

à Employs the original architects, developers and operators of Hadoop from Yahoo!

à Best-in-class 24x7 customer support

à Leading professional services and training

à #1 Data Science Platform (Source: Gartner)

à OpenPOWER performance leadership

à Flexible, software defined storage

à #1 SQL Engine for complex, analytical workloads

à Leader in On-premise and Hybrid Cloud solutions

+

IBM + Hortonworks = Unlocking Actionable Insights

© Hortonworks Inc. 2011 – 2017. All Rights Reserved

Data Science In Action

Data ScientistsResponsible for “The Math”

Data EngineersResponsible for “The Data”

Business AnalystResponsible for “The Business”

The Team The Process

Corporate ITResponsible for “Technology”

© Hortonworks Inc. 2011 – 2017. All Rights Reserved

Data Science Challenges

Data Scientists“I like my own tools”“How can I productionize my model”

Data Engineers“I need a central place for data”“How can I efficiently transform data”

Business Analyst”I need to visualize the shape of data”“How can we fail fast and prototype quickly”

The Team The Process Productionizing with data

So many tools & limited compute resources

Data Discovery

Model detioriation & data evolution

Corporate IT“How do I govern and secure this?”“I can’t support all of these tools”

© Hortonworks Inc. 2011 – 2017. All Rights Reserved

The IBM + HWK Data Science Experience

Data ScientistsTools: R Studio, Juypter, Zeppelin, H20, etcModel management

Data EngineersPlace all data assets in one placeProductionize models with REST endpoints

Business AnalystRich data visualizationCommunity and collaboration of knowledge

The Team The Process

Corporate ITRun secure & governed data scienceOne experience to support many tools

Collaboration

Community

© Hortonworks Inc. 2011 – 2017. All Rights Reserved

Data Science Solution

Community Open Source Scale & Enterprise Security

• Find tutorials and datasets• Connect with Data Scientists• Ask questions• Read articles and papers• Fork and share projects

• Code in Scala/Python/R/SQL• Zeppelin & Jupyter Notebooks• RStudio IDE and Shiny• Apache Spark• Your favorite libraries

• Data Science at Scale• Run Spark Jobs on HDP Cluster• Secure Hadoop Support• Ranger Atlas Support for Data• Support for ABAC

Model Management

• Data Shaping Pipeline UI• Auto-data preparation & modeling• Advanced Visualizations• Model management & deployment• Documented Model APIs

Data Science Experience

© Hortonworks Inc. 2011 – 2017. All Rights Reserved

DEMO

© Hortonworks Inc. 2011 – 2017. All Rights Reserved

Use Case

à All industries are effected by churn.à Being able to predict churn helps

companies take action and keep customers longer.

à The more historical data, the better the model

à Data collected and labeled over time based on churn.

à Using a Random Forest we will predict future churners.

Customer Churn Architecture

© Hortonworks Inc. 2011 – 2017. All Rights Reserved

Demo ScenarioAssessing Customer Churn Probability in Real Time

• Stored long term data on customer churn behavior

• New real time data coming in

• Predict a customers churn probability before they churn

• Alert the proper departments | manager

• Business monitors customer retention outlook & performance

© Hortonworks Inc. 2011 – 2017. All Rights Reserved

Demo ScenarioProblems Solved

• Data Scientist collaborate, learn new tools & frameworks

• Choice of tools, notebooks and languages

• Run favorite notebook on all data in the HDP Cluster

• Deploy the model to production

• Leverage the production model to deliver insights to business

• Monitor models and retrain models as new data comes in

top related