a practical guidance of the enterprise machine learning

71
A Practical Guidance to the Enterprise Machine Learning Platform Ecosystem

Upload: jesus-rodriguez

Post on 13-Apr-2017

1.423 views

Category:

Software


0 download

TRANSCRIPT

Page 1: A practical guidance of the enterprise machine learning

A Practical Guidance to the Enterprise Machine Learning Platform Ecosystem

Page 2: A practical guidance of the enterprise machine learning

About Us

• Helping great companies become great software companies

• Building software solutions powered by disruptive enterprise software trends

-Machine learning and data science

-Cyber-security

-Enterprise IOT

-Powered by Cloud and Mobile

• Bringing innovation from startups and academic institutions to the enterprise

• Award winning agencies: Inc 500, American Business Awards, International Business Awards

Page 3: A practical guidance of the enterprise machine learning

About This Webinar

• Research that brings together big enterprise software trends, exciting startups and academic research

• Best practices based on real world implementation experience• No sales pitches

Page 4: A practical guidance of the enterprise machine learning

• Cloud vs. On-Premise machine learning

• Cloud machine learning platforms

• Azure machine learning

• AWS machine learning

• Databricks

• Watson developer cloud

• Others…

• On-premise machine learning platforms

• Revolution analytics

• Dato

• Spark Mlib

• TensorFlow

• Others…

Agenda

Page 5: A practical guidance of the enterprise machine learning

Enterprise Data Science

Page 6: A practical guidance of the enterprise machine learning

“data science”

Page 7: A practical guidance of the enterprise machine learning
Page 8: A practical guidance of the enterprise machine learning

Modern Machine Learning

• Advances in storage, compute and data science research are making machine learning as part of mainstream technology platforms

• Big data movement

• Machine learning platforms are optimized with developer-friendly interfaces

• Platform as a service providers have drastically lowered the entry point for machine learning applications

• R and Python are leading the charge

Page 9: A practical guidance of the enterprise machine learning

Cloud vs. On-Premisemachine learning platforms

Page 10: A practical guidance of the enterprise machine learning

Cloud Machine Learning Platforms: Benefits

• Service abstraction layer over the machine learning infrastructure

• Rich visual modeling tools

• Rich monitoring and tracking interfaces

• Combine multiple platforms: R, Python, etc

• Enable programmatic access to ML models

Page 11: A practical guidance of the enterprise machine learning

Cloud machine Learning Platforms:: Challenges

• Integration with on-premise data stores

• Extensibility

• Security and privacy

Page 12: A practical guidance of the enterprise machine learning

On-Premise machine Learning Platforms: Benefits

• Control

• Security

• Integration with on-premise data stores

• Integrated with R and Python machine learning frameworks

Page 13: A practical guidance of the enterprise machine learning

On-Premise machine Learning Platforms: Challenges

• Code-based modeling interfaces

• Scalability

• Tightly coupled with Hadoop distributions

• Monitoring and management

• Data quality and curation

Page 14: A practical guidance of the enterprise machine learning

Cloud Machine Learning Platforms

Page 15: A practical guidance of the enterprise machine learning

• Azure Machine Learning

• AWS machine learning

• Databricks

• Watson developer cloud

The Leaders

Page 16: A practical guidance of the enterprise machine learning

Azure Machine Learning

Page 17: A practical guidance of the enterprise machine learning

Azure Machine Learning

• Native machine learning capabilities as part of the Azure cloud

• Elastic infrastructure that scale based on the model requirements

• Support over 30 supervised and unsupervised machine learning algorithms

• Integration with R and Python machine learning libraries

• Expose machine learning models via programmable interfaces

• Integrated with the Cortana Analytics suite

• Integrated with PowerBI

Page 18: A practical guidance of the enterprise machine learning

• Supports both supervised and unsupervised models

• Integrated with Azure HDInsight

• Large library of models and sample gallery

• Support for R and Python code

Visual Model Creation

Page 19: A practical guidance of the enterprise machine learning

• Visual dashboard to track the execution of ML models

• Track execution of different steps within a ML model

• Integrated monitoring experience with other Azure services

Rich Monitoring and Management Interface

Page 20: A practical guidance of the enterprise machine learning

• Expose machine learning models as Web Services APIs

• Integrate ML Models with Azure API Gateway

• Retrain and extend models via ML APIs

Programmatic Access to ML Models

Page 21: A practical guidance of the enterprise machine learning

AWS Machine Learning

Page 22: A practical guidance of the enterprise machine learning

AWS Machine Learning

• Native machine learning service in AWS

• Provide data exploration and visualization tools

• Supports supervised and unsupervised algorithms

• Integrated data transformation models

• APIs for dynamically creating machine learning models

Page 23: A practical guidance of the enterprise machine learning

• Programmatic creation of machine learning models

• Large number of algorithms and recipes

• Data transformation models included in the language

Sophisticated ML Model Authoring

Page 24: A practical guidance of the enterprise machine learning

• Sophisticated monitoring for evaluating ML models

• Integrated with AWS Cloud Watch

• KPIs that evaluate the efficiency of ML models

Monitoring ML Model Execution

Page 25: A practical guidance of the enterprise machine learning

• Optimized DSL for data transformation

• Recipes that abstract common transformations

• Reuse transformation recipes across ML models

Embedded Data Transformation

Page 26: A practical guidance of the enterprise machine learning

• Sophisticated monitoring for evaluating ML models

• Integrated with AWS Cloud Watch

• KPIs that evaluate the efficiency of ML models

Monitoring ML Model Execution

Page 27: A practical guidance of the enterprise machine learning

Databricks

Page 28: A practical guidance of the enterprise machine learning

Databricks Machine Learning

• Scaling Spark machine learning pipelines

• Integrated data visualization tools

• Sophisticated ML monitoring tools

• Combine Python, Scala and R in a single platform

Page 29: A practical guidance of the enterprise machine learning

• Implementing machine learning models using Notebooks

• Publishing notebooks to a centralized catalog

• Leverage Python, Scala or R to implement machine learning models

Notebooks Based Authoring

Page 30: A practical guidance of the enterprise machine learning

• Integrate data visualization into machine learning pipelines

• Reuse data visualization notebooks across applications

• Evaluate the efficiency of machine learning pipelines using visualizations

Machine Learning Data Visualization

Page 31: A practical guidance of the enterprise machine learning

• Monitor the execution of machine learning pipelines

• Run machine learning pipelines manually

• Rapidly modify and deploy machine learning pipelines

Monitoring and Management

Page 32: A practical guidance of the enterprise machine learning

Watson Developer Cloud

Page 33: A practical guidance of the enterprise machine learning

• Personality Insights

• Tradeoff Analytics

• Relationship Extraction

• Concept Insights

• Speech to Text

• Text to Speech

• Visual Recognition

• Natural Language Classifier

• Language Identification

• Language Translation

• Question and Answer

• Concept Expansion

• Message Resonance

• AlchemyAPI Services

Large Variety of Cognitive Services

Page 34: A practical guidance of the enterprise machine learning

• Access services via REST APIs

• SDKs available for different languages

• Integration with different services in the BlueMix platform

Rich Developer Interfaces

Page 35: A practical guidance of the enterprise machine learning

Relationship Extraction Concept Expansion Message Resonance

User Modeling

Complex Algorithms – Simple Interfaces

Page 36: A practical guidance of the enterprise machine learning

Other Interesting Platforms

• Microsoft’s Project Oxford https://www.projectoxford.ai/ • BigML https://bigml.com/

Page 37: A practical guidance of the enterprise machine learning

On-premise machinelearning platforms

Page 38: A practical guidance of the enterprise machine learning

The Leaders

• Revolution Analytics (Microsoft)

• Spark Mlib + Spark R

• Dato

• TensorFlow• Others: PredictionIO, Scikit-learn…

Page 39: A practical guidance of the enterprise machine learning

Revolution Analytics

Page 40: A practical guidance of the enterprise machine learning

All of Open Source R plus:

• Big Data scalability

• High-performance analytics

• Development and deployment tools

• Data source connectivity

• Application integration framework

• Multi-platform architecture

• Support, Training and Services

Revolution Analytics (Microsoft)

Page 41: A practical guidance of the enterprise machine learning

R+CR

AN

Revo

R

DistributedR

ScaleR

ConnectR

DeployRDevelopR

In the Cloud Amazon AWS

Workstations & ServersWindowsRed Hat and SUSE Linux

Clustered Systems IBM Platform LSFMicrosoft HPC

EDW IBM NetezzaTeradata

Hadoop HortonworksCloudera

Write Once, Deploy Anywhere

Page 42: A practical guidance of the enterprise machine learning

DeployR does not provide any application UI.

3 integration modes embed real-time R results into existing interfaces

Web app, mobile app, desktop app, BI tool, Excel, …

RBroker Framework :

Simple, high-performance API for Java, .NET and Javascript apps Supports transactional, on-demand analytics on a stateless R session

Client Libraries:

Flexible control of R services from Java, .NET and Javascript apps Also supports stateful R integrations (e.g. complex GUIs)

DeployR Web Services API:

Integrate R using almost any client languages

Integrate R Scripts Into Third Party Applications

Page 43: A practical guidance of the enterprise machine learning

Spark Mlib + SparkR

Page 44: A practical guidance of the enterprise machine learning

• It is built on Apache Spark, a fast and general engine for large-scale data processing

• Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.

• Write applications quickly in Java, Scala, or Python.

Spark Mlib

Page 45: A practical guidance of the enterprise machine learning

• Integrated with Spark SQL for data queries and transformations

• Integrated with Spark GraphX for data visualizations

• Integrated with Spark Streaming for real time data processing

Beyond Machine Learning

Page 46: A practical guidance of the enterprise machine learning

• Run R and machine learning models using the same infrastructure

• Leverage R scripts from Spark Mlib models

• Scale R models as part of a Spark cluster

• Execute R models programmatically using Java APIs

Spark Mlib + SparkR

Page 47: A practical guidance of the enterprise machine learning

Dato

Page 48: A practical guidance of the enterprise machine learning

• Makes Python machine learning enterprise – ready

• Graphlab Create

• Dato Distributed

• Dato Predictive Services

Dato

Page 49: A practical guidance of the enterprise machine learning
Page 50: A practical guidance of the enterprise machine learning
Page 51: A practical guidance of the enterprise machine learning

Principles:

• Get started fast

• Rapidly iterate

• Combine for new apps

import graphlab as gl data = gl.SFrame.read_csv('my_data.csv') model = gl.recommender.create(data,

user_id='user',

item_id='moviez

target='rating') recommendations = model.recommend(k=5)

Recommender Image search Sentiment Analysis

Data Matching Auto Tagging Churn Predictor

Click Prediction Product Sentiment Object Detector

Search Ranking Summarization …

Sophisticated ML made easy - Toolkits

Page 52: A practical guidance of the enterprise machine learning

Tensor Flow

Page 53: A practical guidance of the enterprise machine learning

• Powers deep learning capabilities on dozens of Google’s products

• Interfaces for modeling machine and deep learning algorithms

• Platform for executing those algorithms

• Scales from mobile devices to a cluster with thousands of nodes

• Has become one of the most popular projects in Guthub in less than a week

Google’s Tensor Flow

Page 54: A practical guidance of the enterprise machine learning

• Based on the principle of a dataflow graph

• Nodes can perform data operations but also send or receive data

• Python and C++ libraries. NodeJS, Go and others in the pipeline

Tensorflow Programming Model

cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))sess.run(tf.initialize_all_variables())for i in range(20000): batch = mnist.train.next_batch(50) if i%100 == 0: train_accuracy = accuracy.eval(feed_dict={ x:batch[0], y_: batch[1], keep_prob: 1.0}) print "step %d, training accuracy %g"%(i, train_accuracy) train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

print "test accuracy %g"%accuracy.eval(feed_dict={ x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})

Page 55: A practical guidance of the enterprise machine learning

• Scales from a single device to a large cluster of nodes

• Tensorflow uses a placement algorithm based on heuristics to place tasks on the different nodes in a graph

• The execution engine assigns tasks for fault tolerance

• Linear scalability model

Tensor Flow Implementation

Page 56: A practical guidance of the enterprise machine learning

• TensorFlow includes an engine that enables the visual representation of the execution graph

• Visualizations include summary statistics of the different states of the model

• The visualization engine is included in the current open source release

Tensor Flow Graph Visualization

Page 57: A practical guidance of the enterprise machine learning

Other Interesting Projects

• H20.ai• PredictionIO• Scikit-Learn• Microsoft’s DMTK

Page 58: A practical guidance of the enterprise machine learning

Machine Learning in the Enterprise

Page 59: A practical guidance of the enterprise machine learning

•Enable foundational building blocks -Data quality -Data discovery -Functional and integration testing •Predictions are tempting but classification and clustering are easier

•Run multiple models at once•Enable programmatic interfaces to interact with ML models •Start small, deliver quickly, iterate…

Machine Learning in the Enterprise

Page 60: A practical guidance of the enterprise machine learning

•Machine learning is becoming one of the most important elements of modern enterprise solutions

•Innovation in machine learning is happening in both the on-premise and cloud space

•Cloud machine learning innovators include: Azure ML, AWS ML, Databricks and IBM Watson

•On-premise machine learning innovators include: Spark Mlib, Microsoft’s Revolution R, Dato, TensorFlow

•Enterprise machine learning solutions should include elements such as data quality, data governance, etc

•Start small and use real use cases

Summary

Page 62: A practical guidance of the enterprise machine learning

Appendix A: Scikit-Learn

Page 63: A practical guidance of the enterprise machine learning

• Extensions to SciPy (Scientific Python) are called SciKits. SciKit-Learn provides machine learning algorithms.

• Algorithms for supervised & unsupervised learning

• Built on SciPy and Numpy

• Standard Python API interface

• Sits on top of c libraries, LAPACK, LibSVM, and Cython

• Open Source: BSD License (part of Linux)

• Probably the best general ML framework out there.

Scikit-Learn

Page 64: A practical guidance of the enterprise machine learning

Load & Transform Data

Raw Data Feature Extraction

Build Model

Feature Evaluation

Very Simple Prediction Model

Evaluate Model

Page 65: A practical guidance of the enterprise machine learning

Assess how model will generalize to independent data set (e.g. data not in the training set).

1. Divide data into training and test splits

2. Fit model on training, predict on test

3. Determine accuracy, precision and recall

4. Repeat k times with different splits then average as F1

Predicted Class A Predicted Class B

Actual A True A False B #A

Actual B False A True B #B

#P(A) #P(B) total

Simple Programming Model-Cross Validation (classification)

Page 66: A practical guidance of the enterprise machine learning

How to evaluate clusters? Visualization (but only in 2D)

Data Visualization

Page 67: A practical guidance of the enterprise machine learning

Appendix B: Prediction IO

Page 68: A practical guidance of the enterprise machine learning

• Developer friendly machine learning platform

• Completely open source

• Based on Apache Spark

PredictionIO

Page 69: A practical guidance of the enterprise machine learning

• PredictionIO platformA machine learning stack for building, evaluating and deploying engines with machine learning algorithms.

• Event ServerAn open source machine learning analytics layer for unifying events from multiple platforms

• Template Galleryengine templates for different type of machine learning applications

A Simple Architecture

Page 70: A practical guidance of the enterprise machine learning

• Execute models asynchronous via event interface

• Query data programmatically via REST interface

• Various SDKs provided as part of the platform

Model Execution

Page 71: A practical guidance of the enterprise machine learning

• Visual model for model creation

• Integrated with a template gallery

• Ability to test and valite engines

Rich Model Creation Interface