democratizing access to machine learning with pre-trained ......ibm watson studio paperspace h20 -...

53
Democratizing access to Machine Learning with Pre-Trained Models and Cloud Services Jay Bartot – CTO, Madrona Venture Labs

Upload: others

Post on 05-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Democratizing access to Machine Learning with Pre-Trained Models

and Cloud ServicesJay Bartot – CTO, Madrona Venture Labs

Page 2: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Who Am I?• CTO of Madrona Venture Labs

• Serial Technology Entrepreneur

• Machine Learning Enthusiast

• Started learning about ML in 2001

• Lots of startups and acquisitions over last 20 years

• Classification, Forecasting, Text-mining, Computer vision

• eCommerce, Online Advertising, Travel, Medical informatics, Consumer video

Page 3: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

What I work on

• We look for technology startup ideas that are aligned with Madrona

Venture Group’s investment thesis

• Our critical work: Carefully vetting the ideas for market, customer,

technology fit

• Most ideas ultimately don’t pass muster – so we kill them!

• Main focus: Vertical AI/ML companies

Page 4: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Data science (DS) and Machine learning (ML) Landscape

Many once extremely challenging problems now have robust solutions due to:

• Big data

• Compute power

• Reemergence of neural network learning algorithms

Page 5: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

DS/ML Landscape

• Explosion of open source toolkits and platforms – phenomenal!

• Amazing how quickly new ML techniques and tools become commoditized

• Lots of new educational material and resources

• But DS/ML is still complex with significant learning curve

Page 6: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

DS/ML Landscape

• Big learning curve

• Somewhat of a black art• Critical work:

• Data cleaning, normalizing • Feature engineering• Hyperparameter tuning• Testing/Validation

• Hosting models for inference

• Time consuming and expensive!

Page 7: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Machine learning pipeline – Phase 1Data Prep

Page 8: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Machine learning pipeline – Phase 2Model Training

Page 9: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Machine learning pipeline – Phase 3Inference

Page 10: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Full featured DS/ML Cloud platforms

AMZ SageMaker

Google ML Engine

Azure ML Studio

IBM Watson Studio

Paperspace H20 - DriverlessAI

Page 11: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

But there is a problem

Page 12: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Democratizing DS/ML: Empowering the Rest of Us

• Proliferation of pretrained models and services

• From files/libraries to cloud services

• Can the paradigm of software reusability and

encapsulated components apply to data science and ML?

• Just like we were taught with software, avoid reinventing the wheel!

Pre-TrainedModels(PTMs)

Page 13: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Categories of Pretrained Models (PTMs)

• Computer vision

• Text analysis (NLP, NLU)

• Speech-to-text, Text-to-speech

• Language translation

• Anomaly detection

• Chatbot foundations (e.g. intent mapping)

(Pre)TrainedModels

(Pre)TrainedModels

Pre-TrainedModels(PTMs)

Page 14: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Top Pretrained Model Vendors/Products

Page 15: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Pretrained Models - Computer Vision

• Game changer: Convolutional Neural Networks (CNNs)• Image Labeling, scene detection, object detection

• Face detection, recognition• Facial key-points, emotion, gender, age, etc.

• Text recognition, OCR• Objectionable content detection• Fashion recognition

• Image Style Transfer

• Video analysis

Page 16: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Google Cloud Vision

Page 17: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Google Cloud Vision

Page 18: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Clarifai – Specialized models

Page 19: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

AWS Rekognition - Face Metadata

Page 20: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Deep Style Transfer

Page 21: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

YOLO

Page 22: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Pretrained Models - Text Analysis• Named Entity Recognition (NER)• Dependency parsers

• Sentiment/Tone Analysis

• Speech-to-text, Text-to-speech• Language Translation• Word embeddings

• Language detection• Summarization

• Content Moderation

Word embedding space

Page 23: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Microsoft Azure Cognitive Services

Page 24: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Google Cloud Natural Language

Page 25: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Sentiment Analysis

Page 26: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Google Grammatical Parsing

Page 27: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

IBM Watson Tone Analyzer

"utterance_text" : "Well, nothing is working :(","tones" :

"score" : 0.997149,"tone_id" : "sad","tone_name" : "sad"

"utterance_text" : "Sorry to hear that.","tones" :

"score" : 0.689109,"tone_id" : "polite","tone_name" : "Polite"

"score" : 0.663203,"tone_id" : "sympathetic","tone_name" : "Sympathetic"

"utterance_text" : "Hello, I'm having a problem with your product.","tones" :

"score" : 0.718352,"tone_id" : "polite","tone_name" : "Polite"

Label text with classifications such as: anger, disgust, fear, joy, sadness, analytical, confident, tentative, sad, frustrated, satisfied, excited, polite, impolite, and sympathetic.

Page 28: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

How Do Competing Offerings Compare?

• Hard not to notice, many of the big guys (and little guys) are

competing with very similar offerings.

• Obvious question: which one should I choose for my application?

• How about using multiple offerings at once?

• We built a simple empirical comparison tool to look at model

performances side-by-side

Page 29: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Image OCR Comparisons

Page 30: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Image OCR Comparisons

Page 31: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Image OCR Comparisons

Page 32: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Content Moderation

Page 33: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Sentiment Analysis Comparisons

Page 34: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Speech-to-Text Comparisons

Page 35: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

More Speech-to-Text Comparisons

Page 36: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Language Translations Comparisons

Page 37: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Face Detection Comparisons

Page 38: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Face Detection Comparisons

Page 39: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Case Studies

• User product reviews over voice• Sentiment tied to keyword tagging

• Chat communications conversation analysis• Sentiment, emotional analysis of conversations

• Home décor design• Pretrained CNNs power furniture image similarity engine

• Text recognition, OCR• Use paper (EOBs, transcripts, bills) as input to app

• Meeting analysis• speech-to-text

Page 40: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Challenges with PTMs

• Lack of interpretability of DNNs – Black boxes

• Measuring accuracy – How good are these models?

• Report N-fold cross validation metrics?

• Establish cross-vendor standardized test sets

• Measuring Cultural Bias

• E.g. Face detection

• CNNs can be vulnerable to so-called adversarial examples

Page 41: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Transfer learning

• Idea: Leverage the ‘layered’ architectures of DNNs

• Finetune an existing PTM for a specific classification task

• Typically only requires a small amount of training data (yay!)

• Right now works great for CNNs, but can it work for text?

• Google ‘AutoML Vision’ and MSFT’s “Custom Vision” Service

Page 42: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Simple CNN

classification

Page 43: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Simple CNN

Freeze Retrain

classification

Page 44: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Conclusions

• Data scientists fear not: Manual machine learning model building is not going away

• Question: What is the depth of product use cases PTMs can support/satisfy?

• As devs catch on, will there be demand for more specific/granular models?

• What about other types of common machine learning problems can be generalized and

benefit from customizable base models (e.g. churn)?

• How can we overcome the challenges with PTMs?

• Make your apps smarter - Try some of these PTMs out!

Page 45: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Thank you!Jay Bartot – CTO, Madrona Venture Labs

Page 46: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Appendix

Page 47: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Google’s AutoML

• Machine learning models are often painstakingly designed by a team

of engineers and scientists

• Manually designing machine learning models is difficult because the

search space of all possible models can be combinatorically large

• AutoML: Automate the design of machine learning models

• Transfer learning capabilities

Page 48: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

GoogleNet Architecture

Page 49: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

High-level machine learning steps

Page 50: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

VGG Architecture

Page 51: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

My First PTM - Viola-Jones Face Detector

• A seminal approach to real-time face (and object) detection

• Training is slow but detection is very fast

• Key ideas:• Integral images for fast feature computation

• AdaBoost for feature selection

• Attentional cascade for fast rejection of non-face windows

• C/C++ version added to OpenCV in 2008(?)

Page 52: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Deep Dream Image

Page 53: Democratizing access to Machine Learning with Pre-Trained ......IBM Watson Studio Paperspace H20 - DriverlessAI. But there is a problem. ... • Just like we were taught with software,

Machine Learning Appliances

• https://techcrunch.com/2017/07/06/h2o-ais-driverless-ai-automates-machine-learning-for-businesses/

• https://ai.googleblog.com/2017/05/using-machine-learning-to-explore.html