deploying and managing machine

60

Upload: others

Post on 27-Mar-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deploying and managing machine
Page 2: Deploying and managing machine

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Deploying and managing machine learning models at scale

A I M 3 4 8

Sireesha Muppala

AI/ML Specialist SA

Amazon Web Services

Nitin Wagh

Sr. BDM, Amazon SageMaker

Amazon Web Services

Page 3: Deploying and managing machine

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Our awesome support experts

A I M 3 4 8

Arun Nagarajan

Sr. SDE, AI Platforms

Kiran Bakshi

Consultant, ProServ

Piyush Bothra

Sr. Solutions Architect

Page 4: Deploying and managing machine

Workshop map

Page 5: Deploying and managing machine

Workshop map

Page 6: Deploying and managing machine

Related Breakouts

AIM307 - Amazon SageMaker deep dive: A modular solution for ML

AIM311 - Choose the right instance type in Amazon SageMaker

AIM318 - Amazon SageMaker: Automatically tune hyperparameters

AIM306 - How to build high-performance machine learning solutions at low cost

Page 7: Deploying and managing machine

Workshop map

Page 8: Deploying and managing machine

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

https://tinyurl.com/w5wu595

Page 9: Deploying and managing machine

Workshop map

Page 10: Deploying and managing machine

You are a data scientist at Media Company

• Build music recommendation model for customers

• Dataset provides user purchase/listening patterns

• Develop model monitoring solution to ensure it is up-to-date

• Build movie recommendation model

• Dataset provides user purchase/viewing patterns

Page 11: Deploying and managing machine

You are an engineer responsible for deployment atMedia Company

• Deploy models as real-time endpoints at scale

• Set up model drift detection pipeline that triggers training if required

• Save cost and efficiently run large number of models

Page 12: Deploying and managing machine

Workshop map

Page 13: Deploying and managing machine

Amazon SageMaker at Re:Invent 2019

Amazon

SageMakerGround

Truth

Algorithms

& FrameworksNotebooks

Training

& Tuning

Deployment &

HostingRLML

MarketplaceNeo

SageMakerStudio

NEW!

Quick-start

Notebooks (Preview)

NEW!

Experiments

NEW!

Debugger

NEW!

Autopilot

NEW!

Model Monitor

NEW!

Build, Train, Deploy Machine Learning Models Quickly at Scale

Processing

NEW!

Page 14: Deploying and managing machine

Model deployment in SageMaker - Overview

Page 15: Deploying and managing machine

Model deployment in SageMaker – Key features

Page 16: Deploying and managing machine

Model deployment – Security and compliance

Page 17: Deploying and managing machine

Amazon Confidential – Do not share or distribute

Deploying a model is not the end.

You need to continuously monitor

models in production and iterate.

Concept drift due to

divergence of data

Model performance can

change due to unknown

factors

Continuous monitoring involves a

lot of tooling and expense

Model monitoring is

cumbersome but critical

+

+

=

Page 18: Deploying and managing machine

Amazon Confidential – Do not share or distribute

Introducing Amazon SageMaker Model Monitor

Automatic data

collection

Continuous

monitoringCloudWatch

integration

Continuous monitoring of models in production

Visual

data analysisFlexibility

with rules

Page 19: Deploying and managing machine

Workshop map

Page 20: Deploying and managing machine

Deploy trained model (XGBoost movie recommendation model)

Amazon SageMaker

training job

Model Amazon SageMaker

Endpoint

Applications

Page 21: Deploying and managing machine

Enable data capture for Amazon SageMaker Endpoint

Amazon SageMaker

training job

Model Amazon SageMaker

EndpointApplications

Requests,

predictions

Page 22: Deploying and managing machine

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Page 23: Deploying and managing machine

Create Endpoint with data capture enabled

Page 24: Deploying and managing machine

s3://{destination-bucket-prefix}/{endpoint-name}/{variant-name} /yyyy/mm/dd/hh/filename.jsonl

Data captured from SageMaker Endpoint

Page 25: Deploying and managing machine

Example of collected prediction request and response

Page 26: Deploying and managing machine

Workshop map

2. Run predictions and

view captured data

Page 27: Deploying and managing machine

Run predictions and view captured data

Amazon SageMaker

training job

Model Amazon SageMaker

Endpoint

Applications

Requests,

predictions

View

captured data

Page 28: Deploying and managing machine

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Page 29: Deploying and managing machine

Workshop map

Page 30: Deploying and managing machine

Amazon SageMaker

training job

Model Amazon SageMaker

Endpoint

Applications

Baseline statistics

and constraints

Requests,

predictions

Analyze

baseline results

Page 31: Deploying and managing machine

Generate baseline: Create a ProcessingJob

Page 32: Deploying and managing machine

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Page 33: Deploying and managing machine

Generate baseline: Under the hood ProcessingJob

Page 34: Deploying and managing machine

Baseline results – Statistics

Page 35: Deploying and managing machine

Baseline results – Statistics

Page 36: Deploying and managing machine

Baseline results – Constraints (suggested)

Page 37: Deploying and managing machine

Workshop map

Page 38: Deploying and managing machine

Amazon SageMaker

training job

Model Amazon SageMaker

Endpoint

Applications

Results:

statistics

and violations

Baseline statistics

and constraints

Requests,

predictions

Page 39: Deploying and managing machine

MonitoringSchedule Job

Page 40: Deploying and managing machine

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Page 41: Deploying and managing machine

Model monitoring: Under the hood

ProcessingJob

Page 42: Deploying and managing machine

Monitoring Schedule Execution Summary

Page 43: Deploying and managing machine

MonitoringSchedule execution: constraint_violations.json

Page 44: Deploying and managing machine

MonitoringSchedule execution: Violation sample

Page 45: Deploying and managing machine

{ "violations": [{

"feature_name" : "string",

"constraint_check_type" :

"data_type_check",

| "completeness_check",

| "baseline_drift_check",

| "missing_column_check",

| "extra_column_check",

| "categorical_values_check"

"description" : "string"

}]

}

MonitoringSchedule execution: Violation types

Page 46: Deploying and managing machine

For numerical fields:

Metric : Max → query for MetricName: feature_data_{feature_name}, Stat: Max

Metric : Min → query for MetricName: feature_data_{feature_name}, Stat: Min

Metric : Sum → query for MetricName: feature_data_{feature_name}, Stat: Sum

Metric : SampleCount → query for MetricName: feature_data_{feature_name}, Stat: SampleCount

Metric:Average→queryforMetricName:feature_data_{feature_name},Stat:Average

For both numerical and string fields:

Metric: Completeness → query for MetricName: feature_non_null_{feature_name}, Stat: Sum

Metric:BaselineDrift→queryforMetricName:feature_baseline_drift_{feature_name},Stat:Sum

CloudWatch metrics

/aws/sagemaker/Endpoints/data-metric namespace with EndpointName and ScheduleName dimensions

Page 47: Deploying and managing machine

Workshop map

Page 48: Deploying and managing machine

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Page 49: Deploying and managing machine

Alerting and automate training trigger

Amazon SageMaker

Training job

Model Amazon SageMaker

Endpoint

Applications

Results:

statistics

and violations

Baseline statistics

and constraintsAmazon

CloudWatch

metrics

Requests,

predictions

Analysis of

results

Notifications

• Model updates

• Training data

updates

• Retraining

Page 50: Deploying and managing machine

MonitoringSchedule execution: CloudWatch Alarms

Page 51: Deploying and managing machine

Take corrective action: Retrigger model training

Page 52: Deploying and managing machine

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Page 53: Deploying and managing machine

Workshop map

Page 54: Deploying and managing machine

Price a

house

Find

regulatory

violations

USA

Brazil

Singapore

… …

Next best

action

C-000001 C-000002 C-945821…

Number of models can add up quickly …

Page 55: Deploying and managing machine

Multi-model endpointsFlexible cost savings as number of models scale

EP-1

Model 1

EP-2

Model 2

EP-10

Model 10

EP

Model 1

Model 2

…Model 10

Sample scenario: ml.c5.xlarge, $0.238/hr, 2 instances running 24/7

10 separate endpoints

$3,430/mo

1 multi-model endpoint

$343/mo

Page 56: Deploying and managing machine

Multi-Model Endpoints

Mode:

Artifact location:

predict

s3://bucket/your-endpoint-models/

load

new_york.tar.gz

texas.tar.gz

florida.tar.gz

nevada.tar.gz

Amazon SageMaker

Multi-model endpoint S3 model storage

Page 57: Deploying and managing machine

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

multi-model-endpoint

Page 58: Deploying and managing machine

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Page 59: Deploying and managing machine

Thank you!

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Sireesha Muppala

AI/ML Specialist SA

Amazon Web Services

Nitin Wagh

Sr. BDM, Amazon SageMaker

Amazon Web Services

Page 60: Deploying and managing machine

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.