enable secure ml deployments
TRANSCRIPT
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Enable secure ML deployments in Financial Services
F S I 4 0 4
Ilya Epshteyn
Principal Solutions Architect
Amazon Web Services
Songzhi Liu
Senior Big Data Consultant
Amazon Web Services
Workshop team
The challenge
FSI security considerations
The solution
Hands on
Agenda
The ChallengeYour company wants to enable their data scientists to deliver machine learning-
based projects that are trained on highly sensitive company data.
The project teams are constrained by shared on-premise resources, so you have
been tasked with determining how the business can leverage the cloud to
provision environments for the data science teams.
The environment must be secure, protecting the sensitive data while also
enabling the data science teams to self-service.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
• Is all access compliant with our
corporate authentication
standards?
• Can all user interfaces integrate
with our AWS Directory Service for
Microsoft Active Directory?
• How do we authorize access to all
resources?
• How do we ensure that users only
access data they are allowed to
see?
AuthenticationPrivate network connectivity
Data protection
Authorization
Artifact management Auditability
• Can all traffic transfer over private
and secure network links?
• Can we block ingress and egress
Internet access?
• Can we encrypt all data in-transit
and at-rest?
• Does the service support AWS Key
Management Service (AWS KMS)
customer master keys (CMK)?
• How do we safely bring in public
and private libraries and
frameworks?
• How do we securely persist and
protect code and model artifacts?
• Does the service provide end-to-
end auditability?
• Can audit trails be captured at user
and file/object level?
Common security considerations
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
• All Amazon SageMaker components (notebooks, training and hosting instances) should be
deployed in and accessed over a private network with no Internet connectivity
• Configure VPC Endpoints for all services the notebook may need including:
o Amazon Simple Storage Service (Amazon S3)
o Amazon CloudWatch (training analytics)
o Amazon CloudWatch Logs (training job logging)
o AWS Security Token Service (AWS STS) (to obtain notebook IAM role ARN)
o Amazon SageMaker API (to submit a training job)
o Amazon SageMaker Runtime (to invoke hosting endpoint)
o Amazon SageMaker notebook (to access the notebook from VPC)
• Use VPC endpoint policies to further limit access to Amazon SageMaker resources
Common patterns for private networking
® 2018 Amazon Web Services Inc. or its Affiliates. All rights reserved.
Amazon SageMaker notebook – Amazon SageMaker egress, no VPC
connectivity (default)
Private subnet
Customer account
Customer VPC
Availability Zone 2
Public subnet
Availability Zone 1
Amazon SageMaker platform
VPC
Amazon SageMaker platform
egress VPC
Auth Proxy
Service
Private subnet
Public subnet
Internet gateway
Internet
gateway
Amazon SageMaker service account
Notebook
S3 bucket CloudWatch
Logs
Amazon
SageMaker API
Amazon
SageMaker
Runtime
® 2018 Amazon Web Services Inc. or its Affiliates. All rights reserved.
Private subnet
Amazon SageMaker notebook – VPC connectivity with VPC endpoints
(recommended)
Amazon SageMaker service account
VPC endpoints
Amazon SageMaker platform
Egress VPC
Notebook
Auth Proxy
Service
Amazon SageMaker platform
VPC
Internet
gateway
Private subnet
Customer account
S3 bucket CloudWatch
Logs
Amazon
SageMaker API
Customer VPC
Availability Zone 2Availability Zone 1
Private subnet
Private subnet
Amazon
SageMaker
Runtime
® 2018 Amazon Web Services Inc. or its Affiliates. All rights reserved.
Amazon SageMaker notebook – Limit access to notebook by SourceIp or
SourceVpce{
"Id": "notebook-example-1",
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Enable Notebook Access",
"Effect": "Allow",
"Action": [
"sagemaker:CreatePresignedNotebookInstanceUrl",
"sagemaker:DescribeNotebookInstance"
],
"Resource": "*",
"Condition": {
"ForAnyValue:StringEquals": {
"aws:sourceVpce": [
"vpce-111bbccc",
"vpce-111bbddd"
]
}
}
}
]
}
Reference: SourceVpce
{
{
"Effect": "Allow",
"Action": "sagemaker:CreatePresignedNotebookInstanceUrl",
"Resource": "*",
"Condition": {
"IpAddress": {
"aws:SourceIp": [
"192.0.2.0/24",
"203.0.113.0/24"
]
}
}
}
]
}
Reference: SourceIp
One of the values for
aws:SourceVpce is the ID of
the interface endpoint for
the notebook. The other is
the ID of the interface
endpoint for the Amazon
SageMaker API.
® 2018 Amazon Web Services Inc. or its Affiliates. All rights reserved.
Private subnet
Amazon SageMaker notebook – Limit access to notebook by SourceVpce
Amazon SageMaker service
account
Notebook
Auth Proxy
Service
Amazon SageMaker
platform VPC
Private subnet
Customer account
S3 bucket CloudWatch
Logs
Amazon
SageMaker API
Customer VPC
Availability Zone
Amazon
SageMaker
Runtime
Amazon SageMaker
VPC Endpoints
Amazon
SageMaker
notebook
Amazon
SageMaker API
On-premise
Inbound Amazon
Route 53
Resolver endpointCorp DNS server
Customer account
Customer shared services VPC
VPC endpoints
S3 bucket CloudWatch
Logs
Amazon SageMaker platform VPC
Amazon SageMaker service account
Internet
gateway
Private subnet
Private subnet
Customer account
Customer VPC
Availability Zone 2Availability Zone 1
Private subnet
Private subnetAlgorithm
Platform agent
Host agent
Data agent Log agent
Training instance 1 Training instance 2Job 1
Algorithm
Platform agent
Host agent
Data agent Log agent
VPC endpoints
Amazon SageMaker training default deployment
® 2018 Amazon Web Services Inc. or its Affiliates. All rights reserved.
Amazon SageMaker training VPC deployment (recommended)
S3 bucket
Private subnet
VPC endpoints
Private subnet
Customer account
Customer VPC
Availability Zone 2Availability Zone 1
Private subnet
Private subnet
CloudWatch
Logs
Amazon SageMaker platform VPC
Amazon SageMaker service account
Internet
gateway
Algorithm
Platform agent
Host agent
Data agent Log agent
Training instance 1 Training instance 2Job 1
Algorithm
Platform agent
Host agent
Data agent Log agent
VPC configurations must
be provided as part of
training job parameters.
Can be simplified with
lifecycle configurations or
AWS Service Catalog.
Amazon SageMaker training VPC deployment
• Use IAM conditions to limit
access from specific
principals
• Scope down access to
specific resources
VPC endpoint policies
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
• Access to Jupyter requires IAM authentication and authorization
• Each data scientist should be provided their own Amazon SageMaker notebook (no multi-
tenancy)
• Each data scientist should only be able to open, start, and stop their own Amazon
SageMaker notebook
• Notebook creation within a VPC requires additional Amazon EC2 permissions (see next
slides)
• Disable root access on the notebook
Common patterns for authentication & authorization
• Data scientist user role (Console access, start/stop Amazon SageMaker notebook, open
Jupyter)
• Notebook creation role (used by CI/CD pipeline or AWS Service Catalog to create a
notebook)
• Notebook execution role (used by notebook to access AWS resources)
• Training job execution role (used by training job to access AWS resources)
• Transform job execution role (same as training job execution role)
• Endpoint creation role (is using CI/CD pipeline to create hosting endpoints)
• Endpoint hosting role (used by hosting endpoint to access AWS resources)
• Endpoint invocation role (used by an application to call hosting endpoint)
Common IAM roles for Amazon SageMaker
{"Version": "2012-10-17","Statement": [
{"Sid": "SageMakerStartStop","Effect": "Allow","Action": [
"sagemaker:StartNotebookInstance",
"sagemaker:StopNotebookInstance",
"sagemaker:CreatePresignedNotebookInstanceUrl"],"Resource": "*","Condition": {
"StringEquals": {
"sagemaker:ResourceTag/owner": "${aws:userid}"}
}}
▪ This policy ensures that notebook users can only
start, stop, or open their own notebooks using tag-
based condition
▪ aws:userid value for SAML federated users is role
id:caller-specified-role-name (see this page)
Limit data scientists’ access to their own notebooks
{"Sid": "VpcConfiguration","Effect": "Allow","Action": [
"ec2:DescribeVpcs","ec2:DescribeSubnets","ec2:DescribeSecurityGroups","ec2:DescribeNetworkInterfaces",
"ec2:CreateNetworkInterface","ec2:CreateNetworkInterfacePermission","ec2:DeleteNetworkInterface","ec2:DeleteNetworkInterfacePermission","ec2:DescribeDhcpOptions"
],"Resource": "*"
}
▪ NetworkInterface permissions are required when
creating a notebook that is associated with
customer’s VPC
▪ For this reason, a common pattern is to create the
notebooks with a CI/CD pipeline or AWS Service
Catalog and grant those permissions to the
pipeline/execution roles only
▪ The rest of the ec2:Describe permissions listed are
required to view VPC, subnets, and security groups
in the Amazon SageMaker console
Additional permissions needed for VPC deployment
“Condition”: {“ArnEquals”: {
“sagemaker:VolumeKmsKey”: “kms-1234”}
"Effect":"Allow","Condition":{
"ForAllValues:StringEqualsIfExists":{"sagemaker:VpcSubnets":[
"{{resolve:ssm:PrivateSubnetAId:1}}","{{resolve:ssm:PrivateSubnetBId:1}}"
],"sagemaker:VpcSecurityGroupIds":["{{resolve:ssm:SageMakerSecurityGroupId:1}}"
]}
}
IAM condition keys for Amazon SageMaker
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
• All data at rest on Amazon S3 and on the notebook should be encrypted with AWS KMS
Customer Managed Keys (CMK)
o Leverage Amazon S3 default encryption option with AWS KMS CMK
o Encrypt EBS volumes with KMS CMK
• All data in transit including traffic between Amazon SageMaker instances must be
encrypted in transit
o Enable encryption for multi-node training intercommunication
Common patterns for data protection
• If source and output buckets are encrypted with AWS KMS CMK, the notebook execution
role will need to have the necessary encrypt/decrypt permissions to the CMK
• Enable AWS KMS CMK encryption for the data volumes when creating the notebook
Enabling KMS encryption for Amazon S3 and notebook volumes
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
• Outside of sandbox environments, data scientists should not be able to download packages
from the Internet, only from private repos
• Use lifecycle configurations to distribute packages to the notebook
• All model artifacts should be versioned and archived if required
o Enable Amazon S3 versioning
o Enable Amazon S3 object lock, if required
• Use version control system and repository management for all artifacts
Common patterns for artifact management
Algorithm code
• Jupyter notebook
• Scripts
• DockerFile
Docker image
• Training image
• Inference image
Model artifacts
• Model.tar.gz
Libraries
• Python libraries
• R libraries
• RPM packages
Repository management(e.g. Artifactory, Amazon S3)
Version control system (e.g. AWS CodeCommit / GitHub)
Library and repository management
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
• All Amazon SageMaker API calls are logged in AWS CloudTrail
• CloudTrail Amazon S3 data events should be enabled for Amazon S3 data and model
artifacts auditing
Common patterns for auditability
+ exec su -s /bin/sh -l -c 'source activate JupyterSystemEnv && exec "$0" "$@"' ec2-user -- jupyter notebook --notebook-dir=/home/ec2-user/SageMaker/ --ip=0.0.0.0 --NotebookApp.token=[REDACTED][I 00:10:23.163 NotebookApp] Using EnvironmentKernelSpecManager...[I 00:10:23.165 NotebookApp] Started periodic updates of the kernel list (every 3 minutes).[I 00:10:23.551 NotebookApp] Writing notebook server cookie secret to /home/ec2-user/.local/share/jupyter/runtime/notebook_cookie_secret[W 00:10:25.034 NotebookApp] All authentication is disabled. Anyone who can connect to this server will be able to run code.[I 00:10:25.641 NotebookApp] JupyterLab extension loaded from /home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.6/site-packages/jupyterlab[I 00:10:25.642 NotebookApp] JupyterLab application directory is /home/ec2-user/anaconda3/envs/JupyterSystemEnv/share/jupyter/lab[I 00:10:26.697 NotebookApp] [nb_conda] enabled[I 00:10:27.251 NotebookApp] Serving notebooks from local directory: /home/ec2-user/SageMaker[I 00:10:27.251 NotebookApp] The Jupyter Notebook is running at:[I 00:10:27.251 NotebookApp] https://(ip-10-10-13-171 or 127.0.0.1):8443/
CloudWatch Logs – Jupyter log
Arguments: train[2019-10-16:22:12:23:INFO] Running standalone xgboost training.[2019-10-16:22:12:23:INFO] File size need to be processed in the node: 0.21mb. Available memory size in the node: 56464.76mb[22:12:23] S3DistributionType set as FullyReplicated[22:12:23] 2923x9 matrix with 23384 entries loaded from /opt/ml/input/data/train[22:12:23] S3DistributionType set as FullyReplicated[22:12:23] 626x9 matrix with 5008 entries loaded from /opt/ml/input/data/validation[22:12:23] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 30 extra nodes, 0 pruned nodes, max_depth=5[0]#011train-rmse:8.13221#011validation-rmse:8.24165[22:12:23] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 38 extra nodes, 2 pruned nodes, max_depth=5[1]#011train-rmse:6.64547#011validation-rmse:6.74473[22:12:23] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 38 extra nodes, 4 pruned nodes, max_depth=5[2]#011train-rmse:5.48156#011validation-rmse:5.58173
CloudWatch Logs – Training job
{
"eventVersion": "1.05",
"userIdentity": {
"type": "AssumedRole",
"principalId": "AROAIWDEMPFJ343XXXXXX:SageMaker",
"arn": "arn:aws:sts::705582XXXXXX:assumed-role/AmazonSageMaker-ExecutionRole-20180418T142231/SageMaker",
……
"eventTime": "2019-07-30T03:46:51Z",
"eventSource": "sagemaker.amazonaws.com",
"eventName": "CreateTrainingJob",
"awsRegion": "us-west-2",
"userAgent": "im.amazonaws.com",
"requestParameters": {
"trainingJobName": "sagemaker-tensorflow-2019-07-30-03-46-50-373",
"hyperParameters": {
"sagemaker_requirements": "\"\"",
"sagemaker_container_log_level": "20",
"evaluation_steps": "10",
"sagemaker_program": "\"stockmarket_predictor.py\"",
"sagemaker_enable_cloudwatch_metrics": "false",
"sagemaker_region": "\"us-west-2\"",
"checkpoint_path": "\"s3://testbucket-ilya/sagemaker-tensorflow-2019-07-30-03-46-50-373/checkpoints\"",
"sagemaker_job_name": "\"sagemaker-tensorflow-2019-07-30-03-46-50-373\"",
"sagemaker_submit_directory": "\"s3://testbucket-ilya/sagemaker-tensorflow-2019-07-30-03-46-50-373/source/sourcedir.tar.gz\"",
"training_steps": "10"
},
…..
},
"responseElements": {
"trainingJobArn": "arn:aws:sagemaker:us-west-2:705582XXXXXX:training-job/sagemaker-tensorflow-2019-07-30-03-46-50-373"
},
CloudTrail – CreateTrainingJob
{
"eventVersion": "1.05",
"userIdentity": {
"type": "AssumedRole",
"principalId": "AROAIWDEMPFJ34362WJ4I:SageMaker",
"arn": "arn:aws:sts::705582XXXXXX:assumed-role/AmazonSageMaker-ExecutionRole-20180418T142231/SageMaker",
"accountId": "705582XXXXXX",
…..
},
"eventTime": "2019-07-30T04:04:28Z",
"eventSource": "s3.amazonaws.com",
"eventName": "PutObject",
"awsRegion": "us-west-2",
"sourceIPAddress": "10.0.1.51",
"userAgent": "[aws-internal/3 aws-sdk-java/1.11.550 Linux/4.14.123-86.109.amzn1.x86_64 OpenJDK_64-Bit_Server_VM/25.212-b03 java/1.8.0_212
vendor/Oracle_Corporation exec-env/AWS_ECS_EC2]",
"requestParameters": {
"bucketName": "testbucket-ilya",
"Host": "testbucket-ilya.s3.us-west-2.amazonaws.com",
"key": "sagemaker-tensorflow-2019-07-30-03-46-50-373/output/model.tar.gz"
},
"eventType": "AwsApiCall",
"recipientAccountId": "705582XXXXXX",
"vpcEndpointId": "vpce-0eb0fc2b903d61596"
}
CloudTrail – PutObject (model artifact)
• SOC 1, 2, 3
• PCI
• ISO/IEC 27001:2013, 27017:2015, 27018:2014, and ISO/IEC 9001:2015
• HIPAA (excluding public workforce and vendor workforce)
• HITRUST CSF (excludes public workforce and vendor workforce)
• DoD CC SRG (undergoing assessment)
• FedRAMP (JAB review)
Amazon SageMaker compliance scope
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
• In order to solve the challenge you engage a cross-functional team including:
o Cloud Center of Excellence (CCoE) – Manages underlying cloud
environment
o Data science administration team – Manages resources to support the
data science teams
o Data science team – A project team tasked with delivering an ML project
for the business
• During the workshop, you will be wearing multiple (IAM roles
corresponding to the above personas)
o CloudCOE IAM Role
o DataScientistAdmin IAM Role
o DataScientist IAM Role
Cross-functional team
Lab Description Persona
(IAM role)
Lab 1 Deploy base infrastructure CloudCOE
Lab 2 Deploy team resources DataScientistAdmin
Lab 3 Deploy a secure Amazon SageMaker notebook DataScientist
Lab 4 Create a training job and validate detective controls DataScientist
Lab 5 Improve security posture with preventive controls DataScientistAdmin
DataScientist
Workshop labs
• Assume CloudCOE role
• Deploy base infrastructure AWS CloudFormation template which creates the
following:• VPC
o No IGW (no Internet)
o Two private subnets
o Security group for the notebook
o Seven VPC endpoints (AWS STS, CloudWatch, CloudWatch Logs, Amazon S3, Amazon
SageMaker API, Runtime, and notebook)
• IAM roles
o DataScientistAdmin
o DataScientist
• AWS Service Catalog product called SageMakerNotebookExeRole
• Enables DataScientistAdmin to safely create IAM role for Amazon SageMaker notebook
• Detective control for VPC enforcement
o CloudTrail, CloudWatch Events rule, AWS Lambda function
• Stores VPC resources IDs in SSM parameter store
Lab 1 – Deploy base infrastructure
• Assume DataScientistAdmin role
• Create an IAM execution role for the Amazon SageMaker notebook (one role for
team) using AWS Service Catalog product (created in Lab 1 by the CloudCOE)
• Deploy team resources AWS CloudFormation template which creates the
following:
o Team S3 buckets (data and model artifact)
o AWS KMS Customer Managed Key (CMK) to encrypt data in S3 buckets
o AWS Service Catalog product called SageMakerNotebook
▪ Ensures that the notebook is deployed within a VPC
▪ Ensures that root access is disabled
▪ Ensures that all EBS volumes are encrypted with a KMS Customer
Managed Key (CMK)
▪ Copies a Jupyter notebook from Amazon S3
Lab 2 - Deploy team resources
• Assume DataScientist role
• Create an Amazon SageMaker notebook using AWS Service Catalog product
(created in Lab 2 by the DataScientistAdmin)
• Verify that the notebook is deployed within a VPC and that there is no Internet
access
• Launch Jupyter notebook and become familiar with the StockMarket Level
Predictor notebook
Lab 3 - Deploy a secure Amazon SageMaker notebook
• Assume DataScientist role
• Create a training job without specifying VPC parameters
• Verify that the detective controls (CloudWatch event rule + Lambda) detect and
terminate the non-compliant training job
• Add VPC configurations as part of training job parameters and re-launch a
compliant training job
Lab 4 - Create a training job and validate detective controls
• Assume DataScientistAdmin role
• Update the SageMakerNotebookExeRole product to introduce the new detective
controls
• Assume DataScientist role
• Launch a new training job outside of VPC to validate effectiveness of the new
preventive controls
Lab 5 - Improve security posture with preventive controls
• Add a preventive control to ensure that access to the notebook is over VPC
endpoint (i.e., via jump host in a VPC or from on-prem workstation over AWS
Direct Connect)
• Add a preventive control to make sure that each data scientist can only
start/stop/terminate their own notebooks only
• Implement cost optimization control to auto-stop idle notebooks
Bonus challenges
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://dashboard.eventengine.run
Region is N. Virginia (us-east-1)
Region is N. Virginia (us-east-1)
Region is N. Virginia (us-east-1)
A few reminders
Region – NORTH VIRGINIA (US-EAST-1)
User guide contains step-by-step instructions (click to expand)
Leave default values for CloudFormation parameters; only provide the required ones
Write down your AWS account number and TeamName
It can take a few minutes for the kernel to become available in Jupyter notebook
Only modify the required values in Jupyter notebook
Check out FAQ section in the user guide
Ask the workshop team for any help you need!
https://bit.ly/2r26bLy
(https://sagemaker-workshop.com/security.html)
Workshop guide
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Related breakouts
AIM337-R – [REPEAT] Deep dive into Amazon SageMaker security features
AIM337-R1 – [REPEAT 1] Deep dive into Amazon SageMaker security features
AIM327 – Security for ML environments with Amazon SageMaker, featuring Vanguard
AIM307 – Amazon SageMaker deep dive: A modular solution for machine learning
Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ilya Epshteyn
Songzhi Liu
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.