aws august webinar series - ec2 spot instances - 08192015
Post on 20-Mar-2017
5.957 Views
Preview:
TRANSCRIPT
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
EC2 SpotSave Up to 90% on your Amazon EC2 Bill with Spot Instances
Tipu Qureshi Jafar Shameem19th August 2015
Name your own price for EC2 Compute
• A market where price of compute changes based upon Supply and Demand
• When Bid Price exceeds Spot Market Price, instance is launched
• Instance is terminated (with 2 minute warning) if market price exceeds bid price
• Unused On-Demand Instances
What is Spot?
• Spot prices are determined via supply and demand• There are hundreds of uncorrelated Spot markets• Prices can, but often don’t fluctuate wildly
About Spot…
General-purpose: M1, M3 , T2
Compute-optimized:C1, CC2, C3, C4
Memory-optimized: M2, CR1, R3, M4
Dense-storage: HS1, D2
I/O-optimized: HI1, I2
GPU: CG1, G2
Micro: T1, T2
.micro
.medium
.large
.xlarge
.2xlarge
.4xlarge
.8xlarge
WindowsLinux
-1a-1b-1c….
Type Size OS AZ
Spot is not one market
Each instance family (r3) and size (4xlarge), in each Availability Zone (US-East-1b)
Uncorrelated pools of Spot Capacity
50% Bid
70% Bid
You pay the market price
Bid Price and Market Price
cc2.8xlarge32 cores, 60.5 GB memory
On-Demand Price:$2.00/hr
$0.00936/core/hr
On average, AWS adds enough new server capacity every day to support Amazon’s global infrastructure when
it was a $7B business.
EC2 Spot - best practices
Check the Price History
Describe Spot Price History API:• Provides historical prices on a per-pool basis • Goes back 90 days (3 months)• Popular instance types tend to have Spot prices that are
somewhat more volatile• Older generations (including c1.8xlarge, m1.small,
cr1.8xlarge, and cc2.8xlarge) tend to be much more stable and have lower cost in general
Capacity pools
Set of EC2 instances of the same properties:• Availability zone• Product/Operating system (Linux/Unix or Windows)• EC2 instance type
Each EC2 capacity pool has it’s own:• Availability – number of Spot instances• Price – based on supply and demand
Use Multiple Capacity Pools
• Run applications across multiple capacity pools to reduce your application’s sensitivity to price spikes that affect a pool
• In general, there is very little correlation between prices in different capacity pools.
• For example, if you run in five different pools your price swings and interruptions can be cut by 80%.
Use Multiple Capacity Pools
Run across multiple availability zones in conjunction• Auto Scaling• Spot Fleet API
Run application across different sizes of instances within the same family
• Amazon EMR takes this approach
Your application could figure out how many vCPUs it is running on, and then launch enough worker threads to keep all of them occupied.
CPU and cores• What kind of performance does your application require?
How many cores does your application need?Memory/core
• How much memory per core does your application need?Networking
• Does your application need high, moderate, low network bandwidth?
Disk• How much local disk does your application need?
Use Normalized pools of Compute
You only pay what the Market price is
But, bid what you are willing to pay
You pay for the price as you enter the hour
And pay for it at the end of the hour
If you get interrupted, you don’t pay for that hour
Bid only what you are willing to pay.
(by default, bid limited to 10 * On Demand Price)
What about Bidding Strategy?
AWS Spot Labs• https://github.com/awslabs/aws-spot-labs
Helps to find capacity pools (defined as instance type and AZ) with lower price volatility by ordering these pools based on duration of time since the Spot price last exceeded the bid price. It uses AWS CLI to programmatically obtain Spot price history data.
Finding the best pools of Compute Capacity
python get_spot_duration.py \--region us-east-1 \--product-description 'Linux/UNIX' \--bids c3.xlarge:0.105,c3.2xlarge:0.21,c3.4xlarge:0.42,c3.8xlarge:0.84,c4.xlarge:0.110,c4.2xlarge:0.220,c4.4xlarge:0.440,c4.8xlarge:0.880,cc2.8xlarge:1.000,c1.xlarge:0.26 \--hours 168
Note:• Price as of 8/15/2015• AZ mappings may differ• 168 hours = 1 week• In this example, bidding
the on-demand price
Using the Spot Tools Lab
Build stateless, distributed, scalable applicationsChoose which instance types fit your workload the bestIngest price feed data for AZs and regions Make run time decisions on which Spot pools to launch in based on price and volatilityManage interruptionsMonitor and manage market prices across Azs and instance typesManage the capacity footprint in the fleetAnd all of this while you don’t know where the capacity isServe your customers
Helping with the undifferentiated heavy lifting
UNDIFFERENTIATED HEAVY LIFTING
Instead of writing all that code to manage Spot Instances, simply specify:
Target Capacity - The number of EC2 instances that you want in your fleet.Maximum Bid Price - The maximum bid price that you are willing to pay.Launch Specifications - # of and types of instances, AMI id, VPC, subnets or AZs, etc.IAM Fleet Role - The name of an IAM role. It must allow EC2 to launch and terminate instances on your behalf.
Introducing Spot Fleet
EC2 Spot - Use Cases
Stateless Web/App Server Fleets
Hadoop Workloads
Continuous Integration (CI)
High Performance Computing (HPC)
Grid Computing
Media Rendering / Transcoding
Spot Use Cases
EC2 Spot - Web Architecture
Considerations
Highly availability
Cost
Elasticity
Stateless Web tier
Parallelism
Stateless Web/App/API Architecture with Spot
Elastic LoadBalancing
Stateless Web Servers
Stateless Web Servers
On Demand Auto Scaling group
Session State Data
Stateless Web Servers (Spot)
Stateless Web Servers (Spot)
Spot Auto Scaling group
Availability Zone A
Availability Zone B
Stateless Web Servers (Spot)
Stateless Web Servers (Spot)
Spot Auto Scaling group
Web Application - Auto ScalingMultiple Auto Scaling groups
• On-demand instances for fallback. • Multiple EC2 Spot instance Auto Scaling groups• Each Spot Auto Scaling group using a different capacity pool
(e.g. AZ, bid, Instance size, Instance type)
Auto Scaling groups behind the same Elastic Load Balancer.
Pick the right instance time for the job based on the price history.
Auto Scaling Policies
Aggressive scaling policies for Spot Auto Scaling Groupse.g. Scale up at 75% CPU utilization and scale down when at 25% CPU utilization with a large capacity range)
More conservative scaling policies for On-Demand Auto Scaling groups.
Session state for the web application can be stored in DynamoDB.
• Data replicated across availability zones.
You can also choose other databases to maintain state in your architecture.
• Amazon RDS using Multi-AZ deployments• Amazon Elasticache
Where to store the state?
Spot termination considerations
Availability of Spot instances can vary based on supply and demand
Architect application to be resilient to instance termination
When the Spot price exceeds the price you named (i.e. the bid price), the instance will receive a two-minute warning that the instance will be terminated
Spot termination considerations
Check for the 2 minute spot instance termination notification every 5 seconds leveraging a script invoked at instance launch. Upon notification:• Place any session information into DynamoDB• Use IAM roles so that the spot instances can de-register
themselves from the ELB upon termination notification
Since the Auto Scaling groups span across multiple availability zones, we highly recommend enabling cross-zone load balancing for the load balancer.
To allow in-flight requests to complete when de-registering Spot instances that are about to be terminated, connection draining can be enabled on the load balancer with a timeout of 90 seconds.
Elastic Load Balancing
Sample script
#!/bin/bashwhile true do if curl -s http://169.254.169.254/latest/meta-data/spot/termination-time | \ grep -q .*T.*Z; then instance_id=$(curl -s http://169.254.169.254/latest/meta-data/instance-id); \ aws elb deregister-instances-from-load-balancer \ --load-balancer-name my-load-balancer \ --instances $instance_id; /env/bin/flushsessiontoDBonterminationscript.sh; else # Spot instance not yet marked for termination. sleep 5 fidone
Web Application Architecture with Spot
Elastic LoadBalancing
Stateless Web Servers
Stateless Web Servers
On Demand Auto Scaling group
Session State Data
Stateless Web Servers (Spot)
Stateless Web Servers (Spot)
Spot Auto Scaling group
Availability Zone A
Availability Zone B
Stateless Web Servers (Spot)
Stateless Web Servers (Spot)
Spot Auto Scaling group
Studyplus Case Study
Batch Processing with Amazon EC2 Spot
Batch oriented applications can leverage on-demand processing using EC2 Spot to save up to 90% cost:• Claims processing• Large scale transformation• Media processing• Multi-part data processing work
You can also leverage EMR with spot instances.
Batch Processing with Amazon EC2 Spot
• Multi-part job processing architecture • Auto Scaling groups to setup a heterogeneous, scalable
“grid” of EC2 spot instances with multiple capacity pools as worker nodes
• Use S3 to invoke AWS Lambda upon object upload• Use SQS for decoupling• DynamoDB for tracking job status• Complete large batch processing tasks in parallel
Batch Processing with Amazon EC2 Spot
About Lambda and SQS
AWS Lambda is a compute service that runs your code in response to events and automatically manages the compute resources for you, making it easy to build applications that respond quickly to new information.
Amazon Simple Queue Service (SQS) is a fast, reliable, scalable, fully managed message queuing service to decouple components.
Depending on the application’s needs, multiple SQS queues might be required for functions and priorities.
Batch Processing with Amazon EC2 Spot
On Demand Auto-Scaling group
Output S3 bucket
Spot Auto-Scaling group 2
Availability Zone A
Availability Zone BSpot Auto-
Scaling group 1
Upload object into input S3
bucket
Job SQS Queue
Auto Scaling groups will scale up based on queue depth and scale down based on
CPU utilization CW metrics
Workers will check for
jobs in the queue
Workers will update Job status (start time, SLA end time, etc)
in DynamoDB
Uploads to S3 will trigger a Lamda
function to put jobs in SQS and DynamoDB
EFSEC2 instance worker fleet
IAM Role for Lambda Policy{ "Version": "2012-10-17", "Statement": [ { "Sid": "Stmt1438283855455", "Action": [ "dynamodb:PutItem" ], "Effect": "Allow", "Resource": "arn:aws:dynamodb:us-east-1::table/demojobtable" }, { "Sid": "Stmt1438283929844", "Action": [ "sqs:SendMessage" ], "Effect": "Allow", "Resource": "arn:aws:sqs:us-east-1::demojobqueue" } ]}
AWS Lambda function for SQS and DynamoDB updates// dependenciesvar AWS = require('aws-sdk');
// get reference to clientsvar s3 = new AWS.S3();var sqs = new AWS.SQS();var dynamodb = new AWS.DynamoDB();
console.log ('Loading function');
exports.handler = function(event, context) { // Read options from the event. var srcBucket = event.Records[0].s3.bucket.name; // Object key may have spaces or unicode non-ASCII characters. var srcKey = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, " "));
// prepare SQS message var params = { MessageBody: 'object '+ srcKey + ' ', QueueUrl: 'https://sqs.us-east-1.amazonaws.com//demojobqueue', DelaySeconds: 0 }; //send SQS message sqs.sendMessage(params, function (err, data) { if (err) { console.error('Unable to put object' + srcKey + ' into SQS queue due to an error: ' + err); context.fail(srcKey, 'Unable to send message to SQS'); } // an error occurred else { //define DynamoDB table variables var tableName = "demojobtable"; var datetime = new Date().getTime().toString();
AWS Lambda function for SQS and DynamoDB updates
//Put item into DynamoDB table where srcKey is the hash key and datetime is the range key dynamodb.putItem({ "TableName": tableName, "Item": { "srcKey": {"S": srcKey }, "datetime": {"S": datetime }, } }, function(err, data) { if (err) { console.error('Unable to put object' + srcKey + ' into DynamoDB table due to an error: ' + err); context.fail(srcKey, 'Unable to put data to DynamoDB Table'); } else { console.log('Successfully put object' + srcKey + ' into SQS queue and DynamoDB'); context.succeed(srcKey, 'Data put into SQS and DynamoDB'); } }); } });};
AWS Lambda function for SQS and DynamoDB updates
Batch Processing with Amazon EC2 Spot
• Worker nodes get job parts from the SQS and perform single tasks based on the job task state in DynamoDB
• Store the input objects in a file system such as Amazon Elastic File System (Amazon EFS), local instance store or Amazon Elastic Block Store (EBS)
• Each job can be further split into multiples sub-parts if there is a mechanism to stitch the outputs together
• Once completed, the objects will be uploaded back to S3 using multi-part upload.
Batch Processing with Amazon EC2 Spot
On Demand Auto-Scaling group
Output S3 bucket
Spot Auto-Scaling group 2
Availability Zone A
Availability Zone BSpot Auto-
Scaling group 1
Upload object into input S3
bucket
Job SQS Queue
Auto Scaling groups will scale up based on queue depth and scale down based on
CPU utilization CW metrics
Workers will check for
jobs in the queue
Workers will update Job status (start time, SLA end time, etc)
in DynamoDB
Uploads to S3 will trigger a Lamda
function to put jobs in SQS and DynamoDB
EFSEC2 instance worker fleet
More automation?Use a Lambda function to dynamically manage Auto Scaling groups based on the Spot market
• The Lambda function could periodically invoke the EC2 Spot APIs to assess market prices and availability and respond by creating new Auto Scaling launch configurations and groups automatically.
• This function could also delete any Spot Auto Scaling groups and launch configurations that have no instances.
AWS Data Pipeline can be used to invoke the Lambda function using the AWS CLI at regular intervals by scheduling pipelines
Automated Batch Architecture with Spot
Worker
Worker
On Demand Autoscaling group
Output S3 bucket
Worker (spot)
Worker(spot)
Spot Autoscaling group 2
Availability Zone A
Availability Zone B
Worker(spot)
Worker (spot)
Spot Autoscaling group 1
Upload object into input S3
bucket
Job SQS Queue
AutoScaling groups will scale up based on queue depth and scale down based on CPU utilization
CW metrics
Workers will check for
jobs in the queue
Workers will update Job status (start time, SLA end time, etc)
in DynamoDB
DataPipeline can invoke a Lambda function in a scheduled manner which can manage AutoScaling
groups based on the spot market
Uploads to S3 will trigger a Lamda
function to put jobs in DynamoDB and SQS EFS
Further cost optimization with Trusted Advisor
Save money on AWS by eliminating unused and idle resources Cost Optimization TA Checks:
• Amazon EC2 Reserved Instances Optimization• Low Utilization Amazon EC2 Instances• Idle Load Balancers• Underutilized Amazon EBS Volumes• Unassociated Elastic IP Addresses• Amazon RDS Idle DB Instances
AWS re:Invent 2015 – October 6-9AWS re:Invent is the largest annual gathering of the global cloud community. Whether you are an existing customer or new to the cloud, AWS re:Invent will provide you with the knowledge and skills to refine your cloud strategy, improve developer productivity, increase application performance and security, and reduce infrastructure costs.
Though AWS re:Invent tickets are sold out, you can still register to view the Live Stream Broadcasts of the keynote addresses and select technical sessions on October 7 and October 8. Register now.
Details:Wednesday, October 79:00am - 10:30am PT: Andrew Jassy, Sr. Vice President, AWS11:00am - 5:15pm PT: 5 of the most popular breakout sessions (to be announced)
Thursday, October 89:00am - 10:30am PT: Dr. Werner Vogels, CTO, Amazon11:00am - 6:15pm PT: 6 of the most popular breakout sessions (to be announced)
Register now for the Live Stream Broadcast by submitting your email where prompted on the AWS re:Invent home page.
Stay Connected: Follow event activities on Twitter @awsreinvent (#reinvent), or like us on Facebook.
Thank you!
Questions?
What have customers done with Spot?
Some case studies..
EBS
Submit jobs, orchestrate HPC clusters over VPC
Run 1 Million drive head designs = 70.75 core-years
90x throughput: Ran in 8 hours, not 30 days 3 days from idea to running
70,908 cores, 729 TFLOPSc3, r3 with Intel E5-2670 v2
Cost: $5,594Spot Instances
New Drive Head DesignWorkloads
World’s Largest F500 Cloud RunTransforming drive design to store the world’s data
Encrypt, route data to AWS, return results
Cluster 70,908 Coreswith SpotInstances
AWS Delivered Unheard-of Processing
39 years of science
10,600 AWS Instances
Saved equivalent of $40M infrastructure
10 Million compounds screened
39 drug design years in 11 hours for a cost of… $4,232
3 promising compounds identified
Scaling Hadoop Jobs with Spothttp://engineering.bloomreach.com/strategies-for-reducing-your-amazon-emr-costs/
Bloomreach launches 1,500 to 2,000 Amazon EMR clusters and run 6,000 Hadoop jobs every day.
Continuous Integration & Testing with Spot
• Tapjoy - Premier Mobile Ad Network Across iOS & Android• Global Network (435 Million Monthly Reach)• Jenkins + Spot Instances
• https://github.com/bwall/ec2-plugin (thanks to an RIT senior project)• Go wide during business hours, scale back in the evenings.
Automatically kicks online at 06:00ET• Workers scale horizontally to support dozens of simultaneous regression
tests spread out over dozens of workers• Jenkins automatically guards against spot termination
Ooyala• Video technology platform that
serves ESPN, Bloomberg, ...• Uses combo of OD/RI/Spot to
ensure it can cover predicted volumes while keeping costs low
• http://aws.amazon.com/solutions/case-studies/ooyala/
Vevo• Library of over 75,000 HD videos• Must be able to rapidly transcode
library to a new screen format• Can spin up 100s of Spot
instances to transcode entire library in a matter of days (instead of the weeks)
Queue-based media transcoding
Using Spot Fleet
An example..
Using Spot Fleet
Create EC2 Spot Fleet IAM Role Requesting a fleet:
• aws ec2 request-spot-fleet --spot-fleet-request-config file://mySmallFleet.json
Describe fleet:• aws ec2 describe-spot-fleet-requests• aws ec2 describe-spot-fleet-requests --spot-fleet-request-ids <sfr-………..>
Describe instances within the fleet• aws ec2 describe-spot-fleet-instances --spot-fleet-request-id <sfr-…………>
Cancel Spot Fleet (with termination):• aws ec2 cancel-spot-fleet-requests --spot-fleet-request-ids <sfr-…………..> -
terminate-instances
mySpotFleet.json{ "TargetCapacity": 5, "SpotPrice": "1.00", "IamFleetRole": "arn:aws:iam::962872214910:role/fleetRole",
"LaunchSpecifications": [ { "ImageId": "ami-ff527ecf", "InstanceType": "m1.small" },
{ "ImageId": "ami-ff527ecf", "InstanceType": "m1.medium" },
{ "ImageId": "ami-ff527ecf", "InstanceType":"m1.large" } ]}
top related