AWS Sticker Shock? How can containers
and automation help?
Ed Lee Mukulika Kapas
VOIP or Dial-in (see chat)
Questions? Hit the GTW chat or @applatix
May 2, 2023 2
• We have about 40 minutes of content with time for questions
• We will post and email a video and slides on Monday
• Post any questions on the GTM chat for us to answer
• If audio fails, let us know on chat! We will dial in again quickly!
• You might hear a train go by at the :24 minute mark, sorry!
But first, some quick housekeeping
May 2, 2023 3
Who are we?
Ed Lee Founder & CTO
Mukulika KapasProduct Director
May 2, 2023 4
• Financial – Cloud sprawl• Operational – Unfamiliarity, steep
learning curve, SLA breaches • Business risks – Using cloud like on-
prem and not meeting business agility• Security – Open ports, no regular
vulnerability assessment
Cloud disasters
May 2, 2023 5
Cloud management frameworkMonitor
Analyze
Optimize
Govern
Account Cost Resource
Across 3-dimensions
Automation is key to analyze and optimize
May 2, 2023 6
•Manageability is key as cloud usage grows• To bring “Order to Chaos”, you must
Gain visibility across all clouds, accounts, regions and services
Group (tag) resources to gain granular understanding of usage and costs
Gather and analyze real time usage data Track both operational and financial metrics Analyze trends and investigate anomalies
Monitor & Analyze
May 2, 2023 7
Outline
Cost and usage monitoring &
analysis
How can containers &
automation help?
Cost and Usage Visibility & Analysis – Quick Wins
May 2, 2023 9
Standardize account hierarchyAWS Main Account
LOB 1
Project 1
Dev
Prod
Project 2
LOB 2
Project 3
LOB 3
Create and maintain account hierarchy1) Track resource usage and billing2) Centrally manage access using groups, roles and policies
Best Practice Recommendations
May 2, 2023 10
AWS Main Account
AWS Dev Account
AWS Test Account
AWS Prod Account
Use AWS tagsBilling (project/owner)
Purpose (perf testing)
Expiration (2017/05/05)
1) Be consistent and disciplined in applying and using tags2) Leverage automation to apply tags3) Follow naming standards for concatenation4) Use billing tags to generate granular billing and usage reports
Best Practice Recommendations
May 2, 2023 11
• Name – Used to identify individual resources• Project/Owner – Useful for billing and point of contact for the resource• Purpose – What is this resource being used for?• Expiration – Date when this resource can be freed• Cluster – Group resources used by distributed applications• AllowedPorts – 80, 443• Backup – daily
• Cost Allocation Tags must be activated to be reflected in billing data• IAM policies can be conditioned on tags
Example tags
May 2, 2023 12
Setup consolidated billing• Enable AWS cost and usage reports
Choose hourly granularity & enable resource ids Activate desired cost allocation tags
May 2, 2023 13
Continuously monitor spending• Analyze trends• Investigate anomalies• No substitute for talking to users
• Use AWS Cost Explorer – It’s free! Provides useful information related to
Reserved Instances Does not provide hourly granularity Does not break out enough items Not so useful spending categorization
• Third party applications/services Provide more functionality, but $$
May 2, 2023 14
Example spending analysis with Claudia
May 2, 2023 15
• Setup AWS CloudWatch Track real-time resource usage metrics Monitor custom metrics Collect and monitor log files Set alarms Automate reaction to resource
changes• Free tier of basic monitoring
• Important for rightsizing resources
Setup resource monitoring (AWS CloudWatch)
May 2, 2023 16
Combine cost & resource usage metrics• Monitor cost versus resource utilization
Correlate cost vs utilizationo Billing => costso CloudWatch => utilization
Why? To look for underutilized expensive cost buckets Optimize resource sizing/usage to reduce costs
• Use 3rd party tools or write your own automation scripts
• Other CloudWatch limitations No application level monitoring & tracing
May 2, 2023 17
• Convertible RI are attractive but require a 3 year term• Sweet spot in many cases is partial-upfront one-year RI• Break even for most partial-upfront one-year RI is 7 months
14 months for three-year RIs
• Break even point is more important than term of contract
Reserved Instance (RI) planning
May 2, 2023 18
• Standard RIs are always bound to a particular instance family
• In the past, RIs must be bound to a zone within a specific region Provides a capacity reservation (i.e. you can always start the RI without delay) Within the same region, the RI’s zone may be manually changed Within the same instance family, the RI’s size may be manually changed
• More recently, RIs may be bound to a region rather than a zone No capacity reservation (i.e. there may be a delay before starting an RI) Automatically applies to instance in the same region regardless of zone (Sep 2016) Automatically applies to instance of the same family, regardless of size (Mar 2017)
Important Reserved Instance details
May 2, 2023 19
Resource & cost usage optimizations• Right-size your resources (instances, EBS volumes, etc.)• Take advantage of new regional RI benefits• Automate optimization of resources with policies
Power down unused resources, e.g. nights and weekends Delete EBS volumes not attached to EC2 instances Check for open ports
• Perform “what if analysis” to optimize use of RIs Based on past three months of usage, would another RI have saved money? Based on next three months of usage, would another RI save money?
• Use spot instances instead of RIs whenever possible 5-10x cheaper than on-demand, 2-3x cheaper than RIs More flexible than RIs (no term contracts)
How Can Containers & Automation Help?
May 2, 2023 21
• For many use cases, bulk of spending is for EC2 instances• Containers enable higher compute efficiency and density• With automation, containers enable
On-demand computing Auto-scaling
o Power off unused resourceso Burst large jobs
Effective use of spot instances
• If you are not continuously scaling your cloud infrastructure to match demand, you are not getting the full benefits of cloud
Containers + Automation => additional 2-5x improvement in efficiency
The next level of agility and efficiency
05/02/2023 22
• Many enterprises start with ‘lift and shift’ to move to the cloud• Result: 1 AWS instance per VM results in low utilization• Low utilization => high costs• Right sizing becomes important (time consuming, depends on historical load)
• Typical tools for managing VMs/instances : Chef/Puppet• Example web app: Apache, Java, MySQL
Lift and shift leads to low utilization
On-Premises AWS
Multiple VMs – flexible capacity Multiple Instances – fixed capacity
‘lift and shift’
Apache2.X
Java8.x
MySQL5.x
Apache2.x
Java8.x
MySQL5.x
A public cloud instance is not a VM!• Public cloud instance is more like a server than a VM• Lift and shift wasted compute resources➜• How do Google and Facebook get 80% utilization? Containers!• Containers are an ideal virtualization technology for the public cloud
ContainerOn-Premises Public Cloud
VMs InstancesUtilization: 30-40% Utilization: 10-20%
05/02/2023
Containers increase agility & efficiency
24
On-Premises
AWS Instance (your VPC)
• Containerize and deploy apps and application stacks• Greater utilization, decreased cost; new technologies, learning curve
Apache 2.xApache
2.xJava8.x
MySQL5.x
Java 8.x
Containers
Multiple VMs – flexible capacity
MySQL 5.x
Cluster
Utilization: 30-40% Utilization: >60%
Instance
05/02/2023
Container orchestration is required
25
• Managing containerized apps at scale requires orchestration• Requires high-level of automation to use effectively
• Deploy containers and application stacks • Drive orchestration with code (e.g. YAML)• Result: High utilization, low cost, application-level visibility,
infrastructure-as-code
• Eliminate configuration management tools (Chef/Puppet, etc.)
May 2, 2023 26
• Spot instances are 5-10x cheaper than on-demand (2-3x cheaper than RI) Bid for unused EC2 capacity Hourly price is set by AWS based on supply and demand AWS may terminate spot instances with lower bids at any time Applications must be able to tolerate restart after termination
• Good Spot Instance use cases Batch processing Any application that can be quickly and reliably restarted By leveraging containers and automation, spot instances are suitable for most
applications
• Effective use of spot instances requires careful orchestration of spot instances and on-demand instances
Automatically use spot-instances w/ orchestration
May 2, 2023 27
Key Takeaways• Proper account management is critical to cloud management• Enable consolidated billing and reporting• Be consistent and disciplined in tagging resources• Correlate billing with resource utilization data• Automate cost and resource utilization mapping• Take advantage of new regional RI benefits• Start investigating how to use containers and automation to
improve agility and resource efficiency
May 2, 2023 28
Follow up
• For more resources see http://applatix.com
• Feedback? Questions? [email protected] or @applatix
• Our next Webinar: Using Kubernetes on AWS, April 13
Thank you