nuts and bolts of running a popular site in the aws cloud

69
Host a hit site in the cloud without downtime or going broke David Veksler

Upload: david-veksler

Post on 14-Apr-2017

127 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Nuts and bolts of running a popular site in the aws cloud

Host a hit site in the cloud without downtime

or going brokeDavid Veksler

Page 2: Nuts and bolts of running a popular site in the aws cloud

Nuts and bolts of running a popular site in the AWS cloud• I will share how we develop and host a popular publishing platform in

the cloud with a limited budget and technology team. • We'll cover architecture, including a variety of services at Amazon

Web Services such as elastic load balancing, S3, Elastic Beanstalk, and RDS in the context of a real site. • We'll cover how we control costs with Spot and burstable instances

and scale up with distributed caching. • Finally we'll discuss continuous deployment strategies for Windows

and Linux-based cloud applications in the context of a distributed team using an agile process.

Page 3: Nuts and bolts of running a popular site in the aws cloud

Contents1. Cloud Architecture2. Key AWS Services3. Keeping costs under control4. Configuration management5. Key tools for distributed agile development

Page 4: Nuts and bolts of running a popular site in the aws cloud

Architecture Overview

Page 5: Nuts and bolts of running a popular site in the aws cloud

Northern Virginia AZ

FEE-DB security groupSpot Instance Fleet

fee-media(US-Standard Region)

Media Storage

EC2VM

C4.2xlarge

Cloudflare

DNSCDN,

FirewallServices

LIVE DB: feedb2

Amazon Web Services Cloud

FEE-Dev.org

FEE.org Admin Node

TeamCity CIFee-dev.org:8080

EC2VM

C4.2xlarge

Admin.fee.orgFee-dev.org

Web1.fee.org

Admin.fee.org contains:

SES Internal Email

Other Services:

• Domain: Google Domains

• Performance: New Relic Pro

• Analytics: Parse.ly, Clicky, Google Analytics

• Uptime: Pingdom

• Email: MailChimp

• Code: BitBucket

users

Web2.fee.org

EC2VM

C4.2xlarge

fee-misc(US-Standard Region)

Backups

admin.fee.org hosts both live and dev, acts as staging for deployments

cache cluster:fee-cache-001fee-cache-002

Redis Cache

ArchitectureDiagram

DEV DB: fee-dev2

Elastic Load Balancinglb.fee.org

Analytics &Content

Recommendations

Marketing Email

web#.FEE.org instances use spot pricing to bid for the best price

DNS, Firewall and CDN

RDS

RDS

Page 6: Nuts and bolts of running a popular site in the aws cloud

High-level objectives (by priority)1. Front end uptime should be 99.8%2. Back Office (admin) uptime should be 95%3. Keep personal information (payments, admin access) secure4. Stay up during traffic surges up to 6X weekly peak5. Keep budget under $1,600/month6. Ongoing development should not impact uptime.

Page 7: Nuts and bolts of running a popular site in the aws cloud

Design strategy1. All components should be redundant and self-healing2. Pay for normal load while supporting surges3. Outsource infrastructure: let AWS cloud be responsible for as much

infrastructure as feasible4. Automate all backup processes5. Semi-automated disaster recovery: site should recover from most

outages automatically, when cost of doing so is reasonable6. Change management integrated into architecture via imaging and

cache keys

Page 8: Nuts and bolts of running a popular site in the aws cloud

Architecture Summary• Front-end is load balanced, scalable, and self-healing• Backend is isolated from front-end• Automatic snapshots for servers, transaction logging for DB• Rely on AWS services for all infrastructure services• Combine functionality within servers to save costs• Massively over-allocate capacity using market-based pricing• Development process integrated with production architecture

Page 9: Nuts and bolts of running a popular site in the aws cloud

Northern Virginia AZ

FEE-DB security groupSpot Instance Fleet

fee-media(US-Standard Region)

Media Storage

EC2VM

C4.2xlarge

Cloudflare

DNSCDN,

FirewallServices

LIVE DB: feedb2

EC2VM

C4.2xlarge

Admin.fee.orgFee-dev.org

Web1.fee.org

SES Internal Email

users

Web2.fee.org

EC2VM

C4.2xlarge

fee-misc(US-Standard Region)

Backups

cache cluster:fee-cache-001fee-cache-002

Redis Cache

DEV DB: fee-dev2

Elastic Load Balancinglb.fee.org

RDS

RDS

Page 10: Nuts and bolts of running a popular site in the aws cloud

Amazon Cloud Services Used• Load balancing: Elastic Load Balancer• Virtual machines: EC2 Spot Instances• Databases: RDS (SQL Server)• Media Storage & Backups: S3• Distributed Cache: ElastiCache (Redis)• CDN: CloudFront CloudFlare• Email: Amazon SES

Page 11: Nuts and bolts of running a popular site in the aws cloud

Other Cloud Services• Analytics: Parse.ly, Clicky, Google Analytics• Performance: New Relic Pro• Email: MailChimp (Campaigns & Automations)

Page 12: Nuts and bolts of running a popular site in the aws cloud

Selected Services in Detail

Page 13: Nuts and bolts of running a popular site in the aws cloud

Why CloudFlare is awesome• Flat-rate CDN service (supports CDN daisy-chaining)• Free, powerful SSL• Active, crowd-sourced firewall• Powerful DNS (CNAME flatting, much more)• HTML and Image minification• Much more!• Saves FEE.org $ thousands per year in bandwidth costs• Starts at $20/month

Page 14: Nuts and bolts of running a popular site in the aws cloud

30 days:

Page 15: Nuts and bolts of running a popular site in the aws cloud
Page 16: Nuts and bolts of running a popular site in the aws cloud

Elastic Load Balancer• Point DNS at CNAME of load balancer• Point destination to specific VMs or use auto-scaling rules• Set destination by path pattern with Application Load Balancer• Use TCP, HTTP, SSL for health check• We use a custom health check endpoint which verifies application

uptime & DB connectivity

Page 17: Nuts and bolts of running a popular site in the aws cloud

RDS: Relational Database Service• FEE.org uses SQL Server Web• Other sites use AuroraDB, which is 10X faster than MySQL

• (With proper tuning, in specific scenarios)

• Use snapshots to create dev instances of DB• Schedule configuration changed for off-hours• Be aware that RDS SQL Server restricts most admin actions. There are special

sprocs for some actions such as renaming DB or bringing DB online (but not taking offline!) • Backup restore not allowed: use SQL Database Migration Wizard to restore DB• Use burstable SQL Server instances, especially for dev DB

Page 18: Nuts and bolts of running a popular site in the aws cloud

S3: Media storage + backup• FEE.org uses S3 as a media (Image/PDF/EPUB/MP4/MP3) store• Only originals are stored in S3, thumbnails are stored on server• Amazon Web Services S3 IFileSystem provider for Umbraco + a

custom caching layer• XSLT transforms to specify production/dev buckets

Page 19: Nuts and bolts of running a popular site in the aws cloud

Spot Instances• Instances only run when market price below the bid price• In practical terms, Spot = 80% saving on hourly instances• Supports auto scaling. Use it!• Set bid price equal to hourly instance price and get 100% availability (so

far)• Specify a range of qualified instance types (including previous

generations) to maximize chance of availability. • FEE.org runs master server as xlarge hourly instance and read-only nodes

as 2xlarge Spot instances. This guarantees at least 1 cheap(er) instance even if prices spike or instances refresh at the same time.

Page 20: Nuts and bolts of running a popular site in the aws cloud

Spot Pricing History

Page 21: Nuts and bolts of running a popular site in the aws cloud

Elastic Load Balancer

Page 22: Nuts and bolts of running a popular site in the aws cloud

Auto Scaling

Page 23: Nuts and bolts of running a popular site in the aws cloud

Example: Netflix• http://techblog.netflix.com/2012/01/auto-scaling-in-amazon-cloud.html

Red= # of serversGreen = CPU utilization

Page 24: Nuts and bolts of running a popular site in the aws cloud

Auto Scaling

Build cloud systems that scale automatically to meet current demand

Page 25: Nuts and bolts of running a popular site in the aws cloud

When to auto-scale?• Instances that don’t take very long to spin up• Individual instances don’t use too much resources• Version release process is automated (such as with Elastic Beanstalk)• Don’t release very often, or cost or snapshot management is minimal• Large difference between minimum and peak traffic• Unpredictable traffic trends

Page 26: Nuts and bolts of running a popular site in the aws cloud

Alternatives to auto-scaling• Burstable instances• Spot Instances• Schedule on/off instance times with AutomatiCloud

Page 27: Nuts and bolts of running a popular site in the aws cloud

Why doesn’t FEE.org auto-scale?• Minimum instance count for high availability is 3• Peak traffic (> 600 concurrent users) can be handled by 2 instances• Each instance requires 16GB ram and 8 CPUs for optimal performance• Release process not fully automated & no full-time developers (do not

use Elastic Beanstalk & have to make manual snapshots post-release)• Can spin up new instances within minutes with Spot + New Relic

Alerts• Will probably consider auto-scaling when we have more process

maturity (fully automated release process)

Page 29: Nuts and bolts of running a popular site in the aws cloud

Elastic Beanstalk

Page 30: Nuts and bolts of running a popular site in the aws cloud

Elastic Beanstalk• Upload DLLs to AWS git reposity, AWS does the rest• AWS will deploy the code, load balancing, auto-scaling, health

monitoring, etc.• Environment configuration with web.config XSLT transforms and ACL

permissions (wpp.targets) file.• FREE service – only pay for resources used• If using .Net, works with most 100% managed code projects• GUI integrated with Visual Studio

Page 31: Nuts and bolts of running a popular site in the aws cloud
Page 32: Nuts and bolts of running a popular site in the aws cloud
Page 33: Nuts and bolts of running a popular site in the aws cloud
Page 34: Nuts and bolts of running a popular site in the aws cloud
Page 35: Nuts and bolts of running a popular site in the aws cloud
Page 36: Nuts and bolts of running a popular site in the aws cloud
Page 37: Nuts and bolts of running a popular site in the aws cloud
Page 38: Nuts and bolts of running a popular site in the aws cloud
Page 39: Nuts and bolts of running a popular site in the aws cloud
Page 40: Nuts and bolts of running a popular site in the aws cloud

Cloud hosting on a budget

Page 41: Nuts and bolts of running a popular site in the aws cloud

Thinking about IAAS/SAAS Pricing Strategy• Cloud services almost always cost much more per compute resource

than colocations or dedicated hardware• Cost savings come in matching demand to infrastructure and

outsourcing management services• Amazon & Azure are some of the most costly cloud services per

resources, but recommended for most scenarios because of productivity benefits from breadth and depth of managed services.

Grant Brown
Because what?
Page 42: Nuts and bolts of running a popular site in the aws cloud

Cloud Services Pricing Summary• Each cloud service provider has a unique bundle of services and pricing

model. Different providers have unique price advantages for different products. Provider selection should be based on a typical application mix for our business.• Azure may have a price advantage over Amazon when using cloud-optimized

architecture based on Microsoft products.• Softlayer, Digital Ocean, and Google Compute all have better prices than

bost for various scenarios, especially Windows VM, but offer fewer services.• Cost is just one of many criteria for choosing a provider! No provider has a

decisive advantage for all scenarios.

Grant Brown
bost?
Page 43: Nuts and bolts of running a popular site in the aws cloud

Pricing Recommendations1. Use the pricing calculator offered by each provider to estimate total

application cost for specific applications. Keep in mind cloud-optimized architectures may have a much lower cost. (For example, compute functions instantiated on-demand, auto-scaling, etc.)

2. Do not make pricing the primary consideration in provider selection unless the cost difference is critical to businesses requirements. In general, major service and quality differences between providers are more important than pricing considerations.

3. Developing deep expertise and service integration with a cloud provider is usually more important than cost differences for individual projects.

Page 44: Nuts and bolts of running a popular site in the aws cloud

Saving Money with AWS• Reserved Instances• Spot Instances• Burstable Instances• Scheduled Instances (using AWS or third party tool)• This can be used with any AWS VM service – EC2, RDS, ElastiCache,

etc!

Page 45: Nuts and bolts of running a popular site in the aws cloud

AWS Instance type selection criteria• Use the latest generation of instance types (x4/t2)• Use burstable instances for applications with high daily variability• Evaluate whether applications are CPU, memory, or IO intensive and

select the appropriate type – scale up your particular bottleneck• For applications with consistent and predicable load, prefer larger

instances; for applications with unpredictable load, auto-scale horizontally with more burstable instances

Page 46: Nuts and bolts of running a popular site in the aws cloud

Buying a reserved instance• Unsure about your needs?

Get a convertible instance! Can move up or across.

• You can sell them! (I haven’t tried this)

• Best savings/risk is usually with partial payment option.

Page 47: Nuts and bolts of running a popular site in the aws cloud

S3 Reduced Redundancy Store & Glacier• “Only” duplicated across 2 facilities• .01% storage failure rate (“400 times the durability of typical HDD”)• About 25% cheaper

Page 48: Nuts and bolts of running a popular site in the aws cloud

• Background service via event handle to media upload completed method

• $412GB * $0.0314 per GB = $155/year saved on storage alone

• Runs as AWS Marketplace service ($39/month) or desktop app

JPEGmini

Page 49: Nuts and bolts of running a popular site in the aws cloud

Summary: FEE.org $ saving strategy:• 2 reserved burstable RDS databases• 1 reserved admin EC2 VM• 2 Spot EC2 front-end server instances• AutomatiCloud EC2 scheduling for off-hours (and backup automation)• S3 Reduced redundancy store for non-critical backups and dev data• CloudFlare CDN• JPEGmini image optimization background service

Page 50: Nuts and bolts of running a popular site in the aws cloud

Continuous Deployment Strategies

Page 51: Nuts and bolts of running a popular site in the aws cloud

FEE Development Process1. Post job on UpWork.com2. Hire freelancer3. Developer commits work to git4. Deploy to dev environment5. Test work6. Create pull request for release7. Release build8. Staged deployment to production servers

Page 52: Nuts and bolts of running a popular site in the aws cloud

Development Process in Detail

Page 53: Nuts and bolts of running a popular site in the aws cloud

UpWork.com

Page 54: Nuts and bolts of running a popular site in the aws cloud

Orientation• Google Doc with:• Architectural overview• FEE.org development process• Instructions to setup localhost environment• Review of tools used• Relevant people involved & their contact info• Address of FEE-Dev Skype group• Code Quality Expectations

Page 55: Nuts and bolts of running a popular site in the aws cloud

Development Environment Setup1. Checkout git repository2. “Just hit F5”

• NuGet for all dependencies• XSLT for non-local environments• Dev DB hosted in cloud• Optional: Install Redis on localhost for better performance

Page 56: Nuts and bolts of running a popular site in the aws cloud

Continuous Integration http://fee-dev.org:8080/Login as guest now!

Page 57: Nuts and bolts of running a popular site in the aws cloud

Release Build

Page 58: Nuts and bolts of running a popular site in the aws cloud

Staged, Staggered Deployment• xcopy to each production server• ELB takes server out of production within 30

seconds • Stagger release by ~5 minutes to let each

application pool warm up

Page 59: Nuts and bolts of running a popular site in the aws cloud

Environment Monitoring

Page 60: Nuts and bolts of running a popular site in the aws cloud

Collaboration & Internal Messaging• SlackBot

Page 61: Nuts and bolts of running a popular site in the aws cloud

Project Management

Page 62: Nuts and bolts of running a popular site in the aws cloud

Aside: LAMP deployment strategy (highly available WordPress)• Commit hooks on master branch in Bitbucket git repository• Hooks call deploy.php script which runs a git pull in dev environment• Release PHP code with git pull on production• Image staging server (AMI), and deploy Spot fleet with AMI

• Use S3 Media storage provider, Redis cache – no persistent data on Spot instances• Easy Engine for easy nginx configuration, etckeeper to backup/sync

configuration file

Page 63: Nuts and bolts of running a popular site in the aws cloud
Page 64: Nuts and bolts of running a popular site in the aws cloud

The [email protected]

Page 65: Nuts and bolts of running a popular site in the aws cloud

@AtlCodeCamphttpS://AtlantaCodeCamp.com/2016

Page 66: Nuts and bolts of running a popular site in the aws cloud

Platinum Sponsors

Page 67: Nuts and bolts of running a popular site in the aws cloud

Gold Sponsors

Page 68: Nuts and bolts of running a popular site in the aws cloud

SWAG Sponsors

Silver Sponsors

Page 69: Nuts and bolts of running a popular site in the aws cloud

Surveys and Prizes• Please complete the session and event surveys!1 ticket per session survey1 ticket for the event survey1 ticket for completing the booth game

• Drawing for prizes begins at 5pm in Q202