aws reference architectures

Elastic-Load

Balancing

A

Elastic-Load

Balancing

BAmazon

Route-)z

Amazon,,,,

CloudFront

AmazonSz

AutoScaling

AutoScaling

AmazonEC6

AmazonEC6

AutoScaling

AutoScaling

AmazonEC6

AmazonEC6

Database

Servers

Load

BalancerLoad

Balancer

Web

ServersWeb-

Servers

Application

ServersApplication

Servers

Application

ServersApplication

Servers

AmazonRDSMultiGAZ

Standby

AmazonRDSMultiGAZ

Standby

Synchronous-Replication

Elastic-Load

Balancing

AmazonRDSMaster

AmazonRDSMaster

Resources-and

Static-Content

Content

Delivery

Network

DNS-ResolutionWeb

ServersWeb-

Servers

Highly, available, and, scalable, web, hosting, can, be, complex, and,expensiveZ,Dense,peak,periods,and,wild,swings,in,traffic,patterns,result, in, low, utilization, of, expensive, hardwareZ, Amazon, Web,Services, provides, the, reliableR, scalableR, secureR, and, highPperformance, infrastructure, required, for, web, applications, while,enabling, an, elasticR, scalePout, and, scalePdown, infrastructure, to,match,IT,costs,in,real,time,as,customer,traffic,fluctuatesZ

SystemOverview

WEB-APPLICATIONHOSTING

Amazon,Route-)z

Amazon,Sz

Amazon,EC6

Elastic,Load,Balancing,Amazon,C

loudFront

AWS

Reference

Architectures

Auto-Scaling

Amazon,RDS

7

6

z

(

) 7

6

6

6

(

)

z7

7

7

The,userBs,DNS,requests,are,served,by,Amazon-Route-)zR, a, highly, available, Domain, Name, System, 'DNSN,

serviceZ, Network, traffic, is, routed, to, infrastructure, running, in,Amazon,Web,ServicesZ

StaticR, streamingR,and,dynamic,content, is,delivered,by,Amazon- CloudFrontR, a, global, network, of, edge,

locationsZ, Requests, are, automatically, routed, to, the, nearest,edge, locationR,so,content, is,delivered,with, the,best,possible,performanceZ

HTTP, requests, are, first, handled, by, Elastic- Load-BalancingR, which, automatically, distributes, incoming,

application,traffic,among,multiple,Amazon-Elastic-Compute-Cloud- OEC6P, instances, across, Availability, Zones, 'AZsNZ, It,enables, even, greater, fault, tolerance, in, your, applicationsR,seamlessly,providing,the,amount,of, , load,balancing,capacity,needed,in,response,to,incoming,,application,trafficZ

Web, servers, and, application, servers, are, deployed, on,Amazon, ECj, instancesZ, Most, organizations, will, select,

an,Amazon-Machine- Image- OAMIP,and, then,customize, it, to,their, needsZ, This, custom, AMI, will, then, become, the, starting,point,for,future,web,developmentZ

Web,servers,and,application,servers,are,deployed,in,an,Auto-Scaling,groupZ,Auto,Scaling,automatically,adjusts,

your,capacity,up,or,down,according,to,conditions,you,defineZ,With, Auto, ScalingR, you, can, ensure, that, the, number, of,Amazon- EC6, instances, you’re, using, increases, seamlessly,during, demand, spikes, to, maintain, performance, and,decreases,automatically,during,demand,to,minimize,costsZ

To,provide,high,availabilityR,the,relational,database,that,contains, applicationBs, data, is, hosted, redundantly, on, a,

multiPAZ, 'multiple, Availability, Zones–zones, A, and, B, hereN,deployment, of, Amazon- Relational- Database- Service,'Amazon,RDSNZ

Resources, and, static, content, used, by, the, web,application, are, stored, on, Amazon- Simple- Storage-

Service- OSzPR, a, highly, durable, storage, infrastructure,designed,for,missionPcritical,and,primary,data,storageZ

SystemOverview

3 The click-through servers are a group of Amazon EC2 instances dedicated to collecting click-through data.

This information is contained in the log files of the click-through web servers, which are periodically uploaded to Amazon S3.

ADVERTISEMENTSERVING

Amazon EC2

Amazon EC2 Spot

Amazon S3

Amazon Import/E

xport

AWS

Reference

Architectures

5 Data processing results are pushed back into Amazon DynamoDB, a fully managed NoSQL database service

that provides fast and predictable performance with seamless scalability. Amazon DynamoDB tables can store and retrieve any amount of data, and serve any level of request traffic, both of which are specific requirements for storing and quickly retrieving visitors’ profile information. The high availability and fast performance of Amazon DynamoDB enable ad server front-ends to serve requests with predictable response time, even with high traffic volumes or large profile’s data sets.

Amazon EMR

3

2

Amazon Dynamo DB

Click-through

Servers

5 AmazonS3Static Files

Repository

Ad servinginfrastructure

Click-throughinfrastructure

SpotInstances

Amazon

Elastic

MapReduce

Click-through

Requests

AutoScaling

Profiles

Database

Amazon

Dynamo DB

Ad Servers

Link to

Ad resource

AmazonEC2

Internet advertising services need to serve targeted advertising and must do so under limited time. These are just two of multiple technical challenges they face.

Amazon Web Services provides services and infrastructure to build reliable, fault-tolerant, and highly available ad serving platforms in the cloud. In this document, we describe the two main parts of such a system: ad serving infrastructure and click-through collection featuring a data analysis cluster.

4ElasticLoad

Balancing

Amazon

CloudFront

Content

Delivery

Network

Visitor

Advertisement

(image / video)

ImpressionLog Files

1 When visitors load a web page, ad servers return a pointer to the ad resource to be displayed. These

servers are running on Amazon Elastic Compute Cloud (Amazon EC2) instances. They query a data set stored in an Amazon DynamoDB table to find relevant ads depending on the user's profile.

2 Ad files are downloaded from Amazon CloudFront, a content delivery service with low latency, high data-

transfer speeds, and no commitments. Log information from displayed ads is stored on Amazon Simple Storage Service (Amazon S3), a highly available data store.

2

Click-through

Log Files

4 Ad impression and click-through data are retrieved and processed by an Amazon Elastic MapReduce cluster

using a hosted Hadoop framework to process the data in a parallel job flow. The cluster's capacity can be dynamically extended using Spot Instances to reduce the processing time and the cost of running the job flow.

Elastic Load BalancingAuto Scalin

g

Amazon CloudFront

1

AmazonSQS

AmazonSQS

— OR —

AmazonRDS

AmazonRDS

Amazon

SimpleDB

Master DB

Slave DB

Elastic IP

AmazonEC2

AmazonEC2

Control M

essages

AmazonS3

Job Info &

Analytics Store Worker

Nodes

Output

Queue

(Optional)

Input Queue

Job Data

StoreJob

Manager

AutoScaling

Optional Chaining

6

4

6 7

25

3

End User

1

BATCHPROCESSING

Batch processing architectures are often synonymous with highly variable usage patterns that have significant usage peaks (e.g., month-end processing) followed by significant periods of underutilization.There are numerous approaches to building a batch processing architecture. This document outlines a basic batch processing architecture that supports job scheduling, job status inspection, uploading raw data, outputting job results, grid management, and reporting job performance data.

Batch processing on AWS allows for the on-demand provisioning of a multi-part job processing architecture that can be used for instantaneous or delayed deployment of a heterogeneous, scalable “grid” of worker nodes that can quickly crunch through large batch processing tasks in parallel. There are numerous batch oriented applications in place today that can leverage this style of on-demand processing, including claims processing, large scale transformation, media transcoding and multi-part data processing work.

Amazon EC2

Amazon RDS

Amazon SQS

Amazon S3

Auto Scaling

AWS

Reference

Architectures

1

4

2

53

6

Amazon Sim

pleDB

SystemOverview

Users interact with the Job Manager application which is deployed on an Amazon Elastic Computer Cloud

(EC2) instance. This component controls the process of accepting, scheduling, starting, managing, and completing batch jobs. It also provides access to the final results, job and worker statistics, and job progress information.

Raw job data is uploaded to Amazon Simple Storage Service (S3), a highly-available and persistent data

store.

Individual job tasks are inserted by the Job Manager in an Amazon Simple Queue Service (SQS) input

queue on the user’s behalf.

Worker nodes are Amazon EC2 instances deployed on an Auto Scaling group. This group is a container

that ensures health and scalability of worker nodes. Worker nodes pick up job parts from the input queue automatically and perform single tasks that are part of the list of batch processing steps.

Interim results from worker nodes are stored in Amazon S3.

Progress information and statistics are stored on the analytics store. This component can be either an

Amazon SimpleDB domain or a relational database such as an Amazon Relational Database Service (RDS) instance.

Optionaly, completed tasks can be inserted in an Amazon SQS queue for chaining to a second

processing stage.

7

CONTENT &MEDIA SERVING

AWS provides a suite of services specifically tailored to deliver high-performance media serving. Each service features pay as you go pricing on an elastic infrastructure, meaning that you can scale up and down according to your demand curve while paying for only the resources you use. Because this infrastructure is programmable, it can react quickly. Our advanced API provides detailed control over the infrastructure that powers your system.

Serving digital content is one of the most basic and straightforward tasks—that is, until you have serious requirements for low latency, high availability, durability, access control, and millions of views on or under budget. In addition, because of “spiky” usage patterns, operations teams often need to provision static hardware, network, and management resources to support the maximum expected need, which guarantees waste outside of peak hours.

Amazon EC2

Amazon Route 53

Amazon S3

AWS

Reference

Architectures

Amazon CloudFront

SystemOverview

http://yourdomain/LiveMovie

Live Stream

Source

LiveMovie

LiveMovie

AmazonEC2

Adobe Flash

Media Server

Instancehttp://yourdomain/Content

http://yourdomain/Content

Content(cached)

Retrieves data

when cache miss

Content

Content

Amazon

CloudFront

AmazonS3

Amazon

CloudFront

1

4

Amazon

Route 53

Custom Origins

Content

Content

Content

Content

Retrieves data

when cache miss

Amazon EC2

Private Server

End User

3

2

Simple and Secure — This reference architecture uses Amazon Simple Storage Service (S3) to host

static content on the web. Amazon S3 is highly available, highly durable, and designed for web scale. It provides a great way to offload the work of serving static content from your web servers. You can also provide secure access to this content over HTTPS.

Faster and Edge Cached — As your customer base grows and becomes more geographically distributed,

using a high- performance edge cache like Amazon CloudFront can provide substantial improvements in latency, fault tolerance, and cost. By using Amazon S3 as the origin

server for the Amazon CloudFront distribution, you gain the advantages of fast in-network data transfer rates, simple publishing/caching workflow, and a unified security framework. Amazon S3 and Amazon CloudFront can be configured by a web service, the AWS Management Console, or a host of third-party management tools.

Alternatively, you could use Amazon Elastic Compute Cloud (EC2) as origin server of Amazon S3 for hosting

static content. Using Amazon EC2 could allow you a greater degree of control, logging, and feature richness in serving content. For static content, you could also substitute your own

on-premises or cohosted private servers as origin servers for Amazon CloudFront.

Live Streaming — Featuring the power of Adobe Flash Media Server hosted on Amazon EC2, combined with

Amazon CloudFront for stream distribution and caching, live streaming works seamlessly on the AWS platform. This configuration uses a web server to host the manifest.xml file, Amazon DevPay EC2 instances to host Flash Media Server with hourly license pricing, and Amazon CloudFront to serve the stream.d

Read more here: http://www.adobe.com/go/fmsaws

1

2 3

4

On-premises

On-premises

AWS Cloud

AWS Cloud

SystemOverview

DISASTER RECOVERYFOR LOCAL APPLICATIONS

Disaster recovery is about preparing for and recovering from any event that has a negative impact on your IT systems. A typical approach involves duplicating infrastructure to ensure the availability of spare capacity in the event of a disaster. Amazon Web Services allows you to scale up your infrastructure on an as-needed basis. For a disaster recovery solution, this results in significant cost savings. The following diagram shows an example of a disaster recovery setup for a local application.

Amazon EC2

Amazon VPC

Amazon S3

AWS Storage Gateway

AWS

Reference

Architectures

Amazon EBS

1 A corporate data center hosts an application consisting of a database server and an application server with

local storage for a content management system.

2 AWS Storage Gateway is a service connecting an on-premises software appliance with cloud-based storage.

AWS Storage Gateway securely uploads data to the AWS cloud for cost effective backup and rapid disaster recovery.

3 Database server backups, application server volume snapshots, and Amazon Machine Images (AMI) of the

5 The application and database servers are recreated using Amazon EC2. To restore volume snapshots, you

can use Amazon Elastic Block Store (EBS) volumes, which are then attached to the recovered application server.

on Amazon Virtual Private Cloud (Amazon VPC). Amazon VPC lets you provision a private, isolated section of the AWS cloud where you can recreate your application.

6 To remotely access the recovered application, you use a VPN connection created by using the VPC Gateway.

recovery servers are stored on Amazon Simple Storage Service (Amazon S3), a highly durable and cost-effective data store. AMIs are pre-configured operating system and application software that are used to create a virtual machine Amazon Elastic Compute Cloud (Amazon EC2). Oracle databases can directly back up to Amazon S3 using the Oracle Secure Backup (OSB) Cloud Module.

4 In case of disaster in the corporate data center, you can recreate the complete infrastructure from the backups

AmazonS3

AmazonVPC

AWS

Storage

Gateway

VPC

GatewayVPC

Gateway

Corporate

User

Database

Server

(Recovery)Database

Server

(Recovery)

Application

Server

(Recovery)Application

Server

(Recovery)

4

3Internet

GatewaySnapshots

AMIsFiles

Storage Volumes

SecureConnection

SecureConnection

AmazonEC2

AmazonEC2

2

1

5

Oracle Secure

Backups

Secure

Connection

Secure

Connection

Application

Server

(Production)Application

Server

(Production)

Database

Server

(Production)Database

Server

(Production)

Corporate

Data Center

Data Restoration

AmazonEBS

AmazonEBS

5

6

Customers want to find the products they are interested in quickly, and they expect pages to load quickly. Worldwide customers want to be able to make purchases at any time, so the website should be highly available. Meeting these challenges becomes harder as your catalog and customer base grow.With the tools that AWS provides, you can build a compelling, scalable website with a searchable product catalog that is accessible with very low latency.

SystemOverview

E-COMMERCEWEB SITEPART 1: WEB FRONT-END

Amazon Route 53

Amazon DynamoDB

Amazon ElastiCacheAWS Elastic Beanstalk

AWS

Reference

Architectures

Amazon S3

With Amazon Web Services, you can build a highly available e-commerce website with a flexible product catalog that scales with your business.Maintaining an e-commerce website with a large product catalog and global customer base can be challenging. The catalog should be searchable, and individual product pages should contain a rich information set that includes, for example, images, a PDF manual, and customer reviews.

Amazon CloudFront

1 DNS requests to the e-commerce website are handled by Amazon Route 53, a highly available Domain Name

System (DNS) service.

5 Amazon DynamoDB is a fully-managed, high performance, NoSQL database service that is easy to

set up, operate, and scale. It is used both as a session store for persistent session data, such as the shopping cart, and as the product database. Because DynamoDB does not have a schema, we have a great deal of flexibility in adding new product categories and attributes to the catalog.

2 Amazon CloudFront is a content distribution network (CDN) with edge locations around the globe. It can

cache static and streaming content and deliver dynamic content with low latency from locations close to the customer.

3 The e-commerce application is deployed by AWS Elastic Beanstalk, which automatically handles the

details of capacity provisioning, load balancing, auto scaling, and application health monitoring.

4 Amazon Simple Storage Service (Amazon S3) stores all static catalog content, such as product images,

manuals, and videos, as well as all log files and clickstream information from Amazon CloudFront and the e-commerce application.

6 Amazon ElastiCache is used as a session store for volatile data and as a caching layer for the product

catalog to reduce I/O (and cost) on DynamoDB.

7 Product catalog data is loaded into Amazon CloudSearch, a fully managed search service that

provides fast and highly scalable search functionality.

8 When customers check out their products, they are redirected to an SSL-encrypted checkout service.

9 A marketing and recommendation service consumes log data stored on Amazon S3 to provide the customer

with product recommendations.

Amazon CloudSearch

Customer

AWSElasticBeanstalk

AmazonCloudFront

AmazonRoute 53

1

6

Amazon

ElastiCache

5

9

AWSElasticBeanstalk

AWSElasticBeanstalk

AWSElastic

Beanstalk

AWSElastic

Beanstalk

Amazon

CloudSearch

AmazonS3

Amazon

DynamoDBLOGS

MARKETING AND

RECOMMENDATION

SERVICEPart

3

CHECKOUT

SERVICEPart

2

E-commerce

Application

Recommendation

Web Service

Recommendation

Web Service

Catalog Cache &

Transient Session

StoreSearch

EngineProduct Catalog &

Persistent Session

Store

Checkout

ApplicationCheckout

Application

Log File Repository &

Static Catalog Content

DNS

2

3

4

7

8

SecureConnection

SecureConnection

Customers expect their private data, such as their purchase history and their credit card information, to be managed on a secure infrastructure and application stack. AWS has achieved multiple security certifications relevant to e-commerce business, including the Payment Cards Industry (PCI) Data Security Standard (DSS).With the tools that AWS provides, you can build a secure checkout service that manages the purchasing workflow from order to fulfillment.

SystemOverview

Amazon VPC

Amazon SESAmazon EC2Elastic B

eanstalk

AWS

Reference

Architectures

Amazon RDS

With Amazon Web Services, you can build a secure and highly available checkout service for your e-commerce website that scales with your business. Managing the checkout process involves many steps, which have to be coordinated. Some steps, such as credit card transactions, are subject to specific regulatory requirements. Other parts of the process involve manual labor, such as picking, packing, and shipping items from a warehouse.

Amazon SWF

1 The e-commerce web front end redirects the customer to an SSL-encrypted checkout application to

authenticate the customer and execute a purchase.

5 SWF Workers are deployed on Amazon EC2 instances within a private subnet. The EC2 instances

are part of an Auto Scaling group, which can scale in and out according to demand. The Workers manage the different steps of the checkout pipeline, such as validating the order, reserving and charging the credit card, and triggering the sending of order and shipping confirmation emails.

2 The checkout application, which is deployed by AWS Elastic Beanstalk, uses Amazon Simple Workflow

Service (Amazon SWF) to authenticate the customer and trigger a new order workflow.

3 Amazon SWF coordinates all running order workflows by using SWF Deciders and SWF Workers.

4 The SWF Decider implements the workflow logic. It runs on an Amazon Elastic Compute Cloud (Amazon

EC2) instance within a private subnet that is isolated from the public Internet.

6 SWF Workers can also be implemented on mobile devices, such as tablets or smartphones, in order to

integrate pick, pack, and ship steps into the overall order workflow.

7 Amazon Simple Email Service (Amazon SES) is used to send transactional email, such as order and shipping

confirmations, to the customer.

8 To provide high availability, the customer and orders databases are hosted redundantly on a multi-AZ (multi

Availability Zone) deployment of Amazon Relational Database Service (Amazon RDS)within private subnets that are isolated from the public Internet.

E-COMMERCEWEB SITEPART 2: CHECKOUT SERVICE

●

7

2

1

54

AWSElastic

Beanstalk

AWSElastic

Beanstalk

WEB FRONT-END

Part

1 E-Commerce

ApplicationE-Commerce

Application

Customer

AWS

Elastic

Beanstalk

AmazonSES

Checkout

Application

Email

Service

Customers & Orders

Database

Mobile Workers

(in warehouse)

AutoScaling

AutoScaling

AmazonSWF

AmazonRDSMaster

AmazonRDS

Multi-AZ Standby

Workers

Workers

DeciderDecider

Order Emails

6

8

Workflow

Service 3

CHECKOUT

SERVICEPart

2AWS

Elastic

Beanstalk

●AmazonSES

7

5

3

AmazonRDS

Read Replica

AmazonElastic

MapReduce

Amazon

DynamoDB AmazonS3

AmazonS3

1

Email

Service

Marketing

Mgmt App

User

Profiles

AmazonRDSMaster

AmazonRDSMaster

Customer

& Orders DBCustomer

& Orders DB

Log File

RepositoryLog File

Repository

Recommendation

Web Service

Customer

& Orders DB

Read Replica

AWS

Elastic

Beanstalk

AWS

Elastic

Beanstalk

AWSElastic

Beanstalk

WEB FRONT-END

Part

1

E-commerce

ApplicationE-commerce

Application

Marketing

Emails

Marketing

Manager

Customers

46

2

The insights that you gain about your customers can also be used to manage personalized marketing campaigns targeted at specific customer segments. With the tools that AWS provides, you can build highly scalable recommendation services that can be consumed by different channels, such as dynamic product recommendations on the e - commerce website or targeted email campaigns for your customers.

SystemOverview

E-COMMERCEWEBSITEPART 3: MARKETING & RECOMMENDATIONS

Amazon EMR

Amazon SES

AWS Elastic

Beanstalk

AWS Elastic Beanstalk

AWS

Reference

Architectures

Amazon RDS

With Amazon Web Services, you can build a recommendation and marketing service to manage targeted marketing campaigns and offer personalized product recommendations to customers who are browsing your e-commerce site.In order to build such a service, you have to process very large amounts of data from multiple data sources. The resulting user profile information has to be available to deliver real-time product recommendations on your e-commerce website.

Amazon S3

1 Amazon Elastic MapReduce (Amazon EMR) is a hosted Hadoop framework that runs on Amazon Elastic

Compute Cloud (Amazon EC2) instances. It aggregates and processes user data from server log files and from the customer´s purchase history.

5 A recommendation web service used by the web front end is deployed by AWS Elastic Beanstalk. This

service uses the profile information stored on Amazon DynamoDB to provide personalized recommendations to be mm

shown on the e-commerce web front end.

2 An Amazon Relational Database Services (Amazon RDS) Read Replica of customer and order databases is

used by Amazon EMR to compute user profiles and by Amazon Simple Email Service (Amazon SES) to send targeted marketing emails to customers.

3 Log files produced by the e-commerce web front end have been stored on Amazon Simple Storage Service

(Amazon S3) and are consumed by the Amazon EMR cluster to compute user profiles.

4 User profile information generated by the Amazon EMR cluster is stored in Amazon DynamoDB, a scalable,

high-performance managed NoSQL database that can serve recommendations with low latency.

6 A marketing administration application deployed by AWS Elastic Beanstalk is being used by marketing

managers to send targeted email campaigns to customers with specific user profiles. The application reads customer email addresses from an Amazon RDS Read Replica of the customer database.

7 Amazon SES is used to send marketing emails to customers. Amazon SES is based on the scalable

technology used by Amazon web sites around the world to send billions of messages a year.

Amazon DynamoDB

Application(active)

On Failure:

Replace instanceand

re-attachvolume

On Failure:

Replace instance and

re-attach volume

AmazonS3

Snapshots

Elastic IP

Application(standby)

Application

(Replacement)

B

AAmazonEBS

AmazonEBSAmazon

EC2

End User

End User

BAvailability

Zone

Avoid

unnecessary

dependencies

Avoid

unnecessary

dependencies

Ability to fail over

Ability to fail over

Replicated

Data Layer

Elastic Load

Balancing

AAvailability

Zone

Web

ServerWeb

Server

Application

ServerApplication

Server

Database

Server

Amazon EC2

Amazon EBS

Amazon S3

2

2

AWS

Reference

Architectures

1

4

Elastic Load Balancing

Amazon Web Services provides services and infrastructure to build reliable, fault-tolerant, and highly available systems in the cloud.These qualities have been designed into our services both by handling such aspects without any special action by you and by providing features that must be used explicitly and correctly.

Amazon EC2 provides infrastructure building blocks that, by themselves, may not be fault-tolerant. Hard drives may fail, power supplies may fail, and racks may fail. It is important to use combinations of the features presented in this document to achieve fault tolerance and high availability.

FAULT TOLERANCE& HIGH AVAILABILITY

Fault Tolerance and High Availability of Amazon Web Services

Most of the higher-level services, such as Amazon Simple Storage Service (S3), Amazon SimpleDB, Amazon Simple Queue Service (SQS), and Amazon Elastic Load Balancing (ELB), have been built with fault tolerance and high availability in mind. Services that provide basic infrastructure, such as Amazon Elastic Compute Cloud (EC2) and Amazon Elastic Block Store (EBS), provide specific features, such as availability zones, elastic IP addresses, and snapshots, that a fault-tolerant and highly available system must take advantage of and use correctly. Just moving a system into the cloud doesn’t make it fault-tolerant or highly available.

22

3

4

SystemOverview

Load balancing is an effective way to increase the availability of a system. Instances that fail can be

replaced seamlessly behind the load balancer while other instances continue to operate. Elastic Load Balancing can be used to balance across instances in multiple availability zones of a region.

Availability zones (AZs) are distinct geographical locations that are engineered to be insulated from

failures in other AZs. By placing Amazon EC2 instances in multiple AZs, an application can be protected from failure at a single location. It is important to run independent application stacks in more than one AZ, either in the same region or in another region, so that if one zone fails, the application in the other zone can continue to run. When you design such a

system, you will need a good understanding of zone dependencies.

Elastic IP addresses are public IP addresses that can be programmatically mapped between instances within

a region. They are associated with the AWS account and not with a specific instance or lifetime of an instance.Elastic IP addresses can be used to work around host or availability zone failures by quickly remapping the address to another running instance or a replacement instance that was just started. Reserved instances can help guarantee that such capacity is available in another zone.

Valuable data should never be stored only on instance storage without proper backups, replication, or the

ability to re-create the data. Amazon Elastic Block Store (EBS) offers persistent off-instance storage volumes that are about an order of magnitude more durable than on-instance storage. EBS volumes are automatically replicated within a single availability zone. To increase durability further, point-in-time snapshots can be created to store data on volumes in Amazon S3, which is then replicated to multiple AZs. While EBS volumes are tied to a specific AZ, snapshots are tied to the region. Using a snapshot, you can create new EBS volumes in any of the AZs of the same region. This is an effective way to deal with disk failures or other host-level issues, as well as with problems affecting an AZ. Snapshots are incremental, so it is advisable to hold on to recent snapshots.

1

2

3

4

Given the straightforward, stateless client-server architecture in which web services are viewed as resources and can be identified by their URLs, development teams are free to create file sharing and syncing applications for their departments, for enterprises, or for consumers directly. This diagram represents the core architecture of a scalable and cost-effective file sharing and synchronization platform, using Amazon Web Services.

SystemOverview

FILE SYNCHRONIZATIONSERVICE

Amazon EC2


Amazon DynamoDB

Amazon Route 53

Auto Scaling

AWS

Reference

Architectures

Amazon S3

AWS STS

1 The file synchronization service endpoint consists of an Elastic Load Balancer distributing incoming requests

to a group of application servers hosted on Amazon Elastic Compute Cloud (Amazon EC2) instances. An Auto Scaling group automatically adjusts the number of Amazon EC2 instances depending on the application needs.

2 To upload a file, a client first needs to request the permission to the service and get a security token.

3 After checking the user's identity, application servers get a temporary credential from AWS Security Token

Service (STS). This credential allows users to upload files.

5 File metadata, version information, and unique identifiers are stored by the application servers on an

Amazon DynamoDB table. As the number of files to maintain in the application grows, Amazon DynamoDB tables can store and retrieve any amount of data, and serve

6 File change notifications can be sent via email to users following the resource with Amazon Simple Email

Service (Amazon SES), an easy-to-use, cost-effective email solution.

4 Users upload files into Amazon Simple Storage Service (Amazon S3), a highly durable storage

infrastructure designed for mission-critical and primary data storage. Amazon S3 makes it easy to store and retrieve any amount of data, at any time. Large files can be uploaded by the same client using multiple concurrent threads to maximize bandwidth usage.

7 Other clients sharing the same files will query the service endpoint to check if newer versions are

available. This query compares the list of local files checksums with the checksums listed in an Amazon DynamoDB table. If the query finds newer files, they can be retrieved from Amazon S3 and sent to the client application.

Amazon SES

any level of traffic.

Amazon

DynamoDB

Elastic Load

Balancing

AmazonS3

1

Amazon

Route 53

DNS

3

2

AmazonSES●

AutoScaling

AWSSTS

Application Servers

Application Servers

4

6

7

Files Repository

File

Followers

Email Sender

Security

Token

Service

File

Metadata

Store

5

AWS Direct

Connect

Cluster S

ubnet

10.0.1.0/24

Data Subnet

10.0.3.0/24

Customer

Gateway

CORPORATE

DATA CENTER

Grid Clie

nt Subnet

10.0.2.0/24

AVAILABILITY

ZONEVPCGateway

AmazonS3

AmazonDynamoDB

Amazon

Elastic

MapReduce

Grid

ClientGrid

Client

Grid

ControllerGrid

Controller

Application

Source DataApplication

Source Data

Amazon

Glacier

AmazonEC2

AmazonEC2

AmazonEC2

AmazonEC2

AmazonRDS

AmazonRDS

Bootstrap

Gridlib

Grid

Engine TierGrid

Engine Tier

CounterpartyData Source

CounterpartyData Source

TradeData Source

TradeData Source

MarketData Source

MarketData Source

EndUsers

Financial services grid computing on the cloud provides dynamic scalability and elasticity for operation when compute jobs are required, and utilizing services for aggregation that simplify the development of grid software.On demand provisioning of hardware, and template driven deployment, combined with low latency access to existing on-premise data sources make AWS a powerful platform for high performance grid computing systems.

SystemOverview

FINANCIAL SERVICESGRID COMPUTING

Amazon EC2

AWS Direct Connect

Amazon DynamoDBAmazon R

DS

Amazon Glacier

AWS

Reference

Architectures

Amazon S3

Amazon EMR

1

1 Date sources for market, trade, and counterparties are installed on startup from on premise data sources, or

from Amazon Simple Storage Service (Amazon S3).

2

2 AWS DirectConnect can be used to establish a low latency and reliable connection between the corporate

data center site and AWS, in 1 to 10Gbit increments. For situations with lower bandwidth requirements, a VPN connection to the VPC Gateway can be established.

3

3

3

3 Private subnetworks are specifically created for customer source data, compute grid clients, and the

grid controller and engines.

4 Application and corporate data can be securely stored in the cloud using the Amazon Relational Database

Service (Amazon RDS).

4

8

5

5 Grid controllers and grid engines are running Amazon Elastic Compute Cloud (Amazon EC2) instances

started on demand from Amazon Machine Images (AMIs) that contain the operating system and grid software.

6

6 Static data such as holiday calendars and QA libraries and additional gridlib bootstrapping data can be

downloaded on startup by grid engines from Amazon S3.

7 Grid engine results can be stored in Amazon DynamoDB, a fully managed database providing

configurable read and write throughput, allowing scalability on demand.

5

7

9

8 Results in Amazon DynamoDB are aggregated using a map/reduce job in Amazon Elastic MapReduce

(Amazon EMR) and final output is stored in Amazon S3.

9 The compute grid client collects aggregate results from Amazon S3.

10 Aggregate results can be archived using Amazon Glacier, a low-cost, secure, and durable storage service.

Amazon VPC

High throughput / Parallel upload

(or Import / Export)

Read/Write data from S3

(using HTTP or FUSE layer)

Download results

from S3 buckets

Alternate: upload

into EC2 / EBS

Alternate:

Download results

from EBS

Alternate:

Use EBS for staging,

temporary or result

storage

Alternate:

share results

using snapshots

Share results

from S3 buckets

AmazonEC2

AmazonS3

AmazonEBS

SystemOverview

LARGE SCALE COMPUTING& HUGE DATA SETS

Amazon Web Services is very popular for large-scale computing scenarios such as scientific computing, simulation, and research projects. These scenarios involve huge data sets collected from scientific equipment, measurement devices, or other compute jobs. After collection, these data sets need to be analyzed by large-scale compute jobs to generate result data sets. Ideally, results will be available as soon as the data is collected. Often, these results are then made available to a larger audience.

Amazon EC2

Amazon EBS

Amazon S3

AWS

Reference

Architectures

AWS Import /

Export3

2

1

4

4

To upload large data sets into AWS, it is critical to make the most of the available bandwidth. You can do so by

uploading data into Amazon Simple Storage Service (S3) in parallel from multiple clients, each using multithreading to enable concurrent uploads or multipart uploads for further parallelization. TCP settings like window scaling and selective acknowledgement can be adjusted to further enhance throughput. With the proper optimizations, uploads of several terabytes a day are possible. Another alternative for huge data sets might be Amazon Import/Export, which supports sending storage devices to AWS and inserting their contents directly into Amazon S3 or Amazon EBS volumes.

Parallel processing of large-scale jobs is critical, and existing parallel applications can typically be run on

multiple Amazon Elastic Compute Cloud (EC2) instances. A parallel application may sometimes assume large scratch areas that all nodes can efficiently read and write from. S3 can be used as such a scratch area, either directly using HTTP or using a FUSE layer (for example, s3fs or SubCloud) if the application expects a POSIX-style file system.

Once the job has completed and the result data is stored in Amazon S3, Amazon EC2 instances can be

shut down, and the result data set can be downloaded The

output data can be shared with others, either by granting read permissions to select users or to everyone or by using time limited URLs.

Instead of using Amazon S3, you can use Amazon EBS to stage the input set, act as a temporary storage

area, and/or capture the output set. During the upload, the concepts of parallel upload streams and TCP tweaking also apply. In addition, uploads that use UDP may increase speed further. The result data set can be written into EBS volumes, at which time snapshots of the volumes can be taken for sharing.

1 2

3

4

Since most businesses today have limited manpower, budget, and data center space, AWS offers a unique set of opportunities to compete and scale without having to invest in hardware, staff, or additional data center space. Utilizing AWS is not an all or nothing proposition. Depending on the project, different services can be used independently. This diagram shows an example of a highly available, durable, and cost-effective media sharing and processing platform.

SystemOverview

MEDIASHARING

Amazon EC2


Amazon CloudFront

Amazon Route 53

Auto Scaling

AWS

Reference

Architectures

Amazon S3

Media sharing is one of the hottest markets on the Internet. Customers have a staggering appetite for placing photos and videos on social networking sites, and for sharing their media in custom online photo albums. The growing popularity of media sharing means scaling problems for site owners, who face ever-increasing storage and bandwidth requirements and increased go-to-market pressure to deliver faster than the competition.

Amazon RDS

Amazon SQS

Amazon EC2 Spot

Edge Location

(Paris)

Media Files

Repository

Amazon

CloudFront

Content

Delivery

Network

ElasticLoadBalancing

Web Servers

Web Servers

Elastic Load

Balancing

AmazonS3

Data

Store

AutoScaling

Amazon

Route 53

DNS1

5

7

8

8

AmazonEC2

AutoScaling

Spot Instances

AmazonEC2

Upload

Web Servers

AutoScaling

6

Media ProcessingSubsystem

Media DistributionSubsystem

Job Queue

AmazonSQS

1 Sharing content first involves uploading media files to the online service. In this configuration, an Elastic

Load Balancer distributes incoming network traffic to upload servers, a dynamic fleet of Amazon Elastic Compute Cloud (Amazon EC2) instances. Amazon CloudWatch monitors these servers and an Auto Scaling group manages them, automatically scaling EC2 capacity up or down based on load. In this example, a separate endpoint to receive media uploads was created in order to off-load this task from the website's servers.

5 Once processing is completed, Amazon S3 stores the output files. Original files can be stored with high

durability. Processed files could use reduced redundancy.

2 Original uploaded files are stored in Amazon Simple Storage Service (Amazon S3), a highly available and

durable storage service.

4 The processing pipeline is a dedicated group of Amazon EC2 instances used to execute any kind of

post-processing task on the uploaded media files (video transcoding, image resizing, etc.). To automatically adjust the needed capacity, Auto Scaling manages this group. You can use Spot Instances to dynamically extend the capacity of the group and to significantly reduce the cost of file processing.

3 To submit a new file to be processed, upload web servers push a message into an Amazon Simple

Queue Service (Amazon SQS) queue. This queue acts as a communication pipeline between the file reception and file processing components. 6 Media-related data can be put in a relational database

like Amazon Relational Database Service (Amazon RDS) or in a key-value store like Amazon SimpleDB.

7 A third fleet of EC2 instances is dedicated to host the website front-end of the media sharing service.

8 Media files are distributed from Amazon S3 to the end user via Amazon CloudFront, a content delivery

network. Amazon CloudFront offers low-latency delivery through a worldwide network of edge locations.

4

4

Processing

PipelineProcessing

PipelineProcessing

PipelineProcessing

Pipeline

2

AmazonEC2

3

satisfactory player experience. Amazon Web Services provides different tools and services that can be used for building online games that scale under high usage traffic patterns. This document presents a cost-effective online game architecture featuring automatic capacity adjustment, a highly available and high-speed database, and a data processing cluster for player behavior analysis.

SystemOverview

ONLINEGAMES

Amazon EC2


Amazon DynamoDB

Amazon EMR

Auto Scaling

AWS

Reference

Architectures

Amazon S3

Online games back-end infrastructures can be challenging to maintain and operate. Peak usage periods, multiple players, and high volumes of write operations are some of the most common problems that operations teams face. But the most difficult challenge is ensuring flexibility in the scale of that system. A popular game might suddenly receive millions of users in a matter of hours, yet it must continue to provide a ______

Amazon SES

1 Browser games can be represented as client-server applications. The client generally consists of static files,

such as images, sounds, flash applications, or Java applets. Those files are hosted on Amazon Simple Storage Service (Amazon S3), a highly available and reliable data store.

5 Log files generated by each web server are pushed back into Amazon S3 for long-term storage.

2 As the user base grows and becomes more geographically distributed, a high-performance cache

like Amazon CloudFront can provide substantial improvements in latency, fault tolerance, and cost. By using Amazon S3 as the origin server for the Amazon CloudFront distribution, the game infrastructure benefits from fast network data transfer rates and a simple publishing/caching workflow.

3 Requests from the game application are distributed by Elastic Load Balancing to a group of web servers

running on Amazon Elastic Compute Cloud (Amazon EC2) instances. Auto Scaling automatically adjusts the size of this group, depending on rules like network load, CPU usage, and so on.

4 Player data is persisted on Amazon DynamoDB, a fully managed NoSQL database service. As the player

population grows, Amazon DynamoDB provides predictable performance with seamless scalability.

Amazon Route 53

6 Managing and analyzing high data volumes produced by online games platforms can be challenging. Amazon

Elastic MapReduce (Amazon EMR) is a service that processes vast amounts of data easily. Input data can be retrieved from web server logs stored on Amazon S3 or from player data stored in Amazon DynamoDB tables to run analytics on player behavior, usage patterns, etc. Those results can be stored again on Amazon S3, or inserted in a relational database for further analysis with classic business intelligence tools.

7 Based on the needs of the game, Amazon Simple Email Service (Amazon SES) can be used to send

email to players in a cost-effective and scalable way.

Amazon CloudFront

www.mygame.com

Amazon

Route 53

DNS

Resolution

Amazon

Dynamo DB

Game

interaction

(status, JSON, ...)

AutoScaling

AutoScaling

ElasticLoad

Balancing

Web

Servers

Amazon

CloudFront

Content

Delivery

Network

AmazonS3

Game files

(flash, applet, ...)

Files

Repository

Game

Database

Amazon

Elastic

MapReduce

Game

Analysis

log files

4

5

logfiles

Gameclientfiles

7

AmazonSES

●

6

PlayersEmail

Emitter

logfiles

21

3

AWS

DataPip

eline

This elasticity is achieved by using Auto Scaling groups for ingest processing, AWS Data Pipeline for scheduled Amazon Elastic MapReduce jobs, AWS Data Pipeline for intersystem data orchestration, and Amazon Redshift for potentially massive-scale analysis. Key architectural throttle points involving Amazon SQS for sensor message buffering and less frequent AWS Data Pipeline scheduling keep the overall solution costs predictable and controlled.

SystemOverview

TIME SERIESPROCESSING

Amazon EC2

Amazon EMR

Amazon DynamoDB

AWS Data Pipelin

e

Auto Scaling

AWS

Reference

Architectures

Amazon S3

When data arrives as a succession of regular measurements, it is known as time series information. Processing of time series information poses systems scaling challenges that the elasticity of AWS services is uniquely positioned to address.

Amazon SQS

Amazon EC2 Spot

2 Send messages to an Amazon Simple Queue Servicequeue for processing into Amazon DynamoDB using

autoscaled Amazon EC2 workers. Or, if the sensor source can do so, post sensor samples directly to Amazon DynamoDB. Try starting with a DynamoDB table that is a week-oriented, time-based table structure.

2

1

6

3

3 If a Supervisory Control and Data Acquisition (SCADA) system exists, create a flow of samples to or from

Amazon DynamoDB to support additional cloud processing or other existing systems, respectively.

4 Using AWS Data Pipeline, create a pipeline with a regular Amazon Elastic MapReduce job that both

calculates expensive sample processing and delivers samples and results.

4

7

7 The pipeline also optionally exports results in a format custom applications can accept.

Corporate

Data Center

AmazonSQS

AmazonDynam

oDB

AutoScaling

Worker

Nodes

Sensor Sampled Data

SCADA

AmazonS3

Remote Sensor

Messages

AmazonElastic

MapReduce

+EC2 Spot Instances

AmazonRedshift

5

Custom

Applicatio

n

5 The pipeline places results into Amazon Redshift for additional analysis.

8 Amazon Redshift optionally imports historic samples to reside with calculated results.

9 Using in-house or Amazon partner business intelligence solutions, Amazon Redshift supports

additional analysis on a potentially massive scale.

1 Remote devices such as power meters, mobile clients, ad-network clients, industrial meters, satellites, and

environmental meters measure the world around them and send sampled sensor data as messages via HTTP(S) for processing.

6 The pipeline exports historical week-oriented sample tables, from Amazon DynamoDB to

Amazon Simple Storage Service (Amazon S3)

Business

Inte

lligence

User

8

9

AmazonEC2

aws reference architectures

Documents