aws reference architectures
TRANSCRIPT
Elastic-Load
Balancing
A
Elastic-Load
Balancing
BAmazon
Route-)z
Amazon,,,,
CloudFront
AmazonSz
AutoScaling
AutoScaling
AmazonEC6
AmazonEC6
AutoScaling
AutoScaling
AmazonEC6
AmazonEC6
Database
Servers
Load
BalancerLoad
Balancer
Web
ServersWeb-
Servers
Application
ServersApplication
Servers
Application
ServersApplication
Servers
AmazonRDSMultiGAZ
Standby
AmazonRDSMultiGAZ
Standby
Synchronous-Replication
Elastic-Load
Balancing
AmazonRDSMaster
AmazonRDSMaster
Resources-and
Static-Content
Content
Delivery
Network
DNS-ResolutionWeb
ServersWeb-
Servers
Highly, available, and, scalable, web, hosting, can, be, complex, and,expensiveZ,Dense,peak,periods,and,wild,swings,in,traffic,patterns,result, in, low, utilization, of, expensive, hardwareZ, Amazon, Web,Services, provides, the, reliableR, scalableR, secureR, and, highPperformance, infrastructure, required, for, web, applications, while,enabling, an, elasticR, scalePout, and, scalePdown, infrastructure, to,match,IT,costs,in,real,time,as,customer,traffic,fluctuatesZ
SystemOverview
WEB-APPLICATIONHOSTING
Amazon,Route-)z
Amazon,Sz
Amazon,EC6
Elastic,Load,Balancing,Amazon,C
loudFront
AWS
Reference
Architectures
Auto-Scaling
Amazon,RDS
7
6
z
(
) 7
6
6
6
(
)
z7
7
7
The,userBs,DNS,requests,are,served,by,Amazon-Route-)zR, a, highly, available, Domain, Name, System, 'DNSN,
serviceZ, Network, traffic, is, routed, to, infrastructure, running, in,Amazon,Web,ServicesZ
StaticR, streamingR,and,dynamic,content, is,delivered,by,Amazon- CloudFrontR, a, global, network, of, edge,
locationsZ, Requests, are, automatically, routed, to, the, nearest,edge, locationR,so,content, is,delivered,with, the,best,possible,performanceZ
HTTP, requests, are, first, handled, by, Elastic- Load-BalancingR, which, automatically, distributes, incoming,
application,traffic,among,multiple,Amazon-Elastic-Compute-Cloud- OEC6P, instances, across, Availability, Zones, 'AZsNZ, It,enables, even, greater, fault, tolerance, in, your, applicationsR,seamlessly,providing,the,amount,of, , load,balancing,capacity,needed,in,response,to,incoming,,application,trafficZ
Web, servers, and, application, servers, are, deployed, on,Amazon, ECj, instancesZ, Most, organizations, will, select,
an,Amazon-Machine- Image- OAMIP,and, then,customize, it, to,their, needsZ, This, custom, AMI, will, then, become, the, starting,point,for,future,web,developmentZ
Web,servers,and,application,servers,are,deployed,in,an,Auto-Scaling,groupZ,Auto,Scaling,automatically,adjusts,
your,capacity,up,or,down,according,to,conditions,you,defineZ,With, Auto, ScalingR, you, can, ensure, that, the, number, of,Amazon- EC6, instances, you’re, using, increases, seamlessly,during, demand, spikes, to, maintain, performance, and,decreases,automatically,during,demand,to,minimize,costsZ
To,provide,high,availabilityR,the,relational,database,that,contains, applicationBs, data, is, hosted, redundantly, on, a,
multiPAZ, 'multiple, Availability, Zones–zones, A, and, B, hereN,deployment, of, Amazon- Relational- Database- Service,'Amazon,RDSNZ
Resources, and, static, content, used, by, the, web,application, are, stored, on, Amazon- Simple- Storage-
Service- OSzPR, a, highly, durable, storage, infrastructure,designed,for,missionPcritical,and,primary,data,storageZ
SystemOverview
3 The click-through servers are a group of Amazon EC2 instances dedicated to collecting click-through data.
This information is contained in the log files of the click-through web servers, which are periodically uploaded to Amazon S3.
ADVERTISEMENTSERVING
Amazon EC2
Amazon EC2 Spot
Amazon S3
Amazon Import/E
xport
AWS
Reference
Architectures
5 Data processing results are pushed back into Amazon DynamoDB, a fully managed NoSQL database service
that provides fast and predictable performance with seamless scalability. Amazon DynamoDB tables can store and retrieve any amount of data, and serve any level of request traffic, both of which are specific requirements for storing and quickly retrieving visitors’ profile information. The high availability and fast performance of Amazon DynamoDB enable ad server front-ends to serve requests with predictable response time, even with high traffic volumes or large profile’s data sets.
Amazon EMR
3
2
Amazon Dynamo DB
Click-through
Servers
5 AmazonS3Static Files
Repository
Ad servinginfrastructure
Click-throughinfrastructure
SpotInstances
Amazon
Elastic
MapReduce
Click-through
Requests
AutoScaling
Profiles
Database
Amazon
Dynamo DB
Ad Servers
Link to
Ad resource
AmazonEC2
Internet advertising services need to serve targeted advertising and must do so under limited time. These are just two of multiple technical challenges they face.
Amazon Web Services provides services and infrastructure to build reliable, fault-tolerant, and highly available ad serving platforms in the cloud. In this document, we describe the two main parts of such a system: ad serving infrastructure and click-through collection featuring a data analysis cluster.
4ElasticLoad
Balancing
Amazon
CloudFront
Content
Delivery
Network
Visitor
Advertisement
(image / video)
ImpressionLog Files
1 When visitors load a web page, ad servers return a pointer to the ad resource to be displayed. These
servers are running on Amazon Elastic Compute Cloud (Amazon EC2) instances. They query a data set stored in an Amazon DynamoDB table to find relevant ads depending on the user's profile.
2 Ad files are downloaded from Amazon CloudFront, a content delivery service with low latency, high data-
transfer speeds, and no commitments. Log information from displayed ads is stored on Amazon Simple Storage Service (Amazon S3), a highly available data store.
2
Click-through
Log Files
4 Ad impression and click-through data are retrieved and processed by an Amazon Elastic MapReduce cluster
using a hosted Hadoop framework to process the data in a parallel job flow. The cluster's capacity can be dynamically extended using Spot Instances to reduce the processing time and the cost of running the job flow.
Elastic Load BalancingAuto Scalin
g
Amazon CloudFront
1
AmazonSQS
AmazonSQS
— OR —
AmazonRDS
AmazonRDS
Amazon
SimpleDB
Master DB
Slave DB
Elastic IP
AmazonEC2
AmazonEC2
Control M
essages
AmazonS3
Job Info &
Analytics Store Worker
Nodes
Output
Queue
(Optional)
Input Queue
Job Data
StoreJob
Manager
AutoScaling
Optional Chaining
6
4
6 7
25
3
End User
1
BATCHPROCESSING
Batch processing architectures are often synonymous with highly variable usage patterns that have significant usage peaks (e.g., month-end processing) followed by significant periods of underutilization.There are numerous approaches to building a batch processing architecture. This document outlines a basic batch processing architecture that supports job scheduling, job status inspection, uploading raw data, outputting job results, grid management, and reporting job performance data.
Batch processing on AWS allows for the on-demand provisioning of a multi-part job processing architecture that can be used for instantaneous or delayed deployment of a heterogeneous, scalable “grid” of worker nodes that can quickly crunch through large batch processing tasks in parallel. There are numerous batch oriented applications in place today that can leverage this style of on-demand processing, including claims processing, large scale transformation, media transcoding and multi-part data processing work.
Amazon EC2
Amazon RDS
Amazon SQS
Amazon S3
Auto Scaling
AWS
Reference
Architectures
1
4
2
53
6
Amazon Sim
pleDB
SystemOverview
Users interact with the Job Manager application which is deployed on an Amazon Elastic Computer Cloud
(EC2) instance. This component controls the process of accepting, scheduling, starting, managing, and completing batch jobs. It also provides access to the final results, job and worker statistics, and job progress information.
Raw job data is uploaded to Amazon Simple Storage Service (S3), a highly-available and persistent data
store.
Individual job tasks are inserted by the Job Manager in an Amazon Simple Queue Service (SQS) input
queue on the user’s behalf.
Worker nodes are Amazon EC2 instances deployed on an Auto Scaling group. This group is a container
that ensures health and scalability of worker nodes. Worker nodes pick up job parts from the input queue automatically and perform single tasks that are part of the list of batch processing steps.
Interim results from worker nodes are stored in Amazon S3.
Progress information and statistics are stored on the analytics store. This component can be either an
Amazon SimpleDB domain or a relational database such as an Amazon Relational Database Service (RDS) instance.
Optionaly, completed tasks can be inserted in an Amazon SQS queue for chaining to a second
processing stage.
7
CONTENT &MEDIA SERVING
AWS provides a suite of services specifically tailored to deliver high-performance media serving. Each service features pay as you go pricing on an elastic infrastructure, meaning that you can scale up and down according to your demand curve while paying for only the resources you use. Because this infrastructure is programmable, it can react quickly. Our advanced API provides detailed control over the infrastructure that powers your system.
Serving digital content is one of the most basic and straightforward tasks—that is, until you have serious requirements for low latency, high availability, durability, access control, and millions of views on or under budget. In addition, because of “spiky” usage patterns, operations teams often need to provision static hardware, network, and management resources to support the maximum expected need, which guarantees waste outside of peak hours.
Amazon EC2
Amazon Route 53
Amazon S3
AWS
Reference
Architectures
Amazon CloudFront
SystemOverview
http://yourdomain/LiveMovie
Live Stream
Source
LiveMovie
LiveMovie
AmazonEC2
Adobe Flash
Media Server
Instancehttp://yourdomain/Content
http://yourdomain/Content
Content(cached)
Retrieves data
when cache miss
Content
Content
Amazon
CloudFront
AmazonS3
Amazon
CloudFront
1
4
Amazon
Route 53
Custom Origins
Content
Content
Content
Content
Retrieves data
when cache miss
Amazon EC2
Private Server
End User
3
2
Simple and Secure — This reference architecture uses Amazon Simple Storage Service (S3) to host
static content on the web. Amazon S3 is highly available, highly durable, and designed for web scale. It provides a great way to offload the work of serving static content from your web servers. You can also provide secure access to this content over HTTPS.
Faster and Edge Cached — As your customer base grows and becomes more geographically distributed,
using a high- performance edge cache like Amazon CloudFront can provide substantial improvements in latency, fault tolerance, and cost. By using Amazon S3 as the origin
server for the Amazon CloudFront distribution, you gain the advantages of fast in-network data transfer rates, simple publishing/caching workflow, and a unified security framework. Amazon S3 and Amazon CloudFront can be configured by a web service, the AWS Management Console, or a host of third-party management tools.
Alternatively, you could use Amazon Elastic Compute Cloud (EC2) as origin server of Amazon S3 for hosting
static content. Using Amazon EC2 could allow you a greater degree of control, logging, and feature richness in serving content. For static content, you could also substitute your own
on-premises or cohosted private servers as origin servers for Amazon CloudFront.
Live Streaming — Featuring the power of Adobe Flash Media Server hosted on Amazon EC2, combined with
Amazon CloudFront for stream distribution and caching, live streaming works seamlessly on the AWS platform. This configuration uses a web server to host the manifest.xml file, Amazon DevPay EC2 instances to host Flash Media Server with hourly license pricing, and Amazon CloudFront to serve the stream.d
Read more here: http://www.adobe.com/go/fmsaws
1
2 3
4
On-premises
On-premises
AWS Cloud
AWS Cloud
SystemOverview
DISASTER RECOVERYFOR LOCAL APPLICATIONS
Disaster recovery is about preparing for and recovering from any event that has a negative impact on your IT systems. A typical approach involves duplicating infrastructure to ensure the availability of spare capacity in the event of a disaster. Amazon Web Services allows you to scale up your infrastructure on an as-needed basis. For a disaster recovery solution, this results in significant cost savings. The following diagram shows an example of a disaster recovery setup for a local application.
Amazon EC2
Amazon VPC
Amazon S3
AWS Storage Gateway
AWS
Reference
Architectures
Amazon EBS
1 A corporate data center hosts an application consisting of a database server and an application server with
local storage for a content management system.
2 AWS Storage Gateway is a service connecting an on-premises software appliance with cloud-based storage.
AWS Storage Gateway securely uploads data to the AWS cloud for cost effective backup and rapid disaster recovery.
3 Database server backups, application server volume snapshots, and Amazon Machine Images (AMI) of the
5 The application and database servers are recreated using Amazon EC2. To restore volume snapshots, you
can use Amazon Elastic Block Store (EBS) volumes, which are then attached to the recovered application server.
on Amazon Virtual Private Cloud (Amazon VPC). Amazon VPC lets you provision a private, isolated section of the AWS cloud where you can recreate your application.
6 To remotely access the recovered application, you use a VPN connection created by using the VPC Gateway.
recovery servers are stored on Amazon Simple Storage Service (Amazon S3), a highly durable and cost-effective data store. AMIs are pre-configured operating system and application software that are used to create a virtual machine Amazon Elastic Compute Cloud (Amazon EC2). Oracle databases can directly back up to Amazon S3 using the Oracle Secure Backup (OSB) Cloud Module.
4 In case of disaster in the corporate data center, you can recreate the complete infrastructure from the backups
AmazonS3
AmazonVPC
AWS
Storage
Gateway
VPC
GatewayVPC
Gateway
Corporate
User
Database
Server
(Recovery)Database
Server
(Recovery)
Application
Server
(Recovery)Application
Server
(Recovery)
4
3Internet
GatewaySnapshots
AMIsFiles
Storage Volumes
SecureConnection
SecureConnection
AmazonEC2
AmazonEC2
2
1
5
Oracle Secure
Backups
Secure
Connection
Secure
Connection
Application
Server
(Production)Application
Server
(Production)
Database
Server
(Production)Database
Server
(Production)
Corporate
Data Center
Data Restoration
AmazonEBS
AmazonEBS
5
6
Customers want to find the products they are interested in quickly, and they expect pages to load quickly. Worldwide customers want to be able to make purchases at any time, so the website should be highly available. Meeting these challenges becomes harder as your catalog and customer base grow.With the tools that AWS provides, you can build a compelling, scalable website with a searchable product catalog that is accessible with very low latency.
SystemOverview
E-COMMERCEWEB SITEPART 1: WEB FRONT-END
Amazon Route 53
Amazon DynamoDB
Amazon ElastiCacheAWS Elastic Beanstalk
AWS
Reference
Architectures
Amazon S3
With Amazon Web Services, you can build a highly available e-commerce website with a flexible product catalog that scales with your business.Maintaining an e-commerce website with a large product catalog and global customer base can be challenging. The catalog should be searchable, and individual product pages should contain a rich information set that includes, for example, images, a PDF manual, and customer reviews.
Amazon CloudFront
1 DNS requests to the e-commerce website are handled by Amazon Route 53, a highly available Domain Name
System (DNS) service.
5 Amazon DynamoDB is a fully-managed, high performance, NoSQL database service that is easy to
set up, operate, and scale. It is used both as a session store for persistent session data, such as the shopping cart, and as the product database. Because DynamoDB does not have a schema, we have a great deal of flexibility in adding new product categories and attributes to the catalog.
2 Amazon CloudFront is a content distribution network (CDN) with edge locations around the globe. It can
cache static and streaming content and deliver dynamic content with low latency from locations close to the customer.
3 The e-commerce application is deployed by AWS Elastic Beanstalk, which automatically handles the
details of capacity provisioning, load balancing, auto scaling, and application health monitoring.
4 Amazon Simple Storage Service (Amazon S3) stores all static catalog content, such as product images,
manuals, and videos, as well as all log files and clickstream information from Amazon CloudFront and the e-commerce application.
6 Amazon ElastiCache is used as a session store for volatile data and as a caching layer for the product
catalog to reduce I/O (and cost) on DynamoDB.
7 Product catalog data is loaded into Amazon CloudSearch, a fully managed search service that
provides fast and highly scalable search functionality.
8 When customers check out their products, they are redirected to an SSL-encrypted checkout service.
9 A marketing and recommendation service consumes log data stored on Amazon S3 to provide the customer
with product recommendations.
Amazon CloudSearch
Customer
AWSElasticBeanstalk
AmazonCloudFront
AmazonRoute 53
1
6
Amazon
ElastiCache
5
9
AWSElasticBeanstalk
AWSElasticBeanstalk
AWSElastic
Beanstalk
AWSElastic
Beanstalk
Amazon
CloudSearch
AmazonS3
Amazon
DynamoDBLOGS
MARKETING AND
RECOMMENDATION
SERVICEPart
3
CHECKOUT
SERVICEPart
2
E-commerce
Application
Recommendation
Web Service
Recommendation
Web Service
Catalog Cache &
Transient Session
StoreSearch
EngineProduct Catalog &
Persistent Session
Store
Checkout
ApplicationCheckout
Application
Log File Repository &
Static Catalog Content
DNS
2
3
4
7
8
SecureConnection
SecureConnection
Customers expect their private data, such as their purchase history and their credit card information, to be managed on a secure infrastructure and application stack. AWS has achieved multiple security certifications relevant to e-commerce business, including the Payment Cards Industry (PCI) Data Security Standard (DSS).With the tools that AWS provides, you can build a secure checkout service that manages the purchasing workflow from order to fulfillment.
SystemOverview
Amazon VPC
Amazon SESAmazon EC2Elastic B
eanstalk
AWS
Reference
Architectures
Amazon RDS
With Amazon Web Services, you can build a secure and highly available checkout service for your e-commerce website that scales with your business. Managing the checkout process involves many steps, which have to be coordinated. Some steps, such as credit card transactions, are subject to specific regulatory requirements. Other parts of the process involve manual labor, such as picking, packing, and shipping items from a warehouse.
Amazon SWF
1 The e-commerce web front end redirects the customer to an SSL-encrypted checkout application to
authenticate the customer and execute a purchase.
5 SWF Workers are deployed on Amazon EC2 instances within a private subnet. The EC2 instances
are part of an Auto Scaling group, which can scale in and out according to demand. The Workers manage the different steps of the checkout pipeline, such as validating the order, reserving and charging the credit card, and triggering the sending of order and shipping confirmation emails.
2 The checkout application, which is deployed by AWS Elastic Beanstalk, uses Amazon Simple Workflow
Service (Amazon SWF) to authenticate the customer and trigger a new order workflow.
3 Amazon SWF coordinates all running order workflows by using SWF Deciders and SWF Workers.
4 The SWF Decider implements the workflow logic. It runs on an Amazon Elastic Compute Cloud (Amazon
EC2) instance within a private subnet that is isolated from the public Internet.
6 SWF Workers can also be implemented on mobile devices, such as tablets or smartphones, in order to
integrate pick, pack, and ship steps into the overall order workflow.
7 Amazon Simple Email Service (Amazon SES) is used to send transactional email, such as order and shipping
confirmations, to the customer.
8 To provide high availability, the customer and orders databases are hosted redundantly on a multi-AZ (multi
Availability Zone) deployment of Amazon Relational Database Service (Amazon RDS)within private subnets that are isolated from the public Internet.
E-COMMERCEWEB SITEPART 2: CHECKOUT SERVICE
●
7
2
1
54
AWSElastic
Beanstalk
AWSElastic
Beanstalk
WEB FRONT-END
Part
1 E-Commerce
ApplicationE-Commerce
Application
Customer
AWS
Elastic
Beanstalk
AmazonSES
Checkout
Application
Service
Customers & Orders
Database
Mobile Workers
(in warehouse)
AutoScaling
AutoScaling
AmazonSWF
AmazonRDSMaster
AmazonRDS
Multi-AZ Standby
Workers
Workers
DeciderDecider
Order Emails
6
8
Workflow
Service 3
CHECKOUT
SERVICEPart
2AWS
Elastic
Beanstalk
●AmazonSES
7
5
3
AmazonRDS
Read Replica
AmazonElastic
MapReduce
Amazon
DynamoDB AmazonS3
AmazonS3
1
Service
Marketing
Mgmt App
User
Profiles
AmazonRDSMaster
AmazonRDSMaster
Customer
& Orders DBCustomer
& Orders DB
Log File
RepositoryLog File
Repository
Recommendation
Web Service
Customer
& Orders DB
Read Replica
AWS
Elastic
Beanstalk
AWS
Elastic
Beanstalk
AWSElastic
Beanstalk
WEB FRONT-END
Part
1
E-commerce
ApplicationE-commerce
Application
Marketing
Emails
Marketing
Manager
Customers
46
2
The insights that you gain about your customers can also be used to manage personalized marketing campaigns targeted at specific customer segments. With the tools that AWS provides, you can build highly scalable recommendation services that can be consumed by different channels, such as dynamic product recommendations on the e - commerce website or targeted email campaigns for your customers.
SystemOverview
E-COMMERCEWEBSITEPART 3: MARKETING & RECOMMENDATIONS
Amazon EMR
Amazon SES
AWS Elastic
Beanstalk
AWS Elastic Beanstalk
AWS
Reference
Architectures
Amazon RDS
With Amazon Web Services, you can build a recommendation and marketing service to manage targeted marketing campaigns and offer personalized product recommendations to customers who are browsing your e-commerce site.In order to build such a service, you have to process very large amounts of data from multiple data sources. The resulting user profile information has to be available to deliver real-time product recommendations on your e-commerce website.
Amazon S3
1 Amazon Elastic MapReduce (Amazon EMR) is a hosted Hadoop framework that runs on Amazon Elastic
Compute Cloud (Amazon EC2) instances. It aggregates and processes user data from server log files and from the customer´s purchase history.
5 A recommendation web service used by the web front end is deployed by AWS Elastic Beanstalk. This
service uses the profile information stored on Amazon DynamoDB to provide personalized recommendations to be mm
shown on the e-commerce web front end.
2 An Amazon Relational Database Services (Amazon RDS) Read Replica of customer and order databases is
used by Amazon EMR to compute user profiles and by Amazon Simple Email Service (Amazon SES) to send targeted marketing emails to customers.
3 Log files produced by the e-commerce web front end have been stored on Amazon Simple Storage Service
(Amazon S3) and are consumed by the Amazon EMR cluster to compute user profiles.
4 User profile information generated by the Amazon EMR cluster is stored in Amazon DynamoDB, a scalable,
high-performance managed NoSQL database that can serve recommendations with low latency.
6 A marketing administration application deployed by AWS Elastic Beanstalk is being used by marketing
managers to send targeted email campaigns to customers with specific user profiles. The application reads customer email addresses from an Amazon RDS Read Replica of the customer database.
7 Amazon SES is used to send marketing emails to customers. Amazon SES is based on the scalable
technology used by Amazon web sites around the world to send billions of messages a year.
Amazon DynamoDB
Application(active)
On Failure:
Replace instanceand
re-attachvolume
On Failure:
Replace instance and
re-attach volume
AmazonS3
Snapshots
Elastic IP
Application(standby)
Application
(Replacement)
B
AAmazonEBS
AmazonEBSAmazon
EC2
End User
End User
BAvailability
Zone
Avoid
unnecessary
dependencies
Avoid
unnecessary
dependencies
Ability to fail over
Ability to fail over
Replicated
Data Layer
Elastic Load
Balancing
AAvailability
Zone
Web
ServerWeb
Server
Application
ServerApplication
Server
Database
Server
Amazon EC2
Amazon EBS
Amazon S3
2
2
AWS
Reference
Architectures
1
4
Elastic Load Balancing
Amazon Web Services provides services and infrastructure to build reliable, fault-tolerant, and highly available systems in the cloud.These qualities have been designed into our services both by handling such aspects without any special action by you and by providing features that must be used explicitly and correctly.
Amazon EC2 provides infrastructure building blocks that, by themselves, may not be fault-tolerant. Hard drives may fail, power supplies may fail, and racks may fail. It is important to use combinations of the features presented in this document to achieve fault tolerance and high availability.
FAULT TOLERANCE& HIGH AVAILABILITY
Fault Tolerance and High Availability of Amazon Web Services
Most of the higher-level services, such as Amazon Simple Storage Service (S3), Amazon SimpleDB, Amazon Simple Queue Service (SQS), and Amazon Elastic Load Balancing (ELB), have been built with fault tolerance and high availability in mind. Services that provide basic infrastructure, such as Amazon Elastic Compute Cloud (EC2) and Amazon Elastic Block Store (EBS), provide specific features, such as availability zones, elastic IP addresses, and snapshots, that a fault-tolerant and highly available system must take advantage of and use correctly. Just moving a system into the cloud doesn’t make it fault-tolerant or highly available.
22
3
4
SystemOverview
Load balancing is an effective way to increase the availability of a system. Instances that fail can be
replaced seamlessly behind the load balancer while other instances continue to operate. Elastic Load Balancing can be used to balance across instances in multiple availability zones of a region.
Availability zones (AZs) are distinct geographical locations that are engineered to be insulated from
failures in other AZs. By placing Amazon EC2 instances in multiple AZs, an application can be protected from failure at a single location. It is important to run independent application stacks in more than one AZ, either in the same region or in another region, so that if one zone fails, the application in the other zone can continue to run. When you design such a
system, you will need a good understanding of zone dependencies.
Elastic IP addresses are public IP addresses that can be programmatically mapped between instances within
a region. They are associated with the AWS account and not with a specific instance or lifetime of an instance.Elastic IP addresses can be used to work around host or availability zone failures by quickly remapping the address to another running instance or a replacement instance that was just started. Reserved instances can help guarantee that such capacity is available in another zone.
Valuable data should never be stored only on instance storage without proper backups, replication, or the
ability to re-create the data. Amazon Elastic Block Store (EBS) offers persistent off-instance storage volumes that are about an order of magnitude more durable than on-instance storage. EBS volumes are automatically replicated within a single availability zone. To increase durability further, point-in-time snapshots can be created to store data on volumes in Amazon S3, which is then replicated to multiple AZs. While EBS volumes are tied to a specific AZ, snapshots are tied to the region. Using a snapshot, you can create new EBS volumes in any of the AZs of the same region. This is an effective way to deal with disk failures or other host-level issues, as well as with problems affecting an AZ. Snapshots are incremental, so it is advisable to hold on to recent snapshots.
1
2
3
4
Given the straightforward, stateless client-server architecture in which web services are viewed as resources and can be identified by their URLs, development teams are free to create file sharing and syncing applications for their departments, for enterprises, or for consumers directly. This diagram represents the core architecture of a scalable and cost-effective file sharing and synchronization platform, using Amazon Web Services.
SystemOverview
FILE SYNCHRONIZATIONSERVICE
Amazon EC2
Elastic Load Balancing
Amazon DynamoDB
Amazon Route 53
Auto Scaling
AWS
Reference
Architectures
Amazon S3
AWS STS
1 The file synchronization service endpoint consists of an Elastic Load Balancer distributing incoming requests
to a group of application servers hosted on Amazon Elastic Compute Cloud (Amazon EC2) instances. An Auto Scaling group automatically adjusts the number of Amazon EC2 instances depending on the application needs.
2 To upload a file, a client first needs to request the permission to the service and get a security token.
3 After checking the user's identity, application servers get a temporary credential from AWS Security Token
Service (STS). This credential allows users to upload files.
5 File metadata, version information, and unique identifiers are stored by the application servers on an
Amazon DynamoDB table. As the number of files to maintain in the application grows, Amazon DynamoDB tables can store and retrieve any amount of data, and serve
6 File change notifications can be sent via email to users following the resource with Amazon Simple Email
Service (Amazon SES), an easy-to-use, cost-effective email solution.
4 Users upload files into Amazon Simple Storage Service (Amazon S3), a highly durable storage
infrastructure designed for mission-critical and primary data storage. Amazon S3 makes it easy to store and retrieve any amount of data, at any time. Large files can be uploaded by the same client using multiple concurrent threads to maximize bandwidth usage.
7 Other clients sharing the same files will query the service endpoint to check if newer versions are
available. This query compares the list of local files checksums with the checksums listed in an Amazon DynamoDB table. If the query finds newer files, they can be retrieved from Amazon S3 and sent to the client application.
Amazon SES
any level of traffic.
Amazon
DynamoDB
Elastic Load
Balancing
AmazonS3
1
Amazon
Route 53
DNS
3
2
AmazonSES●
AutoScaling
AWSSTS
Application Servers
Application Servers
4
6
7
Files Repository
File
Followers
Email Sender
Security
Token
Service
File
Metadata
Store
5
AWS Direct
Connect
Cluster S
ubnet
10.0.1.0/24
Data Subnet
10.0.3.0/24
Customer
Gateway
CORPORATE
DATA CENTER
Grid Clie
nt Subnet
10.0.2.0/24
AVAILABILITY
ZONEVPCGateway
AmazonS3
AmazonDynamoDB
Amazon
Elastic
MapReduce
Grid
ClientGrid
Client
Grid
ControllerGrid
Controller
Application
Source DataApplication
Source Data
Amazon
Glacier
AmazonEC2
AmazonEC2
AmazonEC2
AmazonEC2
AmazonRDS
AmazonRDS
Bootstrap
Gridlib
Grid
Engine TierGrid
Engine Tier
CounterpartyData Source
CounterpartyData Source
TradeData Source
TradeData Source
MarketData Source
MarketData Source
EndUsers
Financial services grid computing on the cloud provides dynamic scalability and elasticity for operation when compute jobs are required, and utilizing services for aggregation that simplify the development of grid software.On demand provisioning of hardware, and template driven deployment, combined with low latency access to existing on-premise data sources make AWS a powerful platform for high performance grid computing systems.
SystemOverview
FINANCIAL SERVICESGRID COMPUTING
Amazon EC2
AWS Direct Connect
Amazon DynamoDBAmazon R
DS
Amazon Glacier
AWS
Reference
Architectures
Amazon S3
Amazon EMR
1
1 Date sources for market, trade, and counterparties are installed on startup from on premise data sources, or
from Amazon Simple Storage Service (Amazon S3).
2
2 AWS DirectConnect can be used to establish a low latency and reliable connection between the corporate
data center site and AWS, in 1 to 10Gbit increments. For situations with lower bandwidth requirements, a VPN connection to the VPC Gateway can be established.
3
3
3
3 Private subnetworks are specifically created for customer source data, compute grid clients, and the
grid controller and engines.
4 Application and corporate data can be securely stored in the cloud using the Amazon Relational Database
Service (Amazon RDS).
4
8
5
5 Grid controllers and grid engines are running Amazon Elastic Compute Cloud (Amazon EC2) instances
started on demand from Amazon Machine Images (AMIs) that contain the operating system and grid software.
6
6 Static data such as holiday calendars and QA libraries and additional gridlib bootstrapping data can be
downloaded on startup by grid engines from Amazon S3.
7 Grid engine results can be stored in Amazon DynamoDB, a fully managed database providing
configurable read and write throughput, allowing scalability on demand.
5
7
9
8 Results in Amazon DynamoDB are aggregated using a map/reduce job in Amazon Elastic MapReduce
(Amazon EMR) and final output is stored in Amazon S3.
9 The compute grid client collects aggregate results from Amazon S3.
10 Aggregate results can be archived using Amazon Glacier, a low-cost, secure, and durable storage service.
Amazon VPC
High throughput / Parallel upload
(or Import / Export)
Read/Write data from S3
(using HTTP or FUSE layer)
Download results
from S3 buckets
Alternate: upload
into EC2 / EBS
Alternate:
Download results
from EBS
Alternate:
Use EBS for staging,
temporary or result
storage
Alternate:
share results
using snapshots
Share results
from S3 buckets
AmazonEC2
AmazonS3
AmazonEBS
SystemOverview
LARGE SCALE COMPUTING& HUGE DATA SETS
Amazon Web Services is very popular for large-scale computing scenarios such as scientific computing, simulation, and research projects. These scenarios involve huge data sets collected from scientific equipment, measurement devices, or other compute jobs. After collection, these data sets need to be analyzed by large-scale compute jobs to generate result data sets. Ideally, results will be available as soon as the data is collected. Often, these results are then made available to a larger audience.
Amazon EC2
Amazon EBS
Amazon S3
AWS
Reference
Architectures
AWS Import /
Export3
2
1
4
4
To upload large data sets into AWS, it is critical to make the most of the available bandwidth. You can do so by
uploading data into Amazon Simple Storage Service (S3) in parallel from multiple clients, each using multithreading to enable concurrent uploads or multipart uploads for further parallelization. TCP settings like window scaling and selective acknowledgement can be adjusted to further enhance throughput. With the proper optimizations, uploads of several terabytes a day are possible. Another alternative for huge data sets might be Amazon Import/Export, which supports sending storage devices to AWS and inserting their contents directly into Amazon S3 or Amazon EBS volumes.
Parallel processing of large-scale jobs is critical, and existing parallel applications can typically be run on
multiple Amazon Elastic Compute Cloud (EC2) instances. A parallel application may sometimes assume large scratch areas that all nodes can efficiently read and write from. S3 can be used as such a scratch area, either directly using HTTP or using a FUSE layer (for example, s3fs or SubCloud) if the application expects a POSIX-style file system.
Once the job has completed and the result data is stored in Amazon S3, Amazon EC2 instances can be
shut down, and the result data set can be downloaded The
output data can be shared with others, either by granting read permissions to select users or to everyone or by using time limited URLs.
Instead of using Amazon S3, you can use Amazon EBS to stage the input set, act as a temporary storage
area, and/or capture the output set. During the upload, the concepts of parallel upload streams and TCP tweaking also apply. In addition, uploads that use UDP may increase speed further. The result data set can be written into EBS volumes, at which time snapshots of the volumes can be taken for sharing.
1 2
3
4
Since most businesses today have limited manpower, budget, and data center space, AWS offers a unique set of opportunities to compete and scale without having to invest in hardware, staff, or additional data center space. Utilizing AWS is not an all or nothing proposition. Depending on the project, different services can be used independently. This diagram shows an example of a highly available, durable, and cost-effective media sharing and processing platform.
SystemOverview
MEDIASHARING
Amazon EC2
Elastic Load Balancing
Amazon CloudFront
Amazon Route 53
Auto Scaling
AWS
Reference
Architectures
Amazon S3
Media sharing is one of the hottest markets on the Internet. Customers have a staggering appetite for placing photos and videos on social networking sites, and for sharing their media in custom online photo albums. The growing popularity of media sharing means scaling problems for site owners, who face ever-increasing storage and bandwidth requirements and increased go-to-market pressure to deliver faster than the competition.
Amazon RDS
Amazon SQS
Amazon EC2 Spot
Edge Location
(Paris)
Media Files
Repository
Amazon
CloudFront
Content
Delivery
Network
ElasticLoadBalancing
Web Servers
Web Servers
Elastic Load
Balancing
AmazonS3
Data
Store
AutoScaling
Amazon
Route 53
DNS1
5
7
8
8
AmazonEC2
AutoScaling
Spot Instances
AmazonEC2
Upload
Web Servers
AutoScaling
6
Media ProcessingSubsystem
Media DistributionSubsystem
Job Queue
AmazonSQS
1 Sharing content first involves uploading media files to the online service. In this configuration, an Elastic
Load Balancer distributes incoming network traffic to upload servers, a dynamic fleet of Amazon Elastic Compute Cloud (Amazon EC2) instances. Amazon CloudWatch monitors these servers and an Auto Scaling group manages them, automatically scaling EC2 capacity up or down based on load. In this example, a separate endpoint to receive media uploads was created in order to off-load this task from the website's servers.
5 Once processing is completed, Amazon S3 stores the output files. Original files can be stored with high
durability. Processed files could use reduced redundancy.
2 Original uploaded files are stored in Amazon Simple Storage Service (Amazon S3), a highly available and
durable storage service.
4 The processing pipeline is a dedicated group of Amazon EC2 instances used to execute any kind of
post-processing task on the uploaded media files (video transcoding, image resizing, etc.). To automatically adjust the needed capacity, Auto Scaling manages this group. You can use Spot Instances to dynamically extend the capacity of the group and to significantly reduce the cost of file processing.
3 To submit a new file to be processed, upload web servers push a message into an Amazon Simple
Queue Service (Amazon SQS) queue. This queue acts as a communication pipeline between the file reception and file processing components. 6 Media-related data can be put in a relational database
like Amazon Relational Database Service (Amazon RDS) or in a key-value store like Amazon SimpleDB.
7 A third fleet of EC2 instances is dedicated to host the website front-end of the media sharing service.
8 Media files are distributed from Amazon S3 to the end user via Amazon CloudFront, a content delivery
network. Amazon CloudFront offers low-latency delivery through a worldwide network of edge locations.
4
4
Processing
PipelineProcessing
PipelineProcessing
PipelineProcessing
Pipeline
2
AmazonEC2
3
satisfactory player experience. Amazon Web Services provides different tools and services that can be used for building online games that scale under high usage traffic patterns. This document presents a cost-effective online game architecture featuring automatic capacity adjustment, a highly available and high-speed database, and a data processing cluster for player behavior analysis.
SystemOverview
ONLINEGAMES
Amazon EC2
Elastic Load Balancing
Amazon DynamoDB
Amazon EMR
Auto Scaling
AWS
Reference
Architectures
Amazon S3
Online games back-end infrastructures can be challenging to maintain and operate. Peak usage periods, multiple players, and high volumes of write operations are some of the most common problems that operations teams face. But the most difficult challenge is ensuring flexibility in the scale of that system. A popular game might suddenly receive millions of users in a matter of hours, yet it must continue to provide a ______
Amazon SES
1 Browser games can be represented as client-server applications. The client generally consists of static files,
such as images, sounds, flash applications, or Java applets. Those files are hosted on Amazon Simple Storage Service (Amazon S3), a highly available and reliable data store.
5 Log files generated by each web server are pushed back into Amazon S3 for long-term storage.
2 As the user base grows and becomes more geographically distributed, a high-performance cache
like Amazon CloudFront can provide substantial improvements in latency, fault tolerance, and cost. By using Amazon S3 as the origin server for the Amazon CloudFront distribution, the game infrastructure benefits from fast network data transfer rates and a simple publishing/caching workflow.
3 Requests from the game application are distributed by Elastic Load Balancing to a group of web servers
running on Amazon Elastic Compute Cloud (Amazon EC2) instances. Auto Scaling automatically adjusts the size of this group, depending on rules like network load, CPU usage, and so on.
4 Player data is persisted on Amazon DynamoDB, a fully managed NoSQL database service. As the player
population grows, Amazon DynamoDB provides predictable performance with seamless scalability.
Amazon Route 53
6 Managing and analyzing high data volumes produced by online games platforms can be challenging. Amazon
Elastic MapReduce (Amazon EMR) is a service that processes vast amounts of data easily. Input data can be retrieved from web server logs stored on Amazon S3 or from player data stored in Amazon DynamoDB tables to run analytics on player behavior, usage patterns, etc. Those results can be stored again on Amazon S3, or inserted in a relational database for further analysis with classic business intelligence tools.
7 Based on the needs of the game, Amazon Simple Email Service (Amazon SES) can be used to send
email to players in a cost-effective and scalable way.
Amazon CloudFront
www.mygame.com
Amazon
Route 53
DNS
Resolution
Amazon
Dynamo DB
Game
interaction
(status, JSON, ...)
AutoScaling
AutoScaling
ElasticLoad
Balancing
Web
Servers
Amazon
CloudFront
Content
Delivery
Network
AmazonS3
Game files
(flash, applet, ...)
Files
Repository
Game
Database
Amazon
Elastic
MapReduce
Game
Analysis
log files
4
5
logfiles
Gameclientfiles
7
AmazonSES
●
6
PlayersEmail
Emitter
logfiles
21
3
AWS
DataPip
eline
This elasticity is achieved by using Auto Scaling groups for ingest processing, AWS Data Pipeline for scheduled Amazon Elastic MapReduce jobs, AWS Data Pipeline for intersystem data orchestration, and Amazon Redshift for potentially massive-scale analysis. Key architectural throttle points involving Amazon SQS for sensor message buffering and less frequent AWS Data Pipeline scheduling keep the overall solution costs predictable and controlled.
SystemOverview
TIME SERIESPROCESSING
Amazon EC2
Amazon EMR
Amazon DynamoDB
AWS Data Pipelin
e
Auto Scaling
AWS
Reference
Architectures
Amazon S3
When data arrives as a succession of regular measurements, it is known as time series information. Processing of time series information poses systems scaling challenges that the elasticity of AWS services is uniquely positioned to address.
Amazon SQS
Amazon EC2 Spot
2 Send messages to an Amazon Simple Queue Servicequeue for processing into Amazon DynamoDB using
autoscaled Amazon EC2 workers. Or, if the sensor source can do so, post sensor samples directly to Amazon DynamoDB. Try starting with a DynamoDB table that is a week-oriented, time-based table structure.
2
1
6
3
3 If a Supervisory Control and Data Acquisition (SCADA) system exists, create a flow of samples to or from
Amazon DynamoDB to support additional cloud processing or other existing systems, respectively.
4 Using AWS Data Pipeline, create a pipeline with a regular Amazon Elastic MapReduce job that both
calculates expensive sample processing and delivers samples and results.
4
7
7 The pipeline also optionally exports results in a format custom applications can accept.
Corporate
Data Center
AmazonSQS
AmazonDynam
oDB
AutoScaling
Worker
Nodes
Sensor Sampled Data
SCADA
AmazonS3
Remote Sensor
Messages
AmazonElastic
MapReduce
+EC2 Spot Instances
AmazonRedshift
5
Custom
Applicatio
n
5 The pipeline places results into Amazon Redshift for additional analysis.
8 Amazon Redshift optionally imports historic samples to reside with calculated results.
9 Using in-house or Amazon partner business intelligence solutions, Amazon Redshift supports
additional analysis on a potentially massive scale.
1 Remote devices such as power meters, mobile clients, ad-network clients, industrial meters, satellites, and
environmental meters measure the world around them and send sampled sensor data as messages via HTTP(S) for processing.
6 The pipeline exports historical week-oriented sample tables, from Amazon DynamoDB to
Amazon Simple Storage Service (Amazon S3)
Business
Inte
lligence
User
8
9
AmazonEC2