![Page 1: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/1.jpg)
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
DAT203 - AWS Storage and Database
Architecture Best Practices
Siva Raghupathy, Amazon Web Services
![Page 2: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/2.jpg)
The Third Platform
• Built on:
– Mobile devices
– Cloud services
– Social technologies
– Big data
• Billions of users
• Millions of apps
![Page 3: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/3.jpg)
Data Volume, Velocity, Variety
• 2.7 zettabytes (ZB) of data exists in the digital universe today – 1 ZB = 1 billion terabytes
• 450 billion transaction per day by 2020
• More unstructured data than structured data
![Page 4: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/4.jpg)
Common Questions from Database Developers
Cloud Migration
• How do I move (my data) to the
cloud?
Data/Storage Technologies
• What data store should I use?
– SQL or NoSQL?
– Hadoop or DW?
– What about search?
Management Concerns
• Is my data (in the cloud) secure?
• Relational features w/o management
nightmares?
• My data volume, velocity, and variety
are exploding!
• How can I reduce cost?
Performance and Delivery
• Need low latency (ms or µs)
• Need high throughput
• Need to ship in days – not years!
![Page 5: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/5.jpg)
Cloud Data Tier Anti-Pattern
Data Tier
![Page 6: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/6.jpg)
Cloud Data Tier Architecture – Use the Right Tool for the Job!
App/Web Tier
Client Tier
Data Tier
Search
Hadoop
Cache ETL Blob Store
SQL NoSQL Data
Warehouse
![Page 7: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/7.jpg)
![Page 8: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/8.jpg)
Compute Storage
AWS Global Infrastructure
Database
App Services
Deployment & Administration
Networking
AWS
![Page 9: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/9.jpg)
AWS Managed Database & Storage Services
Structured – Complex Query
• SQL – Amazon RDS
(MySQL, Oracle, SQL Server)
• Data Warehouse – Amazon Redshift
• Search – Amazon
CloudSearch
Unstructured – Custom Query
• Hadoop – Amazon Elastic MapReduce
(EMR)
Structured – Simple Query
• NoSQL – Amazon DynamoDB
• Cache – Amazon ElastiCache
(Memcached, Redis)
Unstructured – No Query
• Cloud Storage – Amazon S3
– Amazon Glacier
![Page 10: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/10.jpg)
AWS Primitive Compute and Storage
Compute Capabilities
• Many different EC2 instance types – General purpose
– Compute optimized
– Storage optimized
– Memory optimized
• Host any major data storage technology – RDBMS
– NoSQL
– Cache
Raw Storage Options
• EC2 Instance store (ephemeral)
• Amazon Elastic Block Store (EBS) – Standard volume
• 1 TB, ~100 IOPS per volume
– Provisioned IOPS volume • 1 TB, up to 4000 IOPS per volume
– Stripe multiple volumes for higher IOPS or storage
Primitives add flexibility, but also come with operational burden!
![Page 11: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/11.jpg)
AWS Data Tier Architecture - Us the right tool for the job!
Data Tier
Amazon RDS
Amazon CloudSearch
Amazon DynamoDB
Amazon ElastiCache
Amazon Elastic MapReduce
Amazon S3
Amazon
Glacier
Amazon Redshift AWS Data Pipeline
![Page 12: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/12.jpg)
Reference Architecture
![Page 13: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/13.jpg)
Reference Architecture
Amazon
RDS
Amazon
CloudSearch
Amazon
DynamoDB
Amazon
ElastiCache
Amazon
EMR
Amazon
S3
Amazon
Glacier
AWS Data Pipeline
Amazon
Redshift
![Page 14: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/14.jpg)
Use Case: A Video Streaming Application
![Page 15: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/15.jpg)
Use Case: A Video Streaming App – Upload
Amazon DynamoDB
Amazon RDS
Amazon CloudSearch
Amazon S3
![Page 16: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/16.jpg)
A Video Streaming App – Discovery
X
Amazon Glacier
Amazon
ElastiCache
CloudFront
Amazon DynamoDB
Amazon RDS
Amazon CloudSearch
Amazon S3
![Page 17: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/17.jpg)
Use Case: A Video Streaming App – Recs
Amazon
S3
Amazon
Glacier
Amazon
DynamoDB Amazon
EMR
![Page 18: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/18.jpg)
Use Case: A Video Streaming App – Analytics
Amazon
EMR
Amazon
S3
Amazon
Glacier
Amazon
Redshift
![Page 19: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/19.jpg)
What is the temperature of your data?
![Page 20: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/20.jpg)
Data Characteristics: Hot, Warm, Cold
Hot Warm Cold
Volume MB–GB GB–TB PB
Item size B–KB KB–MB KB–TB
Latency ms ms, sec min, hrs
Durability Low–High High Very High
Request rate Very High High Low
Cost/GB $$-$ $-¢¢ ¢
![Page 21: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/21.jpg)
Amazon
ElastiCache
Amazon
RDS Amazon
Redshift
Amazon S3
Request rate High Low
Cost/GB High Low
Latency Low High
Data Volume Low High
Amazon Glacier
Amazon
EMR
Str
uctu
re
Low
High
Amazon
DynamoDB
![Page 22: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/22.jpg)
What data store should I use? Elasti-
Cache
Amazon
DynamoDB
Amazon
RDS
Cloud
Search
Amazon
Redshift
Amazon
EMR (Hive)
Amazon S3 Amazon
Glacier
Average
latency
ms ms ms,sec ms,sec sec,min sec,min,
hrs
ms,sec,min
(~ size)
hrs
Data volume GB GB–TBs
(no limit)
GB–TB
(3 TB Max)
GB–TB TB–PB
(1.6 PB max)
GB–PB
(~nodes)
GB–PB
(no limit)
GB–PB
(no limit)
Item size B-KB KB
(64 KB max)
KB
(~rowsize)
KB
(1 MB
max)
KB
(64 K max)
KB-MB KB-GB
(5 TB max)
GB
(40 TB
max)
Request rate Very High Very High High High Low Low Low–
Very High
(no limit)
Very Low
(no limit)
Storage cost
$/GB/month
$$ ¢¢ ¢¢ $ ¢
¢ ¢ ¢
Durability Low -
Moderate
Very High High High High High Very High Very High
Hot Data Warm Data Cold Data
![Page 23: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/23.jpg)
AWS Data Tier Architecture - Use the right tool for the job!
Data Tier
Amazon RDS
Amazon CloudSearch
Amazon DynamoDB
Amazon ElastiCache
Amazon Elastic MapReduce
Amazon S3
Amazon
Glacier
Amazon Redshift AWS Data Pipeline
![Page 24: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/24.jpg)
Cost Conscious Design
![Page 25: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/25.jpg)
Cost Conscious Design Example: Should I use Amazon S3 or Amazon DynamoDB?
“I’m currently scoping out a project that will greatly increase
my team’s use of Amazon S3. Hoping you could answer
some questions. The current iteration of the design calls for
many small files, perhaps up to a billion during peak. The
total size would be on the order of 1.5 TB per month…”
Request rate
(Writes/sec)
Object size
(Bytes)
Total size
(GB/month)
Objects per month
300 2048 1483 777,600,000
![Page 26: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/26.jpg)
Cost Conscious Design Example: Should I use Amazon S3 or Amazon DynamoDB?
![Page 27: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/27.jpg)
Request rate
(Writes/sec)
Object size
(Bytes)
Total size
(GB/month)
Objects per
month
300 2,048 1,483
777,600,000
Amazon S3 or Amazon DynamoDB?
![Page 28: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/28.jpg)
Request rate
(Writes/sec)
Object size
(Bytes)
Total size
(GB/month)
Objects per
month
Scenario 1 300 2,048 1,483 777,600,000
Scenario 2 300 32,768 23,730 777,600,000
Amazon S3
Amazon DynamoDB
use
use
![Page 29: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/29.jpg)
Best Practices
![Page 30: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/30.jpg)
When to use
• Transactions
• Complex queries
• Medium to high query/write rate – Up to 30 K IOPS (15 K reads + 15
K writes)
• 100s of GB to low TBs
• Workload can fit in a single node
• High durability
When not to use
• Massive read/write rates – Example: 150 K write requests per
second
• Data size or throughput demands sharding
– Example: 10 s or 100 s of terabytes
• Simple Get/Put and queries that a NoSQL can handle
• Complex analytics
Read Replicas Push-Button Scaling
Region
Multi-AZ
AZ 1 AZ 2
Amazon RDS
![Page 31: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/31.jpg)
Amazon RDS Best Practices • Use the right DB instance class
• Use EBS-optimized instances
– db.m1.large, db.m1.xlarge, db.m2.2xlarge, db.m2.4xlarge,
db.cr1.8xlarge
• Use provisioned IOPS
• Use multi-AZ for high availability
• Use read replicas for
– Scaling reads
– Schema changes
– Additional failure recovery
![Page 32: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/32.jpg)
When to use
• Fast and predictable performance
• Seamless/massive scale
• Autosharding
• Consistent/low latency
• No size or throughput limits
• Very high durability
• Key-value or simple queries
When not to use
• Need multi-item/row or cross table
transactions
• Need complex queries, joins
• Need real-time analytics on
historic data
• Storing cold data
Amazon DynamoDB
![Page 33: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/33.jpg)
Amazon DynamoDB Best Practices
• Keep item size small
• Store metadata in Amazon DynamoDB and
large blobs in Amazon S3
• Use a table with a hash key for extremely
high scale
• Use table per day, week, month etc. for
storing time series data
• Use conditional/OCC updates
• Use hash-range key to model
– 1:N relationships
– Multi-tenancy
• Avoid hot keys and hot partitions
Events_table_2012
Event_id (Hash key)
Timestam
p (range key)
Attribute1 …. Attribute N
Events_table_2012_05_week1
Event_id (Hash key)
Timestam
p (range key)
Attribute1 …. Attribute N Events_table_2012_05_week2
Event_id (Hash key)
Timestam
p (range key)
Attribute1 …. Attribute N
Events_table_2012_05_week3
Event_id (Hash key)
Timestam
p (range key)
Attribute1 …. Attribute N
![Page 34: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/34.jpg)
When to use
• Transient key-value store
• Need to speed up reads/write
• Caching frequent SQL, NoSQL or
DW query results
• Saving transient and frequently
updated data – Increment/decrement game
scores/counters
– Web application session storage
• Best effort deduplication
When not to use
• Store infrequently used data
• Need persistence
Amazon ElastiCache (Memcached)
![Page 35: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/35.jpg)
Amazon ElastiCache (Memcached) Best Practices
• Use autodiscovery
• Share memcached client objects in application
• Use TTLs
• Consider memory for connections overhead
• Use Amzon CloudWatch alarms / SNS alerts • Number of connections
• Swap memory usage
• Freeable memory
![Page 36: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/36.jpg)
When to use
• Key-value store with advanced
data structures – Strings, lists, sets, sorted sets,
hashes
• Caching
• Leader boards
• High-speed sorting
• Atomic counters
• Queuing systems
• Activity streams
When not to use
• Need “native” sharding or scale-out
• Need “hard” persistence
• Data won’t fit in memory
• Need transaction rollback even
under exceptions
Amazon ElastiCache (Redis)
![Page 37: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/37.jpg)
Amazon ElastiCache (Redis) Best Practices
• Use TTL
• Use the right instance types • Instances with high ECU/vCPU and network performance
yield the highest throughput. Example: m2.4xlarge, m2.2xlarge
• Use read replicas • Increase read throughput
• AOF cannot protect against all failure modes
• Promote read replicas to primary
• Use RDB file snapshot for on-premises to Amazon ElastiCache migration
• Key parameter group settings • Avoid “AOF with fsync always” – huge impact on performance
• AOF (+ RDB) with fsync everysec – best durability + performance
• Pub-sub: set client-output-buffer-limit-pubsub-hard-limit and client-output-buffer-limit-pubsub-soft-limit
based on the workloads
![Page 38: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/38.jpg)
When to use
• No search expertise
• Full-text search
• Ranking
• Relevance
• Structured and unstructured data
• Faceting
– $0 to $10 (4 items)
– $10 and above (3 items)
When not to use
• Not as replacement for a database – Not as a system of record
– Transient data
– Nonatomic updates
Amazon CloudSearch
![Page 39: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/39.jpg)
• Batch documents for uploading
• Use Amazon CloudSearch for searching and another
store for retrieving full records for the UI (i.e. don’t use
return fields)
• Include other data like popularity scores in documents
• Use stop words to remove common terms
• Use fielded queries to reduce match sets
• Query latency is proportional to query specificity
Amazon CloudSearch Best Practices
![Page 40: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/40.jpg)
When to use
• Information analysis and reporting
• Complex DW queries that summarize historical data
• Batched large updates e.g. daily sales totals
• 10s of concurrent queries
• 100s GB to PB
• Compression
• Column based
• Very high durability
When not to use
• OLTP workloads
– 1000s of concurrent users
– Large number of singleton
updates
Amazon Redshift
![Page 41: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/41.jpg)
Amazon Redshift Best Practices
• Use COPY command to load large data sets from Amazon
S3, Amazon DynamoDB, Amazon EMR/EC2/Unix/Linux hosts
– Split your data into multiple files
– Use GZIP or LZOP compression
– Use manifest file
• Choose proper sort key
– Range or equality on WHERE clause
• Choose proper distribution key
– Join column, foreign key or largest dimension, group by column
– Avoid distribution key for denormalized data
![Page 42: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/42.jpg)
When to use
• Batch analytics/processing – Answers in minutes or hours
• Structured and unstructured data
• Parallel scans of the entire dataset
with uniform query performance
• Supports Hive QL + other languages
• GB, TB, or PB of data
• Replicated data store (HDFS) for
ad-hoc and real-time queries
(HBase)
When not to use
• Real-time analytics (DW) – Need answers in seconds
• 1000s of concurrent users
Amazon Elastic MapReduce
![Page 43: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/43.jpg)
Amazon Elastic MapReduce Best Practices
• Choose between transient and persistent clusters for best TCO
• Leverage Amazon S3 integration for highly durable and interim storage
• Right-size cluster instances based on each job – not one size fits all
• Leverage resizing and spot to add and remove capacity cost-effectively
• Tuning cluster instances can be easier than tuning Hadoop code
Job Flow
14 Hours
Duration:
Duration:
Job Flow
7 Hours
![Page 44: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/44.jpg)
AWS Data Pipeline
When to use
• Automate movement and transformation
of data (ETL in the cloud)
• Dependency management – Data
– Control
• Schedule management
• Transient Amazon EMR clusters
• Regular data move pattern – Every hour, day
– Every 30 minutes
• Amazon DynamoDB backups – Cross region
When not to use
• Less that 15 minutes scheduling
interval
• Execution latency less than a minute
• Event-based scheduling
![Page 45: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/45.jpg)
AWS Data Pipeline Best Practices
• Use dependency rather than time based
• Make your activities idempotent
• Add in your tools using shell activity
• Use Amazon S3 for staging
![Page 46: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/46.jpg)
When to use
• Store large objects
• Key-value store - Get/Put/List
• Unlimited storage
• Versioning
• Very high durability – 99.999999999%
• Very high throughput (via parallel
clients)
• Use for storing persistent data – Backups
– Source/target for EMR
– Blob store with metadata in SQL
or NoSQL
When not to use
• Complex queries
• Very low latency (ms)
• Search
• Read-after-write consistency for
overwrites
• Need transactions
Amazon S3
![Page 47: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/47.jpg)
Amazon S3 Best Practices
• Use random hash prefix for keys
• Ensure a random access pattern
• Use Amazon CloudFront for high throughput GETs and PUTs
• Leverage the high durability, high throughput design of Amazon S3
for backup and as a common storage sink • Durable sink between data services
• Supports de-coupling and asynchronous delivery
• Consider RRS for lower cost, lower durability storage of derivatives or copies
• Consider parallel threads and multipart upload for faster writes
• Consider parallel threads and range get for faster reads
![Page 48: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/48.jpg)
When to use
• Infrequently accessed data sets
• Very low cost storage
• Data retrieval times of several
hours is acceptable
• Encryption at rest
• Very high durability
– 99.999999999%
• Unlimited amount of storage
When not to use
• Frequent access
• Low latency access
Amazon Glacier
![Page 49: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/49.jpg)
Amazon Glacier Best Practices
• Reduce request and storage costs with aggregation • Aggregating your files into bigger files before sending them to Amazon Glacier
• Store checksums along with your files
• Use a format that allows you to access files within your aggregate archive
• Improve speed and reliability with multipart upload
• Reduce costs with ranged retrievals
• Maintaining your own index in a highly durable store
![Page 50: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/50.jpg)
When to use
• Alternate data store technologies
• Hand-tuned performance needs
• Direct/admin access required
When not to use
• When a managed service will do
the job
• When operational experience is
low
Amazon EC2 + Amazon EBS/Instance Storage
![Page 51: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/51.jpg)
Amazon EBS Best Practices
• Pick the right EC2 instance type • Higher “network performance” instances for driving more Amazon EBS IOPS
• EBS-Optimized EC2 instances for dedicated throughput between EC2 & Amazon EBS
• Use provisioned IOPS volumes for database workloads requiring
consistent IOPS
• Use standard volumes for workloads requiring low to moderate IOPS
& occasional bursts
• Stripe multiple Amazon EBS volumes for higher IOPS or storage • RAID0 for higher I/O
• RAID10 for highest local durability
• Amazon EBS snapshots • Quiesce the file system and take a snapshot
![Page 52: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/52.jpg)
HI-Best IOPS/$
HS-Best GB/$
Amazon EC2 Best Practices Best vCPU/$
Best Memory-
GiB/$
![Page 53: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/53.jpg)
Summary
![Page 54: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/54.jpg)
Cloud Data Tier Architecture Anti-Pattern
Data Tier
![Page 55: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/55.jpg)
AWS Data Tier Architecture - Use the right tool for the job!
Data Tier
Amazon RDS
Amazon CloudSearch
Amazon DynamoDB
Amazon ElastiCache
Amazon Elastic MapReduce
Amazon S3
Amazon
Glacier
Amazon Redshift AWS Data Pipeline
![Page 56: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/56.jpg)
Reference Architecture
Amazon
RDS
Amazon
CloudSearch
Amazon
DynamoDB
Amazon
ElastiCache
Amazon
EMR
Amazon
S3
Amazon
Glacier
AWS Data Pipeline
Amazon
Redshift
![Page 57: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/57.jpg)
Cost Conscious Design
![Page 58: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/58.jpg)
Please give us your feedback on this
presentation
As a thank you, we will select prize
winners daily for completed surveys!
DAT203
![Page 59: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/59.jpg)
![Page 60: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/60.jpg)
![Page 61: AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022042606/540dd54d8d7f72747e8b4b8e/html5/thumbnails/61.jpg)
Remember…