(pfc403) maximizing amazon s3 performance | aws re:invent 2014

40

Upload: amazon-web-services

Post on 29-Jun-2015

6.534 views

Category:

Technology


3 download

DESCRIPTION

This session drills deep into the Amazon S3 technical best practices that help you maximize storage performance for your use case. We provide real-world examples and discuss the impact of object naming conventions and parallelism on Amazon S3 performance, and describe the best practices for multipart uploads and byte-range downloads.

TRANSCRIPT

Page 1: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
Page 2: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014

in data transfer from S3

not including Amazon Web Services use

Page 3: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014

Architecture

Choosing a region

Building a naming scheme

Considering LISTs

Optimizing PUTs

Multipart upload

Demo

Optimizing GETs

Using CloudFront

Range-based GETs

Demo

Customer Case

BigData Corp

Page 4: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
Page 5: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
Page 6: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014

Request Rate and Performance Considerations

http://amzn.to/18oF5LCTIP

Page 7: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014

1 2

58

100/8 = 12.5 events/sec

100,000 users @ 10 events an hour = 224 TPS

Page 8: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014

<my_bucket>/2013_11_13-164533125.jpg<my_bucket>/2013_11_13-164533126.jpg<my_bucket>/2013_11_13-164533127.jpg<my_bucket>/2013_11_13-164533128.jpg<my_bucket>/2013_11_12-164533129.jpg<my_bucket>/2013_11_12-164533130.jpg<my_bucket>/2013_11_12-164533131.jpg<my_bucket>/2013_11_12-164533132.jpg<my_bucket>/2013_11_11-164533133.jpg<my_bucket>/2013_11_11-164533134.jpg<my_bucket>/2013_11_11-164533135.jpg<my_bucket>/2013_11_11-164533136.jpg

Page 9: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014

1 2 N1 2 N

Partition Partition Partition Partition

Page 10: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014

<my_bucket>/521335461-2013_11_13.jpg<my_bucket>/465330151-2013_11_13.jpg<my_bucket>/987331160-2013_11_13.jpg<my_bucket>/465765461-2013_11_13.jpg<my_bucket>/125631151-2013_11_13.jpg<my_bucket>/934563160-2013_11_13.jpg<my_bucket>/532132341-2013_11_13.jpg<my_bucket>/565437681-2013_11_13.jpg<my_bucket>/234567460-2013_11_13.jpg<my_bucket>/456767561-2013_11_13.jpg<my_bucket>/345565651-2013_11_13.jpg<my_bucket>/431345660-2013_11_13.jpg

Page 11: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014

1 2 N1 2 N

Partition Partition Partition Partition

Page 12: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014

• Store objects as a hash of their name– add the original name as metadata

• “deadmau5_mix.mp3” 0aa316fb000eae52921aab1b4697424958a53ad9

– prepend key name with short hash

• 0aa3-deadmau5_mix.mp3

• Epoch time (reverse)– 5321354831-deadmau5_mix.mp3

Page 13: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
Page 14: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014

<my_bucket>/images/521335461-2013_11_13.jpg<my_bucket>/images/465330151-2013_11_13.jpg<my_bucket>/movies/293924440-2013_11_13.jpg<my_bucket>/movies/987331160-2013_11_13.jpg<my_bucket>/thumbs-small/838434842-2013_11_13.jpg<my_bucket>/thumbs-small/342532454-2013_11_13.jpg<my_bucket>/thumbs-small/345233453-2013_11_13.jpg<my_bucket>/thumbs-small/345453454-2013_11_13.jpg

Page 15: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014

Request Rate and Performance Considerations

http://amzn.to/18oF5LCTIP

Page 16: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
Page 17: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
Page 18: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
Page 19: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014

faster flexible

set of parts

presents all parts as

a single object

parallel pausing resuming

beginning uploads before

you know the total object size

Page 20: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
Page 21: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
Page 22: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014

DEMOMultipart Uploads

Page 23: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
Page 24: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
Page 25: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
Page 26: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014

DEMOAmazon CloudFront vs. Amazon S3 download performance

Page 27: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014

• Align your ranges with your parts!

Page 28: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014

DEMORange based GETs

Page 29: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
Page 30: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
Page 31: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014

DynamoDB Amazon RDS Amazon

CloudSearchAmazon EC2

Page 32: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
Page 33: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
Page 34: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
Page 35: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
Page 36: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014

Maestro

(Reserved Instance)

List of crawl

URLs Main workers

Execute crawling

and process data

Spot Instances

Secondary workers

(queue listeners)

Reprocess data,

query additional

services, store

data on MongoDB

Spot Instances

Secondary

work queues –

processed data

MongoDB

cluster

Command and

Control Queue

Page 37: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
Page 38: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014

Architecture

Choosing a region

Building a naming scheme

Considering LISTs

Optimizing PUTs

Multipart upload

Demo

Optimizing GETs

Using CloudFront

Range-based GETs

Demo

Customer Case

BigData Corp

Page 40: (PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014

Please give us your feedback on this

presentation