(pfc403) maximizing amazon s3 performance | aws re:invent 2014

in data transfer from S3

not including Amazon Web Services use

Architecture

Choosing a region

Building a naming scheme

Considering LISTs

Optimizing PUTs

Multipart upload

Demo

Optimizing GETs

Using CloudFront

Range-based GETs

Demo

Customer Case

BigData Corp

Request Rate and Performance Considerations

http://amzn.to/18oF5LCTIP

http://amzn.to/18oF5LC

1 2

58

100/8 = 12.5 events/sec

100,000 users @ 10 events an hour = 224 TPS

<my_bucket>/2013_11_13-164533125.jpg<my_bucket>/2013_11_13-164533126.jpg<my_bucket>/2013_11_13-164533127.jpg<my_bucket>/2013_11_13-164533128.jpg<my_bucket>/2013_11_12-164533129.jpg<my_bucket>/2013_11_12-164533130.jpg<my_bucket>/2013_11_12-164533131.jpg<my_bucket>/2013_11_12-164533132.jpg<my_bucket>/2013_11_11-164533133.jpg<my_bucket>/2013_11_11-164533134.jpg<my_bucket>/2013_11_11-164533135.jpg<my_bucket>/2013_11_11-164533136.jpg

1 2 N1 2 N

Partition Partition Partition Partition

<my_bucket>/521335461-2013_11_13.jpg<my_bucket>/465330151-2013_11_13.jpg<my_bucket>/987331160-2013_11_13.jpg<my_bucket>/465765461-2013_11_13.jpg<my_bucket>/125631151-2013_11_13.jpg<my_bucket>/934563160-2013_11_13.jpg<my_bucket>/532132341-2013_11_13.jpg<my_bucket>/565437681-2013_11_13.jpg<my_bucket>/234567460-2013_11_13.jpg<my_bucket>/456767561-2013_11_13.jpg<my_bucket>/345565651-2013_11_13.jpg<my_bucket>/431345660-2013_11_13.jpg

1 2 N1 2 N

Partition Partition Partition Partition

• Store objects as a hash of their name– add the original name as metadata

• “deadmau5_mix.mp3” 0aa316fb000eae52921aab1b4697424958a53ad9

– prepend key name with short hash

• 0aa3-deadmau5_mix.mp3

• Epoch time (reverse)– 5321354831-deadmau5_mix.mp3

<my_bucket>/images/521335461-2013_11_13.jpg<my_bucket>/images/465330151-2013_11_13.jpg<my_bucket>/movies/293924440-2013_11_13.jpg<my_bucket>/movies/987331160-2013_11_13.jpg<my_bucket>/thumbs-small/838434842-2013_11_13.jpg<my_bucket>/thumbs-small/342532454-2013_11_13.jpg<my_bucket>/thumbs-small/345233453-2013_11_13.jpg<my_bucket>/thumbs-small/345453454-2013_11_13.jpg

Request Rate and Performance Considerations

http://amzn.to/18oF5LCTIP

http://amzn.to/18oF5LC

faster flexible

set of parts

presents all parts as

a single object

parallel pausing resuming

beginning uploads before

you know the total object size

DEMOMultipart Uploads

DEMOAmazon CloudFront vs. Amazon S3 download performance

• Align your ranges with your parts!

DEMORange based GETs

DynamoDB Amazon RDS Amazon

CloudSearchAmazon EC2

Maestro

(Reserved Instance)

List of crawl

URLs Main workers

Execute crawling

and process data

Spot Instances

Secondary workers

(queue listeners)

Reprocess data,

query additional

services, store

data on MongoDB

Spot Instances

Secondary

work queues –

processed data

MongoDB

cluster

Command and

Control Queue

Architecture

Choosing a region

Building a naming scheme

Considering LISTs

Optimizing PUTs

Multipart upload

Demo

Optimizing GETs

Using CloudFront

Range-based GETs

Demo

Customer Case

BigData Corp

[email protected]

[email protected]

mailto:[email protected]

mailto:[email protected]

Please give us your feedback on this

presentation

(pfc403) maximizing amazon s3 performance | aws re:invent 2014

Technology

store data

performance considerationshttp

data transfer

request rate

amazon web services

store objects

amazon s3

demomultipart uploads