s3 intro, tips and filling it up with data aws ug be #2

30
S3 Intro, tips and filling it up with data quickly @fdenkens [email protected] http://skyscrape.rs AWS User Group Belgium #2 - 2013/11/06

Upload: frederik-denkens

Post on 13-Jan-2015

415 views

Category:

Technology


1 download

DESCRIPTION

Read the blogpost: http://skyscrape.rs/2013/11/15/awsugbe-2-aws-use-cases-and-s3-best-practicesupload-performance/ At the second AWS User Group Belguim, I presented “S3 Intro, tips and filling it up with data quickly”. The first half focused on a general introduction to S3 on how to use it. The second section focused on how to get your data onto S3 as quickly as possible using standard tools. After some theory on best practices, we progressed to do some tests and formulate conclusions. The tests started at around 18 megabytes per second of data transferred from an EC2 ramdisk to S3. However, through some simple optimisations we got up to 248 megabytes per second using just standard command line tools. The two main benefactors to this dramatic performance increase were: - instance type and related IO performance class - the use of multiple upload threads. Theoretically a Very High I/O instance should go up to 10 Gbit, or about 1,1 gigabytes per second. Some people (http://improve.dk/pushing-the-limits-of-amazon-s3-upload-performance/) on the internet claim to have gotten up to such speeds. Alex shed some light on how we might be able to reach that goal by taking into consideration how S3 indexing and partitioning (these two might help: http://www.slideshare.net/AmazonWebServices/building-scalable-applications-on-amazon-s3-stg303 http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html) works. Unfortunately I haven't had the time to test that out yet. Any takers? :-)

TRANSCRIPT

Page 1: S3 intro, tips and filling it up with data   aws ug be #2

S3 Intro, tips and filling it up with data quickly

@[email protected]

http://skyscrape.rs

AWS User Group Belgium #2 - 2013/11/06

Page 2: S3 intro, tips and filling it up with data   aws ug be #2

The Skyscrapers ...

● help companies figure out cloud

● design and build platforms in the cloud

● take care of the complete lifecycle, so you can focus on your business

Page 3: S3 intro, tips and filling it up with data   aws ug be #2

S3 in a nutshell

Page 4: S3 intro, tips and filling it up with data   aws ug be #2

What is S3

● object storage architecture● abstraction from storage (black) magic● operates at object-level (aka ‘file’, blob, ...)● Simple API

Page 5: S3 intro, tips and filling it up with data   aws ug be #2

Benefits

● Scalability● High-availability● Low cost● 99,999999999% durability● Secure● Low latency/high-speed

Page 6: S3 intro, tips and filling it up with data   aws ug be #2

Some advantages for webapps

● Easier to scale apps● Cleaner apps● Better ‘mobility’ of your app● Simpler hosting platforms● No storage worries

Page 7: S3 intro, tips and filling it up with data   aws ug be #2

Use cases

● Asset storage and CDN● Data storage● Static site● Backups● Mobile storage backend● File distribution● ...

Page 8: S3 intro, tips and filling it up with data   aws ug be #2

Buckets

● Collection of objects● Globally unique id● a-z A-Z 0-9 . - ● Max 100 buckets/user● No limit on number of objects

Page 9: S3 intro, tips and filling it up with data   aws ug be #2

Buckets

Best practices on naming● DNS compatible● FQDN

○ Allows for vhost○ watch out for SSL: no dots :-(

Page 10: S3 intro, tips and filling it up with data   aws ug be #2

Objects

● Blob● Don’t care about file formats● Metadata can be added (like mimetype)● Maximum 5 TB/object

Page 11: S3 intro, tips and filling it up with data   aws ug be #2

Keys

● = Name ● max 1024 chars● UTF8

Page 12: S3 intro, tips and filling it up with data   aws ug be #2

Accessing your data

● ARN○ arn:aws:s3:::bucketname○ arn:aws:s3:::bucketname/objectpath

● HTTP○ http://s3.amazonaws.com/bucket/key○ http://bucket.s3.amazonaws.com/key○ http://bucket/key (vhost style)

Page 13: S3 intro, tips and filling it up with data   aws ug be #2

Performance tip

s3.amazonaws.com > s3-eu-west-1.amazonaws.com

Page 14: S3 intro, tips and filling it up with data   aws ug be #2

Getting data on S3

Page 15: S3 intro, tips and filling it up with data   aws ug be #2

Getting data on S3

● Through AWS services● Tools● Libraries ● Filesystem mapping● Direct from client (pre-signed URL’s)

Page 16: S3 intro, tips and filling it up with data   aws ug be #2

CMD line tools

● s3cmd● s3-multipart● s3funnel● ...

Page 17: S3 intro, tips and filling it up with data   aws ug be #2

Getting data on S3: what matters?

Page 18: S3 intro, tips and filling it up with data   aws ug be #2

Location

● bandwidth● latency

Page 19: S3 intro, tips and filling it up with data   aws ug be #2

Parallelization

● Multiple upload-threads● Multipart

Page 20: S3 intro, tips and filling it up with data   aws ug be #2

Limit SSL

● Negotiation overhead● Encryption overhead on smaller instances

Page 21: S3 intro, tips and filling it up with data   aws ug be #2

Instance type: I/O matters

IO class Theoretical speed

Low 100 Mbit (?)

Moderate 250 - 500 Mbit

High 1 Gbit

Very High 10 Gbit

Page 22: S3 intro, tips and filling it up with data   aws ug be #2

Other things

● Network stack optimisations● Tool/method of upload

Page 23: S3 intro, tips and filling it up with data   aws ug be #2

Performance tests

Page 24: S3 intro, tips and filling it up with data   aws ug be #2

Parameters

● 1 GB blob● 25 MB parts● single region● various IO classes● 1, 10, 40, 50 threads● only upload● s3-multipart tool● standard OS install

Page 25: S3 intro, tips and filling it up with data   aws ug be #2

Demo time

Page 26: S3 intro, tips and filling it up with data   aws ug be #2

Some numbers

threads moderate IO high IO very high IO

avg max avg max avg max

1 18 23 21 20 19 19

10 90 112 100 118 153 164

40 86 114 114 119 248 248

50 86 117 119 122 207 242

(Megabytes per second)

Page 27: S3 intro, tips and filling it up with data   aws ug be #2

Conclusions (1)● Optimisation is certainly possible● Single stream max 150 Mbit/20 MB/s● Newer generations are faster, slightly● Couldn’t get to 10 Gbit

Page 28: S3 intro, tips and filling it up with data   aws ug be #2

Conclusions (2)● Instance IO classes = relative concept● 50 threads seem sweet-spot● Part size seemed not that important● Do error control on multi-part

Page 29: S3 intro, tips and filling it up with data   aws ug be #2

Some excuses disclaimers

● Not scientific● No tuning at all● Bottlenecks● Library/app used not optimal