s3 intro, tips and filling it up with data aws ug be #2
DESCRIPTION
Read the blogpost: http://skyscrape.rs/2013/11/15/awsugbe-2-aws-use-cases-and-s3-best-practicesupload-performance/ At the second AWS User Group Belguim, I presented “S3 Intro, tips and filling it up with data quickly”. The first half focused on a general introduction to S3 on how to use it. The second section focused on how to get your data onto S3 as quickly as possible using standard tools. After some theory on best practices, we progressed to do some tests and formulate conclusions. The tests started at around 18 megabytes per second of data transferred from an EC2 ramdisk to S3. However, through some simple optimisations we got up to 248 megabytes per second using just standard command line tools. The two main benefactors to this dramatic performance increase were: - instance type and related IO performance class - the use of multiple upload threads. Theoretically a Very High I/O instance should go up to 10 Gbit, or about 1,1 gigabytes per second. Some people (http://improve.dk/pushing-the-limits-of-amazon-s3-upload-performance/) on the internet claim to have gotten up to such speeds. Alex shed some light on how we might be able to reach that goal by taking into consideration how S3 indexing and partitioning (these two might help: http://www.slideshare.net/AmazonWebServices/building-scalable-applications-on-amazon-s3-stg303 http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html) works. Unfortunately I haven't had the time to test that out yet. Any takers? :-)TRANSCRIPT
S3 Intro, tips and filling it up with data quickly
http://skyscrape.rs
AWS User Group Belgium #2 - 2013/11/06
The Skyscrapers ...
● help companies figure out cloud
● design and build platforms in the cloud
● take care of the complete lifecycle, so you can focus on your business
S3 in a nutshell
What is S3
● object storage architecture● abstraction from storage (black) magic● operates at object-level (aka ‘file’, blob, ...)● Simple API
Benefits
● Scalability● High-availability● Low cost● 99,999999999% durability● Secure● Low latency/high-speed
Some advantages for webapps
● Easier to scale apps● Cleaner apps● Better ‘mobility’ of your app● Simpler hosting platforms● No storage worries
Use cases
● Asset storage and CDN● Data storage● Static site● Backups● Mobile storage backend● File distribution● ...
Buckets
● Collection of objects● Globally unique id● a-z A-Z 0-9 . - ● Max 100 buckets/user● No limit on number of objects
Buckets
Best practices on naming● DNS compatible● FQDN
○ Allows for vhost○ watch out for SSL: no dots :-(
Objects
● Blob● Don’t care about file formats● Metadata can be added (like mimetype)● Maximum 5 TB/object
Keys
● = Name ● max 1024 chars● UTF8
Accessing your data
● ARN○ arn:aws:s3:::bucketname○ arn:aws:s3:::bucketname/objectpath
● HTTP○ http://s3.amazonaws.com/bucket/key○ http://bucket.s3.amazonaws.com/key○ http://bucket/key (vhost style)
Performance tip
s3.amazonaws.com > s3-eu-west-1.amazonaws.com
Getting data on S3
Getting data on S3
● Through AWS services● Tools● Libraries ● Filesystem mapping● Direct from client (pre-signed URL’s)
CMD line tools
● s3cmd● s3-multipart● s3funnel● ...
Getting data on S3: what matters?
Location
● bandwidth● latency
Parallelization
● Multiple upload-threads● Multipart
Limit SSL
● Negotiation overhead● Encryption overhead on smaller instances
Instance type: I/O matters
IO class Theoretical speed
Low 100 Mbit (?)
Moderate 250 - 500 Mbit
High 1 Gbit
Very High 10 Gbit
Other things
● Network stack optimisations● Tool/method of upload
Performance tests
Parameters
● 1 GB blob● 25 MB parts● single region● various IO classes● 1, 10, 40, 50 threads● only upload● s3-multipart tool● standard OS install
Demo time
Some numbers
threads moderate IO high IO very high IO
avg max avg max avg max
1 18 23 21 20 19 19
10 90 112 100 118 153 164
40 86 114 114 119 248 248
50 86 117 119 122 207 242
(Megabytes per second)
Conclusions (1)● Optimisation is certainly possible● Single stream max 150 Mbit/20 MB/s● Newer generations are faster, slightly● Couldn’t get to 10 Gbit
Conclusions (2)● Instance IO classes = relative concept● 50 threads seem sweet-spot● Part size seemed not that important● Do error control on multi-part
Some excuses disclaimers
● Not scientific● No tuning at all● Bottlenecks● Library/app used not optimal
Thank you.
Questions?
http://skyscrape.rs@skyscrapers