deep dive on amazon s3
TRANSCRIPT
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lee Atkinson, Solutions Architect, AWSJey Jeyasingam, CTO, Y-Cam
7 July 2016
Amazon S3Deep Dive
Amazon EFS
FileAmazon EBS Amazon EC2
instance store
BlockAmazon S3 Amazon Glacier
Object
Data transfer
AWS Direct Connect
Snowball ISV connectors Amazon Kinesis
Firehose
TransferAcceleration
AWS StorageGateway
AWS storage services
Cross-region replication
Amazon CloudWatch metrics for Amazon S3
& AWS CloudTrail support
VPC endpointfor Amazon S3
Read-after-write consistency in all
regions
Event notifications
Amazon S3 bucket limit increase
Innovation for Amazon S3 (1/2)
Innovation for Amazon S3 (2/2)
Amazon S3 Standard-IA
TransferAcceleration
Incomplete multipart upload expiration
Expired object delete marker
Standard
Active data Archive dataActive Archive
Standard - Infrequent Access Amazon Glacier
Choice of storage classes on Amazon S3
File sync and share / consumer file storage
Backup and archive /disaster recovery
Long retained data
Some use cases have different requirements
11 9s of durability Designed for 99.9% availability
Durable AvailableSame throughput as
Amazon S3 Standard storage
High performance
• Server-side encryption• Use your encryption keys• KMS-managed encryption keys
Secure• Lifecycle management• Versioning • Event notifications• Metrics
Integrated• No impact on user
experience• Simple REST API• Single bucket
Easy to use
Standard-Infrequent Access storage
Lifecycle policies
Automatic tiering and cost controlsIncludes two possible actions:
• Transition: to Standard-IA or Glacier after specified time
• Expiration: deletes objects after specified timeAllows for actions to be combinedSet policies at the key prefix level
Lifecycle Policy<LifecycleConfiguration><Rule>
<ID>sample-rule</ID><Prefix>documents/</Prefix>
<Status>Enabled</Status><Transition>
<Days>30</Days>
<StorageClass>STANDARD-IA</StorageClass></Transition><Transition>
<Days>365</Days>
<StorageClass>GLACIER</StorageClass></Transition>
</Rule>
</LifecycleConfiguration>
Standard-IA Storage -> Glacier
Standard-Infrequent Access storage
Standard Storage -> Standard-IA
Versioning S3 buckets
Protects from accidental overwrites and deletes New version with every uploadEasy retrieval of deleted objects and roll backThree states of an Amazon S3 bucket
• Unversioned (Default)• Versioning-enabled• Versioning-suspended
Expired object delete marker policy
Deleting a versioned object makes a delete marker the current version of the objectNo storage charge for delete markerRemoving delete marker can improve list performanceLifecycle policy to automatically remove the current version delete marker when previous versions of the object no longer exist
Example lifecycle policy to remove current versions <LifecycleConfiguration>
<Rule>...
<Expiration>
<Days>60</Days>
</Expiration>
<NoncurrentVersionExpiration> <NoncurrentDays>30</NoncurrentDays>
</NoncurrentVersionExpiration>
</Rule>
</LifecycleConfiguration>
Leverage lifecycle to expire currentand non-current versions
S3 Lifecycle will automatically remove any expired object delete markers
Expired object delete marker policy
Example lifecycle policy for non-current version expiration
Lifecycle configuration with NoncurrentVersionExpiration action removes all the noncurrent versions,
<LifecycleConfiguration>
<Rule> ...
<Expiration>
<ExpiredObjectDeleteMarker>true</ExpiredObjectDeleteMarker>
</Expiration>
<NoncurrentVersionExpiration> <NoncurrentDays>30</NoncurrentDays>
</NoncurrentVersionExpiration>
</Rule>
</LifecycleConfiguration>
ExpiredObjectDeleteMarker element removes expired object delete markers.
Expired object delete marker policy
Restricting deletes with MFA
Bucket policies can restrict deletesFor additional security, enable MFA (multi-factor authentication) delete, which requires additional authentication to:
• Change the versioning state of your bucket• Permanently delete an object version
MFA delete requires both your security credentials and a code from an approved authentication device
Parallel PUTs with Multipart Uploads
Increase throughput by parallelizing PUTsIncrease resiliency to network errorsFewer large restarts on error-prone networksA balance between part size & number of parts:
• Small parts increase connection overhead• Large parts provide less benefits of multipart
Incomplete multipart upload expiration policy
Multipart upload feature improves PUT performance Partial upload does not appear in bucket listPartial upload does incur storage chargesSet a lifecycle policy to automatically expire incomplete multipart uploads after a predefined number of days
Example lifecycle policy
Abort incomplete multipart uploads seven days after initiation
<LifecycleConfiguration> <Rule>
<ID>sample-rule</ID><Prefix>SomeKeyPrefix/</Prefix>
<Status>rule-status</Status><AbortIncompleteMultipartUpload>
<DaysAfterInitiation>7</DaysAfterInitiation>
</AbortIncompleteMultipartUpload> </Rule>
</LifecycleConfiguration>
Incomplete multipart upload expiration policy
Parallel GETs
Use range-based GETs to get multithreaded performance when downloading objectsCompensates for unreliable networksBenefits of multithreaded parallelismAlign your ranges with your parts!
Parallel LISTs
Parallelize LIST when you need a sequential list of your keys
Secondary index to get a faster alternative to LIST
• Sorting by metadata• Searchability• Objects by timestamp
Distributing object keys
Most important if you regularly exceed 100 TPS on a bucketDistribute keys uniformly across keyspaceUse a key-naming scheme with randomness at the beginning
Distributing object keys
Don’t do this…<my_bucket>/2013_11_13-164533125.jpg<my_bucket>/2013_11_13-164533126.jpg
<my_bucket>/2013_11_13-164533127.jpg<my_bucket>/2013_11_13-164533128.jpg<my_bucket>/2013_11_12-164533129.jpg<my_bucket>/2013_11_12-164533130.jpg<my_bucket>/2013_11_12-164533131.jpg
<my_bucket>/2013_11_12-164533132.jpg<my_bucket>/2013_11_11-164533133.jpg<my_bucket>/2013_11_11-164533134.jpg<my_bucket>/2013_11_11-164533135.jpg<my_bucket>/2013_11_11-164533136.jpg
Distributing object keys
…because this is going to happen
1 2 N1 2 N
Partition Partition Partition Partition
Distributing object keys
Add randomness to the beginning of the key name…<my_bucket>/521335461-2013_11_13.jpg<my_bucket>/465330151-2013_11_13.jpg
<my_bucket>/987331160-2013_11_13.jpg<my_bucket>/465765461-2013_11_13.jpg<my_bucket>/125631151-2013_11_13.jpg<my_bucket>/934563160-2013_11_13.jpg<my_bucket>/532132341-2013_11_13.jpg
<my_bucket>/565437681-2013_11_13.jpg<my_bucket>/234567460-2013_11_13.jpg<my_bucket>/456767561-2013_11_13.jpg<my_bucket>/345565651-2013_11_13.jpg<my_bucket>/431345660-2013_11_13.jpg
Distributing object keys
…so your transactions can be distributed across the partitions
1 2 N1 2 N
Partition Partition Partition Partition
Techniques for distributing keys
Store as a hash:• 83d02a66a0fee41b5767e4f4dd377d29
Prepend with short hash:• 83d02013_11_13-164533125.jpg
Reverse:• 521335461-31_11_3102.jpg
AWS Import/Export Snowball• Accelerate PBs with AWS-
provided appliances• 80TB and global availability
AWS Storage Gateway• Up to 120 MB/s cloud upload rate
(4x improvement), and • 10 Gb networking for VMware
Data ingestion into Amazon S3
Amazon Kinesis Firehose• Ingest data streams directly into
AWS data stores
AWS Direct Connect
ISV connectorsTransfer Acceleration• Move data up to 300% faster
using the AWS network
Introducing Amazon S3 Transfer Acceleration
Up to 300% fasterChange your endpoint, not your code56 global edge locationsNo firewall exceptionsNo client software required
S3 BucketAWS EdgeLocation
Uploader
OptimizedThroughput!
Rio De Janeiro
Warsaw New York Atlanta Madrid Virginia Melbourne Paris Los Angeles
Seattle Tokyo Singapore
Tim
e [h
rs]
500 GB upload from these edge locations to a bucket in Singapore
Public Internet
How fast is Transfer Acceleration?S3 Transfer Acceleration
Getting Started
1. Enable S3 transfer acceleration on your S3 bucket
2. Update your application/destination URL to <bucket-name>.s3-accelerate.amazonaws.com
3. Done!
How much will it help me?
Use the Amazon S3 Transfer Acceleration Speed Comparison page:
http://s3-accelerate-speedtest.s3-accelerate.amazonaws.com/en/accelerate-speed-
comparsion.html
Y-cam Solutions Ltd Confidential and proprietary
Who we are...
Initially used S3 just to store videos and thumbnails, 6
years ago
120 million objects
But now we also use S3 for so much
more
2 million videos
Y-cam Solutions Ltd Confidential and proprietary
Challenges
Handling the expiration of videos
Legacy scripts
Reducing servers, cutting costs
Y-cam Solutions Ltd Confidential and proprietary
Video Expiration
Create multiple buckets with
different lifecycle
Improve code to decide which
bucket to save the video
Y-cam Solutions Ltd Confidential and proprietary
Legacy Script
Move create thumbnail and
update DynamoDBfrom script to
Lambda function
Extra benefits of using Lambda
Lambda triggered by S3 event notification
Y-cam Solutions Ltd Confidential and proprietary
Future Plans
Reducing number of servers
Servers only serving web app JS code
Moved this to be hosted by S3
Reduced cost
Moving towards serverless architecture
Summary
Amazon S3 Standard-Infrequent AccessAmazon S3 management policiesVersioning for Amazon S3 + MFA DeleteAmazon S3 Transfer Acceleration