deep dive on amazon s3 - march 2017 aws online tech talks

45
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lee Kear Storage Specialist Solutions Architect March 2017 Deep Dive on Amazon S3

Upload: amazon-web-services

Post on 05-Apr-2017

339 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Lee KearStorage Specialist Solutions ArchitectMarch 2017

Deep Dive on Amazon S3

Page 2: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Batches and Streams

AWS Direct Connect

AWS Snowball, Snowball Edge,

Snowmobile

3rd Party Connectors

Transfer Acceleration

AWS Storage

Gateway

Amazon Kinesis Firehose

File

Amazon EFS

Block

Amazon EBS (persistent)

Object

Amazon GlacierAmazon S3 Amazon EC2 Instance Store

(ephemeral)

Page 3: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

What to Expect from the Session

• Pick the right storage class for your use cases • Automate management tasks• Best practices to optimize S3 performance • Tools to help you manage storage

Page 4: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

AWS Direct Connect AWS Snowball ISV Connectors

Amazon Kinesis Firehose

S3 Transfer Acceleration

AWS Storage Gateway

Data transfer into Amazon S3

AWS Snowmobile

AWS Snowball Edge

Page 5: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Amazon Storage Partner Solutions

aws.amazon.com/backup-recovery/partner-solutions/Note: Represents a sample of storage partners

Backup and Recovery Primary Storage Archive

Solutions that leverage file, block, object, and streamed data formats as an extension to on-premises storage

Solutions that leverage Amazon S3 for durable data backup

Solutions that leverage Amazon Glacier for durable and cost-effective

long-term data backup

Page 6: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Choice of storage classes on S3

Standard

Active data Archive dataInfrequently accessed data

Standard - Infrequent Access Amazon Glacier

Page 7: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Storage classes designed for your use case

S3 Standard• Big data analysis• Content distribution• Static website

hosting

Standard - IA• Backup & archive• Disaster recovery• File sync & share• Long-retained data

Amazon Glacier• Long term archives• Digital preservation• Magnetic tape

replacement

Page 8: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

When should you move to Standard-IA?

S3 Analytics - storage class analysis

• Visualize the access pattern on your data over time

• Measure the object age where data is infrequently accessed

• Dive deep by bucket, prefixes, or specific object tag

• Easily create a lifecycle policy based on the analysis

Page 9: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Visualize access pattern on your data

Page 10: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks
Page 11: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Export S3 Analytics to the tools of your choice

Page 12: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks
Page 13: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Pick the right storage class for your use cases Automate management tasks• Best practices to optimize S3 performance • Tools to help you manage storage

Page 14: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Automate data management Lifecycle policies

• Automatic tiering and cost controls• Includes two possible actions:

• Transition: archives to Standard - IA or Amazon Glacier based on object age you specified

• Expiration: deletes objects after specified time

• Actions can be combined• Set policies by bucket, prefix, or tags• Set policies for current version or non-

current versions

Lifecycle policies

Page 15: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Set up a lifecycle policy on the AWS Management Console

Page 16: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks
Page 17: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks
Page 18: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Protect your data from accidental deletes

• Protects from unintended user deletes or application logic failures

• New version with every upload

• Easy retrieval of deleted objects and roll back to previous versions

Best Practice

Versioning

Page 19: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Easily recover from unintended deleteTip: Create a recycle bin for your storage

Versioning

Lifecycle policyNon-current

expiration

Recycle bin

Automaticcleaning

Best Practice

Page 20: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Automate with trigger-based workflowAmazon S3 event notifications

Events

SNS topic

SQS queue

Lambda function

• Notification when objects are created via PUT, POST, Copy, Multipart Upload, or DELETE

• Filter on prefixes and suffixes

• Trigger workflow with Amazon SNS, Amazon SQS, and AWS Lambda functions

Page 21: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Cross-region replicationAutomated, fast, and reliable asynchronous replication of data across AWS regions

Use cases:• Compliance - store data hundreds of miles apart• Lower latency - distribute data to regional customers• Security - create remote replicas managed by separate AWS accounts

How it works:• Only replicates new PUTs. Once configured, all new uploads into source

bucket will be replicated• Entire bucket or prefix based• 1:1 replication between any 2 regions• Versioning required• Deletes and lifecycle actions are not replicated

Page 22: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Summary – automate management tasks

Cross-region replication

Automate transition and expiration with

lifecycle policies

Trigger-based workflow with

event notification

Easily recover from accidental delete with versioning

Page 23: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Topics

Pick the right storage class for your use cases Automate management tasks Best practices to optimize S3 performance • Tools to help you manage storage

Page 24: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Faster upload over long distancesS3 Transfer Acceleration

S3 BucketAWS EdgeLocation

Uploader

OptimizedThroughput!

Change your endpoint, not your code

No firewall changes or client software

Longer distance, larger files, more benefit

Faster or free

68 global edge locations

Try it at S3speedtest.com

Page 25: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Faster upload of large objects Parallelize PUTs with multipart uploads

• Increase aggregate throughput by parallelizing PUTs on high-bandwidth networks• Move the bottleneck to the network,

where it belongs

• Increase resiliency to network errors; fewer large restarts on error-prone networks

Best Practice

Page 26: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Faster downloadYou can parallelize GETs as well as PUTs

GET /example-object HTTP/1.1 Host: example-bucket.s3.amazonaws.com x-amz-date: Fri, 28 Jan 2016 21:32:02 GMT Range: bytes=0-9 Authorization: AWS AKIAIOSFODNN7EXAMPLE:Yxg83MZaEgh3OZ3l0rLo5RTX11o=

For large objects, use range-based GETsalign your get ranges with your parts

For content distribution, enable Amazon CloudFront• Caches objects at the edge• Low latency data transfer to end user

Page 27: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

SQL Query on S3

Amazon Athena

• No loading of data

• Serverless

• Support text, CSV, TSV, JSON, AVRO, and columnar formats such as Apache ORC and Apache Parquet

• Access via Console or JDBC driver

• $5 per TB scanned from S3

Page 28: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Getting Started – Athena with Console

Page 29: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Query your S3 data using SQL

Run time and data scanned

Page 30: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

<my_bucket>/2013_11_13-164533125.jpg<my_bucket>/2013_11_13-164533126.jpg<my_bucket>/2013_11_13-164533127.jpg<my_bucket>/2013_11_13-164533128.jpg<my_bucket>/2013_11_12-164533129.jpg<my_bucket>/2013_11_12-164533130.jpg<my_bucket>/2013_11_12-164533131.jpg<my_bucket>/2013_11_12-164533132.jpg<my_bucket>/2013_11_11-164533133.jpg<my_bucket>/2013_11_11-164533134.jpg<my_bucket>/2013_11_11-164533135.jpg<my_bucket>/2013_11_11-164533136.jpg

Use a key-naming scheme with randomness at the beginning for high TPS

• Most important if you regularly exceed 100 TPS on a bucket• Avoid starting with a date or monotonically increasing numbers

Don’t do this…

Higher TPS by distributing key names

Page 31: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Distributing key names

Add randomness to the beginning of the key namewith a hash or reversed timestamp (ssmmhhddmmyy)

<my_bucket>/521335461-2013_11_13.jpg<my_bucket>/465330151-2013_11_13.jpg<my_bucket>/987331160-2013_11_13.jpg<my_bucket>/465765461-2013_11_13.jpg<my_bucket>/125631151-2013_11_13.jpg<my_bucket>/934563160-2013_11_13.jpg<my_bucket>/532132341-2013_11_13.jpg<my_bucket>/565437681-2013_11_13.jpg<my_bucket>/234567460-2013_11_13.jpg<my_bucket>/456767561-2013_11_13.jpg<my_bucket>/345565651-2013_11_13.jpg<my_bucket>/431345660-2013_11_13.jpg

Page 32: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Let S3 do the list for youS3 Inventory

Save Time Daily or Weekly Delivery Deliver to S3 BucketCSV

Flat File Output

Half the price of LIST API at $0.0025 per million objects listed

Page 33: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks
Page 34: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Best Practices - performance

Faster upload over long distances with S3 Transfer Acceleration

Faster upload for large objects with S3 multipart upload

Optimize GET performance with Range GET and CloudFront

SQL Query on S3 with Athena

Distribute key name for high TPS workload

Optimize list with S3 inventory

Page 35: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Topics

Pick the right storage class for your use cases Automate management tasks Best practices to optimize S3 performance Tools to help you manage storage

Page 36: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Organize your data with object tags

Manage data based on what it is as opposed to where its located

• Classify your data, up to 10 tags per object

• Tag your objects with key-value pairs

• Write policies once based on the type of data

• Put object with tag or add tag to existing objects

Storage Metrics& Analytics

Lifecycle PolicyAccess Control

Page 37: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Manage access with object tags

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject" ], "Resource": "arn:aws:s3:::EXAMPLE-BUCKET-NAME/*" "Condition": {"StringEquals": {"s3:RequestObjectTag/Project": "X"}} } ]}

User permission by tags

Page 38: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Use cases:• Perform security analysis • Meet your IT auditing and compliance needs • Take immediate action on activity

How it works:• Capture S3 object-level requests • Enable at the bucket level• Logs delivered to your S3 bucket• $0.10 per 100,000 data events

Audit and monitor accessAWS CloudTrail data events

Page 39: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Monitor performance and operationAmazon CloudWatch metrics for S3

• Generate metrics for data of your choice• Entire bucket, prefixes, and tags• Up to 1,000 groups per bucket

• 1-minute CloudWatch metrics• Alert and alarm on metrics• $0.30 per metric per month

Page 40: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks
Page 41: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

CloudWatch Metrics for S3

Metric Name valueAllRequests CountPutRequests CountGetRequests CountListRequests CountDeleteRequests CountHeadRequests CountPostRequests Count

Metric Name valueBytesDownloaded MBBytesUploaded MB4xxErrors Count5xxErrors CountFirstByteLatency msTotalRequestLatency ms

Page 42: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Example

Page 43: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Summary – manage your storage

Classify storage and manage access with S3 object tags

Audit and monitor access with CloudTrail

Monitor operational performance and set alarm with S3 CloudWatch metrics

Page 44: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Recap

Pick the right storage class for your use cases Automate management tasks Best practices to optimize S3 performance Tools to help you manage storage

Page 45: Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks

Thank you!