storage on aws - amazon s3 · pdf fileebs volume types comparison magnetic general purpose...
TRANSCRIPT
Storage on AWS
©2015AmazonWebServices,Inc.anditsaffiliates.Allrightsserved.Maynotbecopied,modified,ordistributedinwholeorinpartwithouttheexpressconsentofAmazonWebServices,Inc.
Agenda
• Storage Primer• Block Storage• Shared File Systems• Object Store• On-Premises Storage Integration• Structured Data Store
Block vs File vs ObjectBlock StorageRaw StorageData organized as an array of unrelated blocksHost File System places data on diske.g.: Microsoft NTFS, Unix ZFS
File StorageUnrelated data blocks managed by a file (serving) systemNative file system places data on disk
Object StorageStores Virtual containers that encapsulate the data, data attributes, metadata and Object IDsAPI Access to dataMetadata Driven, Policy-based, etc
Structured storage - Databases
Relational DatabasesStatic SchemaHighly structured table organizationRigid data format
Document StoreDynamic SchemaKey/Value DatabaseCollection of complex documentsArbitrary, nested data format
Storage - Characteristics
Durability Availability Security Cost Scalability Performance Integration
Measure of expected data loss
Measure of expected downtime
Security measures in place
Amount per storage unit, e.g. $ / GB
Upwardflexibility
Performancemetrics
Ability to interact with
Some of the ways we look at storage
AWS has a variety of storage optionsAmazon EBS (Elastic Block Storage)
Amazon Elastic File System (EFS)
Amazon EC2 Instance Store (Ephemeral Volumes)
Amazon S3 (Simple Storage Service)
Amazon Glacier
AWS Storage Gateway
Amazon Import/Export Snowball
AWS also has a variety of database options
Amazon EC2 (Self Managed)
Amazon RDS (Relational Database Service)
Amazon DynamoDB
Amazon ElastiCache
Amazon Redshift
Amazon EBS
• Persistent block level storage for EC2• Pay only for what you provision• Native redundancy and write cache• Consistent and low-latency performance• Optimized for random I/O• Native support for encryption at rest (data volumes)
Amazon EBS
• Network attached block device– Independent data lifecycle– Virtual disks– Multiple volumes per EC2 instance– Only one EC2 instance at a time per volume– Can be detached from an instance and attached to a different one
• Raw block devices– Unformatted block devices– Ideal for databases, filesystems
• Available in multiple types
EBS Volume Types ComparisonMagnetic General Purpose
(SSD)Provisioned IOPS (SSD)
Performance Lowest Cost Burstable PredictableUse Cases Infrequent Data
AccessBoot volumesSmall to Medium DBsDev & Test
I/O IntensiveRelational & NoSQL
Media Magnetic (HDD) SSD SSDMax IOPS 100 on average with
the ability to burst to hundreds of IOPS
Baseline 3 IOPS/GBBurstable to 3,000 IOPS
Consistently performed at provisioned level, up to 20,000 IOPS
Price $.05/GB/Month$.05/million I/O
$.10/GB/MonthI/O Operations - Free
$.125/GB/Month$.065/provisioned IOPS
IOPS Token Bucket Model
分
• Each token represents an “I/O credit” that pays for one read or one write.
• A bucket is associated with each General Purpose (SSD) volume, and can hold up to 5.4 million tokens.
• Tokens accumulate at a rate of 3 per configured GB per second, up to the capacity of the bucket.
• Tokens can be spent at up to 3000 per second per volume.
• The baseline performance of the volume is equal to the rate at which tokens are accumulated — 3 IOPS per GB per second.
Magnetic General Purpose (SSD)
Provisioned IOPS (SSD)
Performance
Lowest Cost Burstable Predictable
Use Cases
Infrequent Data Access
Boot volumesSmall to Medium DBsDev & Test
I/O IntensiveRelational & NoSQL
Media Magnetic (HDD) SSD SSD
Max IOPS
100 on averagewith the ability to burst to hundreds of IOPS
Baseline 3 IOPS/GBBurstable to 3,000 IOPS
Consistently performed at provisioned level, up to 20,000 IOPS
Price $.05/GB/Month$.05/million I/O
$.10/GB/MonthI/O Operations -Free
$.125/GB/Month$.065/provisioned IOPS
EBS Provisioned IOPS
• EBS Optimized Instances• Dedicated storage throughput
• Predictable Performance• 100-20000 IOPS per volume• Single digit millisecond latency
• Performance Design• Deliver within 10% of PIOPs, 99.9% of
the time
Magnetic General Purpose (SSD)
Provisioned IOPS (SSD)
Performance
Lowest Cost Burstable Predictable
Use Cases
Infrequent Data Access
Boot volumesSmall to Medium DBsDev & Test
I/O IntensiveRelational & NoSQL
Media Magnetic (HDD) SSD SSD
Max IOPS
100 on averagewith the ability to burst to hundreds of IOPS
Baseline 3 IOPS/GBBurstable to 3,000 IOPS
Consistently performed at provisioned level, up to 20,000 IOPS
Price $.05/GB/Month$.05/million I/O
$.10/GB/MonthI/O Operations -Free
$.125/GB/Month$.065/provisioned IOPS
Enhanced Throughput for PIOPS & GP2 Volumes
• Maximum attainable throughput to each volume was doubled to 128 MB/s read or write traffic
• An I/O request of up to 256 KB is now counted as a single I/O operation (IOP)
• In many cases you can configure the block size used by your application
• Capable of dramatically reducing your storage costs
Magnetic General Purpose (SSD)
Provisioned IOPS (SSD)
Performance
Lowest Cost Burstable Predictable
Use Cases
Infrequent Data Access
Boot volumesSmall to Medium DBsDev & Test
I/O IntensiveRelational & NoSQL
Media Magnetic (HDD) SSD SSD
Max IOPS
100 on averagewith the ability to burst to hundreds of IOPS
Baseline 3 IOPS/GBBurstable to 3,000 IOPS
Consistently performed at provisioned level, up to 20,000 IOPS
Price $.05/GB/Month$.05/million I/O
$.10/GB/MonthI/O Operations -Free
$.125/GB/Month$.065/provisioned IOPS
Amazon EBS at 20,000 IOPS
• Provisioned IOPS (SSD)– Max Volume 16 TB– Max I/O rate 20,000 IOPS– Max throughput 320 MB/s
• General Purpose (SSD)– Max Volume 16 TB– Max I/O rate 10,000 IOPS– Max throughput 160 MB/s
Magnetic General Purpose (SSD)
Provisioned IOPS (SSD)
Performance
Lowest Cost Burstable Predictable
Use Cases
Infrequent Data Access
Boot volumesSmall to Medium DBsDev & Test
I/O IntensiveRelational & NoSQL
Media Magnetic (HDD) SSD SSD
Max IOPS
100 on averagewith the ability to burst to hundreds of IOPS
Baseline 3 IOPS/GBBurstable to 3,000 IOPS
Consistently performed at provisioned level, up to 20,000 IOPS
Price $.05/GB/Month$.05/million I/O
$.10/GB/MonthI/O Operations -Free
$.125/GB/Month$.065/provisioned IOPS
Internet
AWS Cloud
EBS Snapshots
EC2 Availability Zone
EC2
Amazon S3
EBS
EC2 EC2
EBS EBS EBS EBS EBSEBS Snapshot
EBS Snapshot
EBS Snapshot
EBS Snapshot
EBS Snapshot
Create Snapshot
Clone From Snapshot
EBS Volume
How Do Snapshots Work?Time
Snapshot 1 Snapshot 2 Snapshot 3
S3
Block 1Block 2Block 3Block 4
Chunk 1Chunk 2Chunk 3Chunk 4
EC2 Instance Store (Ephemeral Volumes)
• Free with your EC2 instance– SAS and SSD options– Size/type based on instance type
• Local, direct attached resource• Consistent sequential reads and writes• Use only for non-persistent data
Elastic File System (EFS)
• Fully managed file system for EC2 instances• Provides standard file system semantics• Works with standard operating system APIs• Sharable across thousands of instances• Elastically grows to petabyte scale• Delivers performance for a wide variety of workloads• Highly available and durable• NFS v4–based
EFS – MountingEFS
EC2EC2 EC2 EC2EC2
EFSDNS Nameavailability-zone.file-system-id.efs.aws-region.amazonaws.com
Mountonmachinesudo mount -t nfs4 mount-target-DNS:/ ~/efs-mount-point
EC2
Amazon S3 (Simple Storage Service)
• Web accessible object store• Pay for exactly what you use• Highly durable (99.999999999% design)• Limitlessly scalable• Natively online• Two flavors:
– Standard Storage - $0.0300* per GB / mo– Standard – Infrequent Access Storage (min size 128KB) – $0.0125* per GB / mo + Data
retrieval cost* (US East (N Virginia) pricing)
Amazon S3 (Simple Storage Service)• Parallel I/O for max speed (Multipart Upload, Ranged GETs)
• Resource-level IAM permissions• Bucket Policies & ACLs• Direct access through APIs• Server Side Encryption• Static Website Hosting• Data Lifecycle Rules
Amazon Glacier
• Low-Cost Archival Storage• Secure
• SSL & AES-256
• Durable• Designed for 99.999999999% durability
• Optimized for data archiving and backup• Suitable for RTO measured in hours• Includes storage costs and retrieval costs
• $0.007 per GB/Month (US East pricing)• Integrated with S3
Amazon CloudFront• Easy-to-use Content Delivery Network (CDN)• Pay-as-you-go pricing• Multiple origins: S3, EC2, on-premise
• Worldwide network of 53+ edge locations• Video streaming• Geo Restriction• Custom SSL Certificates• Dynamic Content• POST/PUT
AWS Storage Gateway• VM Appliance run on-premise• Creates iSCSI volume mount points• Directly interfaces with S3 or Glacier
• Gateway-Stored Volumes• Gateway-Cached Volumes• Virtual Tape Library
Amazon Import/Export Snowball
• Petabyte scale data transport• Uses secure appliances• Economic and fast• Faster than Internet for significant data sets• Import into S3
A fully managed SQL database service
Simple to deploy and scale
Without any operational burden
Reliable and cost effective
Choice of Database engines
Amazon RDS
Amazon Aurora
If you host your databases on-premises
Power, HVAC, netRack & stack
Server maintenance
OS patches
DB s/w patchesDatabase backups
ScalingHigh availability
DB s/w installs
OS installation
you
App optimization
If you host your databases on-premises
Power, HVAC, netRack & stack
Server maintenance
OS patches
DB s/w patchesDatabase backups
ScalingHigh availability
DB s/w installs
OS installation
you
App optimization
If you host your databases in EC2
Power, HVAC, netRack & stack
Server maintenance
OS patches
DB s/w patchesDatabase backups
ScalingHigh availability
DB s/w installs
OS installation
you
App optimization
If you host your databases in EC2
OS patches
DB s/w patchesDatabase backups
ScalingHigh availability
DB s/w installs
you
App optimization
Power, HVAC, netRack & stack
Server maintenanceOS installation
If you choose a managed DB service like RDS
Power, HVAC, netRack & stack
Server maintenance
OS patches
DB s/w patchesDatabase backups
App optimization
High availability
DB s/w installs
OS installation
you
Scaling
• key-value access• complex queries• transactions• analytics
Traditional Database Architecture
App/Web Tier
Client Tier
RDBMS
Data Tier
Cache Data Warehouse
RDBMSNoSQL
App/Web Tier
Client Tier
bestdatabaseforeachworkload
Cloud Data Tier Architecture
Data Tier
Cache Data Warehouse
RDBMSNoSQL
key/valuesimplequery
hotreads analytics
complexqueries&transactions
Workload Driven Data Store Selection
Data Tierkey/value
simplequery
hotreads analytics
complexqueries&transactions
Workload Driven Data Store Selection
Amazon ElastiCache
Amazon DynamoDB
Amazon Redshift
Amazon RDS
Amazon DynamoDB
• Fully managed NoSQL database service• Massively scalable, distributed key/value store• Reserved capacity model• Fast and predictable• Built-in fault tolerance• Strong consistency model• Unlimited potential storage and throughput
Amazon ElastiCache
• In-memory cache in the cloud• Improve latency and throughput for read-heavy
workloads• Supports open-source caching engines
– Memcached– Redis
• Examples– Caching of MySQL database query results– Caching of complex query post-processing results
Amazon Redshift• Fast and powerful, petabyte-scale data
warehouse– Fully managed– Highly-parallel– Columnar Data Store
• Data warehouse-type queries– Aggregations, historical analysis– BI Tool integration
• Grow with your data– 160 GB à 1.6 PB
• SSD and SAS Options– SSD provides 10-15x perf @ 5.5x the cost/tb/year
Using Multiple Storage Options Together
• EBS + S3: snapshots
• S3 + EC2 Instance Store: caching
• S3 + CloudFront: edge caching
• S3 + Glacier: data lifecycle archiving
• RDS + ElastiCache: cached queries