storage on aws - imday-southeast.comimday-southeast.com/storage_on_aws.pdfor distributed in whole or...
TRANSCRIPT
Storage on AWS
©2017AmazonWebServices,Inc.anditsaffiliates.Allrightsserved.Maynotbecopied,modified,ordistributedinwholeorinpartwithouttheexpressconsentofAmazonWebServices,Inc.
Agenda
• Introduction• Storage Primer• Block Storage• Shared File Systems• Object Store• On-Premises Storage Integration
Introduction: Why choose AWS for storage
Compelling Economics Easy to Use Reduce risk
Speed, Agility, Scale
Pay as you go
No upfront investmentNo commitment
No risky capacity planning
No need to provision for redundancy or overhead
Self service administration
SDKs for simple integration
Durable and Secure
Avoid risks of physical media handling
Reduce time to market
Focus on your business, not your infrastructure
Block vs File vs ObjectBlock StorageRaw StorageData organized as an array of unrelated blocksHost File System places data on diske.g.: Microsoft NTFS, Unix ZFS
File StorageUnrelated data blocks managed by a file (serving) systemNative file system places data on disk
Object StorageStores Virtual containers that encapsulate the data, data attributes, metadata and Object IDsAPI Access to dataMetadata Driven, Policy-based, etc
Storage - Characteristics
Durability Availability Security Cost Scalability Performance IntegrationMeasure of expected data loss
Measure of expected downtime
Security measures in place
Amount per storage unit, e.g. $ / GB
Upwardflexibility
Performancemetrics
Ability to interact with
Some of the ways we look at storage
AWS has a variety of storage optionsAmazon EBS (Elastic Block Storage)
Amazon Elastic File System (EFS)
Amazon EC2 Instance Store (Ephemeral Volumes)
Amazon S3 (Simple Storage Service)
Amazon Glacier
AWS Storage Gateway: File Gateway
Amazon Snowball & Snowball Edge
AWS Snowmobile
AWS also has a variety of database options
Amazon EC2 (Self Managed)
Amazon RDS (Relational Database Service)
Amazon DynamoDB
Amazon ElastiCache
Amazon Redshift
Amazon EBS
• Persistent block level storage for EC2• Pay only for what you provision• Native redundancy and write cache• Consistent and low-latency performance• Optimized for random I/O• Native support for encryption at rest (data volumes)
Amazon EBS
• Network attached block device– Independent data lifecycle– Virtual disks– Multiple volumes per EC2 instance– Only one EC2 instance at a time per volume– Can be detached from an instance and attached to a different one
• Raw block devices– Unformatted block devices– Ideal for databases, filesystems
• Available in multiple types
AWS EBS Features
Durable Secure
Low-latency SSD Consistent I/O PerformanceStripe multiple volumes for higher I/O performance
Identity and Access PoliciesEncryption
ScalableUnlimited capacity when you need itEasily scale up and down
Performance Backup
Designed for five9’s reliabilityRedundant storage across multiple devices within an AZ
Point-in-time SnapshotsCopy snapshots across AZ and Regions
Amazon EBS• Highly available block storage for all types of data
Internet-scale storage Grow without limits
Benefit from AWS’s massive security investments
Built-in redundancyDesigned for 99.999% availability
Low price per GB per monthNo commitmentNo up-front cost
EBS Volume Types ComparisonMagnetic General Purpose
(SSD)Provisioned IOPS (SSD)
Performance Lowest Cost Burstable PredictableUse Cases Infrequent Data
AccessBoot volumesSmall to Medium DBsDev & Test
I/O IntensiveRelational & NoSQL
Media Magnetic (HDD) SSD SSDMax IOPS 100 on average with
the ability to burst to hundreds of IOPS
Baseline 3 IOPS/GBBurstable to 3,000 IOPS
Consistently performed at provisioned level, up to 20,000 IOPS
Price $.05/GB/Month$.05/million I/O
$.10/GB/MonthI/O Operations - Free
$.125/GB/Month$.065/provisioned IOPS
EBS Volume TypesSolid-State Drives (SSD) Hard disk Drives (HDD)
Volume Type General Purpose SSD (gp2)*
Provisioned IOPS SSD (io1)
Throughput Optimized HDD (st1)
Cold HDD (sc1)
Description General purpose SSD volume that balances price and performance for a wide variety of transactional workloads
Highest-performance SSD volume designed for mission-critical applications
Low cost HDD volume designed for frequently accessed, throughput-intensive workloads
Lowest cost HDD volume designed for less frequently accessed workloads
Use Cases • Recommended for most workloads
• System boot volumes• Virtual desktops• Low-latency interactive
apps• Dev and test environments
• Critical business applications that require sustained IOPS performance, or more than 10,000 IOPS or 160 MiB/s of throughput per volume
• Large database workloads
• Streaming workloads requiring consistent, fast throughput at a low price
• Big data• Data warehouses• Log processing• Cannot be a boot volume
• Throughput-oriented storage for large volumes of data that is infrequently accessed
• Scenarios where the lowest storage cost is important
• Cannot be a boot volume
Volume Size 1 GiB - 16 TiB 4 GiB - 16 TiB 500 GiB - 16 TiB 500 GiB - 16 TiBMax. IOPS**/Volume 10,000 20,000 500 250Max. Throughput/Volume†
160 MiB/s 320 MiB/s 500 MiB/s 250 MiB/s
Max. IOPS/Instance 65,000 65,000 65,000 65,000Max. Throughput/Instance
1,250 MiB/s 1,250 MiB/s 1,250 MiB/s 1,250 MiB/s
Dominant Performance Attribute
IOPS IOPS MiB/s MiB/s
*Default volume type**gp2/io1 based on 16KiB I/O size, st1/sc1 based on 1 MiB I/O size† To achieve this throughput, you must have an instance that supports it, such as r3.8xlarge or x1.32xlarge.
IOPS Token Bucket Model
分
• Each token represents an “I/O credit” that pays for one read or one write.
• A bucket is associated with each General Purpose (SSD) volume, and can hold up to 5.4 million tokens.
• Tokens accumulate at a rate of 3 per configured GB per second, up to the capacity of the bucket.
• Tokens can be spent at up to 3000 per second per volume.
• The baseline performance of the volume is equal to the rate at which tokens are accumulated — 3 IOPS per GB per second.
Magnetic General Purpose (SSD)
Provisioned IOPS (SSD)
Performance
Lowest Cost Burstable Predictable
Use Cases
Infrequent Data Access
Boot volumesSmall to Medium DBsDev & Test
I/O IntensiveRelational & NoSQL
Media Magnetic (HDD) SSD SSD
Max IOPS
100 on averagewith the ability to burst to hundreds of IOPS
Baseline 3 IOPS/GBBurstable to 3,000 IOPS
Consistently performed at provisioned level, up to 20,000 IOPS
Price $.05/GB/Month$.05/million I/O
$.10/GB/MonthI/O Operations -Free
$.125/GB/Month$.065/provisioned IOPS
EBS Provisioned IOPS
• EBS Optimized Instances• Dedicated storage throughput
• Predictable Performance• 100-20000 IOPS per volume• Single digit millisecond latency
• Performance Design• Deliver within 10% of PIOPs, 99.9% of
the time
Magnetic General Purpose (SSD)
Provisioned IOPS (SSD)
Performance
Lowest Cost Burstable Predictable
Use Cases
Infrequent Data Access
Boot volumesSmall to Medium DBsDev & Test
I/O IntensiveRelational & NoSQL
Media Magnetic (HDD) SSD SSD
Max IOPS
100 on averagewith the ability to burst to hundreds of IOPS
Baseline 3 IOPS/GBBurstable to 3,000 IOPS
Consistently performed at provisioned level, up to 20,000 IOPS
Price $.05/GB/Month$.05/million I/O
$.10/GB/MonthI/O Operations -Free
$.125/GB/Month$.065/provisioned IOPS
Enhanced Throughput for PIOPS & GP2 Volumes
• Maximum attainable throughput to each volume now at 500 MB/s read or write traffic (on instance that supports r3.8xl or x1.32xl)
• An I/O request of up to 256 KB is now counted as a single I/O operation (IOP)
• In many cases you can configure the block size used by your application
• Capable of dramatically reducing your storage costs
Magnetic General Purpose (SSD)
Provisioned IOPS (SSD)
Performance
Lowest Cost Burstable Predictable
Use Cases
Infrequent Data Access
Boot volumesSmall to Medium DBsDev & Test
I/O IntensiveRelational & NoSQL
Media Magnetic (HDD) SSD SSD
Max IOPS
100 on averagewith the ability to burst to hundreds of IOPS
Baseline 3 IOPS/GBBurstable to 3,000 IOPS
Consistently performed at provisioned level, up to 20,000 IOPS
Price $.05/GB/Month$.05/million I/O
$.10/GB/MonthI/O Operations -Free
$.125/GB/Month$.065/provisioned IOPS
Amazon EBS at 20,000 IOPS
• Provisioned IOPS (SSD)– Max Volume 16 TB– Max I/O rate 20,000 IOPS– Max throughput 320 MB/s
• General Purpose (SSD)– Max Volume 16 TB– Max I/O rate 10,000 IOPS– Max throughput 160 MB/s
Magnetic General Purpose (SSD)
Provisioned IOPS (SSD)
Performance
Lowest Cost Burstable Predictable
Use Cases
Infrequent Data Access
Boot volumesSmall to Medium DBsDev & Test
I/O IntensiveRelational & NoSQL
Media Magnetic (HDD) SSD SSD
Max IOPS
100 on averagewith the ability to burst to hundreds of IOPS
Baseline 3 IOPS/GBBurstable to 3,000 IOPS
Consistently performed at provisioned level, up to 20,000 IOPS
Price $.05/GB/Month$.05/million I/O
$.10/GB/MonthI/O Operations -Free
$.125/GB/Month$.065/provisioned IOPS
Internet
AWS Cloud
EBS Snapshots
EC2 Availability Zone
EC2
Amazon S3
EBS
EC2 EC2
EBS EBS EBS EBS EBS EBS Snapshot
EBS Snapshot
EBS Snapshot
EBS Snapshot
EBS Snapshot
Create Snapshot
Clone From Snapshot
EBS Volume
How Do Snapshots Work?Time
Snapshot 1 Snapshot 2 Snapshot 3
S3
Block 1Block 2Block 3Block 4
Chunk 1Chunk 2Chunk 3Chunk 4
EC2 Instance Store (Ephemeral Volumes)
• Free with your EC2 instance– SAS and SSD options– Size/type based on instance type
• Local, direct attached resource• Consistent sequential reads and writes• Use only for non-persistent data
Elastic File System (EFS)• Fully managed file system for EC2 instances• Provides standard file system semantics• Works with standard operating system APIs• Sharable across thousands of instances• Elastically grows to petabyte scale• Delivers performance for a wide variety of workloads• Highly available and durable• NFS v4–based• Accessible from on-prem servers New!
Amazon EFS is Simple
• Fully managed- No hardware, network, file layer- Create a scalable file system in seconds!
• Seamless integration with existing tools and apps- NFS v4.1—widespread, open- Standard file system access semantics- Works with standard OS file system APIs
• Simple pricing = simple forecasting
1
Amazon EFS is Elastic
• File systems grow and shrink automatically as you add and remove files
• No need to provision storage capacity or performance
• You pay only for the storage space you use, with no minimum fee
2
• File systems can grow to petabyte scale
• Throughput and IOPS scale automatically as file systems grow
• Consistent low latencies regardless of file system size
• Support for thousands of concurrent NFS connections
Amazon EFS is Scalable3
• Designed to sustain AZ offline conditions
• Resources aggregated across multiple AZ’s
• Superior to traditional NAS availability models
• Appropriate for Production / Tier 0 applications
Highly Durable and Highly Available
Example use cases
• Big Data Analytics
• Media Workflow Processing
• Web Serving
• Content Management
• Home Directories
EFS – MountingEFS
EC2EC2 EC2 EC2EC2
EFSDNS Nameavailability-zone.file-system-id.efs.aws-region.amazonaws.com
Mountonmachinesudo mount -t nfs4 mount-target-DNS:/ ~/efs-mount-point
EC2
Amazon S3 (Simple Storage Service)
• Web accessible object store• Pay for exactly what you use• Highly durable (99.999999999% design)• Limitlessly scalable• Natively online• Two flavors:
– Standard Storage - $0.023 * per GB / mo– Standard – Infrequent Access Storage (min size 128KB) – $0.0125* per GB / mo + Data
retrieval cost* (US East (N Virginia) pricing)
Amazon S3 (Simple Storage Service)• Parallel I/O for max speed (Multipart Upload, Ranged GETs)
• Resource-level IAM permissions• Bucket Policies & ACLs• Direct access through APIs• Server Side Encryption• Static Website Hosting• Data Lifecycle Rules• Amazon Athena – New
– Interactive Query Service that makes it easy to analyze data in Amazon S3 using standard SQL
Object Storage Tiering
S3 Standard
• Primary data• Big Data
Analytics• Small objects• Temporary
scratch space
S3 - IA
• File sync and share
• Active Archive• Enterprise backup• Media transcoding• Geo-
redundancy/DR
Glacier
• Deep/offline archives
• Tape vaulting replacement
• WORM-compliant data
Data tiering using S3 Life Cycle Policies
Object Storage Use Cases
S3
S3-IA
Glacier
Cloud Applications
Big DataAnalytics
Content Distribution Primary Data
File Sync & Share
ActiveArchive
EnterpriseBackup
MediaTranscoding
Disaster Recovery /Geo Redundancy
Deep / Offline
Archives
Tape Vaulting Replacement
WORM Compliant
Data
Temporary & Small
Objects
Lifecycle
AvailableS3: 99.99%
S3-IA: 99.9%
PerformantLow Latency
High Throughput≥ 30 Days≥ 128K
≥ 90 Days
Durable99.999999999%
ScalableElastic capacity No preset limits
> 0K$0.004 / GB per month
$0.0125 / GB per month
“Hot” DataActive and/or
Temporary Data
“Warm” DataInfrequently
Accessed Data
“Cold” DataArchive and
Compliance Data
≥ 0 Days> 0KStarts at $0.023 / GB per month
1-5 mins
$0.01/GB retrieval
Storage Tiered To Your Requirements
S3-IA
Glacier
S3
3 new retrieval options
3–5 hrs 5–12 hrs
Expedited Standard Bulk$0.03 / GB $0.01 / GB $0.0025 / GB
S3 Storage Management Features New!
S3 Object Tagging - manage and control access for Amazon S3 objects. • Object Tags are key-value pairs applied to S3 objects which can be created, updated or deleted at any time during the lifetime of the object.
– provide the ability to create Identity and Access Management (IAM) policies, setup S3 Lifecycle policies, and customize storage metrics.
– manage transitions between storage classes and expire objects in the background.
S3 Analytics, Storage Class Analysis - you can analyze storage access patterns and transition the right data to the right storage class. • automatically identifies the optimal lifecycle policy to transition less frequently accessed storage to SIA. • configure a storage class analysis policy to monitor an entire bucket, a prefix, or object tag. Once an infrequent access pattern is observed,
easily create a new lifecycle age policy based on the results. • provides daily visualizations of your storage usage in the AWS Management Console that can be exported to an S3 bucket to analyze using
the business intelligence tools of your choice, such as Amazon QuickSight.
S3 Inventory – simplify and speed up business workflows and big data jobs• provides a CSV (Comma Separated Values) flat-file output of your objects and their corresponding metadata on a daily or weekly basis for
an S3 bucket or a shared prefix.
S3 CloudWatch Metrics - understand and improve the performance of your applications that use S3 • monitoring and alarming on 13 new S3 CloudWatch Metrics• receive 1-minute CloudWatch Metrics, set CloudWatch alarms, and access CloudWatch dashboards to view real-time operations and
performance such as bytes downloaded and the 4xx HTTP response count of your Amazon S3 storage. • For web and mobile applications that depend on cloud storage, these let you quickly identify and act on operational issues.
Amazon Glacier• Low-Cost Archival Storage• Secure
• SSL & AES-256
• Durable• Designed for 99.999999999% durability
• Optimized for data archiving and backup• Suitable for RTO measured in hours• Includes storage costs and retrieval costs
• Three retrieval options: Expedited, Standard, Bulk • As little as $0.004 per GB/Month (US East pricing)• Integrated with S3
Amazon CloudFront• Easy-to-use Content Delivery Network (CDN)• Pay-as-you-go pricing• Multiple origins: S3, EC2, on-premise
• Worldwide network of 68+ edge locations• Video streaming• Geo Restriction• Custom SSL Certificates• Dynamic Content• POST/PUT
Storage Gateway hybrid storage solutionsEnables using standard storage protocols to access AWS storage services
AWS StorageGateway
Amazon EBS snapshots
Amazon S3
Amazon Glacier
AWS Identity and Access Management (IAM)
AWS Key Management Service (KMS)
AWS CloudTrail
Amazon CloudWatch
Files
Volumes
Tapes
Storage Gateway – Files, volumes, and tapes
File gateway NFS (v3 and v4.1) interfaceOn-premises file storage backed by Amazon S3 objects
Tape gateway iSCSI virtual tape library interfaceVirtual tape storage in Amazon S3 and Glacier with VTL management
Volume gateway iSCSI block interfaceOn-premises block storage backed by S3 with EBS snapshots
Storage Gateway – Common capabilities
Standard storage protocols integrate with on-premises applications
Local caching for low-latency access to frequently used data
Efficient data transfer with buffering and bandwidth management
Native data storage in AWS
Stateless virtual appliance for resiliency
Integrated with AWS management and security
File gatewayOn-premises file storage maintained as objects in Amazon S3
Customer Premises
FileGateway
• Data stored and retrieved from your S3 buckets• One-to-one mapping from files-to-objects• File metadata stored in object metadata• Bucket access managed by IAM role you own and manage• Use S3 Lifecycle Policies, versioning, or CRR to manage data
GlacierS3 Standard
S3 Standard -Infrequent
Access
HTTPSNFSv3 / v4.1
Application Server
Application Server
Volume gatewayOn-premises volume storage backed by Amazon S3 with EBS snapshots
• Block storage in S3 accessed via the volume gateway• Data compressed in-transit and at-rest• Backup on-premises volumes to EBS snapshots• Create on-premises volumes from EBS snapshots• Up to 1PB of total volume storage per gateway
Amazon EBS
snapshots
Storage Gatewaybucket in
Amazon S3
Customer Premises
VolumeGateway
iSCSI HTTPS
Tape gatewayVirtual tape storage in Amazon S3 and Glacier with VTL management
• Virtual tape storage in S3 and Glacier accessed via tape gateway• Data compressed in-transit and at-rest• Unlimited virtual tape storage, with up to 1PB of tapes active in library• Supports leading backup applications:
Archived Tapes stored in
Amazon Glacier
MED
IA
CH
ANG
ERTA
PE
DR
IVE
Customer Premises
TapeGateway
Virtual Tapesstored in
Amazon S3BackupServer
HTTPSiSCSI
Hybrid storage use cases with Storage Gateway
Enabling cloud workloadsMove data to AWS storage for Big Data, cloud bursting, or migration
Tiered cloud storageEasily add AWS storage to your on-premises environment
Backup, archive, and disaster recoveryCost effective storage in AWS with local or cloud restore
Storage Gateway – Key Benefits
Seamless integration across standard storage protocols
Low-latency access
Durability, cost, and elasticity of AWS Storage services
Efficient data transfer
Data encryption
Integrated with AWS monitoring, management, and security
Amazon Snowball & Snowball Edge
• Petabyte scale data transport• Uses secure appliances• Economic and fast• Faster than Internet for significant data sets• Import into S3• HIPAA Compliant New
What is Snowball? Petabyte scale data transport
E-ink shipping label
Ruggedizedcase
“8.5G Impact”
All data encrypted end-to-end
80 TB10G network
Rain & dust resistant
Tamper-resistant case & electronics
How fast is Snowball?
• Less than 1 day to transfer 250TB via 5x10G connections with 5 Snowballs, less than 1 week including shipping
• Number of days to transfer 250TB via the Internet at typical utilizations
InternetConnectionSpeedUtilization 1Gbps 500Mbps 300Mbps 150Mbps
25% 95 190 316 63250% 47 95 158 31675% 32 63 105 211
Amazon Snowmobile• Exabyte-scale data transfer service• Each Snowmobile can transfer up to 100PB• Delivered to your site like a container• Connects to your network via removable high-speed network
switch• Appears as network-attached data store• Once connected secure, high speed data transfer begins• After data transfer, Snowmobile driven back to AWS and data is
loaded into AWS service you select e.g. S3, Redshift, Glacier
Using Multiple Storage Options Together
• EBS + S3: snapshots
• S3 + EC2 Instance Store: caching
• S3 + CloudFront: edge caching
• S3 + Glacier: data lifecycle archiving
Amazon Athena(GA: US East (N. Virginia) and US West (Oregon) )
• Interactive Query Service that makes it easy to analyze data in Amazon S3 using standard SQL
• Interactive query service
• Analyze data directly in Amazon S3
• Use standard (ANSI) SQL
• No ETL required
• Fast performance. Scales automatically
• Serverless. Zero infrastructure. Zero admin
• Pay only for the queries you run
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Aurora with PostgreSQL compatibility (Preview)
• Full PostgreSQL compatibility and up to twice the performance• All the features of Amazon Aurora
• Availability: failover time of < 30 seconds
• Durability: 6 copies across 3 Availability Zones
• Read Replicas: single-digit millisecond lag times on up to 15 replicas
• Cloud-native security and encryption with AWS KMS, IAM, etc.\
• Easy migration using with AWS Database Migration Service and AWS Schema Conversion Tool
Introducing AWS Snowball Edge
Lambda function
Lambda Functions On-board
Snowballclusters
S3 compatible endpoint,
NFS mount point
100 TB of capacity
Petabyte–scale data transport with on-board compute
AWS Snowball Edge Use Cases
Extension of your data center
Process data Expedites move
Encrypted, secure, and embedded
compute
Write data directly as data is generated
Offers a fast and cost effective way to ensure data can be
quickly transferred to and from the cloud
Simplifies data transfer
Uses standard and familiar tools
for the data transfer process
Introducing AWS Snowmobile• 45-foot long ruggedized shipping container
• Up to 100PB of capacity
• Load data S3 or Glacier
• Dedicated security personnel, GPS tracking,
alarm monitoring, 24/7 video surveillance,
and optional escort security while in transit
• Data encrypted with 256-bit encryption keys,
managed through KMS
AWS Snowmobile Use Cases• Move storage to cloud (images, media files, archives)
• Data center shut down• Available to customers in US only• Each engagement will have customized pricing
based on:• Volume of data the customer would like to migrate• Data center set up• Duration of use• Published pricing guideline will be $0.005/GB per month