aws re:invent 2016: deep dive on amazon elastic file system (stg202)

Post on 23-Jan-2018

1.627 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

November 30, 2016

Amazon EFS Deep DiveEdward Naim, Head of Product

Darryl Osborne, Storage Solutions Architect

David Green, Enterprise Solutions Architect

STG202

What to expect from this session

Recognize why and when to use Amazon EFS

Understand key technical/security concepts

Learn how to leverage EFS’s performance

See EFS in action (hands-on)

Review EFS’s economics

Discover some of our upcoming feature plans

What to expect from this session

Recognize why and when to use Amazon EFS

Understand key technical/security concepts

Learn how to leverage EFS’s performance

See EFS in action (hands-on)

Review EFS’s economics

Discover some of our upcoming feature plans

Data Transfer

Direct

Connect

Snowball 3rd Party

Connectors

Transfer

Acceleration

Storage

GatewayKinesis Firehose

File

Amazon EFS

Block

Amazon EBS (persistent)

Object

Amazon GlacierAmazon S3 Amazon EC2

Instance Store (ephemeral)

How EFS fits in to the AWS storage platform

Data Transfer

Direct

Connect

Snowball 3rd Party

Connectors

Transfer

Acceleration

Storage

GatewayKinesis Firehose

File

Amazon EFS

Block

Amazon EBS (persistent)

Object

Amazon GlacierAmazon S3 Amazon EC2

Instance Store (ephemeral)

How EFS fits in to the AWS storage platform

Data Transfer

Direct

Connect

Snowball 3rd Party

Connectors

Transfer

Acceleration

Storage

GatewayKinesis Firehose

File

Amazon EFS

Block

Amazon EBS (persistent)

Object

Amazon GlacierAmazon S3 Amazon EC2

Instance Store (ephemeral)

How EFS fits in to the AWS storage platform

Data Transfer

Direct

Connect

Snowball 3rd Party

Connectors

Transfer

Acceleration

Storage

GatewayKinesis Firehose

File

Amazon EFS

Block

Amazon EBS (persistent)

Object

Amazon GlacierAmazon S3 Amazon EC2

Instance Store (ephemeral)

How EFS fits in to the AWS storage platform

We focused on changing the game

Simple Elastic Scalable

1 2 3

Highly durable

Highly available

Amazon EFS is simple

• Fully managed

- No hardware, network, file layer

- Create a scalable file system in seconds!

• Seamless integration with existing tools and apps

- NFS v4.1—widespread, open

- Standard file system access semantics

- Works with standard OS file system APIs

• Simple pricing = simple forecasting

1

Amazon EFS is elastic

• File systems grow and shrink automatically as

you add and remove files

• No need to provision storage capacity or

performance

• You pay only for the storage space you use,

with no minimum fee

2

• File systems can grow to petabytes of capacity

• Throughput scales automatically as file systems grow

• Consistent low latencies regardless of file system size

• Support for thousands of concurrent NFS connections

Amazon EFS is scalable3

• Every file system object is redundantly stored across multiple Availability Zones in a Region

• Designed to sustain Availability Zone offline conditions

• Superior to traditional NAS availability models

• Appropriate for production/tier 0 applications

Highly durable and highly available (Multi-AZ)

In which Regions can I use EFS today?

• US West (Oregon)

• US East (N. Virginia)

• US East (Ohio)

• EU (Ireland)

More coming soon!

Do you need an EFS file system?

If you have an EC2 application or use case that requires a

file system AND

• Requires multi-attach OR

• Multi-AZ availability/durability OR

• GBs/s throughput OR

• Requires automatic scaling (grow/shrink) of storage

Operating your own multi-attach file storage on

the cloud is complex and expensive

Use an NFS

server or shared

file layer

Replicate EBS

volumes (1 per

EC2 instance)

Substantial management overhead (sync data, provision

and manage volumes)

Costly (one volume per instance)

Complex to set up and maintain

Scale challenges

HA challenges

Costly (compute + storage)

What customers are using EFS for today

Web serving Content management

Analytics

Media and Entertainment

workflows

Workflow management

Home directories

Container storage

Database backups

What to expect from this session

Recognize why and when to use Amazon EFS

Understand key technical/security concepts

Learn how to leverage EFS’s performance

See EFS in action (hands-on)

Review EFS’s economics

Discover some of our upcoming feature plans

What is a file system?

• The primary resource in EFS

• Where you store files and directories

• Can create 125 file systems per account

What is a mount target?

• To access your file system within

a VPC, you create mount targets

in the VPC

• A mount target is an NFS endpoint

that lives in your VPC

• A mount target has an IP address

and a DNS name you use in your

mount command

• A mount target is highly available

AVAILABILITY ZONE 1

REGION

AVAILABILITY ZONE 2

AVAILABILITY ZONE 3

VPC

EC2EC2

EC2

EC2

Mount

target

How to access a file system from an instance

• You “mount” a file system on an Amazon EC2 instance (standard

command) — the file system appears like a local set of directories

and files

• An NFS v4.1 client is standard on Linux distributions

mount –t nfs4 –o nfsvers=4.1

[file system DNS name]:/

/[user’s target directory]

How does it all fit together?

AVAILABILITY ZONE 1

REGION

AVAILABILITY ZONE 2

AVAILABILITY ZONE 3

VPC

EC2EC2

EC2

EC2

File system

Data can be accessed from any AZ in the Region while maintaining full consistency

Several security mechanisms

Control network traffic to and from file systems (mount targets) by

using VPC security groups and network ACLs

Control file and directory access by using POSIX permissions

Control administrative access (API access) to file systems by

using AWS Identity and Access Management (IAM)

EFS supports action-level and resource-level permissions

The AWS Management Console, CLI, and SDK each allow

you to perform a variety of management tasks

Create a file system

Create and manage mount targets

Tag a file system

Delete a file system

View details on file systems in your AWS account

What to expect from this session

Recognize why and when to use Amazon EFS

Understand key technical/security concepts

Learn how to leverage EFS’s performance

See EFS in action (hands-on)

Review EFS’s economics

Discover some of our upcoming feature plans

Amazon EFS is designed for wide spectrum of

performance needs

High throughput and parallel I/O

Low latency and serial I/O

Genomics

Big data analytics

Scale-out jobs

Home directories

Content management

Web servingMetadata-intensive

jobs

Choose the performance mode best suited to

your workload

Mode What’s it for? Advantages Tradeoffs When to use

General

purpose

(default)

Latency-sensitive

applications and

general-purpose

workloads

Lowest latencies

for file operations

Limit of 7,000 ops/sec Best choice for most

workloads

Max I/O Large-scale and data-

heavy applications

Virtually unlimited

ability to scale out

throughput/IOPS

Slightly higher

latencies

Consider if 10s (or

more) instances

access your file

system concurrently

Use the PercentIOLimit CloudWatch metric to determine

if you’re constrained by General Purpose mode

Amazon EFS has a distributed data storage design

EC2EC2

EC2EC2

EC2EC2

…• File systems distributed across

unconstrained number of servers

• Avoids bottlenecks/constraints of

traditional file servers

• Enables high levels of aggregate

IOPS/throughput

• Data also distributed across

Availability Zones (durability,

availability)

How to think about EFS perf relative to EBS

Amazon EFS Amazon EBS PIOPS

Performance

Per-operation

latencyLow, consistent Lowest, consistent

Throughput

scaleMultiple GBs per second Single GB per second

Characteristics

Data availability

/ durabilityStored redundantly across multiple AZs Stored redundantly in a single AZ

Access1 to 1000s of EC2 instances, from

multiple AZs, concurrentlySingle EC2 instance in a single AZ

Use cases

Big Data and analytics, media processing

workflows, content management, web

serving, home directories

Boot volumes, transactional and

NoSQL databases, data warehousing

& ETL

An implication of per-operation latency: I/O size

impacts throughput of serialized operations

4 KB 32 KB 256 KB 2 MB 16 MB

I/O size

Thro

ughput

How to take advantage of EFS’s distributed architecture:

Parallelize

Parallelize via multiple threads and/or multiple instances

0

5000

10000

15000

20000

25000

30000

0 20 40 60 80 100 120 140 160

IOP

S

# of Total Threads

Aggregate IOPS of parallel writes using10 m4.xlarge instances

Use CloudWatch for a number of views of file

system performance

DataReadIOBytes

DataWriteIOBytes

MetadataIOBytes

TotalIOBytes

Measure throughput (‘Sum’ of bytes divided by

seconds in time period) or ops/sec (‘Data

Samples’ divided by seconds in time period)

BurstCreditBalance Monitor your burst credit usage over time to

ensure sufficient throughput capacity

PermittedThroughput Compare to actual throughput to determine

whether you’re being constrained by the burst

model

ClientConnections View the number of clients connected to your

file system

PercentIOLimit Determine whether you’re being constrained by

General Purpose mode (PercentIOLimit at or

near 100%)

Recommended kernel version and NFS mount options

Kernel

version

Use Linux kernel 4.0+ (e.g., Amazon Linux 2016.03.0, Ubuntu

15.10 or 16.04)

Mount

options

Mount via NFSv4.1

Specify 1MB read/write buffers (“rsize”/”wsize”)

Ensure operations are asynchronous

Recommend the following mount options:-o nfsvers=4.1,

rsize=1048576,wsize=1048576,hard,

timeo=600,retrans=2,async

Key recommendations

• Test your application!

• Use General Purpose mode for lowest latency, Max-I/O for

scale-out

• Use Linux kernel version 4.0 or newer, mount via NFSv4.1

• To optimize, look for opportunities to:

• Aggregate I/O

• Perform async operations

• Parallelize (demo later)

• Cache (demo later)

• Don’t forget to check your burst credit earn/spend rate when

testing – ensure sufficient amount of storage

What to expect from this session

Recognize why and when to use Amazon EFS

Understand key technical/security concepts

Learn how to leverage EFS’s performance

See EFS in action: Copying data (hands-on)

Review EFS’s economics

Discover some of our upcoming feature plans

Goal: Move Data Quickly!!

Two Scenarios:

Transferring media assets to EFS

• Size ranges from a few GB to

100+GB per file

• Data sources:

• Amazon S3

• Amazon EBS

Transferring many small files to EFS

• Size ranges from 64K to 256K

• Data sources:

• Amazon S3

• Amazon EBS

Serial vs Parallel

Serial file transfer

Parallel file transfer

How do we do this?

GNU parallel

• Tool for executing jobs in parallel

• Similar to xargs

• Replace loops in shell scripts

• GNU parallel makes sure output

from the commands is the same

output as you would get if you had

run the commands sequentially

https://www.gnu.org/software/parallel/

For people who live life in the parallel lane

Use parallel threads – GNU parallel

# Create destination directory tree from source

find . -type d -print0 | parallel -j $N_THREADS -0 "mkdir -p ${DST_DIR}/{}" > /dev/null 2>&1

# Copy files

find . ! \( -type d \) -print0 | parallel -j $N_THREADS -0 "cp -f {} ${DST_DIR}/{}"

Optimizing Transfers

Monitoring performance

• Data-driven results

• Repeatable outcomes

• Optimize for costs

Benchmark different instance types

• Determine the optimal instance size

• What is best? T2, C3, C4, M3, M4,

R3, X?

• Transfer test set of 1000 small files

• Increase thread count from 1-1024

concurrent threads

Tools

• Command orchestration

• Instance configuration

• Log collection

• Visualization

• Instance performance

Test Results – Large Files

Large Files: Four Instances

Large Files: Four Instances

Adding Additional Instances

Large File: 50 Instances

Test Results – Small Files

Small File Performance - Instance Family Test

~200 threads

c3.large – 5,342 files per minute @ 200 threads

Increase Instance Count

• Using optimal instance size

• c3.large

• Using optimal thread counts

• ~200 per instance

• Increase instance count

• 300 instances

• Optimize for costs

• EC2 Spot Market

EC2 Spot

c3.large – 300 instances

Summary / tl;dr

Results

Small files – 300 instancesLarge files – 50 instances

Summary / tl;dr

• Parallelize everything

• Threads

• Instances

• Test, test, test

• Capture & analyze test data

• Less than $5/hr for 300 instances

What to expect from this session

Recognize why and when to use Amazon EFS

Understand key technical/security concepts

Learn how to leverage EFS’s performance

See EFS in action: WordPress (hands-on)

Review EFS’s economics

Discover some of our upcoming feature plans

Content management & web serving

Web-based applications for creating

and managing website content.

wikisblogs

discussion

boards

Free and open-source content management system hosted

on a web platform

Web software to create beautiful websites, blogs, or apps

“Free and priceless at the same time” – WordPress.org

CODE IS POETRY

27% of all websites (November 2016) – Web Technology Surveys

Easiest and most popular blogging system in use on the

Web – CMS Usage Statistics

Supporting more than 60 million websites – Forbes

CODE IS POETRY

Available as..

• Managed Web Hosting Service

• Software package from WordPress.org installed on self-

provisioned web platform

CODE IS POETRY

Components Resources

Structured data(Posts, pages, comments, categories, tags, etc.)

Amazon EFSUnstructured data(directories, php files, config, themes, plugins, etc.)

Amazon RDS

AWS Architecture

WORD

PRESSMOUNT

TARGET

WORD

PRESS

AZ-A

AZ-B

MOUNT

TARGET

EFS

FILE SYSTEM

CLOUDFRONT

ROUTE 53

OP

CACHE

OP

CACHE

AWS Architecture

Demo

What to expect from this session

Recognize why and when to use Amazon EFS

Understand key technical/security concepts

Learn how to leverage EFS’s performance

See EFS in action (hands-on)

Review EFS’s economics

Discover some of our upcoming feature plans

Simple and predictable pricing

• With Amazon EFS, you pay only for the storage space you use

No minimum commitments or up-front fees

No need to provision storage in advance

No other fees, charges, or billing dimensions

• EFS price: $0.30/GB-month (US Regions)

AVAILABILITY

ZONE 1

REGION

EC2

AVAILABILITY

ZONE 2

AVAILABILITY

ZONE 3

EC2

Compute nodes to

manage 3rd-party

file system layer

EBS

Replicated

storage volumes

EBS

Inter-AZ traffic for

replication

Typical multi-AZ file system setup without EFS

EC2

NFS client

accessing file

system

NFS

TCO example

Let’s say you need to store ~500 GB and require high availability and durability

Using a shared file layer on top of EBS, you might provision 600 GB (with ~85% utilization)

and fully replicate the data to a second Availability Zone for availability/durability

Example comparative cost:

Storage (2x 600 GB EBS gp2 volumes): $120 per month

Compute (2x m4.xlarge instances): $350 per month

Inter-AZ data transfer costs (est.): $129 per month

Total $599 per month

EFS cost is (500GB * $0.30/GB-month) = $150 per month, with no additional charges

What to expect from this session

Recognize why and when to use Amazon EFS

Understand key technical/security concepts

Learn how to leverage EFS’s performance

See EFS in action (hands-on)

Review EFS’s economics

Discover some of our upcoming feature plans

Exciting new features coming late this year

and in early 2017…

Coming soon: Encryption of data at rest

• Additional layer of protection – helps you meet

your organization’s regulatory/compliance

requirements

• Integrated with AWS KMS

• Encryption/decryption handled transparently

• No extra cost

Coming early 2017

Coming soon: Easier mounting

• Single DNS name associated with a file system

• DNS name automatically resolves to mount target in local

Availability Zone

• Simpler mount command

Coming early 2017

mount –t nfs4 –o nfsvers=4.1

[file system DNS name]:/

/[user’s target directory]

New DNS name will resolve to local mount target’s IP

address

mount –t nfs4 –o nfsvers=4.1

fs-096f99a0.efs.us-west-2.amazonaws.com:/

/efs

Today’s DNS name

[AZ].[fs-id].efs.[region].amazonaws.com

Future DNS name

[AZ].[fs-id].efs.[region].amazonaws.com

Coming early 2017

Four scenarios for working with file data across on-

premises environments and EFS

Bursting

Migration Move entire data set permanently to EFS

Access the data from applications running on EC2 instances

Move data set temporarily to EFS

Access the data from applications running on EC2 instances

Move data back on premises once processing finishes

Tiering

Store part of data set permanently on EFS, and keep part of data set

on premises

Access the entire data set from applications running on on-premises

servers

Backup and Disaster

Recovery

Maintain copy of entire data set on EFS

Restore the data to on premises storage or (for DR) access the data

from failed-over applications running on EC2 instances

Now announcing: Access your EFS file system via

AWS Direct Connect

Direct Connect EFS in your Amazon VPCOn-premises servers

Direct Connect support addresses three of the

scenarios

Bursting

Migration

Tiering

Backup / DR

Latency of AWS Direct Connect connection impacts

performance

• Added latency can be 10s of milliseconds (propagation delay over long

distances)

• If serializing I/O, latency of each operation directly impacts rate of data transfer

As with copying from within EC2, using a script

based on the GNU parallel tool reduces transfer time

0

100

200

300

400

500

600

700

800

900

0 2 4 6 8 10 12 14 16 18

Time

NumberofThreads

TotalTimetoCopy26200FilesvsNumberofThreads

AWS Direct Connect access available today in

three Regions

• US West (Oregon)

• US East (Ohio)

• EU (Ireland)

Coming soon to US East (N. Virginia)

Wrapping up

Related Sessions

STG207 – EFS Case Study: w/ Atlassian – Wed @ 11:30am and

Friday @ 12:30pm

STG206 – EFS Case Study w/ Spokeo – Friday @ 9:30am

STG208 – EFS Case Study: w/ Monsanto – Friday @ 11:00am

EFS Resources

AWS Storage Booth @ re:invent

10 Minute Demos @ AWS Booth• Wednesday @ 3:30pm

• Wednesday @ 4:50pm

• Thursday @ 3:30pm

Reference Architecture - https://aws.amazon.com/architecture/

qwikLABS - https://aws.qwiklabs.com/

Thank you!

Remember to complete

your evaluations!

top related