aws re:invent 2016: strategic planning for long-term data archiving with amazon glacier (stg209)

48
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Henry Zhang, Senior Product Manager, AWS Rich Sutton, VP of Engineering, Digital Risk, Proofpoint November 30, 2016 STG209 Strategic Planning for Long-Term Data Archiving with Amazon Glacier

Upload: amazon-web-services

Post on 16-Apr-2017

523 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Henry Zhang, Senior Product Manager, AWS

Rich Sutton, VP of Engineering, Digital Risk, Proofpoint

November 30, 2016

STG209

Strategic Planning for Long-Term

Data Archiving with Amazon Glacier

Page 2: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

AWS storage maturity

Amazon EFS

File

Amazon Elastic

Block Store

Amazon EC2

Instance Store

Block

Amazon S3 Amazon Glacier

Object

Data Transfer

AWS Direct

Connect

AWS

Snowball

ISV

Connectors

Amazon

Kinesis

Firehose

Amazon S3

Transfer

Acceleration

AWS Storage

Gateway

Page 3: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

• Media distribution backbone (Ve.nue platform)

• Over-The-Top (OTT) broadcast service

• 20PBs of media assets, 800,000 hours of high-res content

• Assets to be archived and retained for decades

Video archives

Page 4: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Patient data–Philips Healthcare

• HealthSuite digital platform powered by AWS

• 15 petabytes of patient data

• Archived for decades (beyond the lifetime of patients)

• Uses AWS HIPAA-eligible services in the BAA

Page 5: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Public sector–King County

• Most populous county in Washington state

• Replaced tape solution for backup from 17 agencies

• Meets compliance requirement

• Saved $1MM in first year; no more tape refresh or

management churn

Page 6: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Archive:

Data retained for the long term,

for compliance or potential

future reference

Data archiving needs are growing everywhere

• Media assets, 4K, 8K

• Health care/life sciences

• Financial services

• Regulated industries

• Oil and gas/geospatial

• Digital preservation

• Long-term backups

• Logs

Page 7: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Consideration 1 – Total Archive Cost

Page 8: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Traditional archiving approaches

• Tape libraries, robots, drives, media

• Onsite (online and offline)

• Offsite tape out/vaulting

• Specialized software and personnel

• Tape refresh every 3-5 years

Page 9: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

How can AWS help with your archival?

Metered usage:

Pay as you go

No capital investment

No commitment

No risky capacity planning

Avoid risks of physical

media handling

Control your

geographic locality for

performance and

compliance

Page 10: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

1 PB raw storage

800 TB usable storage

600 TB allocated storage

400 TB application data

Storage pricing - pay only for what you use

AWS Cloud

Storage

Amazon Glacier starts at $0.004/GB/month

Price drop by 43% on 11/21

Page 11: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Consideration 2 – Durability

Page 12: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

99.999999999%Durability

Durability for long-term preservation

Built-in Fixity Checking

Automatic recovery

Page 13: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Consideration 3 – Accessibility

Page 14: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Amazon Glacier – Data Retrieval Tiers

Standard Retrieval

• Current model

• 3-5 hours

• Disaster Recovery

Bulk Retrieval

• Batch/Bulk access

• 5-12 hours

• PB scale re-transcoding

or video/image analysis

Expedited Retrieval

• Emergency access

• 1-5 minutes

• Last minute play-out

schedule swap

$0.03/GB $0.01/GB $0.0025/GB

On-site tape replacement Off-site tape replacement

Page 15: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Consideration 4 - Application & Data Management

Page 16: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Amazon Glacier – 3 ways to Access

•Direct Glacier API/SDK

•S3 lifecycle integration

•Third party tools and gateways

Page 17: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Amazon Glacier – Direct access/APIs

Create Vault

Configure Access

Upload Archives

Register Archive ID

Data Upload

Initiate Retrieval

AsyncRetrieval

Completion

Completion Notification

Download Data

Data Retrieval

Page 18: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Use Glacier via S3 Object Lifecycle

S3 Standard

Active data Archive dataInfrequently accessed data

S3 - Infrequent Access Amazon Glacier

Synchronous access Async accessSynchronous access

$0.023/GB/mo. $0.004/GB/mo.$0.0125/GB/mo.

Page 19: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

- Transition Standard to Standard-IA

- Transition Standard-IA to Amazon Glacier

- Transition based on object tags

- Expiration and versioning

Data lifecycle management

T T+3 days T+5 days T+ 15 days T + 25 days T + 30 days T + 60 days T + 90 days T + 150 days T + 250 days T + 365 days

Data access frequency over time

Page 20: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Transition older videos to Standard-IA

Page 21: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Save money on storage

45% saving over S3 Standard

44% saving over S3 Standard-IA

* Assumes the highest public pricing tier

Page 22: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Amazon Glacier – Third-party tools and gateways

• Consumer grade: less than $50

• Example: Cloudberry, FastGlacier, Arq (Haystack Software)

• Small / medium business: $500 - $1,000

• Example: Synology, Veeam, QNap

• Enterprise gateway and data management software

• Example: NetApp AltaVault, CommVault, StorNext, Vidispine

Page 23: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Which option should I choose?

• Use S3 lifecycle managed Amazon Glacier if the S3

object keys are sufficient for index/search capability

• Use Amazon Glacier directly if you already plan to store

more metadata/indices in a database

• Use 3rd party tools to minimize coding

• Does the tool write data in proprietary or native format in AWS?

Page 24: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

corporate data center

Media Archive and Metadata (cloud transition)

Onsite Archive Offsite Tape Archive

Hierarchical Storage Manager

Metadata (Asset Manager)

Processing Tasks

On-Premise Tape

Page 25: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Onsite Archive

Hierarchical Storage Manager

Metadata (Asset Manager)

Processing Tasks

corporate data center

AWS Region

Amazon Glacier

Cloud DAM (Syncing Metadata from on-prem)

Amazon Direct Connect

Offsite Tape ArchiveOn-Premise Tape

Media Archive (transition to the cloud)

Page 26: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Onsite Archive

Hierarchical Storage Manager

Metadata (Asset Manager)

Processing Tasks

corporate data center

AWS Region

Amazon Glacier

Cloud DAM (Syncing

Metadata from on-

prem)

Amazon S3

Cloud Based Processing

Tasks

Amazon Direct Connect

On-Premise Tape Offsite Tape Archive

Media Archive (transition to the cloud)

Page 27: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Onsite Archive

Hierarchical Storage Manager

Metadata (Asset Manager)

Processing Tasks

corporate data center

AWS Region

Amazon Glacier

Cloud DAM (Syncing

Metadata from on-

prem)

Amazon S3

Cloud Based Processing

Tasks

Amazon Direct Connect

Onsite Cache Offsite Tape ArchiveOn-Premise Tape

Media Archive (transition to the cloud)

Page 28: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Consideration 5 - Compliance and Retention

Page 29: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Amazon Glacier Vault Lock allows you to easily

set compliance controls on individual vaults and

enforce them via a lockable policy

Time-based retention

MFA authentication

Controls govern all

records in a vault

Immutable policy

Two-step locking

Compliance storage with Vault Lock

Page 30: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Vault Lock for compliance storage

• Non-overwrite, non-erasable records

• Time-based retention with “ArchiveAgeInDays” control

• Policy lockdown (strong governance)

• Legal hold with vault-level tags

• Configure optional designated third-party access and grant

temporary access

Page 31: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Amazon Glacier received a third-party assessment

from Cohasset Associates on how Amazon Glacier

with Vault Lock can be used to meet the requirements

of SEC Rule 17a-4(f) and CFTC 1.31(b)-(c).

Page 32: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Rich Sutton, VP of Engineering

Digital Risk, Social Media Security, and Compliance

Proofpoint SocialPatrol Archive

AWS Glacier and Vault Lock

Use Case

Page 33: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Proofpoint

• Cloud-based security and compliance for the enterprise:

threat research, email, mobile, social, digital risk

• Founded 2002, public in 2012

• $350M annual revenue, $3B market cap

• Huge AWS user

Page 34: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Proofpoint SocialPatrol

Policy controls and enforcement for social

• Combats fraudulent brand impersonation

• Moderates content at scale

• Ensures compliance in publishing

• Integrates with social APIs

• 150+ classifiers using NLP and ML

• Text, links, images, meta data

• Ingesting >1M social posts per day

• Built in AWS

Page 35: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Proofpoint SocialPatrol

How it works:

PFPT in AWS

Policy engine MySQL/C*/SolrEnterprise

Archive

“Awesome. Help me with retention by integrating with my existing email archive.”

Social

Page 36: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Proofpoint SocialPatrol archiving integration

Imperfect …

Social != Email Every archive is

different

Requires internal

collaboration

Page 37: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Proofpoint SocialPatrol Archive

SEC Rule 17a-4(f)-compliant archive, purpose-built for

social, enabled by Amazon Glacier and Vault Lock

PFPT in AWS

Policy engine MySQL/C*/SolrSocial

Amazon Glacier

& Vault Lock

Page 38: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Proofpoint SocialPatrol Archive

The customer specifies the retention period in Proofpoint

Social:

Page 39: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Proofpoint SocialPatrol Archive

Via AWS API we create a vault for that customer:

Page 40: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Proofpoint SocialPatrol Archive

Via AWS API,

we lock the vault,

and specify policy

to observe a

legal hold via a tag.

Page 41: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Proofpoint SocialPatrol Archive

As social content flows in, we record its purge date and

surface that to the user. Each piece of social content is an

archive in the vault.

Page 42: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Proofpoint SocialPatrol Archive

Search UI uses

the copy of the data

we already had.

As archives expire,

we purge them.

Page 43: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Proofpoint SocialPatrol Archive

• Legal hold can be put in place by Proofpoint Support

• Data can be exported from Amazon Glacier by

Proofpoint Support when necessary

• Amazon Glacier with Vault Lock allowed us to build a

product that complies with SEC Rule 17a-4(f) and CFTC

Rule 1.31(b)-(c)

What would it have cost for us to build a WORM data store,

get it certified, and scale it … ?

Page 44: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Snowball Edge

• Accelerate PBs with AWS-

provided appliances

• NEW 100 TB model with

compute

Storage Gateway

• Instant hybrid cloud

• Up to 120 MB/s cloud upload rate

(4x improvement)

Data ingestion into AWS storage services

Firehose

• Ingest data streams directly into

AWS data stores

Direct Connect

• COLO to AWS

ISV Connectors

• Commvault

• Veritas

• etcetera

NEW S3 Transfer Acceleration

• Accelerate object transfer up to

300% using AWS’s private

network

Page 45: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Related Sessions

STG302 - Deep Dive on Amazon Glacier

STG210 - Simplified Data Center Migration—Lessons

Learned by Live Nation

STG312 - Workshop: Working with AWS Snowball -

Accelerating Data Ingest into the Cloud

Page 46: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Related Sessions

STG302 - Deep Dive on Amazon Glacier

STG210 - Simplified Data Center Migration—Lessons

Learned by Live Nation

STG312 - Workshop: Working with AWS Snowball -

Accelerating Data Ingest into the Cloud

Page 47: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Remember to complete

your evaluations!

Page 48: AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)

Thank you!