aws january 2016 webinar series - amazon aurora for enterprise database applications

Post on 23-Jan-2018

2.950 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Abdul Sathar Sait

Principal Product Manager

Amazon Aurora

for Enterprise

Database Applications

Enterprise database requirements ..

Database engine with enterprise class availability,

performance, scalability, and security.

Managed services – instant provisioning, push button

scaling, automated backups, patching, monitoring,

migration.

1

2

Goal: Provide fully managed enterprise class database service

without the cost and complexity of commercial database software.

Traditional relational databases

Gradual improvements on decades old design

Accommodate different server and storage hardware

Too complex to tune to achieve optimal performance

Layers of software to mitigate potential points of failure

‘Cloudification’ by virtue of additional layers

High cost, complex and punitive licensing terms Multiple layers of functionality

in a monolithic stack.

SQL

Transactions

Caching

Logging

Relational database re:Imagined

We started with a blank sheet of paper and reimagined

relational database for the cloud

Amazon Aurora is purpose built for the cloud

Designed from the ground up using AWS technology

Distributed component architecture with built-in redundancy

High-Availability and scale out are part of database core design

Self-healing components designed for resilience

Architected for security and performance

Security

Isolates your data within an Amazon VPC

Encryption at rest using Keys you create an

manage using KMS

Data, automated backups, snapshots, and

replicas in the same cluster all automatically

encrypted.

Seamless encryption and decryption, requiring

no changes to your application.

Automatic encryption in transit

Encryption at rest and transit

Enterprise-class

performance

Write performance (console screenshot)

MySQL Sysbench

R3.8XL with 32 cores

and 244 GB RAM

4 client machines with

1,000 threads each

MySQL Sysbench

R3.8XL with 32 cores

and 244 GB RAM

Single client with

1,000 threads

Read performance (console screenshot)

Writes scale with table count

TablesAmazon

Aurora

MySQL

I2.8XL

local SSD

MySQL

I2.8XL

RAM disk

RDS MySQL

30K IOPS

(single AZ)

10 60,000 18,000 22,000 25,000

100 66,000 19,000 24,000 23,000

1,000 64,000 7,000 18,000 8,000

10,000 54,000 4,000 8,000 5,000

Write-only workload

1,000 connections

Query cache (default on for Amazon Aurora, off for MySQL)

Write scales with number of connections

ConnectionsAmazon

Aurora

RDS MySQL

30K IOPS

(single AZ)

50 40,000 10,000

500 71,000 21,000

5,000 110,000 13,000

OLTP Workload

Variable connection count

250 tables

Query cache (default on for Amazon Aurora, off for MySQL)

Less IOs to backend

Effective query caching

Replica managementDo less work

Do it efficiently

Latch free lock management

Adaptive thread pools

Asynchronous commits

Consistent, low-latency writes

AZ 1 AZ 2

Primary

Instance

Standby

Instance

Amazon Elastic

Block Store (EBS)

Amazon S3

EBSmirror

EBS

EBSmirror

AZ 1 AZ 3

Primary

Instance

Amazon S3

AZ 2

Replica

Instance

Improvements

Consistency—tolerance to outliers

Latency—synchronous vs. asynchronous replication

Efficiency—significantly more efficient use of network I/O

Log records

Binlog

Data

Double-write buffer

FRM files, metadata

Type of writes

MySQL with standby Amazon Aurora

async

4/6 quorum

PiTR

Sequential

write

Sequential

write Distributed

writes

Limitation of MySQL lock manager

TIME

The lifetime of a lockin legacy systems

Insert a lock into the lock table

Single global lock allows only one active thread in lock manager

Latch-free lock manager

TIME

Lock lifetime in a new lock manager

A lock exists, but non-blocking

Lock lifetime in legacy systems

Latch freeAtomic lock insert

• Read-After-Write (RAW) with memory barriers for fast synchronization

• Staged allocation and de-allocation of locks for a lock hash table

Identical semantics as MySQL lock Concurrent latch-free operation

Uses specialized resource manager Implements lock compression

Asynchronous group commits

Read

Write

Commit

Read

Read

Read

Write

Commit

Read

Read

Read

Write

Commit

Read

Read

T1 T1 Tn

Commit(T1)

Commit(T2)

Commit(T3)

LSN 10

LSN 12

LSN 22

LSN 50

LSN 30

LSN 34

LSN 41

LSN 47

LSN 20

LSN 49

Commit(T4)

Commit(T5)

Commit(T6)

Commit(T7)

Commit(T8)

LSN growthDurable LSN at head-node

Commit queuePending commits in LSN order

Transactions

Time

Group

Commit

• Pending commits are queued in LSN order for asynchronous execution

• Commit threads scan the queue and executes multiple commits at a time

• Eliminates wait time for writes to be durable at the storage nodes

• Group execution of multiple commits at a time improves efficiency

Designed for

high-availability

Aurora storage

Highly available by default• 6-way replication across 3 AZs

• 4 of 6 write quorum

• Automatic fallback to 3 of 4 if an

Availability Zone (AZ) is unavailable

• 3 of 6 read quorum

SSD, scale-out, multi-tenant storage• Seamless storage scalability

• Up to 64 TB database size

• Only pay for what you use

Log-structured storage• Many small segments, each with their own redo logs

• Log pages used to generate data pages

• Eliminates chatter between database and storage

SQL

Transactions

AZ 1 AZ 2 AZ 3

Caching

Amazon S3

Lose two copies or an AZ failure without read or write availability impact

Lose three copies without read availability impact

Automatic detection, replication, and repair

SQL

Transaction

AZ 1 AZ 2 AZ 3

Caching

SQL

Transaction

AZ 1 AZ 2 AZ 3

Caching

Read and write availability Read availability

Self-healing, fault-tolerant

Continuous backupSegment snapshot Log records

Recovery Point

Segment 1

Segment 2

Segment 3

Time

Take periodic snapshot of each segments in parallel. Stream the redo logs to S3.

Backup happens continuously without performance or availability impact

At restore retrieve the appropriate segment snapshots and log streams to storage nodes.

Apply log streams to segment snapshots in parallel and asynchronously.

Traditional databases

Have to replay logs since the last

checkpoint

Single-threaded in MySQL; requires a

large number of disk accesses

Amazon Aurora

Underlying storage replays redo

records on demand as part of a disk

read

Parallel, distributed, asynchronous

Checkpointed Data Redo Log

Crash at T0 requires

a re-application of the

SQL in the redo log since

last checkpoint

T0 T0

Crash at T0 will result in redo

logs being applied to each segment

on demand, in parallel, asynchronously

Instant crash recovery

Faster, more predictable failover

AppRunningFailure Detection DNS Propagation

Recovery Recovery

DBFailure

MYSQL

App

Running

Failure Detection DNS Propagation

Recovery

DB

Failure

AURORA WITH MARIADB DRIVER

1 5 - 2 0 s e c

3 - 2 0 s e c

To cause the failure of a component at the database node:

ALTER SYSTEM CRASH [{INSTANCE | DISPATCHER | NODE}]

To simulate the failure of disks:

ALTER SYSTEM SIMULATE percent_failure DISK failure_type IN

[DISK index | NODE index] FOR INTERVAL interval

To simulate the failure of networking:

ALTER SYSTEM SIMULATE percent_failure NETWORK failure_type

[TO {ALL | read_replica | availability_zone}] FOR INTERVAL interval

To simulate the failure of an Aurora Replica:

ALTER SYSTEM SIMULATE percentage_of_failure PERCENT

READ REPLICA FAILURE [TO ALL | TO "replica name"] FOR INTERVAL interval

Simulate failures using SQL

Delivered as a

managed services

Databases are hard to manage

RDS platform: managing databases made easy

Schema design

Query construction

Query optimization

Backup & recovery

Isolation & security

Industry compliance

Push-button scaling

Automated patching

Advanced monitoring

Routine maintenance

Amazon RDS takes care of your time-consuming database

management tasks, freeing you to focus on your applications and

business

You

RDS

Advanced monitoring

Single page dashboard for OS

and process diagnostics in AWS

console

Customize dashboard with

choice of metrics and layout

Add alarms on specific metrics

Metrics egress via Cloud Watch

Logs into 3rd party monitoring

tools like Graphite etc.

Support for metrics crossover

into CloudWatch

Metrics such as load average, detailed CPU utilization, detailed disk IO, and per

process provides at a fixed range of granularities ranging from 60 to 1 seconds.

Applications becoming more complex

CLOUD

Amazon EC2

Amazon

RDS

BIG DATA

Hadoop

Cassandra

Amazon EC2

Middleware

On-Prem DBOn-prem

DB

.NET

WEB 2.0

Browser Logic

AJAX

Web Frameworks

Amazon RDSAmazon EC2

Amazon EC2

Middleware Middleware

Amazon EC2

Amazon EC2

Amazon RDS

Amazon

Elasticache

Monitoring across the stack is key to minimizing downtime Access to information from every potential point of failure

Alarm and notification system for pre-emptive action

Rich visualization of aggregated data at users’ convenience

Integrations with tools and dashboards

AWS Database

Migration service

Move data to the same or different database engine

Keep your apps running during the migration

Start your first migration in 10 minutes or less

Replicate within, to or from AWS EC2 or RDS

AWS Database

Migration Service

http://aws.amazon.com/dms

Customer

Premises

Application Users

AWS

Internet

VPN

Start a replication instance

Connect to source and target databases

Select tables, schemas or databases

Let the AWS Database Migration

Service create tables, load data and

keep them in sync

Switch applications over to the target

at your convenience

Keep your apps running during the migration

Migrate off Oracle and SQL Server

Move your tables, views, stored procedures and DML to MySQL, MariaDB & Amazon Aurora

Highlight where manual edits are neededAWS Schema

Conversion Tool

http://aws.amazon.com/sct

AWS Database Migration service

is in Open Preview now

Try it out yourself

Perfect fit for enterprise

6-way replication across 3

Fail-over in less than 30 secs

Near instant crash recovery

Up to 500K/sec read and 100K/sec write

15 low latency (10ms) Read Replicas

Up to 64 TB DB optimized storage volume

Instant provisioning and deployment

Automated patching and software upgrade

Backup and point-in-time recovery

Compute and storage scaling

Performance and scale

Enterprise class availability

Fully managed service

Many features are unique to Amazon Aurora

Comparing to traditional commercial databases like Oracle• Available only in most expensive database edition (Enterprise Edition)

• Failover and Replica – Oracle Active Data guard – Extra $$$ per core

• Backup to S3 - Oracle Secure Backup Cloud Module – Extra $$$ per channel

• Encryption – Oracle Advanced Security - Extra $$$ per core

Comparing features ..

Don’t be constrained by

Licenses, Cost or Capacity

Simple pricing

No licenses

No lock-in

Pay only for what you use

Discounts

44% with a 1-year RI

63% with a 3-year RI

vCPU Mem Hourly Price

db.r3.large 2 15.25 $0.29

db.r3.xlarge 4 30.5 $0.58

db.r3.2xlarge 8 61 $1.16

db.r3.4xlarge 16 122 $2.32

db.r3.8xlarge 32 244 $4.64

• Storage consumed, up to 64 TB, is $0.10/GB-month

• IOs consumed are billed at $0.20 per million I/O

• Prices are for Virginia

Enterprise grade, open source pricing

Cost of Ownership: Aurora vs. Commercial databasesOracle on EC2 Configuration Hourly Cost

Primary

r3.8XL

Standby

r3.8XL

Replica

r3.8XLReplica

R3.8XL

Storage6TB / 30K PIOP

Storage6TB / 30K PIOP

Storage6TB / 30K PIOP

Storage6TB / 30K PIOP

$2.93/hr

$2.93/hr

$2.93/hr $2.93/hr

$3.75/hr

$3.75/hr $3.75/hr

Instance cost: $11.72/ hr

License cost: $63.12/hr

Storage cost: $15.00 / hr

Total cost: $89.84 / hr

$3.75/hr

Enterprise

License

$15.78/hr

Enterprise

License

$15.78/hr

Enterprise

License

$15.78/hr

Enterprise

License

$15.78/hr

Cost of Ownership: Aurora vs. Oracle Aurora Configuration Hourly Cost

Instance cost: $13.92/ hr

Storage cost: $5.15 / hr

Total cost: $19.07 / hr

Primary

r3.8XL

Replica

r3.8XL

Replica

R3.8XL

Storage / 6TB

$4.64/ hr $4.64/ hr $4.64/ hr

$5.15 / hr

Storage IOPs assumptions:

1. Average IOPs is 50% of Max IOPs

2. 50% savings from shipping logs vs. full pages

78.7%

Savings

No idle standby instance

Single shared storage volume

No POIPs – pay for use IO

Reduction in overall IOP

Enterprise

use cases

Fastest growing service in AWS history

1000+ customer after 10 days of launch

Web and mobile

Content management

E-commerce, retail

Internet of Things

Search, advertising

BI and analytics

Games, media

Common Customer Use Cases

Expedia: On-line travel marketplace

Real-time business intelligence and analytics on a

growing corpus of on-line travel market place

data.

Current SQL server based architecture is too

expensive. Performance degrades as data

volume grows.

Cassandra with Solr index requires large memory

footprint and hundreds of nodes, adding cost.

Aurora benefits:

Aurora meets scale and performance

requirements with much lower cost.

25,000 inserts/sec with peak up to 70,000. 30ms

average response time for write and 17ms for

read, with 1 month of data.

World’s leading online travel

company, with a portfolio that

includes 150+ travel sites in 70

countries.

Alfresco: Enterprise Content Management

Needed database to scale without any

degradation in performance.

Benefits:

Alfresco on Amazon Aurora, scaled to 1

billion documents with a throughput of 3

million per hour, which is 10 times faster

than their MySQL environment.

Provides Enterprise Content

Management software built on

open standards.

Alfresco One Architecture

4

7

Alfresco Share

Alfresco Repository

Alfresco SOLR

Activiti Workflow

Engine

Database

FS Content

Store

Indexes

S3

RDS

EBS or Ephemeral

PIOPS EBS

(or Glacier)

EC2

Benchmark Environment – 1.2B Docs

UI Test x 20 m3.2xlarge Simulate 500 Users• Selenium / Firefox

• 1 hour constant load

• 10 sec think time

UI Test UI Test

Alfresco Alfresco Alfresco x 10 c3.2xlarge Alfresco with Share

and Repo

Solr x 20 m3.2xlarge Solr Solr

Aurora x 1 db.r3.xlarge

ELB

Sharded Solr Cloud

sites folders files transactions dbSize GB

10,804 1,168,206 1,168,206,000 15,475,064 3,185

Simulate AWS

Import/Export

(in place)

Benchmark Results

• Document load rate 1000 documents per second (with 10 nodes)

• Load rate was consistent even passing the 1B document

• Sub-second login times and good responses for other actions

• Open Library: 4.5s

• Page Results: 1s

• Navigate to Site: 2.3

• Aurora indexes used efficiently at 3.2TB

• No indications of any size-related bottlenecks with 1.1 Billion Documents

• CPU loads:

• Database: 8-10%

• Alfresco (each of 10 nodes): 25-30%

Insurance claims processing

• ICSC Provides fully integrated policy management, claim and billing solutions for property/casualty insurance organizations

• For the last 12 years ISCS has used SQL Server & Oracle commercial databases for operational & warehouse data

• The cost and maintenance of traditional commercial database has increasingly become the biggest expenditure and maintenance headache

• Maintaining its customer SLAs requires complex, difficult-to-manage replication and redundancy across multiple geographic locations

• As customer data grows, backup/restore times for its largest data sets have progressed to unacceptable levels.

Aurora benefits

SQL Server backups that once took 5-6 hours daily now happen continuously on Aurora. Snapshots from one customer database (~ 5TB in size) take 5 minutes to make and less than an hour to restore. ISCS can actually test disaster recovery daily if it wanted to.

Data that was once only available “daily, batch” into Redshift can now be migrated continuously using Aurora read-replicas and Change Data Capture (CDC).

Performance at scale is linear since ISCS’s application, like Aurora, is optimized for multiple, concurrent read requests to the database.

Multi-AZ Aurora read-replicas also eliminate the need for additional licenses/deployments of SQL Server.

The cost of a “more capable” deployment on Aurora has proven to be about 70% less than ISCS’s SQL Server deployments.

Amazon Aurora: Earth Networks

Earth Networks process over 25 terabytes

of real-time data daily, so need a scalable

database that can rapidly grow with

expanding data analysis

Benefits:

Aurora performance and scalability works

well with their rapid data growth. Moving

from SQL Server to Aurora was very easy

Operates world's largest

weather and lightning sensor

networks and technology

Thank you!

top related