[rightscale webinar] architecting databases in the cloud: how rightscale does it

33
ARCHITECTING DATABASES FOR SCALABILITY & A VAILABILITY IN THE CLOUD: HOW RIGHTSCALE DOES IT

Upload: rightscale

Post on 14-Jun-2015

231 views

Category:

Technology


1 download

DESCRIPTION

Your database is the foundation of your application. With cloud comes new advantages and considerations for architecting and deployment. Find out how RightScale uses SQL and NoSQL databases such as MySQL, MongoDB, and Cassandra to provide a scalable, distributed, and highly available service around the globe.

TRANSCRIPT

Page 1: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

ARCHITECTING DATABASES FOR SCALABILITY &

AVAILABILITY IN THE CLOUD:

HOW RIGHTSCALE DOES IT

Page 2: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

• Josep Blanquer, Chief Architect, RightScale

• Raphael Simon, Senior Systems Architect, RightScale

• Ali Khajeh-Hosseini, Director of Development, RightScale

Q&A

• Ben Ingalls, Sales Development Representative, RightScale

Please use the “Questions” window to ask questions at any

time

Your Panel Today

2

Page 3: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

• Main Technologies Used

• Data Storage and Design for:

• Cloud Management

• Self Service

• Cloud Analytics

• Conclusions

• Q&A

Agenda

3

Page 4: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

• RightScale uses a mix of RDBMS and NoSQL technologies:

• MySQL , Cassandra, MongoDB, Redshift and S3

• The choice for each of them is commonly due to features such as:

• Transactionality

• Availability

• Sharding

• Queryiability

• Raw performance

• Etc…

Intro: Tools and Technologies

Page 5: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

• Strong ACID properties

• Availability through async replication (for “HA” and DR)

• Read scalability through multiple slaves

• Powerful SQL “queryiability”

• Examples of data from our Cloud Management product:

• Users, Plans, Settings

• Published marketplace assets

• Local assets like:

• ServerTemplates, Scripts

• Deployments and server configurations

• Alert definitions

Strong Points: MySQL

Page 6: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

• High-availability properties

• Distributed, master-less

• Easy to horizontally scale (automatic data sharding and rebalancing)

• Tunable replication (including multi-DC)

• Tunable consistency

• TTL (Time To Live) in data elements

• Examples from our Cloud Management product:

• Events

• Audits

• Across-cloud message routing

• Session data

• Tags

Strong Points: Cassandra

Page 7: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

• Mostly offline data retrieval

• Large scale and availability

• Large amounts of data

• When no querying is necessary

• Examples from our Cloud Management product

• Archived audits (encrypted)

• Scraped git repositories

• Archived monitoring data

Strong Points: S3

Page 8: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

• Document oriented storage

• Built-in replication support

• Built-in sharding support

• Test and set query

• Examples from our Self Service product

• Cloud Application Templates (CATs)

• Catalog Applications

• Running Applications

Strong Points: MongoDB

Page 9: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

• Simple to get started and manage

• Scales to handle up to a petabyte of data

• Powerful SQL “queryiability”: we can explore the data easily

• Examples from our Cloud Analytics product

• Storing years of usage, cost and pricing data, e.g.:

• Instance-id-1 with x, y, z params, launched on T1 and terminated at T2

• Price of instance-type-X with x, y, z params at T1 was $0.01

Strong Points: Redshift

Page 10: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

• Let’s take a peek at:

• How the data storage architecture is designed

• How some of these these technologies are deployed

• With examples in each of our three main products:

• Cloud Management

• Self Service

• Cloud Analytics

Storage Architecture and Deployment

10

Page 11: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

Streamline Operations

Streamline operations

RightScale Cloud Management

• Unify management of

compute, storage, and

network

• Design portable, multi-

cloud service

configurations

• Orchestrate large globally

distributed systems

• Control access across

clouds, data centers, and

tenants

11

Page 12: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

For a single account Global, to all accounts

Data Accessibility and Scope

Use

rs

Inst

ance

s

Data

required b

y

Page 13: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

Use

rs

Inst

ance

s Account X-Account

Page 14: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

Use

rs

Inst

ance

s Account X-Account

global

Custom replication

Why custom? More control • Multiple sources • Individual columns • Apply transformations • Smart re-sync features

Global: MySQL • ACID semantics • Master-Slave replication

Page 15: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

Use

rs

Inst

ance

s Account X-Account

global dash

S3

events

tags

audit

Dashboard: MySQL • ACID semantics • Master-SlaveN replication • Slave reads • Rows tagged by account

Other systems: Cassandra • Simpler Key-Value access • Great scalability • Great replica control • High write availability • Time-to-live expiration as cache • Rows tagged by account

Data archive: S3 • Low read rate • Globally accessible

Page 16: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

Use

rs

Inst

ance

s Account X-Account

global dash

S3

events

tags

audit dash

events

tags

audit

So we can horizontally scale our dashboard by partitioning objects based on account groups:

Clusters

Page 17: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

Use

rs

Account

Clu

ster

1

dash

S3

events

tags

audit

Clu

ster

N

dash

S3

events

tags

audit

Account Set 1 Account Set 2

RightScale Accounts

Clu

ster

3

dash

S3

events

tags

audit …

Features: • 1 cluster: N accounts

• 1 account: 1 home

• Migratable accounts

Benefits: • Great horizontal growth

• Better failure isolation

• Independent scale

• Load rebalancing

• Versionable code

• Differentiated service

Page 18: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

Use

rs

Inst

ance

s Account X-Account

dash

events

tags

audit global dash

S3

events

tags

audit

routing

polling

monitor

Page 19: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

Use

rs

Inst

ance

s Account X-Account

dash

events

tags

audit global dash

S3

events

tags

audit

routing

polling

monitor

routing

polling

monitor

And partition our cloud objects based on the cloud the instances of an account run on:

Islands

Page 20: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

Inst

ance

s Account

Cloud 1 Cloud 2 Cloud N

Services co-located

with resources Services co-located

with resources

Services co-located

with resources

routing

polling

monitor

Isla

nd

1

Isla

nd

2

Isla

nd

N

routing

polling

monitor

routing

polling

monitor

routing

polling

monitor

routing

polling

monitor

routing

polling

monitor

Isla

nd

1

Isla

nd

2

Isla

nd

N

Polling Clouds: MySQL • Master-Slave replication • Can port to NoSQL easily • Mostly a resource cache • But cloud partitionable

Monitoring: Custom • Replicated files • Backup to S3 • Archive to S3

Routing: Cassandra • Simpler Key-Value access • Very high availability • Great scalability • Great replica control • Plus cross DC replication*

Page 21: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

Use

rs

Inst

ance

s Account

Clu

ster

1

dash

S3

events

tags

audit

Clu

ster

N

dash

S3

events

tags

audit

Clu

ster

3

dash

S3

events

tags

audit …

routing

polling

monitor

routing

polling

monitor

routing

polling

monitor

Isla

nd

1

Isla

nd

2

Isla

nd

N

Different Geographies

Different Clouds

What if the cloud where the cluster is deployed on…

Fails?

Page 22: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

22

Use

rs

Inst

ance

s Account

Clu

ster

1

dash

S3

events

tags

audit

Clu

ster

N

dash

S3

events

tags

audit

Clu

ster

3

dash

S3

events

tags

audit …

routing

polling

monitor

routing

polling

monitor

routing

polling

monitor

Isla

nd

1

Isla

nd

2

Isla

nd

N

Sister Clusters

Full replica

Features: • Each master has an extra remote slave

• Each cluster in a pair is a DC replica of the other’s

localring

At Disaster Recovery time: • Apps are told to start serving an extra shard

• No need to provision more infrastructure to recover

(try to avoid since everybody is on the same boat)

• New resources can be allocated over time to help

offload existing ones

Page 23: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

Increase innovation

• Reduce development

cycles and increase agility

• Eliminate manual work with

automation and

orchestration

• Drive down spend with

built-in cost controls

• Reduce risks with policy-

based governance

RightScale Self-Service

23

Page 24: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

• Self-Service deals with documents (CATs)

• AngularJS application built on top of REST API

• JSON compatibility

• High availability and good scalability with “test and set” building block

query

• No built-in join but not needed

• Use case allows for heavy use of denormalization

• praxis-mapper for efficient client side joins

Why MongoDB?

24

Page 25: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

• 3 nodes MongoDB replica

set per shard

• Each replica in its own AZ

• Security groups for access

control

• Write concern of 2

• Apps read from master

(need consistency)

• BI, internal tools read from

slaves

Self-Service HA (today)

25

Page 26: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

• Hidden replica in different

region (application does

not send requests to

hidden replicas)

• Deployments in VPC

• VPN between regions

Self-Service DR (EOY)

26

Page 27: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

Optimize Cloud Spend

• Optimize cloud

spend

RightScale Cloud Analytics

• Visualize all your cloud

costs

• Forecast, budget, and

optimize cloud costs

• Optimize your spend and

reduce waste

• Implement chargeback and

showback with automated

reports

27

Page 28: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

Cloud Analytics and Redshift

28

Data sources Data sources

Data sources

Data fetching jobs

CSV files on S3

Redshift cluster 1

Redshift cluster 2

Redshift cluster N

Servers that read and process data

Data load jobs Write to all clusters

Randomly pick one

cluster and read from it Servers that read and process data

Servers that read and process data

Page 29: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

• Each Redshift clusters is deployed in one availability zone, what if that AZ has

issues, or if the cluster goes offline?

• Our architecture makes it easy to have replicas as there is a single “data stream”

of changes, which can be written to all clusters

• Sacrificed consistency across clusters for increased availability and scalability

• If one AZ has issues:

• Writes to clusters get delayed until the AZ is online or we take the affected

cluster offline

• Reads from clusters continue to work as servers can connect to another

cluster

• We run a “create replica” rake task that stops all the writes, takes a snapshot

from a working cluster, and creates a new cluster on a different AZ

Redshift HA

29

Page 30: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

• Redshift supports a “copy snapshot to different region” functionality

• A new cluster can be created from a snapshot

• Cluster configs are not stored in the snapshot and need to be configured

• EC2 instances connect to Redshift using security groups, but the instances

and the cluster must be in the same region for the security groups to work

• We use Cloud Management’s monitoring system to monitor health and other

metrics of clusters, and alert on them

Redshift DR

30

Page 31: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

“Shown how RightScale uses several database

technologies”

• For well-known relational data: MySQL (with high replication)

• For archiving and blob storage we use S3

• For very High-Availability and geo-replication we use Cassandra

• For TTL support and fast writes we also use Cassandra

• For JSON documents we use Mongo (with sharding and replica-sets)

• For large data analytics we use AWS Redshift

Conclusions

31

Page 33: [RightScale Webinar] Architecting Databases in the cloud:  How RightScale Does It

THANK YOU.

33