managing data in the cloud. app server scaling in the cloud load balancer (proxy) app server mysql...

Managing Data in the Cloud

CS271 2

App Server

Scaling in the Cloud

Load Balancer (Proxy)

App Server

MySQL Master DB

MySQL Slave DB

Replication

Client Site

Database becomes the Scalability Bottleneck

Cannot leverage elasticity

App Server

Client Site Client Site

CS271 3

App Server

MySQL Master DB

MySQL Slave DB

Replication

Client Site

App Server

CS271 4

Key Value Stores

Apache+ App Server

Client Site

Apache+ App Server

CS271 5

CAP Theorem (Eric Brewer)

• “Towards Robust Distributed Systems” PODC 2000.

• “CAP Twelve Years Later: How the "Rules" Have Changed” IEEE Computer 2012

CS271 6

Key Value Stores

• Key-Valued data model– Key is the unique identifier– Key is the granularity for consistent access– Value can be structured or unstructured

• Gained widespread popularity– In house: Bigtable (Google), PNUTS (Yahoo!), Dynamo

(Amazon)– Open source: HBase, Hypertable, Cassandra, Voldemort

• Popular choice for the modern breed of web-applications

CS271 7

• Data model.– Sparse, persistent, multi-dimensional sorted map.

• Data is partitioned across multiple servers.• The map is indexed by a row key, column key, and

a timestamp.• Output value is un-interpreted array of bytes.

– (row: byte[ ], column: byte[ ], time: int64) byte[ ]

Big Table (Google)

CS271 8

Architecture Overview

• Shared-nothing architecture consisting of thousands of nodes (commodity PC).

Google File System

Google’s Bigtable Data Model

…….

CS271 9

• Every read or write of data under a single row is atomic.

• Objective: make read operations single-sited!

Atomicity Guarantees in Big Table

CS271 10

• Google File System (GFS)– Highly available distributed file system that stores log and data files

• Chubby– Highly available persistent distributed lock manager.

• Tablet servers– Handles read and writes to its tablet and splits tablets.– Each tablet is typically 100-200 MB in size.

• Master Server– Assigns tablets to tablet servers,– Detects the addition and deletion of tablet servers,– Balances tablet-server load,

Big Table’s Building Blocks

CS271 11

Overview of Bigtable Architecture

Tablet Serve

Google File System

Tablet Serve

Master Chubby

Control Operations

Lease Manageme

T1 T2 Tn Tablets

Master and Chubby Proxies

Log Manager

Cache Manager

CS271 12

GFS Architectural Design

• A GFS cluster– A single master– Multiple chunkservers per master

• Accessed by multiple clients

– Running on commodity Linux machines• A file

– Represented as fixed-sized chunks• Labeled with 64-bit unique global IDs• Stored at chunkservers• 3-way replication across chunkservers

CS271 13

GFS chunkserver

Linux file system

Architectural Design

GFS Master

GFS chunkserver

Linux file systemGFS chunkserver

Linux file system

Application

GFS client

chunk location?

chunk data?

CS271 14

Single-Master Design

• Simple• Master answers only chunk locations• A client typically asks for multiple chunk

locations in a single request• The master also predicatively provides chunk

locations immediately following those requested

CS271 15

Metadata• Master stores three major types

– File and chunk namespaces, persistent in operation log– File-to-chunk mappings, persistent in operation log– Locations of a chunk’s replicas, not persistent.

• All kept in memory: Fast!– Quick global scans

• For Garbage collections and Reorganizations

– 64 bytes of metadata only per 64 MB of data

CS271 16

Mutation Operation in GFS

• Mutation: any write or append operation

• The data needs to be written to all replicas

• Guarantee of the same order when multi user request the mutation operation.

CS271 17

GFS Revisited• “GFS: Evolution on Fast-Forward” an interview with GFS

designers in CACM 3/11.• Single master was critical for early deployment.• “the choice to establish 64MB …. was much larger than the

typical file-system block size, but only because the files generated by Google's crawling and indexing system were unusually large.”

• As the application mix changed over time, ….deal efficiently with large numbers of files requiring far less than 64MB (think in terms of Gmail, for example). The problem was not so much with the number of files itself, but rather with the memory demands all of those files made on the centralized master, thus exposing one of the bottleneck risks inherent in the original GFS design.

CS271 18

GFS Revisited(Cont’d)

• “the initial emphasis in designing GFS was on batch efficiency as opposed to low latency.”

• “The original single-master design: A single point of failure may not have been a disaster for batch-oriented applications, but it was certainly unacceptable for latency-sensitive applications, such as video serving.”

• Future directions: distributed master, etc.• Interesting and entertaining read.

CS271 19

PNUTS Overview

• Data Model:– Simple relational model—really key-value store.– Single-table scans with predicates

• Fault-tolerance:– Redundancy at multiple levels: data, meta-data etc.– Leverages relaxed consistency for high availability:

reads & writes despite failures• Pub/Sub Message System:

– Yahoo! Message Broker for asynchronous updates

CS271 20

Asynchronous replication

CS271 21

Consistency Model

• Hide the complexity of data replication• Between the two extremes:

– One-copy serializability, and– Eventual consistency

• Key assumption:– Applications manipulate one record at a time

• Per-record time-line consistency:– All replicas of a record preserve the update order

CS271 22

Implementation

• A read returns a consistent version• One replica designated as master (per record)• All updates forwarded to that master• Master designation adaptive, replica with

most of writes becomes master

CS271 23

Consistency model

• Goal: make it easier for applications to reason about updates and cope with asynchrony

• What happens to a record with primary key “Brian”?

Record inserted

Update Update Update UpdateUpdate Delete