topics acid vs base starfish availability tacc model transend measurements sns architecture

Topics

ACID vs BASE

Starfish Availability

TACC Model

Transend Measurements

SNS Architecture

Extensible Cluster Based Network Services

Armando FoxSteven GribbleYatin ChawatheEric BrewerPaul Gauthier

University of CaliforniaBerkeley

Inktomi Corporation

Presenter: Ashish Gupta

Advanced Operating Systems

Motivation Proliferation of network-based services Two critical issues must be addressed by

Internet services: System scalability

Incremental and linear scalability Availability and fault tolerance

24x7 operation

Clusters of workstations meet these requirements

Commodity PCs as unit of scaling

Good Cost/performance

Incremental Scalability

“Embarrassingly parallel” workloads

Map well onto workstations

Redundancy of clusters

Masks transient failures

Contribution of this workIsolate common requirements of cluster-

based Internet apps into a reusable substrate

the Scalable Network Services (SNS) framework

Goal: complete separation of *ility concerns from application logic

Legacy code encapsulationLegacy code encapsulation

Insulate programmers from nasty engineeringInsulate programmers from nasty engineering

Contribution of this work

Architecture for SNS, exploiting the strength of cluster computing

Separation of content of network services from implementation

Encapsulation of low level functions in a lower layer

Example of a new service

A Programming Model to go with the architecture

The SNS architecture

C$

LB/FT

Interconnect

FE

$ $

WWWT

FE

FE

WWWA

GUI

Front EndsFront EndsFront EndsFront Ends CachesCachesCachesCaches

User ProfileUser ProfileDatabaseDatabase


WorkersWorkersWorkersWorkers

Manager:Manager:Load Balancing & Fault ToleranceLoad Balancing & Fault Tolerance


AdministrationAdministrationInterfaceInterface


Workers and Front-endsAll control decisions for satisfying user requests localized in the front-ends:

Which Servers to invoke, access profile database, notify the end-user etc.

Workers simple and stateless

Behaviour of service defined entirely at the front-end

Analogy of processes in a Unix pipeline: ls –l | grep .pl | wc


C$

LB/FT

Interconnect

FE

$ $

WWWT

FE

FE

WWWA

GUI









Front-endsUser Interface to SNS

Queue requests for service

Can Maintain State for many simultaneous outstanding requests


C$

LB/FT

Interconnect

FE

$ $

WWWT

FE

FE

WWWA

GUI









User ProfileAllows Mass customization of request processing


C$

LB/FT

Interconnect

FE

$ $

WWWT

FE

FE

WWWA

GUI









WorkersCaches, Service specific Modules

Multiple Instantiation possible

Themselves just perform a specific task, not responsible for load balancing, fault tolerance


C$

LB/FT

Interconnect

FE

$ $

WWWT

FE

FE

WWWA

GUI









Administrative InterfaceTracking and Visualization of system’s behaviour

Administrative actions


C$

LB/FT

Interconnect

FE

$ $

WWWT

FE

FE

WWWA

GUI









ManagerCollects load information from the workers

Balances load across workers

Spawn additional workers on increased load, faults


C$

LB/FT

Interconnect

FE

$ $

WWWT

FE

FE

WWWA

GUI







Workers and Front-endsAll control decisions for satisfying user requests localized in the front-ends:

Which Servers to invoke, access profile database, notify the end-user etc.

Workers simple and stateless

Behaviour of service defined entirely at the front-end

Analogy of processes in a Unix pipeline: ls –l | grep .pl | wcUser ProfileUser Profile

DatabaseDatabase


Separating the content from implementation

Layered Software model

Previous Components

SNS

Scalable Network Service Support

TACC

Transformation, Aggregation, Caching, Customization

Service

Service Specific Code

SNS

Provides Scalability

Load Balancing

Fault tolerance

High Availability

The SNS Layer

Scalability Replicate well-encapsulated components Prolonged Bursts: Notion of Overflow Pool

Load Balancing Centralized: Simple to implement and

predicable

The SNS Layer

Soft State for fault-tolerance and availability Process peers watch each other Because of no hard state, “recovery” == “restart”

Load balancing, hot updates, migration are “easy” Shoot down a worker, and it will recover Upgrade == install new software, shoot down old Mostly graceful degradation

“Starfish” Availability: LB DeathFE detects via broken pipe/timeout, restarts LB

C$

Interconnect

FE

$ $

WWWT

FE

FE

LB/FT

WWWA

“Starfish” Availability: LB DeathFE detects via broken pipe/timeout, restarts LB

C$

Interconnect

FE

$ $

WWWT

FE

FE

LB/FT

WWWA

LB/FT

New LB announces itself (multicast), contacted by workers, New LB announces itself (multicast), contacted by workers, gradually rebuilds load tablesgradually rebuilds load tables

If partition heals, extra LB’s commit suicideIf partition heals, extra LB’s commit suicideFE’s operate using cached LB info during failureFE’s operate using cached LB info during failure

Question: How do we build the services in the higher layers?

Transformation

Operation on a single data object that changes its content

TTC

Aggregation

Collecting data from several sources and collating it

Caching

Storing/re-computing easier than moving across internet

Can also store post-transformation (or post-aggregation) content

Customization

Per user: for content generation

Per device: data delivery, content “packaging”

The TACC Modela model for structuring services

The TACC Modela model for structuring services

Programming model based on composable building blocks

Many existing services fit well within the TACC model

A Meta-Search Engine In TACC

Uses existing services to create a new service 2.5 hours to write using TACC franework

MetasearchWeb UI

Internet

An Example Service

TRANSEND

Datatype-Specific Distillation Lossy compression that preserves semantic content

Tailor content for each client Reduce end-to-end latency when link is slow Meaningful presentation for range of clients

1.2 The Remote Queue Model

We introduce Remote Queues (RQ), ….

1.2 The Remote Queue Model

We introduce Remote Queues (RQ), ….

65x65x

6.8x6.8x

TranSend SNS Components

Workers = Distillers here Simple restart mechanism for fault-tolerance Each distiller took 5-6 hrs to write SNS Fault tolerance removes worries about

occasional bugs/crashes

Measurements Request Generation:

High performance HTTP request playback engine

Burstiness Handled by the overflow pool

Load Balancing

Metric: Queue Length at distillers

Load reaches threshold:

Manager spawns a new distiller

Scalability

Strategy:

Begin with minimal instance

Increase offered load until saturation

Add more resources to eliminate saturation

Observations:

Nearly perfect linear growth

1 Distiller ~ 23 requests/sec

Front end ~ 70 requests/sec

Ultimate bottleneck:

Shared components of the system (Manager and the SAN)

SAN could be bottleneck for communication-intensive workloads (Example of 10Mb/s eth)

Topic for future research

Conclusion

A layered architecture for cluster-based scalable network services

Authors shielded from software complexity of automatic scaling, high availability, and failure management

New services as composition of stateless workers

A useful paradigm for deploying new Internet services

ACID vs BASE semanticsAn approximate answer delivered quickly is more useful than the exact answer slowly

ACID BASEStrong consistency

- data precise or NOT OK

Weak consistency

- stale data OK

Availability??? (not always) Availability First

Focus on “commit” Best Effort

Guarantees accurate answers Approximate Answers OK

Difficult evolution Easy Evolution

Conservative (pessimistic) Aggressive (optimistic)

Simpler, Faster (?)

ACID vs BASE semantics

Search Engine as a database1 Big table

Unknown but large growth

Must be truly highly available

An approximate answer delivered quickly is more useful than the exact answer slowly

A DBMS would be too slow

Choose availability over consistency

Graceful degradation: OK to temporarily lose small random subsets of data due to faults

Consistency

Atomicity

Isolation

Durability

Replace with

Availablity

Graceful degradation

Performance

BASE

Basically Available

Soft-State

Eventual ConsistencyDatabase research is about ACID

Why BASE ?

Idea: focus on looser semantics rather than ACID semantics ACID => data unavailable rather than available but inconsistent BASE => data available, but could be stale, inconsistent or approximate Real systems use BOTH semantics Claim: BASE can lead to simpler systems and better performance

Performance: caching and avoidance of communication and some locks (e.g. ACID requires strict locking and communication with replicas for every write and any reads without locks)

Simpler: soft-state leads to easy recovery and interchangable components

BASE fits clusters well due to partial failure

More BASE… Reduces complexity of service implementation ,

consistency for simplicity Fault Tolerance Availability

Opportunities for better performance optimizations in the SNS framework ACID : durable and consistent state across partial failures

This Is relaxed in the BASE model Example of HotBot

THANK You

Backup Slides

Question1. Why are the cluster-based

network service well suited to internet service

answer The requirements are highly

parallel( many indepent simultaneous users)

The grain size typically corresponds to at most a few CPU seconds on a commodity PC

Question 2 Why does the cluster-base network

service use BASE semantics?

Answer: BASE semantics allow us to

handle partial failure in clusters with less complexity and cost.

Question 3 When the overflow machines are

being recruited unusually often, what should be done at this time?

Answer: It is time to add new machines.

Question 4 Does the Front-end crash not lost

any information? If does, what kind information will be lost?

Answer: User requests will be lost and user

need to handle timeout and resend request.

Clustering and Internet Workloads

Internet vs. “traditional” workloads e.g. Database workloads (TPC benchmarks) e.g. traditional scientific codes (matrix multiply, simulated annealing and

related simulations, etc.)

Some characteristic differences Read mostly Quality of service (best-effort vs. guarantees) Task granularity

“Embarrasingly parallel”…why? HTTP is stateless with short-lived requests Web’s architecture has already forced app designers to work around this!

(not obvious in 1990)

Meeting the Cluster Challenges

Software & programming models Partial failure and application semantics System administration Two case studies to contrast programming models

GLUnix goal: support “all” traditional Unix apps, providing a single system image

SNS/TACC goal: simple programming model for Internet services (caching, transformation, etc.), with good robustness and easy administration

topics acid vs base starfish availability tacc model transend measurements sns architecture

Documents

w w w t w w w

fault tolerance slide

user interface

sns architecture c

access profile database

wc slide

work architecture

satisfying user requests