cs519 - cloud types for eventual consistency

CLOUD TYPES FOR EVENTUAL CONSISTENCY

Class: CS 519 – Software Evolution for MobilityPresenter: Sergii Shmarkatiuk

Date: 10/09/2013

2

MOTIVATION

Today there are more mobile applications available for download and usage than

desktop applications.

3

MOTIVATION

And number of mobile applications does not go down

4

MOTIVATION

It goes up!

5

MOTIVATION

Mobile applications need to communicate with online services and with each other

6

MOTIVATION: WHY MOBILE APPS COMMUNICATE?

Personal Publishing

Games

Data Collection

Collaboration

Sync and Backup

Transactions

BlogFacebook WallWebsite

MusicVideoSkyDrive

SurveysHigh Scores

Shared ListsShared CalendarShared Spreadsheet

Real-timeTurn-based

StoreAuctionMatchmaking

Remote Control

Home ControlRoboticsMedia Player

7

MOTIVATION

Increased need of communication means more data

8

MOTIVATION

One of the most effective ways of storing data for mobile devices is a cloud

9

MOTIVATION

Being effective way of storage, cloud data manipulation poses some challenges for

developers

10

MOTIVATION

Developers usually have responsibility of managing different aspects of cloud data manipulation

Cloud data

Storage

Synchronization

CachingManipulation

Conflicts resolutio

n

11

MOTIVATION

Cloud data manipulation should be simplified for more convenient use

12

PROPOSED SOLUTION

Implementation of eventually consistent storage at the programming language level using cloud data types.

13

WHAT IS EVENTUAL CONSISTENCY?

Weak consistency model of storing shared data in a distributed system that allows clients to perform updates against any replica at any time.

One of the approaches to the CAP (consistency, availability, partition tolerance) problem

Guarantee that all updates are eventually delivered to all replicas, and that they are applied in a consistent order

Transactional model providing basic requirements of atomicity and isolation without possibility of transactions serialization

14

TRANSACTIONS: PICK ANY TWO

Atomicity

Isolation

Serializability

15

TRANSACTIONS: CHOICE OF EVENTUAL CONSISTENCY MODEL

Atomicity

Isolation

Serializability

Revisions

16

REVISION DIAGRAMSBURCKHARDT’2011: SEMANTICS OF CONCURRENT REVISIONS

Data propagate only along

edges

18

Node

Storage

Compute

Node

Storage

Compute

Node

Storage

Compute

Basic (peer-to-peer) Using Cloud Infrastructure

Storage

Storage

Compute Compute Compute

Storage

Storage

Clie

nt

Clie

nt

Clie

nt

Clie

nt

Clie

nt

Clie

nt

DISTRIBUTED SYSTEMS

19

HOW TO PROGRAM USING CLOUD CONCEPTS?

Layer

Storage

Storage

Compute

Compute

Compute

Storage

Storage

Clie

nt

Clie

nt

Clie

nt

Clie

nt

Clie

nt

Clie

ntClient

Not physically secureUnreliableCannot detect failuresPotentially many

Cloud ComputePhysically secure, not so manyNot reliable: no persistent stateCan detect failures somewhatRelatively Expensive

Cloud StorageSecureReliableCan be very cheap

20

MOTIVATING EXAMPLE: GROCERY LIST

milkbreadeggs

cilantrosardinesguava

21

DATA MANAGEMENT MODEL

device 1 device 2

cloud

• Client code: Declare data types read/update data yield (=polite

sync) flush (=forced

sync)

• Under the hood: Revision diagram

rules

22

device 1 device 2

cloud

IMPLICIT TRANSACTIONS

• At yieldRuntime has permission to send or receive updates. Call this frequently, e.g. automatically “on idle”.

• In between yieldsRuntime is not allowed to send or receive updates

• Implies: all client code executes in a (eventually consistent) transaction

…

…

…

…

…

…

…

yield

yield

yield

yieldyield

yield

yield

yield

STRONG CONSISTENCY ON-DEMAND

flush primitive blocks until local state has reached main revision and result has come back to device

Sufficient to implement strong consistency

Flush blocks –times out if server connection is not available.

flush(blocks)

(continue)

24

YIELD/FLUSH EXAMPLE

yield (=polite sync)

flush(=forced sync)

25

FORK-JOIN AUTOMATON (FJA)

Data set is copied on forking

Data is manipulated in isolation after fork

When data is joined, changes are merged.

The merge is fully defined by the data type declarations. some types may include

custom merge functions there is no failure, rollback, or

retry

B

D

CA

fork

fork

fork

join

join

26

PAPER CONTRIBUTIONS

Cloud types definition (CInt, CString, CSet) Formal description of language constructs

(big-step notation) Fork-join automaton operations formalization

(create, delete, propagate, fork, join, …) Formalization of distribution operations

(yield-pull, yield-flush, sync-pull, sync-flush, …)

Formalization of language constructs used by developers in client applications (new, delete, entries, all, yield, flush)

27

PAPER CONTRIBUTIONS

No evaluation

Reference/specification for developers implementing eventual

consistency with cloud types

28

PITFALLS

Last writer wins!

Subtle difference between cloud type methods set and add

29

RELATED WORK CRDT (commutative replicated data types)

Cloud types allow non-commutative operations No integration with fork-join automaton

Concurrent revisions approach Necessity of explicit merge (rfork, rjoin, …)

Persistent data types Do not take into account transactions or distribution

Operational transformations Very similar to eventual consistency with cloud types,

but difficult to implement More focused on correctness checks

OLAP/OLTP Not distributed

Google's Drive's Realtime API, Dropbox Sync API, Firebase Complicated, too many things to care of

30

QUESTIONS TO DISCUSS

Sergii: Is model of eventual consistency with cloud types really suitable for users data management?

Sergii: How does eventual consistency with cloud types ensure that there are no clones entities?

Sergii: How could developers use yield/flush operations effectively in their code? What is the typical case/example of flush or yield statement usage?

Sergii: How does data actually become eventually consistent? What mechanism does ensure that data distribution model is correct?

31

QUESTIONS TO DISCUSS

Michael: Since the development of cloud types make the functionality of cloud synchronization available to the user as a type, how will that affect tools such as debugging? Will the debugging tools show the users the revision diagrams and expose the underling cloud functionality to the users, or will it hide it from them? Will this cloud functionality confuse users as to how these types are behaving?

Michael: How scalable is the solution? At what point will it start to break down because there are too many users hitting it? Will it start to break down with 10 users accessing the same data? 1000? 1M?

32

FOLLOW-UP RESEARCH SUGGESTIONS

Sergii: Research on how existing types could be mapped to cloud types

Sergii: Study model limitations (silent conflict resolution)

Michael: While auto-synchronizing primitive cloud types are great for certain applications, it would be interesting to research ways to give users the power of eventual constancy while still providing a way for them to have finer grained control over the data. One use case would be to allow users to choose which other user's changes they would like to become constant with, or someone who's changes they chose to ignore

cs519 - cloud types for eventual consistency

Technology

cloud data types

cloud compute

cloud client code

shared data

data set

persistent data types

forking data

consistent storage