cs519 - cloud types for eventual consistency
DESCRIPTION
Overview of the paper "Cloud Types for Eventual Consistency" by Burckhardt et al. presented at Oregon State University for "Software Evolution for Mobility" class on Oct 10th 2013. Presentation time: 20 minTRANSCRIPT
CLOUD TYPES FOR EVENTUAL CONSISTENCY
Class: CS 519 – Software Evolution for MobilityPresenter: Sergii Shmarkatiuk
Date: 10/09/2013
2
MOTIVATION
Today there are more mobile applications available for download and usage than
desktop applications.
3
MOTIVATION
And number of mobile applications does not go down
4
MOTIVATION
It goes up!
5
MOTIVATION
Mobile applications need to communicate with online services and with each other
6
MOTIVATION: WHY MOBILE APPS COMMUNICATE?
Personal Publishing
Games
Data Collection
Collaboration
Sync and Backup
Transactions
BlogFacebook WallWebsite
MusicVideoSkyDrive
SurveysHigh Scores
Shared ListsShared CalendarShared Spreadsheet
Real-timeTurn-based
StoreAuctionMatchmaking
Remote Control
Home ControlRoboticsMedia Player
7
MOTIVATION
Increased need of communication means more data
8
MOTIVATION
One of the most effective ways of storing data for mobile devices is a cloud
9
MOTIVATION
Being effective way of storage, cloud data manipulation poses some challenges for
developers
10
MOTIVATION
Developers usually have responsibility of managing different aspects of cloud data manipulation
Cloud data
Storage
Synchronization
CachingManipulation
Conflicts resolutio
n
11
MOTIVATION
Cloud data manipulation should be simplified for more convenient use
12
PROPOSED SOLUTION
Implementation of eventually consistent storage at the programming language level using cloud data types.
13
WHAT IS EVENTUAL CONSISTENCY?
Weak consistency model of storing shared data in a distributed system that allows clients to perform updates against any replica at any time.
One of the approaches to the CAP (consistency, availability, partition tolerance) problem
Guarantee that all updates are eventually delivered to all replicas, and that they are applied in a consistent order
Transactional model providing basic requirements of atomicity and isolation without possibility of transactions serialization
14
TRANSACTIONS: PICK ANY TWO
Atomicity
Isolation
Serializability
15
TRANSACTIONS: CHOICE OF EVENTUAL CONSISTENCY MODEL
Atomicity
Isolation
Serializability
Revisions
16
REVISION DIAGRAMSBURCKHARDT’2011: SEMANTICS OF CONCURRENT REVISIONS
Data propagate only along
edges
18
Node
Storage
Compute
Node
Storage
Compute
Node
Storage
Compute
Basic (peer-to-peer) Using Cloud Infrastructure
Storage
Storage
Compute Compute Compute
Storage
Storage
Clie
nt
Clie
nt
Clie
nt
Clie
nt
Clie
nt
Clie
nt
DISTRIBUTED SYSTEMS
19
HOW TO PROGRAM USING CLOUD CONCEPTS?
Layer
Storage
Storage
Compute
Compute
Compute
Storage
Storage
Clie
nt
Clie
nt
Clie
nt
Clie
nt
Clie
nt
Clie
ntClient
Not physically secureUnreliableCannot detect failuresPotentially many
Cloud ComputePhysically secure, not so manyNot reliable: no persistent stateCan detect failures somewhatRelatively Expensive
Cloud StorageSecureReliableCan be very cheap
20
MOTIVATING EXAMPLE: GROCERY LIST
milkbreadeggs
cilantrosardinesguava
21
DATA MANAGEMENT MODEL
device 1 device 2
cloud
• Client code: Declare data types read/update data yield (=polite
sync) flush (=forced
sync)
• Under the hood: Revision diagram
rules
22
device 1 device 2
cloud
IMPLICIT TRANSACTIONS
• At yieldRuntime has permission to send or receive updates. Call this frequently, e.g. automatically “on idle”.
• In between yieldsRuntime is not allowed to send or receive updates
• Implies: all client code executes in a (eventually consistent) transaction
…
…
…
…
…
…
…
yield
yield
yield
yieldyield
yield
yield
yield
STRONG CONSISTENCY ON-DEMAND
flush primitive blocks until local state has reached main revision and result has come back to device
Sufficient to implement strong consistency
Flush blocks –times out if server connection is not available.
flush(blocks)
(continue)
24
YIELD/FLUSH EXAMPLE
yield (=polite sync)
flush(=forced sync)
25
FORK-JOIN AUTOMATON (FJA)
Data set is copied on forking
Data is manipulated in isolation after fork
When data is joined, changes are merged.
The merge is fully defined by the data type declarations. some types may include
custom merge functions there is no failure, rollback, or
retry
B
D
CA
fork
fork
fork
join
join
26
PAPER CONTRIBUTIONS
Cloud types definition (CInt, CString, CSet) Formal description of language constructs
(big-step notation) Fork-join automaton operations formalization
(create, delete, propagate, fork, join, …) Formalization of distribution operations
(yield-pull, yield-flush, sync-pull, sync-flush, …)
Formalization of language constructs used by developers in client applications (new, delete, entries, all, yield, flush)
27
PAPER CONTRIBUTIONS
No evaluation
Reference/specification for developers implementing eventual
consistency with cloud types
28
PITFALLS
Last writer wins!
Subtle difference between cloud type methods set and add
29
RELATED WORK CRDT (commutative replicated data types)
Cloud types allow non-commutative operations No integration with fork-join automaton
Concurrent revisions approach Necessity of explicit merge (rfork, rjoin, …)
Persistent data types Do not take into account transactions or distribution
Operational transformations Very similar to eventual consistency with cloud types,
but difficult to implement More focused on correctness checks
OLAP/OLTP Not distributed
Google's Drive's Realtime API, Dropbox Sync API, Firebase Complicated, too many things to care of
30
QUESTIONS TO DISCUSS
Sergii: Is model of eventual consistency with cloud types really suitable for users data management?
Sergii: How does eventual consistency with cloud types ensure that there are no clones entities?
Sergii: How could developers use yield/flush operations effectively in their code? What is the typical case/example of flush or yield statement usage?
Sergii: How does data actually become eventually consistent? What mechanism does ensure that data distribution model is correct?
31
QUESTIONS TO DISCUSS
Michael: Since the development of cloud types make the functionality of cloud synchronization available to the user as a type, how will that affect tools such as debugging? Will the debugging tools show the users the revision diagrams and expose the underling cloud functionality to the users, or will it hide it from them? Will this cloud functionality confuse users as to how these types are behaving?
Michael: How scalable is the solution? At what point will it start to break down because there are too many users hitting it? Will it start to break down with 10 users accessing the same data? 1000? 1M?
32
FOLLOW-UP RESEARCH SUGGESTIONS
Sergii: Research on how existing types could be mapped to cloud types
Sergii: Study model limitations (silent conflict resolution)
Michael: While auto-synchronizing primitive cloud types are great for certain applications, it would be interesting to research ways to give users the power of eventual constancy while still providing a way for them to have finer grained control over the data. One use case would be to allow users to choose which other user's changes they would like to become constant with, or someone who's changes they chose to ignore