concurrent revisions and cloud types
DESCRIPTION
Concurrent Revisions And Cloud Types. Sebastian Burckhardt In collaboration with Daan Leijen, Manuel Fähndrich , Alexandro Baldassin, Benjamin Wood, Mooly Sagiv, Yuelu Duan, Alexey Gotsman, Hongseok Yang. Overview. Part I: Concurrent Revisions Summary of prior work(What led me here) - PowerPoint PPT PresentationTRANSCRIPT
Concurrent RevisionsAnd Cloud Types
Sebastian BurckhardtIn collaboration with Daan Leijen, Manuel Fähndrich, Alexandro Baldassin, Benjamin Wood, Mooly Sagiv, Yuelu Duan, Alexey Gotsman, Hongseok Yang
Overview
Part I: Concurrent Revisions
Summary of prior work(What led me here)
Part II: Concurrent Revisions and Distributed Systems
Motivation:a) Programming Apps for Mobile+CloudProposed Solution:b) Revision Consistency c) Cloud Types
Part III: How to make it real: TouchDevelop
CONCURRENT REVISIONSPart I
Parallel tasks: Pick any two
ParallelPerformance Frequent
Conflicts
Serializability
Our pick
ParallelPerformance Frequent
Conflicts
Serializability
Revisions
Concurrent Revisions 101
• When forking a task, state is copied as well.
• Task operates on its copy of the data in isolation.
• When task is joined, changes are merged.
• The merge is fully defined by the data type declarations.• some types may include custom
merge functions• there is no failure , rollback, or
retry
B
D
CA
fork
fork
fork
join
join
Good for Parallel Programming• On multicore : efficient thanks to copy-on-write
• Studied game application [OOPSLA 2010]: revisions provide more parallelization, better performance
• Programming model has good properties: Deterministic Parallel Programming [ESOP 2010]
• Can be extended to express both parallel and incremental computation: “Two for the price of one” [OOPSLA 2011], Distinguished Paper Award
Application Example: SpaceWars3D Game
Revisions helped with these challenges:
- Need stable snapshot for rendering task
- Need to parallelize tasks that may write to same data(physics, collisions, network)
- Need to allow slow background tasks (e.g. autosave) to work on snapshot
Revision Diagram of Parallelized Game Loop
Rend
er
Phys
ics
netw
ork
auto
save
(lo
ng ru
nnin
g)Colli
sion
Dete
ction
part
4pa
rt 3
part
2pa
rt 1
Application Example: SpaceWars3D Game
Eliminated Read-Write Conflicts
Rend
er
Phys
ics
netw
ork
auto
save
(lo
ng ru
nnin
g)Colli
sion
Dete
ction
part
4pa
rt 3
part
2pa
rt 1
All tasks see stable snapshot
Application Example: SpaceWars3D Game
Eliminated Write-Write Conflicts
Rend
er
Phys
ics
netw
ork
auto
save
(lo
ng ru
nnin
g)Colli
sion
Dete
ction
part
4pa
rt 3
part
2pa
rt 1
Network after CD after Physics
Application Example: SpaceWars3D Game
Understanding Concurrent RevisionsOperation-Based Interpretation
• Current state determined by update sequence along path from root.
• Tip of arrow (arrow = end of a revision) count as the aggregate of all operations along the revision
A.Get() -> 0A.Set(1)
A.Set(2)B.Set(2)
A.Get() -> 1B.Get() -> 2
A : integer = 0;B : integer = 0;
Sees only the initialization operation
• Current state determined by update sequence along path from root.
• Tip of arrow (arrow = end of a revision) count as the aggregate of all operations along the revision
Set(A,0)Set(B,0)
A.Get() -> 0A.Set(1)
A.Set(2)B.Set(2)
A.Get() -> 1B.Get() -> 2
A.Set(2)B.Set(2)
A.Set(1)
A.Get() -> 0A.Set(1)
A.Set(2)B.Set(2)
A.Get() -> 1B.Get() -> 2
• Current state determined by update sequence along path from root.
• Tip of arrow (arrow = end of a revision) count as the aggregate of all operations along the revision
A.Set(0)B.Set(0)
A.Add(2)
A.Add(1)
A.Get() -> ?
Puzzle 1 A : integer = 0
A.Add(2)
A.Add(1)
A.Get() -> 3
Puzzle 1 A : integer = 0
A.Add(1)
A.Set(0)
A.Add(2)
Answer:
Updates along path:A.Set(0)A.Add(1)A.Add(2)
Result:3
A.Add(1) A.Set(1)
A.Get() -> ?
Puzzle 2A : integer = 0
A.Add(1)
A.Set(1)
A.Add(1)
A.Get() -> 2
A.Set(1)A.Add(1)
Answer
Updates along path:A.Set(0)A.Add(1)A.Set(1)A.Add(1)
Result:2
A.Add(1)
A.Set(0)Puzzle 2
S.Append(“1”)Puzzle 3
S : string = “”
S.Append(“2”)
S.Append(“3”)
S.Append(“4”)
S.Get() -> ?
S.Get() -> ?
Puzzle 3
S.Append(“2”)
S.Append(“4”)
S.Get() -> ?
S.Get()->“13”
Answer 1
Updates along path:S.Set(“”)S.Append(“1”)S.Append(“3”)
Result:“13”
S.Set(“”)
S.Append(“3”)
S.Append(“1”)
Puzzle 3
S.Append(“2”)
S.Append(“4”)
S.Get() -> “1234”
S.Get()->“13”
Answer 2
Updates along path:S.Set(“”)S.Append(“1”)S.Append(“2”)S.Append(“3”)S.Append(“4”)
Result:“1234”
S.Set(“”)
S.Append(“3”)
S.Append(“1”)
S.Append(“1”)S.Append(“2”)
S.Append(“3”)
S.Append(“3”)S.Append(“4”)
Visibility & Arbitration in Revision Diagrams
• Visibilitywho can see what updates?= Reachabilityis there a (directed) path?
• Arbitrationwhose update goes first?= Cactus Walk
1
2
34
5
6
7
8
9
Not everything is a revision diagram: The join condition
Revision diagrams are subject to the join condition:A revision can only be joined into vertices that are reachable from the fork.
Invalid join, no path from fork.
Without join condition, causality may be violated.
A
B
C
• B sees updates of A• C sees updates of B• But C does not see updates of A
• Without join condition, visibility is not transitive.• We prove in paper: enforcing
join condition is sufficient to guarantee transitive visibility.
Conclusion of Part I
• Revision Diagrams• Make replication explicit• Provide a principled way to understand and
define the effect of concurrent conflicting updates
• Concurrent Revisions• Can use revisions as a programming model to
achieve better performance, or to express incremental + parallel algorithms
REVISIONS & DISTRIBUTED SYSTEMSPart II
Revisions + Distributed Systems
• Can think of many applications.
• Revision pattern is commonly used for:• Source control systems
(data structured as file systems, with per-file merge operations)• Classic web applications
Load HTML form – edit locally – submit – server does merge• Modern web applications
read REST object – javascript modifies locally – write REST object
• We are currently focusing on this programming domain:Apps for Mobile + Cloud.
Revisions + Distributed Systems
• Can think of many applications.
• Revision pattern is commonly used for:• Source control systems
(data structured as file systems, with per-file merge operations)• Classic web applications
Load HTML form – edit locally – submit – server does merge• Modern web applications
read REST object – javascript modifies locally – write REST object
• We are currently focusing on this programming domain:Apps for Mobile + Cloud.
MOTIVATIONPart IIa
Why apps communicate
Personal Publishing
Games
Data Collection
Collaboration
Sync and Backup
Transactions
BlogFacebook WallWebsite
MusicVideoSkyDrive
SurveysHigh Scores
OneNoteShared ListsShared CalendarShared Spreadsheet
Real-timeTurn-based
StoreAuctionMatchmaking
Remote Control
Home ControlRoboticsMedia Player
Requirements• Persistence• Data is not deleted when we:
quit the app, lose connection to server, take the battery out, crash due to bug, close the browser, replace the phone
• Reliability• Process is not lost (resume at last stable point)• Data integrity is protected
• Offline support• App continues to work without connection to cloud
• Security• Control who can do what
• Scalability• Support many users and/or large databases at low cost
Requirements• Persistence• Data is not deleted when we:
quit the app, lose connection to server, take the battery out, crash due to bug, close the browser, replace the phone
• Reliability• Process is not lost (resume at last stable point)• Data integrity is protected
• Offline support• App continues to work without connection to cloud
• Security• Control who can do what
• Scalability• Support many users and/or large databases at low cost
Our focus.
Implies:- Need replicas on client - must support eventual
consistency
milkbreadeggs
cilantrosardinesguava
grocery list
Implementation Architecture?
• Peer-to-Peer• Program runs on clients only, no server• Popular with researchers• Not all that common in practice
• Client-Server, or more recently Client-Service• Very common these days• Service typically hosted on virtualized infrastructure (cloud)
-> makes “economy of scale” accessible to everybody
Node
Storage
Compute
Node
Storage
Compute
Node
Storage
Compute
Basic Using Cloud Infrastructure
Storage Storage
Compute Compute Compute
Storage Storage
Clie
nt
Clie
nt
Clie
nt
Clie
nt
Clie
nt
Clie
nt
Distributed Systems
Layer
Storage Storage
Compute Compute Compute
Storage Storage
Clie
nt
Clie
nt
Clie
nt
Clie
nt
Clie
nt
Clie
nt
How to program this machine?
ClientNot physically secureUnreliableCannot detect failuresPotentially many
Cloud ComputePhysically secure, not so manyNot reliable: no persistent stateCan detect failures somewhatRelatively Expensive
Cloud StorageSecureReliableCan be very cheap
Extensive Replication
Replica
App
Local State GUIStateBinding
Storage Backend
save/restoresave/restore
App
Local State GUIStateBinding
save/restoresave/restoreStorageBackend
Messages
sync
Messages
Compute LayerCompute
LayerCompute LayerCompute
LayerCompute LayerCompute
LayerCompute Layer
Mes
sage
s
LocalStorage
LocalStorage
Cache Coherence? Consistency Model?
grocery list
App programmers should not have to think that much.All this stuff is all about the implementation, not about the problem domain.
• Program runs on server, client is just a view• Example: Classic HTML approach• Client clicks link/button to submit,
gets next page• Program runs on client, uses server as
a resource• Client issues webrequests (e.g. REST)
• Program runs on client and on server• Example: websockets are full-duplex• Client and server can send messages,
causing event handlers to launch at other end
• Peer-to-Peer• Program runs on clients only, no
server• Rare for apps, as far as I know
Replica
AppLocal State
GUIState
Binding
Storage Backend
save/restor
e
save/restor
e
AppLocal State
GUIState
Binding
save/restor
e
save/restor
eStorageBackend
Messages
sync
Messages
Compute LayerCompute
LayerCompute LayerCompute
LayerCompute LayerCompute
LayerCompute Layer
Mes
sage
sLoca
lStorage
Local
Storage
Abstractions, please.
We propose:
- Revision Consistency- Cloud Types
Papers:Eventually Consistent Transaction (ESOP 2011)Cloud Types (ECOOP 2012)
Closely Related Work
• CRDTs (Conflict-Free Replicated Data Types)• [Shapiro, Preguica, Baquero, Zawirski]• Similar motivation and similar techniques
• Bayou• user-defined conflict resolution (merge fcts.)
REVISION CONSISTENCYPart IIb
Revision Consistency
device 1 device 2cloud• Client code: Declare data types read/update data yield (=polite sync) flush (=forced sync)
• Under the hood: Revision diagram
rules
device 1 device 2cloud
Implicit Transactions
• At yieldRuntime has permission to send or receive updates. Call this frequently, e.g. automatically “on idle”.
• In between yieldsRuntime is not allowed to send or receive updates
• Implies: all client code executes in a (eventually consistent) transaction
…
…
…
…
…
…
…
yield
yield
yield
yieldyield
yield
yield
yield
On-Demand Stronger Consistency
• flush primitive blocks until local state has reached main revision and result has come back to device• Sufficient to
implement strong consistency• Flush blocks –times
out if server connection is not available.
flush(blocks)
(continue)
Revision consistency
• Global state evolves as a Revision diagram• Main revision
(center) in reliable cloud storage• Seamless offline
support• Never blocks,
except when client issues fence
B
C
A
D
E
G
F
Client 1 Client 2
flush
yield
yield
yield
yield
yield
Revision consistency
• Global state evolves as a Revision diagram• Main revision
(center) in reliable cloud storage• Seamless offline
support• Never blocks,
except when client issues fence
B
CA
D
E
G
F
Client 1 Client 2
yield
fence
yield
yield
yield
yield
Nice things about Revision Consistency
Strong guarantees:• Guarantees causal eventual consistency• Supports eventually consistent transactions • Supports on-demand stronger consistency
Opportunities for efficient implementation• Naturally supports storage hierarchies• Consistent with full-duplex pull & push updates between
service and client (e.g. websockets)• Can be combined with “log reduction” techniques
• Main Revision = Master Log
• Suggested Implementation:- Main Log in Cloud Storage- Scalable read & write
• It is possible to scale reading/writing of log
• Log Reduction
B
CA
D
E
G
F
Client 1 Client 2
B
AD
CFG
Related to Log-BasedImplementations
[YieldPull]
[YieldPush]
[FlushPush]
[FlushPull]
[SyncPush]
[SyncPull]
[SyncPush]
[SyncPush]
[SyncPull]
[SyncPull]
[YieldPull]
[YieldPush]
1 0 0
1
00
1
1
1
2
22
3
4
3
4
Can build layered serviceReliableStorage
ComputeLayer
ComputeLayer
Client Client
CLOUD TYPESPart IIc
• An abstract data type with
• Initial value e.g. { 0 }• Query operations e.g. { get }• No side effects
• Update operations e.g. { set(x), add(x) }• Total (no preconditions)
• Good cloud types minimize programmer surprises.
What is a cloud type?
Our goals for finding cloud types…
• select only a few• But ensure many others can be derived
• choose types with minimal anomalies• Updates should make sense even if state changes
Forces us to rethink basic data structuring.• Objects & pointers fail the second criterion• Entities & relations do better
Example App: Birdwatching• An app for a birdwatching family.
• Start simple: let’s count the number of eagles seen.
var eagles : cloud integer;
device 1 device 2cloud
var eagles : cloud integer;
Eventually consistent counting
eagles.add(1) eagles.Set(1)
eagles.Get() -> 1
eagles.add(1)
eagles.get() → 3
eagles.add(1)
eagles.get() → 2
device 1 device 2cloud
Counting by birdvar birds: cloud array [name: string] {count : cloud integer}
birds[“jay”].count.Add(1)birds[“gull”].count.Add(2)
birds[“jay”].count.Get() -> 6
birds[“jay”].count.Add(5)
Important: all entries are already there, no need to insert key-value pairs.
Standard Map Semantics Would not Work!
device 1 device 2cloud
if birds.contains (“jay”) birds[jay].Add(5)else birds.insert(“jay”, 5)
?
if birds.contains (“jay”) birds[jay].Add(3)else birds.insert(“jay”, 3)
Our Collection of Cloud TypesPrimitive cloud types
• Cloud Integers{ get } { set(x), add(x) }
• Cloud Strings{ get } { set(s), set-if-empty(s) }
Structured cloud types• Cloud Tables
(cf. entities, tables with implicit primary key)• Cloud Arrays
(cf. key-value stores, relations)
Cloud Tables• Declares• Fixed columns• Regular columns
• Initial value: empty
• Operations: • new E(f1,f2) add new row (at end)• all E return all rows (top to bottom)• delete e permanently delete row• e.f1 read fixed column
• e.coli.op perform operation on cell
cloud table E( f1: index_type1; f2: index_type1;){ col1: cloud_type1; col2: cloud_type2;}
Cloud Arrays
• Initial value: for all keys, fields have initial value• Operations: • A[i1,i2].vali.op perform operation on value• entries A return entries for which
at least one vali is not initial value
cloud array A[ idx1: index_type1; idx2: index_type2;]{ val1: cloud_type1; val2: cloud_type2;}
Arrays + Tables = Relational Data
• Tables• Define entities• Row identity = Invisible primary key
• Arrays• Define arbitrary relations
• Code can access data using queries• For example, LINQ queries
Arrays + Tables = Relational Data
• Example: shopping cart
cloud table Customer{ name: cloud string;}
cloud table Product{ description: cloud string;}
cloud array ShoppingCart[ customer: Customer; product: Product;]{ quantity: cloud integer;}
Function Add(c: Customer; p: Product; x: int){ ShoppingCart[c,p].quantity.Add(x);}
Arrays + Tables = Relational Data
• Example: binary relation
cloud table User{ name: cloud string;}
cloud array friends[ user1 : User; user2 : User;]{ value: cloud boolean;}
Standard math: { relations AxBxC } = { functions AxBxC -> bool }
Arrays + Tables = Relational Data
• Example: linked tables
• Cascading delete: Order is deleted automatically when owning customer is deleted
cloud table Customer{ name: cloud string;}
cloud table Order[ owner: Customer]{ description: cloud string;}
Linked tables solve following problem:
device 1 device 2cloud
delete customer;foreach o in Orders if (o.owner = customer) delete o;
?
new Order(customer);
Flush can be used to implement a lock
We don’t recommend you actually do this in practice. (why?)
function Lock(){ while(lock != my_id) { lock.setIfEmpty(my_id); flush; }}
lock: cloud string;
function Unlock(){ lock.set(“”);}
HOW TO MAKE IT REALPart III
The hypothesis
• Anyone with basic programming skills can write simple apps that share data in the cloud• No harder than to writing a BASIC program or
an Excel script• Just declare your cloud table or cloud indexes,
and off you go.
• Success will mean: things work without pain – users won’t even appreciate that there is a research problem behind it
What is TouchDevelop?• A simple programming language
• An integrated development environment (IDE)
• Optimized for devices (small screens, touch input)no PC required, no keyboard required
• Runs on almost everything (Ipad, Iphone, Android, PC, windows phone …)
• You can share your scripts online (public domain)
or convert them to Windows 8 apps and sell them in the store
News about TouchDevelopConversations by Nokia: "Create your own Lumia apps with TouchDevelopNikkei Computer: "スマホのアプリをスマホで開発"
atmarkIT: "MSの開発環境「TouchDevelop"
PC-Welt: "Microsoft startet webbasierten App Creator für Windows 8"neowin.net: "Microsoft launches web-based Windows 8 app creator"NYTimes: "Fostering Tech Talent in Schools" c't magazine: "Apps für Windows Phone" (in German)TechRepublic: "Fun with TouchDevelop, an IDE for Windows Phone 7"Social Times: "Microsoft Research TouchDevelop for Windows Phone: The First Social Cloud Programming Environment"c't magazine: "[TouchDevelop]" (in German)Social Times: "Microsoft Research [TouchDevelop] Makes Windows Phone a Game Changing Platform: Prepare to be Amazed"Geek Wire: "Microsoft ‘[TouchDevelop]’ uses phone to program phone"