data structures of the future - diaspora stiintifica – 2016background •the first graph problem...
TRANSCRIPT
Data Structures of the Future: Concurrent, Optimistic, and Relaxed
Dan Alistarh ETH Zurich
Background
• The First Graph Problem (Euler, 1735) • 4 vertices, 7 edges
• Graph Problems, circa 2010 • Social Graphs: 1 billion vertices, 100 billion edges (~1 TB storage)
• Graph Problems Today • Web/Office/Brain Graphs: 100 billion vertices, 100 trillion edges
• More than 1 PetaByte storage
The Bridges of Königsberg
We distribute computation across processors, computers, and data centers.
This changes the way data structures are designed, built, and deployed.
Why Concurrent?
To get speedup on newer hardware.
Scaling: more threads should imply more useful work.
The Problem with Concurrency
0,00E+00
1,00E+06
2,00E+06
3,00E+06
4,00E+06
5,00E+06
6,00E+06
0 10 20 30 40 50 60 70
Thro
ugh
pu
t (E
ven
ts/S
eco
nd
)
Number of Threads
Throughput of Parallel Event Processing
Concurrency can be very bad value for money.
Is this problem inherent?
< $1000 / machine
> $10000 / machine
Event Queue
event1 event2 event3
Inherent Sequential Bottlenecks
Data structures with strong ordering semantics • Stacks, Queues, Priority Queues, Exact Counters
This is important because of Amdahl’s Law • Assume single-threaded computation takes 1 week
• Inherently sequential component (e.g., queue) takes 15% = 1 day
• Then maximum speedup < 7x, even with infinite threads
Theorem: Assuming n threads, any deterministic, strongly ordered data structure has an execution in which
a processor takes linear in n time to return. [Alistarh, Aspnes, Gilbert, Guerraoui, Journal of the ACM, 2014]
To get performance, it is critical to speed up shared data structures.
Concurrent Data Structures
Algorithms, data structures, and architectures for scalable distributed computation.
Theory ↔ Software ↔ Hardware
New Algorithmic and Analytic Ideas.
New Hardware Designs!
New Data Structures!
Discrete Event Simulation
1 task
3 task
4 task
5 task
7 task
8 task
Methods:
• Get Top Task • Insert a Task • Search for Task
15 task
11 task
18 task
Priority Queue <key, value>
Search(key)
Insert/Delete(k, v)
DeleteMin()
Extremely useful: • Graph Operations (Shortest Paths)
• Operating System Kernel • Time-Based Simulations
We are looking for a fast concurrent Priority Queue.
The Problem
Target: fast, concurrent Priority Queue.
Lots of work on the topic: [Sanders97], [Lotan&Shavit00], [Sundell&Tsigas07],
[Linden&Jonsson13], [Lenhart et al. 14], [Wimmer et al.14]
Current solutions are hard to scale: DeleteMin is highly contended.
Everyone wants the same element!
Concurrent Solution
head
● Linked list, sorted by priority ● Each node has random “height” (geometrically distributed with parameter ½) ● Elements at the same height form their own lists
H 1 3 4 5 9 … T
● Linked list, sorted by priority ● Each node has random “height” (geometrically distributed with parameter ½) ● Elements at the same height form their own lists ● Average time Search, Insert, Delete logarithmic, work concurrently [Pugh98, Fraser04]
H 1 3 4 5 9 … T
head tail
Search( 5 )
!
Concurrent Solution: the SkipList [Pugh90]
[H, 9]
[H, 9]
[1, 9]
[5, 9] stop
I. Lotan and N. Shavit. Skiplist-Based Concurrent Priority Queues. 2000.
● DeleteMin: simply remove the smallest element from the bottom list ● All processors compete for smallest element ● Does not scale!
head tail
The SkipList as a PQ
● We want to choose an item at random with ‘good’ guarantees ● Minimize loss of exactness by only choosing items near the front of the list ● Minimize contention by keeping collision probability low
The Idea: Relax!
Two examples for starting height 4
procedure Spray() ● At each skiplist level, flip coin to stay or jump forward ● Repeat for each level from log n down to 1 (the bottom)
● As if removing a random priority element near the head
jump
stay
jump
jump
DeleteMin: The Spray [Alistarh, Kopinsky, Li, Shavit, PPoPP 2015]
Spray and pray?
✓ Maximum value returned by Spray has rank Õ(𝑛)
‐ Sprays aren’t too wide
✓ For all x, p(x) = Õ(1/𝑛)
‐ Sprays don’t cluster too much ✓ If x > y is returned by some Spray, then p(y) = Ω (1/𝑛)
‐ Elements do not starve in the list
p(x) = probability that a
spray returns value at
index x
Õ(𝑛)
SprayList Probabilistic Guarantees
• Discrete Event Simulation
• Exact algorithms have negative scaling after 8 threads
• SprayList competitive with the random remover (no guarantees, incorrect execution)
In many practical settings (Discrete Event Simulation, shortest paths),
priority inversions are not expensive.
The Benchmark
node* DeleteMin ( ) {
cur <- head; //starting node
i <- log n; //starting height
while (i > 0) {
repeat(rand(0, 1)) { //stay or skip?
cur <- cur->next[i];
};
i <- i-1; //decrease level
}
v <- cur->val //reached bottom
flag <- Compare-and-Swap ( cur->val, v, NULL ) //acquire node
if ( flag == SUCCESS ) return cur;
else RETRY
}
The SprayList relaxes progress as well!
DeleteMin: The Spray
Relaxed Data Structures
The data structures of our childhood are changing.
A relaxation renaissance
[KarpZhang93], [DeoP92], [Sanders98], [HenzingerKPSS13], [NguyenLP13], [WimmerCVTT14],
[LenhartNP15], [RihaniSD15], [JeffreySYES16]
The SprayList merges both relaxed semantics and optimistic progress to achieve scalability.
My Research Algorithms, data structures, and architectures for
scalable distributed computation.
Theory ↔ Software ↔ Hardware
Algorithms and Data Structures.
Architectures and Systems.
Applications.
Interested?
Internship / Master / PhD.
What’s Next? Algorithms, data structures, and architectures for
scalable distributed computation.
Theory ↔ Software ↔ Hardware
Algorithms and Data Structures.
Architectures and Systems.
Applications.
Next-Generation Data Structures
Distributed Machine Learning
Population Protocols
Low-Latency Transactional Systems Optimization for the Cloud
Large-Scale Graph Processing
Backup: SprayList Shortest Paths
node* DeleteMin ( ) {
cur <- head; //starting node
i <- log n; //starting height
while (i > 0) {
repeat(rand(0, 1)) { //stay or skip?
cur <- cur->next[i];
};
i <- i-1; //decrease level
}
v <- cur->val //reached bottom
flag <- Compare-and-Swap ( cur->val, v, NULL ) //acquire node
if ( flag == SUCCESS ) return cur;
else RETRY
}
Parameters in red can be tuned!
DeleteMin: The Spray
The Scheduler (Intel™ machine, single socket)
The Stochastic Scheduler Model
• Short version: Every (non-faulty) thread can be scheduled in each step, with probability > 0.
• Definition: A scheduler is a triple (Dt, At, θ)t > 0, where Dt is the distribution at time t, At is the active set at time t, and θ is the probability threshold, such that • At time t, scheduling probabilities are given by Dt
• Only processes in At can be scheduled at t
• Each process in At is scheduled with probability ≥ θ.
A scheduler is stochastic if θ > 0.
* Lottery OS Scheduling, e.g. [Petrou, Milford, Gibson]
Examples
• Assume n processes
• The uniform stochastic scheduler: • θ = 1 / n
• Each process gets scheduled uniformly
• A standard shared-memory adversary: • Take any adversarial strategy
• Take Dt to give probability 1 to the process picked by the strategy, 0 to all others
• Not stochastic
• Quantum-based schedulers also easily modeled