data structures of the future - diaspora stiintifica – 2016background •the first graph problem...

Data Structures of the Future: Concurrent, Optimistic, and Relaxed

Dan Alistarh ETH Zurich

Background

• The First Graph Problem (Euler, 1735) • 4 vertices, 7 edges

• Graph Problems, circa 2010 • Social Graphs: 1 billion vertices, 100 billion edges (~1 TB storage)

• Graph Problems Today • Web/Office/Brain Graphs: 100 billion vertices, 100 trillion edges

• More than 1 PetaByte storage

The Bridges of Königsberg

We distribute computation across processors, computers, and data centers.

This changes the way data structures are designed, built, and deployed.

Why Concurrent?

To get speedup on newer hardware.

Scaling: more threads should imply more useful work.

The Problem with Concurrency

0,00E+00

1,00E+06

2,00E+06

3,00E+06

4,00E+06

5,00E+06

6,00E+06

0 10 20 30 40 50 60 70

Thro

ugh

pu

t (E

ven

ts/S

eco

nd

)

Number of Threads

Throughput of Parallel Event Processing

Concurrency can be very bad value for money.

Is this problem inherent?

< $1000 / machine

> $10000 / machine

Event Queue

event1 event2 event3

Inherent Sequential Bottlenecks

Data structures with strong ordering semantics • Stacks, Queues, Priority Queues, Exact Counters

This is important because of Amdahl’s Law • Assume single-threaded computation takes 1 week

• Inherently sequential component (e.g., queue) takes 15% = 1 day

• Then maximum speedup < 7x, even with infinite threads

Theorem: Assuming n threads, any deterministic, strongly ordered data structure has an execution in which

a processor takes linear in n time to return. [Alistarh, Aspnes, Gilbert, Guerraoui, Journal of the ACM, 2014]

To get performance, it is critical to speed up shared data structures.

Concurrent Data Structures

Algorithms, data structures, and architectures for scalable distributed computation.

Theory ↔ Software ↔ Hardware

New Algorithmic and Analytic Ideas.

New Hardware Designs!

New Data Structures!

Discrete Event Simulation

1 task

3 task

4 task

5 task

7 task

8 task

Methods:

• Get Top Task • Insert a Task • Search for Task

15 task

11 task

18 task

Priority Queue <key, value>

Search(key)

Insert/Delete(k, v)

DeleteMin()

Extremely useful: • Graph Operations (Shortest Paths)

• Operating System Kernel • Time-Based Simulations

We are looking for a fast concurrent Priority Queue.

The Problem

Target: fast, concurrent Priority Queue.

Lots of work on the topic: [Sanders97], [Lotan&Shavit00], [Sundell&Tsigas07],

[Linden&Jonsson13], [Lenhart et al. 14], [Wimmer et al.14]

Current solutions are hard to scale: DeleteMin is highly contended.

Everyone wants the same element!

Concurrent Solution

head

● Linked list, sorted by priority ● Each node has random “height” (geometrically distributed with parameter ½) ● Elements at the same height form their own lists

H 1 3 4 5 9 … T

● Linked list, sorted by priority ● Each node has random “height” (geometrically distributed with parameter ½) ● Elements at the same height form their own lists ● Average time Search, Insert, Delete logarithmic, work concurrently [Pugh98, Fraser04]

H 1 3 4 5 9 … T

head tail

Search( 5 )

!

Concurrent Solution: the SkipList [Pugh90]

[H, 9]

[H, 9]

[1, 9]

[5, 9] stop

I. Lotan and N. Shavit. Skiplist-Based Concurrent Priority Queues. 2000.

● DeleteMin: simply remove the smallest element from the bottom list ● All processors compete for smallest element ● Does not scale!

head tail

The SkipList as a PQ

● We want to choose an item at random with ‘good’ guarantees ● Minimize loss of exactness by only choosing items near the front of the list ● Minimize contention by keeping collision probability low

The Idea: Relax!

Two examples for starting height 4

procedure Spray() ● At each skiplist level, flip coin to stay or jump forward ● Repeat for each level from log n down to 1 (the bottom)

● As if removing a random priority element near the head

jump

stay

jump

jump

DeleteMin: The Spray [Alistarh, Kopinsky, Li, Shavit, PPoPP 2015]

Spray and pray?

✓ Maximum value returned by Spray has rank Õ(𝑛)

‐ Sprays aren’t too wide

✓ For all x, p(x) = Õ(1/𝑛)

‐ Sprays don’t cluster too much ✓ If x > y is returned by some Spray, then p(y) = Ω (1/𝑛)

‐ Elements do not starve in the list

p(x) = probability that a

spray returns value at

index x

Õ(𝑛)

SprayList Probabilistic Guarantees

• Discrete Event Simulation

• Exact algorithms have negative scaling after 8 threads

• SprayList competitive with the random remover (no guarantees, incorrect execution)

In many practical settings (Discrete Event Simulation, shortest paths),

priority inversions are not expensive.

The Benchmark

node* DeleteMin ( ) {

cur <- head; //starting node

i <- log n; //starting height

while (i > 0) {

repeat(rand(0, 1)) { //stay or skip?

cur <- cur->next[i];

};

i <- i-1; //decrease level

}

v <- cur->val //reached bottom

flag <- Compare-and-Swap ( cur->val, v, NULL ) //acquire node

if ( flag == SUCCESS ) return cur;

else RETRY

}

The SprayList relaxes progress as well!

DeleteMin: The Spray

Relaxed Data Structures

The data structures of our childhood are changing.

A relaxation renaissance

[KarpZhang93], [DeoP92], [Sanders98], [HenzingerKPSS13], [NguyenLP13], [WimmerCVTT14],

[LenhartNP15], [RihaniSD15], [JeffreySYES16]

The SprayList merges both relaxed semantics and optimistic progress to achieve scalability.

My Research Algorithms, data structures, and architectures for

scalable distributed computation.


Algorithms and Data Structures.

Architectures and Systems.

Applications.

Interested?

Internship / Master / PhD.

What’s Next? Algorithms, data structures, and architectures for

scalable distributed computation.


Algorithms and Data Structures.

Architectures and Systems.

Applications.

Next-Generation Data Structures

Distributed Machine Learning

Population Protocols

Low-Latency Transactional Systems Optimization for the Cloud

Large-Scale Graph Processing

Backup: SprayList Shortest Paths

node* DeleteMin ( ) {

cur <- head; //starting node

i <- log n; //starting height

while (i > 0) {

repeat(rand(0, 1)) { //stay or skip?

cur <- cur->next[i];

};

i <- i-1; //decrease level

}

v <- cur->val //reached bottom

flag <- Compare-and-Swap ( cur->val, v, NULL ) //acquire node

if ( flag == SUCCESS ) return cur;

else RETRY

}

Parameters in red can be tuned!

DeleteMin: The Spray

The Scheduler (Intel™ machine, single socket)

The Stochastic Scheduler Model

• Short version: Every (non-faulty) thread can be scheduled in each step, with probability > 0.

• Definition: A scheduler is a triple (Dt, At, θ)t > 0, where Dt is the distribution at time t, At is the active set at time t, and θ is the probability threshold, such that • At time t, scheduling probabilities are given by Dt

• Only processes in At can be scheduled at t

• Each process in At is scheduled with probability ≥ θ.

A scheduler is stochastic if θ > 0.

* Lottery OS Scheduling, e.g. [Petrou, Milford, Gibson]

Examples

• Assume n processes

• The uniform stochastic scheduler: • θ = 1 / n

• Each process gets scheduled uniformly

• A standard shared-memory adversary: • Take any adversarial strategy

• Take Dt to give probability 1 to the process picked by the strategy, 0 to all others

• Not stochastic

• Quantum-based schedulers also easily modeled

data structures of the future - diaspora stiintifica – 2016background •the first graph problem...

Documents