data structures of the future - diaspora stiintifica – 2016background •the first graph problem...

25
Data Structures of the Future: Concurrent, Optimistic, and Relaxed Dan Alistarh ETH Zurich

Upload: others

Post on 19-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Structures of the Future - Diaspora Stiintifica – 2016Background •The First Graph Problem (Euler, 1735) •4 vertices, 7 edges •Graph Problems, circa 2010 •Social Graphs:

Data Structures of the Future: Concurrent, Optimistic, and Relaxed

Dan Alistarh ETH Zurich

Page 2: Data Structures of the Future - Diaspora Stiintifica – 2016Background •The First Graph Problem (Euler, 1735) •4 vertices, 7 edges •Graph Problems, circa 2010 •Social Graphs:

Background

• The First Graph Problem (Euler, 1735) • 4 vertices, 7 edges

• Graph Problems, circa 2010 • Social Graphs: 1 billion vertices, 100 billion edges (~1 TB storage)

• Graph Problems Today • Web/Office/Brain Graphs: 100 billion vertices, 100 trillion edges

• More than 1 PetaByte storage

The Bridges of Königsberg

We distribute computation across processors, computers, and data centers.

This changes the way data structures are designed, built, and deployed.

Page 3: Data Structures of the Future - Diaspora Stiintifica – 2016Background •The First Graph Problem (Euler, 1735) •4 vertices, 7 edges •Graph Problems, circa 2010 •Social Graphs:

Why Concurrent?

To get speedup on newer hardware.

Scaling: more threads should imply more useful work.

Page 4: Data Structures of the Future - Diaspora Stiintifica – 2016Background •The First Graph Problem (Euler, 1735) •4 vertices, 7 edges •Graph Problems, circa 2010 •Social Graphs:

The Problem with Concurrency

0,00E+00

1,00E+06

2,00E+06

3,00E+06

4,00E+06

5,00E+06

6,00E+06

0 10 20 30 40 50 60 70

Thro

ugh

pu

t (E

ven

ts/S

eco

nd

)

Number of Threads

Throughput of Parallel Event Processing

Concurrency can be very bad value for money.

Is this problem inherent?

< $1000 / machine

> $10000 / machine

Event Queue

event1 event2 event3

Page 5: Data Structures of the Future - Diaspora Stiintifica – 2016Background •The First Graph Problem (Euler, 1735) •4 vertices, 7 edges •Graph Problems, circa 2010 •Social Graphs:

Inherent Sequential Bottlenecks

Data structures with strong ordering semantics • Stacks, Queues, Priority Queues, Exact Counters

This is important because of Amdahl’s Law • Assume single-threaded computation takes 1 week

• Inherently sequential component (e.g., queue) takes 15% = 1 day

• Then maximum speedup < 7x, even with infinite threads

Theorem: Assuming n threads, any deterministic, strongly ordered data structure has an execution in which

a processor takes linear in n time to return. [Alistarh, Aspnes, Gilbert, Guerraoui, Journal of the ACM, 2014]

To get performance, it is critical to speed up shared data structures.

Page 6: Data Structures of the Future - Diaspora Stiintifica – 2016Background •The First Graph Problem (Euler, 1735) •4 vertices, 7 edges •Graph Problems, circa 2010 •Social Graphs:

Concurrent Data Structures

Algorithms, data structures, and architectures for scalable distributed computation.

Theory ↔ Software ↔ Hardware

New Algorithmic and Analytic Ideas.

New Hardware Designs!

New Data Structures!

Page 7: Data Structures of the Future - Diaspora Stiintifica – 2016Background •The First Graph Problem (Euler, 1735) •4 vertices, 7 edges •Graph Problems, circa 2010 •Social Graphs:

Discrete Event Simulation

1 task

3 task

4 task

5 task

7 task

8 task

Methods:

• Get Top Task • Insert a Task • Search for Task

15 task

11 task

18 task

Priority Queue <key, value>

Search(key)

Insert/Delete(k, v)

DeleteMin()

Extremely useful: • Graph Operations (Shortest Paths)

• Operating System Kernel • Time-Based Simulations

We are looking for a fast concurrent Priority Queue.

Page 8: Data Structures of the Future - Diaspora Stiintifica – 2016Background •The First Graph Problem (Euler, 1735) •4 vertices, 7 edges •Graph Problems, circa 2010 •Social Graphs:

The Problem

Target: fast, concurrent Priority Queue.

Lots of work on the topic: [Sanders97], [Lotan&Shavit00], [Sundell&Tsigas07],

[Linden&Jonsson13], [Lenhart et al. 14], [Wimmer et al.14]

Current solutions are hard to scale: DeleteMin is highly contended.

Everyone wants the same element!

Page 9: Data Structures of the Future - Diaspora Stiintifica – 2016Background •The First Graph Problem (Euler, 1735) •4 vertices, 7 edges •Graph Problems, circa 2010 •Social Graphs:

Concurrent Solution

head

● Linked list, sorted by priority ● Each node has random “height” (geometrically distributed with parameter ½) ● Elements at the same height form their own lists

H 1 3 4 5 9 … T

Page 10: Data Structures of the Future - Diaspora Stiintifica – 2016Background •The First Graph Problem (Euler, 1735) •4 vertices, 7 edges •Graph Problems, circa 2010 •Social Graphs:

● Linked list, sorted by priority ● Each node has random “height” (geometrically distributed with parameter ½) ● Elements at the same height form their own lists ● Average time Search, Insert, Delete logarithmic, work concurrently [Pugh98, Fraser04]

H 1 3 4 5 9 … T

head tail

Search( 5 )

!

Concurrent Solution: the SkipList [Pugh90]

[H, 9]

[H, 9]

[1, 9]

[5, 9] stop

Page 11: Data Structures of the Future - Diaspora Stiintifica – 2016Background •The First Graph Problem (Euler, 1735) •4 vertices, 7 edges •Graph Problems, circa 2010 •Social Graphs:

I. Lotan and N. Shavit. Skiplist-Based Concurrent Priority Queues. 2000.

● DeleteMin: simply remove the smallest element from the bottom list ● All processors compete for smallest element ● Does not scale!

head tail

The SkipList as a PQ

Page 12: Data Structures of the Future - Diaspora Stiintifica – 2016Background •The First Graph Problem (Euler, 1735) •4 vertices, 7 edges •Graph Problems, circa 2010 •Social Graphs:

● We want to choose an item at random with ‘good’ guarantees ● Minimize loss of exactness by only choosing items near the front of the list ● Minimize contention by keeping collision probability low

The Idea: Relax!

Page 13: Data Structures of the Future - Diaspora Stiintifica – 2016Background •The First Graph Problem (Euler, 1735) •4 vertices, 7 edges •Graph Problems, circa 2010 •Social Graphs:

Two examples for starting height 4

procedure Spray() ● At each skiplist level, flip coin to stay or jump forward ● Repeat for each level from log n down to 1 (the bottom)

● As if removing a random priority element near the head

jump

stay

jump

jump

DeleteMin: The Spray [Alistarh, Kopinsky, Li, Shavit, PPoPP 2015]

Spray and pray?

Page 14: Data Structures of the Future - Diaspora Stiintifica – 2016Background •The First Graph Problem (Euler, 1735) •4 vertices, 7 edges •Graph Problems, circa 2010 •Social Graphs:

✓ Maximum value returned by Spray has rank Õ(𝑛)

‐ Sprays aren’t too wide

✓ For all x, p(x) = Õ(1/𝑛)

‐ Sprays don’t cluster too much ✓ If x > y is returned by some Spray, then p(y) = Ω (1/𝑛)

‐ Elements do not starve in the list

p(x) = probability that a

spray returns value at

index x

Õ(𝑛)

SprayList Probabilistic Guarantees

Page 15: Data Structures of the Future - Diaspora Stiintifica – 2016Background •The First Graph Problem (Euler, 1735) •4 vertices, 7 edges •Graph Problems, circa 2010 •Social Graphs:

• Discrete Event Simulation

• Exact algorithms have negative scaling after 8 threads

• SprayList competitive with the random remover (no guarantees, incorrect execution)

In many practical settings (Discrete Event Simulation, shortest paths),

priority inversions are not expensive.

The Benchmark

Page 16: Data Structures of the Future - Diaspora Stiintifica – 2016Background •The First Graph Problem (Euler, 1735) •4 vertices, 7 edges •Graph Problems, circa 2010 •Social Graphs:

node* DeleteMin ( ) {

cur <- head; //starting node

i <- log n; //starting height

while (i > 0) {

repeat(rand(0, 1)) { //stay or skip?

cur <- cur->next[i];

};

i <- i-1; //decrease level

}

v <- cur->val //reached bottom

flag <- Compare-and-Swap ( cur->val, v, NULL ) //acquire node

if ( flag == SUCCESS ) return cur;

else RETRY

}

The SprayList relaxes progress as well!

DeleteMin: The Spray

Page 17: Data Structures of the Future - Diaspora Stiintifica – 2016Background •The First Graph Problem (Euler, 1735) •4 vertices, 7 edges •Graph Problems, circa 2010 •Social Graphs:

Relaxed Data Structures

The data structures of our childhood are changing.

A relaxation renaissance

[KarpZhang93], [DeoP92], [Sanders98], [HenzingerKPSS13], [NguyenLP13], [WimmerCVTT14],

[LenhartNP15], [RihaniSD15], [JeffreySYES16]

The SprayList merges both relaxed semantics and optimistic progress to achieve scalability.

Page 18: Data Structures of the Future - Diaspora Stiintifica – 2016Background •The First Graph Problem (Euler, 1735) •4 vertices, 7 edges •Graph Problems, circa 2010 •Social Graphs:

My Research Algorithms, data structures, and architectures for

scalable distributed computation.

Theory ↔ Software ↔ Hardware

Algorithms and Data Structures.

Architectures and Systems.

Applications.

Page 19: Data Structures of the Future - Diaspora Stiintifica – 2016Background •The First Graph Problem (Euler, 1735) •4 vertices, 7 edges •Graph Problems, circa 2010 •Social Graphs:

Interested?

Internship / Master / PhD.

Page 20: Data Structures of the Future - Diaspora Stiintifica – 2016Background •The First Graph Problem (Euler, 1735) •4 vertices, 7 edges •Graph Problems, circa 2010 •Social Graphs:

What’s Next? Algorithms, data structures, and architectures for

scalable distributed computation.

Theory ↔ Software ↔ Hardware

Algorithms and Data Structures.

Architectures and Systems.

Applications.

Next-Generation Data Structures

Distributed Machine Learning

Population Protocols

Low-Latency Transactional Systems Optimization for the Cloud

Large-Scale Graph Processing

Page 21: Data Structures of the Future - Diaspora Stiintifica – 2016Background •The First Graph Problem (Euler, 1735) •4 vertices, 7 edges •Graph Problems, circa 2010 •Social Graphs:

Backup: SprayList Shortest Paths

Page 22: Data Structures of the Future - Diaspora Stiintifica – 2016Background •The First Graph Problem (Euler, 1735) •4 vertices, 7 edges •Graph Problems, circa 2010 •Social Graphs:

node* DeleteMin ( ) {

cur <- head; //starting node

i <- log n; //starting height

while (i > 0) {

repeat(rand(0, 1)) { //stay or skip?

cur <- cur->next[i];

};

i <- i-1; //decrease level

}

v <- cur->val //reached bottom

flag <- Compare-and-Swap ( cur->val, v, NULL ) //acquire node

if ( flag == SUCCESS ) return cur;

else RETRY

}

Parameters in red can be tuned!

DeleteMin: The Spray

Page 23: Data Structures of the Future - Diaspora Stiintifica – 2016Background •The First Graph Problem (Euler, 1735) •4 vertices, 7 edges •Graph Problems, circa 2010 •Social Graphs:

The Scheduler (Intel™ machine, single socket)

Page 24: Data Structures of the Future - Diaspora Stiintifica – 2016Background •The First Graph Problem (Euler, 1735) •4 vertices, 7 edges •Graph Problems, circa 2010 •Social Graphs:

The Stochastic Scheduler Model

• Short version: Every (non-faulty) thread can be scheduled in each step, with probability > 0.

• Definition: A scheduler is a triple (Dt, At, θ)t > 0, where Dt is the distribution at time t, At is the active set at time t, and θ is the probability threshold, such that • At time t, scheduling probabilities are given by Dt

• Only processes in At can be scheduled at t

• Each process in At is scheduled with probability ≥ θ.

A scheduler is stochastic if θ > 0.

* Lottery OS Scheduling, e.g. [Petrou, Milford, Gibson]

Page 25: Data Structures of the Future - Diaspora Stiintifica – 2016Background •The First Graph Problem (Euler, 1735) •4 vertices, 7 edges •Graph Problems, circa 2010 •Social Graphs:

Examples

• Assume n processes

• The uniform stochastic scheduler: • θ = 1 / n

• Each process gets scheduled uniformly

• A standard shared-memory adversary: • Take any adversarial strategy

• Take Dt to give probability 1 to the process picked by the strategy, 0 to all others

• Not stochastic

• Quantum-based schedulers also easily modeled