kitalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · sanders: algorithm engineeringapril...

278
Sanders: Algorithm Engineering April 22, 2014 1 Algorithm Engineering für grundlegende Datenstrukturen und Algorithmen Peter Sanders Was sind die schnellsten implementierten Algorithmen für das 1 × 1 der Algorithmik: Listen, Sortieren, Prioritätslisten, Sortierte Listen, Hashtabellen, Graphenalgorithemen?

Upload: others

Post on 26-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 1

Algorithm Engineering fürgrundlegende Datenstrukturen und

Algorithmen

Peter SandersWas sind dieschnellsten implementierten Algorithmen

für das 1×1 der Algorithmik:

Listen, Sortieren, Prioritätslisten, Sortierte Listen, Hashtabellen,

Graphenalgorithemen?

Page 2: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 2

Nützliche Vorkenntnisse

Algorithmen I

Algorithmen II

etwa Rechnerarchitektur (oder Ct lesen ;-)

passive Kenntnisse von C/C++

Vertiefungsgebiet: Algorithmik

Page 3: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 3

Material

Folien

wissenschaftliche Aufsätze.

Siehe Vorlesungshomepage

Basiskenntnisse: Algorithmenlehrbücher,

z.B. Mehlhorn/Sanders, Cormen et al.

Mehlhorn Näher: The LEDA Platform of Combinatorial

and Geometric Computing. Gut für die fortgeschrittenen

Papiere.

Page 4: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 4

Zusatzaufgabe

Wahl zwischen Vortrag und Programmierprojekt.Themen siehe Webseite20% der Note

Vorträge 20min Vortrag+ 10min Diskussion

Blockveranstaltung gegen Ende / kurz nach derVorlesungszeit

keine Ausarbeitung

i.allg. ein Paper lesen und verständlich präsentieren

spätestens 1 Woche vorher weit fortgeschritteneFolienentwürfe einreichen

Page 5: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 5

Programmierprojekte

5–10min Vortrag+ 5min Diskussion

2 Seiten Ausarbeitung+ Quellen, Deadline 1.4.2013

(möglichst vorher)

Danach noch Nachbesserungsmöglichkeit bis Beginn

Vorlesungszeit

Page 6: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 6

Überblick Was ist Algorithm Engineering, Modelle, . . .

Erste Schritte: Arrays, verkettete Listen, Stacks, FIFOs,. . .

Sortieren rauf und runter

Prioritätslisten

Sortierte Listen

Hashtabellen

Minimale Spannbäume

Kürzeste Wege

Ausgewählte fortgeschrittene Algorithmen, z.B. maximale

Flüsse

Methodik: in Exkursen

Page 7: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 7

Algorithmics

= thesystematicdesign of efficient software and hardware

computer science

logics correct

efficient

soft− & hardware

the

ore

tic

al p

rac

tica

l

algorithmics

Page 8: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 8

(Caricatured) Traditional View: Algorithm Theory

models

design

analysis

perf. guarantees applications

implementation

deduction

Theory Practice

Page 9: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 9

Gaps Between Theory & Practice

Theory ←→ Practice

simple appl. model complex

simple machine model real

complex algorithms FOR simple

advanced data structures arrays,. . .

worst casemax complexity measure inputs

asympt. O(·) efficiency 42%constant factors

Page 10: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 10

Algorithmics as Algorithm Engineering

models

design

perf.−guarantees

deduction

analysis experiments

algorithmengineering

??

?

Page 11: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 11

Algorithmics as Algorithm Engineering

models

design

implementationperf.−guarantees

deduction

falsifiable

inductionhypothesesanalysis experiments

algorithmengineering

Page 12: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 12

Algorithmics as Algorithm Engineering

realisticmodels

implementationperf.−guarantees

deduction

falsifiable

inductionhypotheses experiments

algorithmengineering

design

analysis

real.

real.

Page 13: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 13

Algorithmics as Algorithm Engineering

realisticmodels

design

implementation

librariesalgorithm−

perf.−guarantees

deduction

falsifiable

inductionhypothesesanalysis experiments

algorithmengineering real

Inputs

Page 14: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 14

Algorithmics as Algorithm Engineering

appl. engin.

realisticmodels

design

implementation

librariesalgorithm−

perf.−guarantees

applications

falsifiable

inductionhypothesesanalysis experiments

algorithmengineering real

Inputs

deduction

Page 15: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 15

Goals

bridge gapsbetween theory and practice

acceleratetransferof algorithmic results intoapplications

keep the advantages of theoretical treatment:

generalityof solutions and

reliabiltiy, predictabiltyfrom performance guarantees

appl. engin.

realisticmodels

design

implementation

librariesalgorithm−

perf.−guarantees

applications

falsifiable

inductionhypothesesanalysis experiments

algorithmengineering real

Inputs

deduction

Page 16: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 16

Bits of History1843– Algorithms in theory and practice

1950s,1960sStill infancy

1970s,1980sPaper and pencil algorithm theory.

Exceptions exist, e.g.,[J. Bentley, D. Johnson]

1986 Term used by[T. Beth],

lecture“Algorithmentechnik”in Karlsruhe.

1988– Library of Efficient Data Types and

Algorithms (LEDA) [K. Mehlhorn]

1990– DIMACS Implementation Challenges[D. Johnson]

1997– Workshop on Algorithm Engineering

ESA applied track[G. Italiano]

1997 Term used in US policy paper[Aho, Johnson, Karp, et. al]

1998 Alex workshop in Italy ALENEX

Page 17: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 17

Warum diese Vorlesung?

Jeder Informatiker kennt einige Lehrbuchalgorithmen

wir können gleich mit Algorithm Engineering loslegen

Viele Anwendungen profitieren

Es ist frappierend, dass es hier noch Neuland gibt

Basis für Bachelor- Masterarbeiten

Page 18: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 18

Was diese Vorlesung nicht ist:

Keine wiedergekäute Algorithmen I/II o.Ä.

Grundvorlesungen “vereinfachen” die Wahrheit oft

z.T. fortgeschrittene Algorithmen

steilere Lernkurve

Implementierungsdetails

Betonung von Messergebnissen

Page 19: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 19

Was diese Vorlesung nicht ist:

Keine Theorievorlesung

keine (wenig?) Beweise

Reale Leistung vor Asymptotik

Page 20: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 20

Was diese Vorlesung nicht ist:

Keine Implementierungsvorlesung

Etwas Algorithmenanalyse,. . .

Wenig Software Engineering

Page 21: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 21

Exkurs: Maschinenmodelle

RAM/von Neumann Modell

ALUO(1) registers

1 word = O(log n) bits

large memoryfreely programmable

Analyse: zähle Maschinenbefehle —

load, store, Arithmetik, Branch,. . .

Einfach

Sehr erfolgreich

zunehmendunrealistisch

weil reale Hardware

immer komplexer wird

Page 22: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 22

Das Sekundärspeichermodell

1 wordcapacity Mfreely programmable

ALUregisters

B words

large memory

M: Schneller Speicher der GrößeM

B: Blockgröße

Analyse: zähle (nur?) Blockzugriffe (I/Os)

Page 23: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 23

Interpretationen des Sekundärspeichermodells

Externspeicher Caches

großer Speicher Platte(n) Hauptspeicher

M Hauptspeicher ein cache level

B Plattenblock (MBytes!) Cache Block (16–256) ByteGgf. auch zwei cache levels.

Variante: SSDs

Page 24: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 24

Parallele Platten

MM

B

...B

D1 2

unabhängige Platten

[Vitter Shriver 94]

1 2 D

Mehrkopfmodell

[Aggarwal Vitter 88]

Page 25: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 25

Mehr Modellaspekte

Instruktionsparallelismus (Superscalar, VLIW,

EPIC,SIMD,. . . )

Pipelining

Was kostet branch misprediction?

Multilevel Caches (gegenwärtig 2–3 levels) “cache

oblivious algorithms”

Parallele Prozessoren, Multithreading

Kommunikationsnetzwerke

. . .

Page 26: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 26

1 Arrays, Verkettete Listen und

abgeleitete Datenstrukturen

Bounded Arrays

Eingebaute Datenstruktur.

Größe muss von Anfang an bekannt sein

Page 27: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 27

Unbounded Array

z.B.std::vector

pushBack:Element anhängen

popBack: Letztes Element löschen

Idee: verdopple wenn der Platz ausgeht

halbiere wenn Platz verschwendet wird

Wenn man dasrichtig macht, brauchen

n pushBack/popBack Operationen ZeitO(n)

Algorithmese: pushBack/popBack haben konstanteamortisierte

Komplexität. Was kann man falsch machen?

Page 28: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 28

Doppelt verkettete Listen

Page 29: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 29

ClassItemof Element // one link in a doubly linked list

e : Element

next : Handle// -

-

-

prev : Handle

invariant next→prev=prev→next=this

Trick: Use a dummy header

-

⊥-

· · ·· · ·

-

Page 30: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 30

Proceduresplice(a,b,t : Handle)assertb is not beforea∧ t 6∈ 〈a, . . . ,b〉//Cut out〈a, . . . ,b〉 a′ a b b′

· · · · · ·-

-

-

-

a′ := a→prevb′ := b→nexta′→next :=b′ //b′→prev :=a′ // · · · · · ·

R

-

-

-

Y

// insert〈a, . . . ,b〉 aftertt ′ := t→next //

t a b t′

· · · · · ·R

-

-

Y

b→next := t’ //a→prev :=t // · · · · · ·

R

-

-

-

Y

t→next := a //t ′→prev := b // · · · · · ·

-

-

-

-

Page 31: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 31

Einfach verkettete Listen

...

Vergleich mit doppelt verketteten Listen

Weniger Speicherplatz

Platz ist oft auch Zeit

Eingeschränkter z.B. kein delete

Merkwürdige Benutzerschnittstelle, z.B. deleteAfter

Page 32: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 32

Speicherverwaltung für Listen

kann leicht 90 % der Zeit kosten!

Lieber Elemente zwischen (Free)lists herschieben als echtemallocs

Alloziere viele Items gleichzeitig

Am Ende alles freigeben?

Speichere „parasitär“. z.B. Graphen:Knotenarray. Jeder Knoten speichert ein ListItem Partition der Knoten kann als verkettete Listengespeichert werden MST, shortest Path

Challenge: garbage collection, viele Datentypen auch ein Software Engineering Problem hier nicht

Page 33: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 33

Beispiel: Stack

......12 1 2

SList B-Array U-Array

dynamisch + − +

Platzverschwendung pointer zu groß? zu groß?

freigeben?

Zeitverschwendung cache miss + umkopieren

worst case time (+) + −Wars das?

Hat jede Implementierung gravierende Schwächen?

Page 34: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 34

The Best From Both Worlds

...

B

hybrid

dynamisch +

Platzverschwendungn/B+B

Zeitverschwendung +

worst case time +

Page 35: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 35

Eine Variante

...

...

B

Directory

Elemente

− Reallozierung im top level nicht worst case konstante

Zeit

+ Indizierter Zugrif aufS[i] in konstanter Zeit

Page 36: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 36

Beyond Stacks...

stack

...FIFO queue

...

pushBack popBackpushFrontpopFront

deque

FIFO: BArray−→ cyclic array

h

t0n

b

Aufgabe: Ein Array, das “[i]” in konstanter Zeit und

einfügen/löschen von Elementen in ZeitO(√

n) unterstützt

Aufgabe: Ein externer Stack, dern push/pop Operationen mit

O(n/B) I/Os unterstützt

Page 37: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 37

Aufgabe: Tabelle für hybride Datenstrukturen vervollständigenOperation List SList UArray CArray explanation of ‘∗’[·] n n 1 1|·| 1∗ 1∗ 1 1 not with inter-listsplice

first 1 1 1 1last 1 1 1 1insert 1 1∗ n n insertAfter onlyremove 1 1∗ n n removeAfter onlypushBack 1 1 1∗ 1∗ amortizedpushFront 1 1 n 1∗ amortizedpopBack 1 n 1∗ 1∗ amortizedpopFront 1 1 n 1∗ amortizedconcat 1 1 n nsplice 1 1 n nfindNext,. . . n n n∗ n∗ cache efficient

Page 38: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 38

Was fehlt?

Fakten Fakten Fakten

Messungen für

Verschiedene Implementierungsvarianten

Verschiedene Architekturen

Verschiedene Eingabegrößen

Auswirkungen auf reale Anwendungen

Kurven dazu

Interpretation, ggf. Theoriebildung

Aufgabe: Array durchlaufen versus zufällig allozierte verketteteListe

Page 39: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 39

Sorting — Overview

You think you understandquicksort?

Avoiding branch mispredictions: Super Scalar Sample Sort

(Parallel disk)externalsorting

Multicore sorting

Page 40: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 40

Quicksort

Function quickSort(s : Sequenceof Element) : Sequenceof Element

if |s| ≤ 1 then return s // base case

pick p∈ s uniformly at random // pivot key

a := 〈e∈ s : e< p〉b := 〈e∈ s : e= p〉c := 〈e∈ s : e> p〉return concatenate(quickSort(a),b,quickSort(c)

Page 41: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 41

Engineering Quicksort

array

2-way-Comparisons

sentinelsfor inner loop

inplace swaps

Recursion onsmallersubproblems

→ O(logn) additional space

break recursionfor small (20–100) inputs, insertion sort

(not one big insertion sort)

Page 42: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 42

ProcedureqSort(a : Array of Element;ℓ, r : N) // Sorta[ℓ..r]while r− ℓ≥ n0 do // Use divide-and-conquer

j := pickPivotPos(a, l , r)swap(a[ℓ],a[ j]) // Helps to establish the invariantp := a[ℓ]i := ℓ; j := rrepeat // a: ℓ i→ j← r

while a[i]< p do i++ // Scan over elements (A)while a[ j]> p do j−− // on the correct side (B)if i ≤ j then swap(a[i],a[ j]); i++ ; j−−

until i > j // Done partitioningif i < l+r

2 then qSort(a, ℓ, j); ℓ := jelse qSort(a, i, r) ; r := i

insertionSort(a[l ..r]) // faster for smallr− ℓ

Page 43: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 43

Picking Pivots Painstakingly — Theory

probabilistically: Expected1.4nlogn element comparisons

median of three:Expected1.2nlogn element comparisons

perfect:−→ nlogn element comparisons

(approximate using large samples)

Practice

3GHz Pentium 4 Prescott, g++

Page 44: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 44

Picking Pivots Painstakingly — Instructions

6

7

8

9

10

11

12

10 12 14 16 18 20 22 24 26

inst

ruct

ions

con

stan

t

n

out sort Instructions / n lg n for algs: random pivot - median of 3 - exact median - skewed pivot n/10 - n/11

random pivotmedian of 3

exact medianskewed pivot n/10skewed pivot n/11

Page 45: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 45

Picking Pivots Painstakingly — Time

6.8

7

7.2

7.4

7.6

7.8

8

8.2

8.4

10 12 14 16 18 20 22 24 26

nano

secs

con

stan

t

n

out sort Seconds / n lg n for algs: random pivot - median of 3 - exact median - skewed pivot n/10 - n/11

random pivotmedian of 3

exact medianskewed pivot n/10skewed pivot n/11

Page 46: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 46

Picking Pivots Painstakingly — Branch Misses

0.3

0.32

0.34

0.36

0.38

0.4

0.42

0.44

0.46

0.48

0.5

10 12 14 16 18 20 22 24 26

bran

ch m

isse

s co

nsta

nt

n

out sort Branch misses / n lg n for algs: random pivot - median of 3 - exact median - skewed pivot n/10 - n/11

random pivotmedian of 3

exact medianskewed pivot n/10skewed pivot n/11

Page 47: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 47

Can We Do Better? Previous WorkInteger Keys

+ Can be 2−3 times faster than quicksort

− Naive ones are cache inefficient andslowerthan quicksort

− Simple ones aredistributiondependent.

Cache efficient sortingk-ary merge sort

[Nyberg et al. 94, Arge et al. 04, Ranade et al. 00, Brodal et al.

04]

+ Faktor logk less cache faults

− Only≈ 20 % speedup, and only for laaarge inputs

Page 48: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 48

Sample Sort

Function sampleSort(e= 〈e1, . . . ,en〉,k)if n/k is “small” then return smallSort(e)

let S= 〈S1, . . . ,Sak−1〉 denote a randomsampleof e

sortS

〈s0,s1,s2, . . . ,sk−1,sk〉:=〈−∞,Sa,S2a, . . . ,S(k−1)a,∞〉for i := 1 to n do

find j ∈ 1, . . . ,ksuch thatsj−1 < ei ≤ sj

placeei in bucketb j

return concatenate(sampleSort(b1), . . . ,sampleSort(bk))

1s 5s 6s3s 7s 2s 4s

binary search?

buckets

Page 49: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 49

Why Sample Sort? traditionally:parallelizableon coarse grained machines

+ Cache efficient≈merge sort

− Binary searchnot much faster than merging

− complicatedmemory management

Super Scalar SampleSort Binary search−→ implicit search tree

Eliminate all conditionalbranches

Exploit instruction parallelism

Cache efficiencycomes to bear

“steal” memory management fromradix sort

Page 50: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 50

Classifying Elements

t:= 〈sk/2,sk/4,s3k/4,sk/8,s3k/8,s5k/8,s7k/8, . . .〉for i := 1 to n do

j:= 1

repeat logk times //unroll

j:= 2 j +(ai > t j)

j:= j−k+1∣∣b j∣∣++

oracle[i]:= j //oracle

6s

splitter array index

3s

<

< <

<<<< 1s 5s 7s6 7

buckets

4

3s 2 2

1 4s

5

decisions

decisions

decisions

> >

>>>>

*2

*2 *2

>+1

+1 +1

Now thecompilershould:

usepredicated instructions

interleave for-loop iterations (unrolling∨ software pipelining)

Page 51: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 51

template <class T>

void findOraclesAndCount(const T* const a,

const int n, const int k, const T* const s,

Oracle* const oracle, int* const bucket)

for (int i = 0; i < n; i++)

int j = 1;

while (j < k)

j = j*2 + (a[i] > s[j]);

int b = j-k;

bucket[b]++;

oracle[i] = b;

Page 52: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 52

Predication

Hardware mechanism that allows instructions to be

conditionally executed

Booleanpredicate registers(1–64) hold condition codes

predicate registersp are additional inputs ofpredicated

instructionsI

At runtime,I is executed if and only ifp is true

+ Avoids branch misprediction penalty

+ More flexible instruction scheduling

− Switched off instructions still take time

− Longer opcodes

− Complicated hardware design

Page 53: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 53

Example (IA-64)

Translation of:if (r1 > r2) r3 := r3 + 4

With a conditional branch: Via predication:

cmp.gt p6,p7=r1,r2 cmp.gt p6,p7=r1,r2

(p7) br.cond .label (p6) add r3=4,r3

add r3=4,r3

.label:

(...)

Other Current Architectures:

Conditional movesonly

Page 54: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 54

Unrolling ( k= 16)

template <class T>

void findOraclesAndCountUnrolled([...])

for (int i = 0; i < n; i++)

int j = 1;

j = j*2 + (a[i] > s[j]);

j = j*2 + (a[i] > s[j]);

j = j*2 + (a[i] > s[j]);

j = j*2 + (a[i] > s[j]);

int b = j-k;

bucket[b]++;

oracle[i] = b;

Page 55: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 55

More Unrolling k= 16, n eventemplate <class T>

void findOraclesAndCountUnrolled2([...])

for (int i = n & 1; i < n; i+=2) \

int j0 = 1; int j1 = 1;

T ai0 = a[i]; T ai1 = a[i+1];

j0=j0*2+(ai0>s[j0]); j1=j1*2+(ai1>s[j1]);

j0=j0*2+(ai0>s[j0]); j1=j1*2+(ai1>s[j1]);

j0=j0*2+(ai0>s[j0]); j1=j1*2+(ai1>s[j1]);

j0=j0*2+(ai0>s[j0]); j1=j1*2+(ai1>s[j1]);

int b0 = j0-k; int b1 = j1-k;

bucket[b0]++; bucket[b1]++;

oracle[i] = b0; oracle[i+1] = b1;

Page 56: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 56

Distributing Elements

for i := 1 to n do a′B[oracle[i]]++ := ai

o

B

a’

a

i

moverefer

refer

refer

Why Oracles?

simplifiesmemory management

no overflow testsor re-copying

simplifies softwarepipelining

separatescomputationandmemory accessaspects

small(n bytes)

sequential, predictablememory access

can behiddenusing prefetching / write buffering

Page 57: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 57

Distributing Elementstemplate <class T> void distribute(

const T* const a0, T* const a1,

const int n, const int k,

const Oracle* const oracle, int* const bucket)

for (int i = 0, sum = 0; i <= k; i++)

int t = bucket[i]; bucket[i] = sum; sum += t;

for (int i = 0; i < n; i++)

a1[bucket[oracle[i]]++] = a0[i];

Page 58: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 58

Experiments: 1.4 GHz Itanium 2

restrict keyword from ANSI/ISO C99

to indicate nonaliasing

Intel’s C++ compiler v8.0 usespredicated instructions

automatically

Profiling gives9%speedup

k= 256 splitters

Usestl:sort from g++ (n≤ 1000)!

insertion sort forn≤ 100

Random 32 bit integers in[0,109]

Page 59: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 59

Comparison with Quicksort

0

1

2

3

4

5

6

7

8

4096 16384 65536 218 220 222 224

time

/ n lo

g n

[ns]

n

GCC STL qsortsss-sort

Page 60: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 60

Breakdown of Execution Time

0

1

2

3

4

5

6

4096 16384 65536 218 220 222 224

time

/ n lo

g n

[ns]

n

TotalfindBuckets + distribute + smallSort

findBuckets + distributefindBuckets

Page 61: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 61

A More Detailed View

dynamic dynamic

instr. cycles IPC smalln IPCn= 225

findBuckets,

1× outer loop 63 11 5.4 4.5

distribute,

one element 14 4 3.5 0.8

Page 62: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 62

Comparison with Quicksort Pentium 4

0

2

4

6

8

10

12

14

4096 16384 65536 218 220 222 224

time

/ n lo

g n

[ns]

n

Intel STLGCC STL

sss-sort

Problems: fewregisters, onecondition codeonly, compilerneeds “help”

Page 63: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 63

Breakdown of Execution Time Pentium 4

0

1

2

3

4

5

6

7

8

4096 16384 65536 218 220 222 224

time

/ n lo

g n

[ns]

n

TotalfindBuckets + distribute + smallSort

findBuckets + distributefindBuckets

Page 64: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 64

Analysis

mem. acc.branchesdata dep. I/Os registersinstructions

k-way distribution:

sss-sort nlogk O(1) O(n) 3.5n/B3×unroll O(logk)

quicksort logk lvls. 2nlogk nlogk O(nlogk)2nB logk 4 O(1)

k-way merging:

memory nlogk nlogk O(nlogk) 2n/B 7 O(logk)

register 2n nlogk O(nlogk) 2n/B k O(k)

funnelk′logk′ k 2nlogk′ k nlogk O(nlogk) 2n/B 2k′+2 O(k′)

Page 65: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 65

Conclusions

sss-sort up totwice as fast as quicksort on Itanium

comparisons6= conditional branches

algorithm analysis is not just instructions and caches

New result:GPU-Sample-Sortis best comparison based sorting

algorithm on graphic hardware

[Leischner/Osipov/Sanders 2009]

Page 66: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 66

Criticism I

Why only random keys?

Answer I

Sample sort hardly depends on input distribution

Page 67: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 67

Criticism I´

What if there are many equal keys?

They all end up in the same bucket

Answer I´

Its not a bug its a feature:

si = si+1 = · · ·= sj indicates a frequent key!

Setsi := maxx∈ Key: x< si,(optional: dropsi+2, . . .sj )

Now bucketi+1 need not be sorted!

Exercise: Explain how to support equality buckets using a single

additional comparison per element.

Page 68: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 68

Criticism II

Quicksort is inplace

Answer II

Usehybrid List-Array Representationof sequences

needsO(√

kn)

extra space fork-way sample sort

Page 69: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 69

Criticism II´

But I WANT Arrays for input and output

Answer II´

Inplace Konversion

input: easy

output: tricky. Exercise: develop rough idea. Hint: permute

blocks. Each permutation consists of a product of cyclic

permutations. An inplace cyclic permuation is easy.

Page 70: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 70

Future Work

better small case sorter

handle large buckets

high level fine-tuning, e.g., clever choice ofk

modern architectures

almostin-placeimplementation

multilevel cache-aware or cache-oblivious generalization

(oracles help)

Page 71: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 71

Externes Sortieren

n: Eingabegröße

M: Größe des schnellen Speichers

B: Blockgröße

1 wordcapacity Mfreely programmable

ALUregisters

B words

large memory

Page 72: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 72

ProcedureexternalMerge(a,b,c :File of Element)

x := a.readElement // AssumeemptyFile.readElement= ∞y := b.readElement

for j := 1 to |a|+ |b| doif x≤ y then c.writeElement(x); x := a.readElement

else c.writeElement(y); y := b.readElement

xac

b y

Page 73: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 73

Externes (binäres) Mischen – I/O-Analyse

Dateia lesen:⌈|a|/B⌉ ≤ |a|/B+1.

Dateib lesen:⌈|b|/B⌉ ≤ |b|/B+1.

Dateic schreiben:⌈(|a|+ |b|)/B⌉ ≤ (|a|+ |b|)/B+1.

Insgesamt:

≤ 3+2|a|+ |b|

B≈ 2|a|+ |b|

BBedingung: Wir brauchen 3 Pufferblöcke, d.h.,M > 3B.

xac

b y

Page 74: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 74

Run Formation

Sortiere Eingabeportionen der GrößeM

M

i=0 i=M i=2M

i=2Mi=0 i=M

run internalsort

f

t

I/Os:≈ 2nB

Page 75: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 75

Sortieren durch Externes Binäres Mischen

formRuns formRuns formRuns formRuns

merge merge

make_things_

__aeghikmnst

as_simple_as

__aaeilmpsss

_possible_bu

__aaeilmpsss

t_no_simpler

__eilmnoprst

____aaaeeghiiklmmnpsssst ____bbeeiillmnoopprssstu

merge

________aaabbeeeeghiiiiklllmmmnnooppprsssssssttu

ProcedureexternalBinaryMergeSort // I/Os:≈run formation // 2n/B

while more than one run leftdo //⌈log n

M

⌉×

mergepairs of runs // 2n/B

output remaining run // ∑ : 2nB

(

1+⌈

lognM

⌉)

Page 76: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 76

Zahlenbeispiel: PC 2007

n= 238 Byte

M = 231 Byte

B= 220 Byte

I/O braucht 2−6 s

Zeit: 2nB

(

1+⌈

lognM

⌉)

= 2·218 · (1+7) ·2−6 s= 216 s≈ 18 h

Idee: 8 Durchläufe 2 Durchläufe

Page 77: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 77

Zahlenbeispiel: PC 2007→ 2012

n= 238→39 Byte

M = 231→33 Byte

B= 220→21 Byte

I/O braucht 2−6 s

Zeit: 2nB

(

1+⌈

lognM

⌉)

= 2·218 · (1+6) ·2−6 s= 216 s≈ 16 h

Page 78: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 78

Mehrwegemischen

ProceduremultiwayMerge(a1, . . . ,ak,c :File of Element)

for i := 1 to k do xi:= ai .readElement

for j := 1 to ∑ki=1 |ai | do

find i ∈ 1..k that minimizesxi// no I/Os!,O(logk) time

c.writeElement(xi)

xi := ai .readElement

...

...

...

...

...

internal buffers

...

Page 79: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 79

Mehrwegemischen – Analyse

...

...

...

...

...

internal buffers

...

I/Os: Dateiai lesen:≈ |ai |/B.

Dateic schreiben:≈ ∑ki=1 |ai |/B

Insgesamt:

≤≈ 2∑k

i=1 |ai |B

Bedingung: Wir brauchenk Pufferblöcke, d.h.,k< M/B.

Interne Arbeit: (benutze Prioritätsliste !)

O

(

logkk

∑i=1|ai |)

Page 80: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 80

Sortieren durch Mehrwege-Mischen Sortiere⌈n/M⌉ runsmit je M Elementen 2n/B I/Os

MischejeweilsM/B runs 2n/B I/Os

bis nur noch ein run übrig ist ×⌈

logM/BnM

Mischphasen

————————————————-

Insgesamt sort(n) :=2nB

(

1+⌈

logM/BnM

⌉)

I/Os

formRuns formRuns formRuns formRuns

make_things_

__aeghikmnst

as_simple_as

__aaeilmpsss

_possible_bu

__aaeilmpsss

t_no_simpler

__eilmnoprst

________aaabbeeeeghiiiiklllmmmnnooppprsssssssttu

multi merge

Page 81: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 81

Sortieren durch Mehrwege-Mischen

Interne Arbeit:

O

run formation︷ ︸︸ ︷

nlogM + nlogMB

︸ ︷︷ ︸

PQ access per phase

phases︷ ︸︸ ︷⌈

logM/BnM

= O(nlogn)

Mehr als eine Mischphase?:Nicht für Hierarchie Hauptspeicher, Festplatte.

Grund

>1000︷︸︸︷

MB

>

<1000︷ ︸︸ ︷

RAM Euro/bitPlatte Euro/bit

11-2012:8GB4MB = 2048> 27/8GB

100/2T B = 67.5

Page 82: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 82

Mehr zu externem Sortieren

Untere Schranke≈ 2(?)nB

(

1+⌈

logM/BnM

⌉)

I/Os

[Aggarwal Vitter 1988]

Obere Schranke≈ 2nDB

(

1+⌈

logM/BnM

⌉)

I/Os (erwartet)

für D parallele Platten

[Hutchinson Sanders Vitter 2005, Dementiev Sanders2003]

Offene Frage: deterministisch?

Page 83: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 83

Sorting with Parallel DisksI/O Step:= Access to a single physical block per disk

M

B

... D1 2

independent disks

[Vitter Shriver 94]

Theory: Balance Sort[Nodine Vitter 93].

Deterministic, complex

asymptotically optimal O(·)

Multiway merging

“Usually” factor10? less I/Os.

Not asymptotically optimal. 42%

Basic Approach: Improve Multiway Merging

Page 84: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 84

Striping

21 D

emulated disk

logical block

physical blocks

That takes care ofrun formation

and writing theoutput...

...

...

...

...

internal buffers

...

???

But what aboutmerging?

Page 85: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 85

Naive Striping

Run single disk merge-sort on striped logical disk:

2nDB

(

1+⌈

logM/DBnM

⌉)

I/Os

Theory:Θ(logM/B) worse whenD≈M/B

Practice: 2→ 3 passes in some cases

Page 86: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 86

Prediction

[Folklore, Knuth]

Smallest Element

of each block

triggersfetch.

Prefetch buffers

allow parallel access

of next blocks

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

internal buffers

...

controls

prediction sequence

pref

etch

buf

fers

Page 87: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 87

Warmup: Multihead Model

M

B1 2 D

Multihead Model

[Aggarwal Vitter 88]

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

internal buffers

...

controls

prediction sequence

pref

etch

buf

fers

D prefetch buffers yield an optimal algorithm

sort(n) :=2nDB

(

1+⌈

logM/BnM

⌉)

I/Os

Page 88: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 88

Bigger Prefetch Buffer

...

...

...

...

...

...prediction sequence

...

internal buffers

...

1

2

k

prefetch buffers

Dk gooddeterministicperformance

O(D) would yield an optimal algorithm.

Possible?

Page 89: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 89

Randomized Cycling

[Vitter Hutchinson 01]

Block i of stripe j goes to diskπ j(i) for a rand. permutationπ j

...prediction sequence

internal buffers

...

...

...

...

...

...

prefetch buffers

Good fornaive prefetchingandΩ(D logD) buffers

Page 90: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 90

Buffered Writing[S-Egner-Korst SODA00, Hutchinson-S-Vitter ESA 01]

Theorem:Buffered Writing

is optimal

. . .

Buthow good is optimal?W/D

...

...

...

1 2 3

queues

D

Sequence Σof blocks

randomizedmapping

write whenever

one of W

buffers is free

otherwise, output

one block from

each nonempty queue

Theorem: Rand. cycling achievesefficiency 1−O(D/W).

Analysis:negative associationof random variables,application ofqueueing theoryto a “throttled” Alg.

Page 91: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 91

Optimal Offline Prefetching

Theorem:For buffer sizeW:

∃ (offline) prefetchingschedule forΣ with T input steps

⇔∃ (online)write schedule forΣR with T output steps

r q p o l i f

n g e

m k j

h

abcd

abcdefghijklmnopqr

ΣR

Σ

8 7 6 5 4 3 2 1

1output step 2 3 4 5 6 7 8

Puffer

input step

order of reading

order of writing

Page 92: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 92

Optimal Offline Prefetching

Theorem:For buffer sizeW:

∃ (offline) prefetchingschedule forΣ with T input steps

⇔∃ (online)write schedule forΣR with T output steps

abcdefghijklmnopqr

ΣR

Σrqponm

8 7 6 5 4 3 2 1

1output step 2 3 4 5 6 7 8

Puffer

input step

r

m

n

order of reading

order of writing

Page 93: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 93

Optimal Offline Prefetching

Theorem:For buffer sizeW:

∃ (offline) prefetchingschedule forΣ with T input steps

⇔∃ (online)write schedule forΣR with T output steps

abcdefghijklmnopqr

ΣR

Σ qpolk

j

k

8 7 6 5 4 3 2 1

1output step 2 3 4 5 6 7 8

Puffer

input step

qr

m

n

order of reading

order of writing

Page 94: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 94

Optimal Offline Prefetching

Theorem:For buffer sizeW:

∃ (offline) prefetchingschedule forΣ with T input steps

⇔∃ (online)write schedule forΣR with T output steps

abcdefghijklmnopqr

ΣR

Σpol

j

ki

h

j

h

p

8 7 6 5 4 3 2 1

1output step 2 3 4 5 6 7 8

Puffer

input step

qr

m

n

order of reading

order of writing

Page 95: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 95

Optimal Offline Prefetching

Theorem:For buffer sizeW:

∃ (offline) prefetchingschedule forΣ with T input steps

⇔∃ (online)write schedule forΣR with T output steps

abcdefghijklmnopqr

ΣR

Σ

ol

ki j

h

p

efg

g

o

8 7 6 5 4 3 2 1

1output step 2 3 4 5 6 7 8

Puffer

input step

qr

m

n

order of reading

order of writing

Page 96: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 96

Optimal Offline Prefetching

Theorem:For buffer sizeW:

∃ (offline) prefetchingschedule forΣ with T input steps

⇔∃ (online)write schedule forΣR with T output steps

abcdefghijklmnopqr

ΣR

Σ

l

ki j

h

p

ef

g

od

c e

l

d

8 7 6 5 4 3 2 1

1output step 2 3 4 5 6 7 8

Puffer

input step

qr

m

n

order of reading

order of writing

Page 97: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 97

Optimal Offline Prefetching

Theorem:For buffer sizeW:

∃ (offline) prefetchingschedule forΣ with T input steps

⇔∃ (online)write schedule forΣR with T output steps

abcdefghijklmnopqr

ΣR

Σ

ki j

h

p

f

g

o

c e

l

d

b

a

c

i

8 7 6 5 4 3 2 1

1output step 2 3 4 5 6 7 8

Puffer

input step

qr

m

n

order of reading

order of writing

Page 98: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 98

Optimal Offline Prefetching

Theorem:For buffer sizeW:

∃ (offline) prefetchingschedule forΣ with T input steps

⇔∃ (online)write schedule forΣR with T output steps

abcdefghijklmnopqr

ΣR

Σ

k j

h

p

f

g

o

e

l

d

b

a

c

i

b

f

8 7 6 5 4 3 2 1

1output step 2 3 4 5 6 7 8

Puffer

input step

qr

m

n

order of reading

order of writing

Page 99: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 99

Optimal Offline Prefetching

Theorem:For buffer sizeW:

∃ (offline) prefetchingschedule forΣ with T input steps

⇔∃ (online)write schedule forΣR with T output steps

abcdefghijklmnopqr

ΣR

Σ

k j

h

p

g

o

e

l

d

a

c

i

b

f

a

8 7 6 5 4 3 2 1

1output step 2 3 4 5 6 7 8

Puffer

input step

qr

m

n

order of reading

order of writing

Page 100: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 100

SynthesisMultiway merging

+ prediction [60s Folklore]

+optimal (randomized) writing [S-Egner-Korst SODA 2000]

+randomized cycling [Vitter Hutchinson 2001]

+optimal prefetching [Hutchinson-S-Vitter ESA 2002]

(1+o(1))·sort(n) I/Os

“answers” question in[Knuth 98];

difficulty 48 on a 1..50 scale.

Page 101: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 101

We are not done yet!

Internal work

Overlapping I/O and computation

Reasonable hardware

Interfacing with the Operating System

Parameter Tuning

Software engineering

Pipelining

Page 102: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 102

Key Sorting

TheI/O bandwidthof our machine is about1/3 of its main

memorybandwidth

If key size≪ element size

sort key pointer pairs to save memory bandwidth during run

formation

3 1 4 1 5 1 1 3 4 5

3 1 4 1 5

1 1 3 4 5

a

a

b

b

c

c

d

d

e

e

permute

extract

sort

Page 103: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 103

Tournament Trees for Multiway Merging

Assumek= 2K runs

K level complete binary tree

Leaves:smallest current element of each run

Internal nodes:loserof a competition for being smallest

Above root: global winner

6 2 7 9 1 4 74

6 7 9 7

4 4

2

1

36 2 7 9 4 74

6 7 9 7

4 4

3

2deleteMin+insertNext

3

Page 104: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 104

Why Tournament Trees Exactlylogk elementcomparisons

Implicit layout in an array simple index arithmetics(shifts)

Predictableload instructions and index computations(Unrollable) inner loop:for (int i=(winnerIndex+kReg)>>1; i>0; i>>=1)

currentPos = entry + i;

currentKey = currentPos->key;

if (currentKey < winnerKey)

currentIndex = currentPos->index;

currentPos->key = winnerKey;

currentPos->index = winnerIndex;

winnerKey = currentKey;

winnerIndex = currentIndex;

Page 105: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 105

Overlapping I/O and Computation

One thread for each disk (or asynchronous I/O)

Possibly additional threads

Blocks filled with elements are passedby references

between different buffers

Page 106: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 106

Overlapping During Run Formation

First postreadrequests for runs1 and2

Thread A: Loop wait-readi; sort i; post-write i;Thread B: Loop wait-write i; post-readi+2;

computebound case

I/Obound case

sort

read

time

control flow in thread A

sort

read

write

1

1

1

2

2

3

2

3

3

4

4

4

...

...

...

k−1

k−1

k−1

k

k

k

control flow in thread B

1 2

2

2

1

1

3

3

3

4

4

4

k−1

k−1

k−1

k

k

k

...

...

...write

Page 107: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 107

Overlapping During Merging

1B−12 3B−14 5B−16 · · ·Bad example: · · ·

1B−12 3B−14 5B−16 · · ·

Page 108: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 108

Overlapping During Merging

.

.

..

read buffers

.

.

..

k+3Doverlap buffers

merging

−ping

1

k

merge

2D w

rite buffers

D blocks

merge buffers

overlap−

prefetch buffers

disk scheduling

O(D

)I/O Threads:Writing haspriority over reading

Page 109: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 109

I/O bound case: The I/O thread never blocks

y= # of elements

merged during

one I/O step.

I/O bound

y> DB2

y≤ DB

blocking

r

DB−y DB 2DB−y 2DBw

kB+2DB

kB+DB+y

kB+DB

kB+3DB

kB+2DB+y

fetch

output#e

lem

ents

in o

verla

p+m

erge

buf

fers

#elements in output buffers

Page 110: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 110

Compute bound case:

The merging thread

never blocks

r

kB+2DB

kB+DB

kB+3DB

block I/O

kB+y

output

fetch

block merging

DB 2DBw

2DB−y

kB

kB+2y

#ele

men

ts in

ove

rlap+

mer

ge b

uffe

rs

#elements in output buffers

Page 111: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 111

Hardware (mid 2002)

Linux

(2× 2GHz Xeon× 2 Threads)

Several66 MHz PCI-buses

(SuperMicro P4DPE3)

Several fastIDE controllers

(4× Promise Ultra100 TX2)

Many fast IDEdisks

(8× IBM IC35L080AVVA07)

PCI−Busses

Controller

Channels

2x Xeon

2x64x66 Mb/s

4 Threads

E7500 128

MB/s

400x64 Mb/s

4x2x100

Intel

8x80GBMB/s

Chipset

8x45

RAMDDR1 GB

cost effective I/O-bandwidth (real 360 MB/s for≈ 3000 )e

Page 112: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 112

Hardware (end 2009) geschätzt

Linux

(2× 2.4 GHz Xeon E5530× 4 Cores× 2 Threads)

PCIe x8 SATA controller

16–241.5 TByte SATAdisks

(8× IBM IC35L080AVVA07)

24 GByte RAM

cost effective I/O-bandwidth (real 2 GB/s for≈ 6000 )e

Page 113: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 113

Software InterfaceGoals:efficient+ simple+ compatible

TX

XL

S

files, I/O requests, disk queues,

block prefetcher, buffered block writer

completion handlers

Asynchronous I/O primitives layer

Block management layertyped block, block manager, buffered streams,

Containers:

STL−user layervector, stack, set

priority_queue, mapsort, for_each, merge

Pipelined sorting,zero−I/O scanning

Streaming layer

Algorithms:

Operating System

Applications

Page 114: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 114

Default Measurement Parameters

t := number of available buffer blocks

Input Size: 16 GByte

Element Size:128 Byte

Keys: Random 32 bit integers

Run Size:256 MByte

Block sizeB: 2 MByte

Compiler: g++ 3.2 -O6

Write Buffers: max(t/4,2D)

Prefetch Buffers:2D+310

(t−w−2D)

PCI−Busses

Controller

Channels

2x Xeon

2x64x66 Mb/s

4 Threads

E7500 128

MB/s

400x64 Mb/s

4x2x100

Intel

8x80GBMB/s

Chipset

8x45

RAMDDR1 GB

Page 115: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 115

Element sizes (16 GByte, 8 disks)

0

50

100

150

200

250

300

350

400

16 32 64 128 256 512 1024

time

[s]

element size [byte]

run formationmergingI/O wait in merge phaseI/O wait in run formation phase

parallel disks bandwidth“for free” internal work, overlapping are relevant

Page 116: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 116

Earlier Academic ImplementationsSingle Disk,at most 2 GByte, old measurements useartificial M

0

100

200

300

400

500

600

700

800

16 32 64 128 256 512 1024

sort

tim

e [s

]

element size [byte]

LEDA-SMTPIE<stxxl> comparison based

Page 117: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 117

Earlier Acad. Implementations: Multiple Disks

2630

4050

75

100

200

300

400500616

16 32 64 128 256 512 1024

sort

tim

e [s

]

element size [byte]

LEDA-SM Soft-RAIDTPIE Soft-RAID<stxxl> Soft-RAID<stxxl>

Page 118: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 118

What are good block sizes (8 disks)?

12

14

16

18

20

22

24

26

128 256 512 1024 2048 4096 8192

sort

tim

e [n

s/by

te]

block size [KByte]

128 GBytes 1x merge128 GBytes 2x merge

16 GBytes

B is not a technology constant

Page 119: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 119

Optimal Versus Naive Prefetching

Total merge time

-3

-2

-1

0

1

0 20 40 60 80 100

diffe

renc

e to

nai

ve p

refe

tchi

ng [%

]

fraction of prefetch buffers [%]

176 read buffers112 read buffers

48 read buffers

Page 120: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 120

Impact of Prefetch and Overlap Buffers

115

120

125

130

135

140

145

150

16 32 48 64 80 96 112 128 144 160 172

mer

ging

tim

e [s

]

number of read buffers

no prefetch bufferheuristic schedule

Page 121: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 121

Tradeoff: Write Buffer Size Versus Read BufferSize

110

115

120

125

130

135

140

145

0 50 100 150 200

mer

ge ti

me

[s]

number of read buffers

Page 122: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 122

Scalability

0

5

10

15

20

16 32 64 128

sort

tim

e [n

s/by

te]

input size [GByte]

128-byte elements512-byte elements

4N/DB bulk I/Os

Page 123: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 123

Discussion

Theory and practice harmonize

No expensive server hardware necessary (SCSI,. . . )

No need to work with artificialM

No 2/4 GByte limits

Faster than academic implementations

(Must be) as fast as commercial implementations but with

performance guarantees

Blocks are much larger than often assumed. Not a

technology constant

Parallel disks

bandwidth“for free” don’t neglect internal costs

Page 124: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 124

More Parallel Disk Sorting?

Pipelining: Input does not come from disk but from a logicalinput stream. Output goes to a logical output stream only half the I/Os for sorting oftenno I/Osfor scanning todo: better overlapping

Parallelism:This is the only way to go forreally manydisks

Tuning and Special Cases:ssssort, permutations, balance workbetween merging and run formation?. . .

Longer Runs:not done with guaranteed overlapping, fastinternal sorting !

Distribution Sorting:Better for seeks etc.?

Inplace Sorting:Could also be faster

Determinism:A practical and theoretically efficient algorithm?

Page 125: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 125

ProcedureformLongRunsq,q′ : PriorityQueuefor i := 1 to M do q.insert(readElement)

invariant |q|+ |q′|= Mloop

while q 6= /0writeElement(e:= q.deleteMin)

if input exhaustedthen break outer loopif e′:= readElement < e then q′.insert(e′)elseq.insert(e′)

q:= q′; q′:= /0outputq in sorted order; outputq′ in sorted order

Knuth: average run length2Mtodo:cache-effizienteImplementierung

Page 126: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 126

2 Priority Queues (insert, deleteMin)Binary Heapsbest comparison based “flat memory” algorithm

+ On averageconstanttime for insertion

+ On averagelogn+O(1) key comparisons per delete-Min

using the “bottom-up” heuristics[Wegener 93].

− ≈ 2log(n/M) block accesses

per delete-MinCacheM

Page 127: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 127

Bottom Up Heuristics

97

2

97

2

movecompare swap

97

1

48

6

8

8

6

2

97

4

delete MinO(1)

sift down holelog(n)

averageO(1)

sift up

8

2

9

7

8

2

6

4 4

6

4

6

8

2

9

7

6

4

Factor two faster

than naive implementation

Page 128: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 128

Der Wettbewerber fit gemacht:

int i=1, m=2, t = a[1];

m += (m != n && a[m] > a[m + 1]);

if (t > a[m])

do a[i] = a[m];

i = m;

m = 2*i;

if (m > n) break;

m += (m != n && a[m] > a[m + 1]);

while (t > a[m]);

a[i] = t;

Keine signifikanten Leistungsunterschiede auf meiner Maschine

(heapsort von random integers)

Page 129: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 129

Vergleich

Speicherzugriffe:O(1) weniger als top down.

O(logn) worst case. beieffizienter Implementierung

Elementvergleiche:≈ logn weniger für bottom up (average

case) aber die sindleicht vorhersagbar

Aufgabe: siftDown mitworst caselogn+O(log logn)

Elementvergleichen

Page 130: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 130

Heapkonstruktion

ProcedurebuildHeapBackwards

for i := ⌊n/2⌋ downto 1 do siftDown(i)

ProcedurebuildHeapRecursive(i : N)

if 4i ≤ n thenbuildHeapRecursive(2i)

buildHeapRecursive(2i+1)

siftDown(i)

Rekursive Funktion für große Eingaben 2× schneller!

(Rekursion abrollen für 2 unterste Ebenen)

Aufgabe: Erklärung

Page 131: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 131

Mittelgroße PQs –km≪M2/B Einfügungen

minsertionbuffer PQ

sortedsequences

PQ

...1 2 k

k−merge

mB

Insert: Anfangs ininsertion buffer.

Überlauf−→sort; flush; kleinster Schlüssel in merge-PQ

Delete-Min: deleteMin aus der PQ mit kleinerem min

Page 132: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 132

Analyse – I/Os

deleteMin: jedes Element wird≤ 1× gelesen, zusammen mitB

anderen –amortisiert1/B penalty fürinsert.

minsertionbuffer PQ

sortedsequences

PQ

...1 2 k

k−merge

mB

Page 133: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 133

Analyse – Vergleiche (Maß für interne Arbeit)

deleteMin: 1+O(max(logk, logm)) = O(logm)

genauere Argumentation: amortisiert1+ logk bei

geeigneter PQ

insert: ≈mlogm alle mOps.Amortisiert logm

Insgesamt nur logkmamortisiert !

minsertionbuffer PQ

sortedsequences

PQ

...1 2 k

k−merge

mB

Page 134: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 134

Large Queues[Sanders 00]

group− group− group−buffer bufferbuffer 1 2 3

cached

externalswapped

group 1

group 2

group 3

k k

k

m

m m

2T 3T

...1 2 k ...1 2 k ...1 2 k

k−merge k−merge k−merge

R−merge

m

m’

m

1T

deletion buffer

m

insertion bufferm

insert: group full−→merge group; shift into next group.merge invalid group buffers and move them into group 1.

Delete-Min: Refill. m′≪m. nothing else

Page 135: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 135

Example

Merge insertion buffer, deletion buffer, and leftmost group

buffer

b

g

v

jilq

k f

unmc

wtpoh

a

db

ex

3

sr

b

v

jilq

k f

unm

wtpoh

xab

ed

cg

sr

3insert( )

Page 136: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 136

Example

Merge group 1

b

g

v

jilq

k f

unmc

wtpoh

a

db

ex

3

sr

ex

b

v

ilq

unmc

wtpoh

a

db

fgjk

3

sr

Page 137: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 137

Example

Merge group 2

ex

b

v

ilq

unmc

wtpoh

a

db

fgjk

3

sr

ex

3a

fgjk

bdb

chil

mn o p

qrst u v

w

Page 138: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 138

Example

Merge group buffers

ex

3a

fgjk

bdb

chil

mn o p

qrst u v

w

ex

3a

fgjk

chil

mn o p

qrst u v

w

bbd

Page 139: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 139

Example

DeleteMin 3; DeleteMin a;

ex

3a

fgjk

chil

mn o p

qrst u v

w

bbd

ex

fgjk

chil

mn o p

qrst u v

w

bbd

Page 140: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 140

Example

DeleteMin b

ex

fgjk

il

mn o p

qrst u v

wd

chil

mn o p

qrst u v

w

bbd

ex

jk

bfg

ch

Page 141: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 141

Analysis

I insertions, buffer sizesm= Θ(M)

merging degreek= Θ(M/B)

block accesses:sort(I)+“small terms”

key comparisons:I logI + “small terms”

(on average)

... ... ...

R

R

W

WR

WR W

R W

RR

Other (similar, earlier)[Arge 95, Brodal-Katajainen 98, Brengel

et al. 99, Fadel et al. 97]data structures spend afactor≥ 3 more

I/Os toreplaceI by queue size.

Page 142: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 142

Implementation Details

Fast routines for 2–4 way merging keeping smallest

elements inregisters

Use sentinels to avoid special case treatments (empty

sequences, . . . )

Currently heap sort for sorting the insertion buffer

k 6= M/B: multiple levels, limited associativity, TLB

Page 143: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 143

Experiments

Keys: random 32 bit integers

Associated information:32 dummy bits

Deletion buffer size:32 Near optimal

Group buffer size:256 : performance on

Merging degreek: 128 all machines tried!

Compiler flags:Highly optimizing, nothing advanced

Operation Sequence:

(Insert-DeleteMin-Insert)N(DeleteMin-Insert-DeleteMin)N

Near optimal performance on all machines tried!

Page 144: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 144

MIPS R10000, 180 MHz

0

50

100

150

1024 4096 16384 65536 218 220 222 223

(T(d

elet

eMin

) +

T(in

sert

))/lo

g N

[ns

]

N

bottom up binary heapbottom up aligned 4-ary heapsequence heap

Page 145: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 145

Ultra-SparcIIi, 300 MHz

0

20

40

60

80

100

120

140

160

256 1024 4096 16384 65536 218 220 222 223

(T(d

elet

eMin

) +

T(in

sert

))/lo

g N

[ns

]

N

bottom up binary heapbottom up aligned 4-ary heap

sequence heap

Page 146: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 146

Alpha-21164, 533 MHz

0

20

40

60

80

100

120

140

256 1024 4096 16384 65536 218 220 222 223

(T(d

elet

eMin

) +

T(in

sert

))/lo

g N

[ns

]

N

bottom up binary heapbottom up aligned 4-ary heap

sequence heap

Page 147: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 147

Pentium II, 300 MHz

0

50

100

150

200

1024 4096 16384 65536 218 220 222 223

(T(d

elet

eMin

) +

T(in

sert

))/lo

g N

[ns

]

N

bottom up binary heapbottom up aligned 4-ary heap

sequence heap

Page 148: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 148

Core2 Duo Notebook, 1.??? GHz

0

5

10

15

20

25

1024 4096 16384 65536 218 220 222 223

(T(d

elet

eMin

) +

T(in

sert

))/lo

g N

[ns

]

N

bottom up binary heapsequence heap

Page 149: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 149

(insert(deleteMin insert)s)N

(deleteMin(insert deleteMin)s)N

32

64

128

256 1024 4096 16384 65536 218 220 222 223

(T(d

elet

eMin

) +

T(in

sert

))/lo

g N

[ns

]

N

s=0, binary heaps=0, 4-ary heap

s=0s=1s=4

s=16

Page 150: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 150

Methodological Lessons

If you want to comparesmallconstant factors inexecution time:

Reproducabilitydemandspublication of source codes(4-ary heaps, old study in Pascal)

Highly tuned codes in particularfor the competitors(binary heaps have factor 2 between good and naiveimplementation).How do you compare two mediocre implementations?

Careful choice/description ofinputs

Use multiple different hardwareplatforms

Augment withtheory(e.g., comparisons, datadependencies, cache faults, locality effects . . . )

Page 151: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 151

Open Problems

Dependence onsizerather than number of insertions

Parallel disks

Space efficientimplementation

Multi-level cache aware or cache-oblivious variants

Page 152: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 152

3 van Emde-Boas Search Trees

K’ bit

vEB

top

hash table

K’ bit

vEB

boti

Store setM of K = 2k-bit integers.

later: associated information

K = 1 or |M|= 1: store directly

K′:= K/2

Mi :=

x mod 2K′ : x div 2K′ = i

root points to nonemptyMi-s

top t = i : Mi 6= /0

insert, delete, search inO(logK) time

Page 153: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 153

Locate

//minx∈M : y≤ x

Function locate(y : N) : ElementHandle

if y> maxM then return ∞if K = 1 then return locateLocally(y)

if M = x then return x

i:= y div 2K/2

if Mi = /0∨y> maxMi thenreturn minMtop.locate(i+1)

elsereturn i2K/2+Mi.locate(y mod 2K/2)

Page 154: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 154

Comparison with Comparison Based Search TreesIdeally: logn log logn

Problems:

Many special

casetests

High space

consumption

100

1000

256 1024 4096 16384 65536 218 220 222 223

Tim

e fo

r ra

ndom

loca

te [n

s]

n

orig-STreeLEDA-STree

STL map(2,16)-tree

Page 155: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 155

Efficient 32 bit Implementation

top 3 layers of bit arrays

K’ bit

vEB

top

hash table

K’ bit

vEB

boti

hash table

K’ bit

vEB

boti

65535

630

0 31

0 31

32

32

63

63

4095

… … … …

… … … …

65535

630

0 31

0 31

32

32

63

63

4095

… … … …

… … … …

Page 156: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 156

Layers of Bit Arrays

t1[i] = 1 iff Mi 6= /0

t2[i] = t1[32i]∨ t1[32i+1]∨ ·· ·∨ t1[32i+31]

t3[i] = t2[32i]∨ t2[32i+1]∨ ·· ·∨ t2[32i+31]

Page 157: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 157

Efficient 32 bit Implementation

Break recursion after 3 layers

hash table

K’ bit

vEB

boti

65535

630

0 31

0 31

32

32

63

63

4095

… … … …

… … … …

65535

630

0 31

0 31

32

32

63

63

4095

… … … …

… … … …

630

0 31

0 31

32

32

63

63

4095

… … … …

… … … …

Level 1 – root

Bits 31-16

… …

hash table

… …

hash table

Level 2

Bits 15-8

Level 2

Bits 15-8

Level 1 – root

Bits 31-16

… …

hash table

… …

hash table

Level 2

Bits 15-8

Level 1 – root

Bits 31-16

… …

hash table

… …

hash table

Level 2

Bits 15-8

… …

hash table

… …

hash table

… …

hash table

… …

… …

hash table

… …

hash table

… …

… …

hash table

Level 2

Bits 15-8

Level 2

Bits 15-8

Level 3

Bits 7-0

… …

hash table

Level 3

Bits 7-0

… …

hash table

… …

hash table

… …

… …

hash table

hash tablehash table

65535

Page 158: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 158

Efficient 32 bit Implementation

root hash table root array

630

0 31

0 31

32

32

63

63

4095

… … … …

… … … …

Level 1 – root

Bits 31-16

… …

hash table

… …

hash table

Level 2

Bits 15-8

Level 2

Bits 15-8

Level 1 – root

Bits 31-16

… …

hash table

… …

hash table

Level 2

Bits 15-8

Level 1 – root

Bits 31-16

… …

hash table

… …

hash table

Level 2

Bits 15-8

… …

hash table

… …

hash table

… …

hash table

… …

… …

hash table

… …

hash table

… …

… …

hash table

Level 2

Bits 15-8

Level 2

Bits 15-8

Level 3

Bits 7-0

… …

hash table

Level 3

Bits 7-0

… …

hash table

… …

hash table

… …

… …

hash table

hash tablehash table

65535

630

0 31

0 31

32

32

63

63

4095

… … … …

… … … …

Level 1 – root

Bits 31-16

… …

hash table

… …

hash table

Level 2

Bits 15-8

Level 2

Bits 15-8

Level 1 – root

Bits 31-16

… …

hash table

… …

hash table

Level 2

Bits 15-8

Level 1 – root

Bits 31-16

… …

hash table

… …

hash table

Level 2

Bits 15-8

… …

hash table

… …

hash table

… …

hash table

… …

… …

hash table

… …

hash table

… …

… …

hash table

Level 2

Bits 15-8

Level 2

Bits 15-8

Level 3

Bits 7-0

… …

hash table

Level 3

Bits 7-0

… …

hash table

… …

hash table

… …

… …

hash table

65535

65535

array 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0…

65535

array 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0…

Page 159: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 159

Efficient 32 bit Implementation

Sorted doubly linked lists forassociated informationandrange

queries

630

0 31

0 31

32

32

63

63

4095

… … … …

… … … …

Level 1 – root

Bits 31-16

… …

hash table

… …

hash table

Level 2

Bits 15-8

Level 2

Bits 15-8

Level 1 – root

Bits 31-16

… …

hash table

… …

hash table

Level 2

Bits 15-8

Level 1 – root

Bits 31-16

… …

hash table

… …

hash table

Level 2

Bits 15-8

… …

hash table

… …

hash table

… …

hash table

… …

… …

hash table

… …

hash table

… …

… …

hash table

Level 2

Bits 15-8

Level 2

Bits 15-8

Level 3

Bits 7-0

… …

hash table

Level 3

Bits 7-0

… …

hash table

… …

hash table

… …

… …

hash table

65535

65535

array 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0…

65535

array 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0…

630

0 31

0 31

32

32

63

63

4095

… … … …

… … … …

Level 1 – root

Bits 31-16

… …

hash table

… …

… …

hash table

… …

hash table

… …

… …

hash table

Level 2

Bits 15-8

Level 2

Bits 15-8

Level 3

Bits 7-0

… …

hash table

… …

… …

hash table

65535

65535

array 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0…

65535

array 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0…

Page 160: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 160

Example

0 0 0 0 0 0 00 0 0 00 0...

v

hash

v...

0 0 0 00 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0

v

...v

...

1 111111111111111

t3

t2

t1

=111111=1111

=1,11,111,1111M

M1M04

t100

t200

t20

t10

0

32

0

32 ...111

...111

0 0 0 0 0 0 0 0

0

root

L2

L3

1..1111

0 hash1..111

32

32

0 1

63

2047 root−top

65535

Element List

...4

=1,11,111,1111,111111M

M00=1,11,111

0

Page 161: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 161

Locate High Level// return handle of minx∈M : y≤ x

Function locate(y : N) : ElementHandle

if y> maxM then return ∞i := y[16..31] // Level 1

if r[i] = nil ∨y> maxMi then return minMt1.locate(i+1)

if Mi = x then return x

j := y[8..15] // Level 2

if r i [ j] = nil ∨y> maxMi j then return minMi,t1i .locate( j+1)

if Mi j = x then return x

return r i j [t1i j .locate(y[0..7])] // Level 3

Page 162: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 162

Locate in Bit Arrays

//find the smallestj ≥ i such thattk[ j] = 1

Method locate(i) for a bit arraytk consisting ofn bit words

//n= 32 for t1, t2, t1i , t1

i j ; n= 64 for t3; n= 8 for t2i , t2

i j

assertsome bit intk to the right ofi is nonzero

j := i div n // which word?

a := tk[n j..n j+n−1]

seta[(i modn)+1..n−1] to zero // n−1· · · i modn· · ·0if a= 0 then

j := tk+1.locate( j)

a := tk[n j..n j+n−1]

return n j+msbPos(a) // e.g. floating point conversion

Page 163: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 163

Random Locate

100

1000

256 1024 4096 16384 65536 218 220 222 223

Tim

e fo

r lo

cate

[ns]

n

orig-STreeLEDA-STree

STL map(2,16)-tree

STree

Page 164: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 164

Random Insert

0

500

1000

1500

2000

2500

3000

3500

4000

1024 4096 16384 65536 218 220 222 223

Tim

e fo

r in

sert

[ns]

n

orig-STreeLEDA-STree

STL map(2,16)-tree

STree

Page 165: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 165

Delete Random Elements

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

5500

1024 4096 16384 65536 218 220 222 223

Tim

e fo

r de

lete

[ns]

n

orig-STreeLEDA-STree

STL map(2,16)-tree

STree

Page 166: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 166

Open Problems

Measurement for “worst case” inputs

Measure Performance forrealistic inputs

– IP lookupetc.

– Best firstheuristics like, e.g., bin packing

More space efficientimplementation

(A few) more bits

Page 167: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 167

4 Hashing

“to hash” ≈ “to bring into completedisorder”

paradoxically, this helps us to find things

moreeasily!

store setM ⊆ Element.

key(e) is unique fore∈M.

supportdictionaryoperations inO(1) time:

M.insert(e : Element): M := M∪e

M.remove(k : Key): M := M \e, e= k

M.find(k : Key): returne∈M with e= k; ⊥ if none present

(Convention:key is implicit), e.g.e= k iff key(e) = k)

Page 168: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 168

An (Over)optimistic approach

t

h

M

A (perfect)hash functionh

maps elements ofM to

unique entries oftablet[0..m−1], i.e.,

t[h(key(e))] = e

Page 169: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 169

Collisions

perfect hash functions are difficult to obtain

t

h

M

Example: Birthday Paradoxon

Page 170: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 170

Collision Resolution

for example byclosed hashing

entries: elements sequencesof elements

< >< >

< >

< >

<><><>

<><>

<><><>

t[h(k)]

t

h

M

k

Page 171: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 171

Hashing with Chaining

Implement sequences in closed hashing by singly linked lists

insert(e): Inserte at the beginning oft[h(e)]. constant time

remove(k): Scan throught[h(k)]. If an elemente with h(e) = k isencountered, remove it and return.

find(k) : Scan throught[h(k)].If an elemente with h(e) = k is encountered,return it. Otherwise, return⊥.

O(|M|) worst case time forremove andfind

< >< >

< >

< >

<><><>

<><>

<><><>

t[h(k)]

t

h

M

k

Page 172: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 172

A Review of Probability

1s 5s 6s3s 7s 2s 4s

binary search?

buckets

Example from hashing

sample spaceΩ random hash functions0..m−1Key

events: subsets ofΩ E42 = h∈Ω : h(4) = h(2)px =probabilityof x∈Ω uniform distr. ph = m−|Key|

P [E ] = ∑x∈E px P [E42] =1m

random variableX0 : Ω→ R X = |e∈M : h(e) = 0|.

Page 173: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 173

expectation E[X0] = ∑y∈Ω pyX(y) E[X] = |M|m (*)

Linearity of Expectation: E[X+Y] = E[X]+E[Y]

Proof of (*):

Consider the 0-1 RVXe = 1 if h(e) = 0 for e∈M andXe = 0

else.

E[X0] = E[ ∑e∈M

Xe] = ∑e∈M

E[Xe] = ∑e∈M

P [Xe = 1] = |M| · 1m

Page 174: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 174

Analysis for Random Hash Functions< >< >

< >

< >

<><><>

<><>

<><><>t

h

M

t[h(k)]Satz 1. The expectedexecution timeof

remove(k) andfind(k)

is O(1) if |M|= O(m).

Beweis.Constant time plus thetime for scanningt[h(k)].X := |t[h(k)]|= |e∈M : h(e) = h(k)|.Consider the 0-1 RVXe = 1 if h(e) = h(k) for e∈M andXe = 0else.

E[X] = E[ ∑e∈M

Xe] = ∑e∈M

E[Xe] = ∑e∈M

P [Xe = 1] =|M|m

= O(1)

This isindependent of the input setM.

Page 175: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 175

Universal Hashing

Idea: use only certain“easy” hash functions

Definition:

U ⊆ 0..m−1Key is universal

if for all x, y in Key with x 6= y and randomh∈U ,

P [h(x) = h(y)] =1m

.

Satz 2. Theorem 1 also applies to universal families of hash

functions.

Beweis.For Ω = U we still haveP [Xe = 1] = 1m.

The rest is as before.

Page 176: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 176

H Ω

Page 177: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 177

A Simple Universal FamilyAssumem is prime, Key⊆ 0, . . . ,m−1k

Satz 3. For a= (a1, . . . ,ak) ∈ 0, . . . ,m−1k define

ha(x) = a·x mod m,H · =

ha : a∈ 0..m−1k

.

H · is a universal family of hash functions

x1 x2 x3

a2 a3a1* * * mod m = h (x)+ + a

Page 178: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 178

Beweis.Considerx = (x1, . . . ,xk), y = (y1, . . . ,yk) with x j 6= y j

counta-s with ha(x) = ha(y).For each choice ofais, i 6= j, ∃ exactly onea j with

ha(x) = ha(y):

∑1≤i≤k

aixi ≡ ∑1≤i≤k

aiyi( mod m)

⇔ a j(x j −y j)≡ ∑i 6= j,1≤i≤k

ai(yi−xi)( mod m)

⇔ a j ≡ (x j −y j)−1 ∑

i 6= j,1≤i≤k

ai(yi−xi)( mod m)

mk−1 ways to choose theai with i 6= j.

mk is total number ofas, i.e.,

Page 179: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 179

P [ha(x) = ha(y)] =mk−1

mk =1m.

Page 180: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 180

Bit Based Universal FamiliesLet m= 2w, Key = 0,1k

Bit-Matrix Multiplication: H⊕ =

hM : M ∈ 0,1w×k

wherehM (x) = Mx (arithmetics mod2, i.e., xor, and)

Table Lookup:H⊕[] =

h⊕[](t1,...,tb)

: ti ∈ 0..m−10..w−1

whereh⊕[](t1,...,tb)

((x0,x1, . . . ,xb)) = x0⊕⊕b

i=1ti [xi]

x0x2 x1

a wa

x

k

Page 181: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 181

Hashing with Linear Probing

Open hashing: go back to original idea.

Elements are directly stored in the table.

Collisions are resolved by finding other entries.

linear probing: search for next free place by scanning the table.

Wrap around at the end.

simple

space efficient

cache efficient

t

h

M

Page 182: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 182

The Easy Part

ClassBoundedLinearProbing(m,m′ : N; h : Key→ 0..m−1)

t=[⊥, . . . ,⊥] : Array [0..m+m′−1] of Element

invariant ∀i : t[i] 6=⊥⇒ ∀ j ∈ h(t[i])..i−1 : t[i] 6=⊥

Procedure insert(e : Element)

for i := h(e) to ∞ while t[i] 6=⊥ do ;

asserti < m+m′−1

t[i] := e

Function find(k : Key) : Element

for i := h(e) to ∞ while t[i] 6=⊥ doif t[i] = k then return t[i]

return ⊥

t

h

M

m’

m

Page 183: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 183

Removeexample:t = [. . . , x

h(z),y,z, . . .], remove(x)

invariant ∀i : t[i] 6=⊥⇒ ∀ j ∈ h(t[i])..i−1 : t[i] 6=⊥Procedureremove(k : Key)

for i := h(k) to ∞ while k 6= t[i] do // searchk

if t[i] =⊥ then return // nothing to do

//we plan for ahole ati.

for j := i+1 to ∞ while t[ j] 6=⊥ do//Establish invariant fort[ j].

if h(t[ j])≤ i thent[i] := t[ j] // Overwrite removed element

i := j // move planned hole

t[i] :=⊥ // erase freed entry

Page 184: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 184

More Hashing Issues

High probability andworst caseguarantees

more requirements on the hash functions

Hashing as a means of load balancing in parallel systems,

e.g., storage servers

– Different disk sizes and speeds

– Adding disks / replacing failed disks without much

copying

Page 185: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 185

Space Efficient Hashing

with Worst Case Constant Access Time

Represent a set ofn elements (with associated information)

using space(1+ ε)n.

Support operationsinsert, delete, lookup, (doall) efficiently.

Assume a truly random hash functionh

Page 186: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 186

Related Work

Uniform hashing:Expectedtime≈ 1

ε

h3h1 h2

Dynamic Perfect Hashing,[Dietzfelbinger et al. 94]Worst caseconstant timefor lookupbut ε is not small.

Approaching the Information Theoretic Lower Bound:[Brodnik Munro 99,Raman Rao 02]Space(1+o(1))×lower boundwithout associated information[Pagh 01]static case.

Page 187: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 187

Cuckoo Hashing

[Pagh Rodler 01]Table of size2+ ε .

Two choices for each element.

Insert moves elements;

rebuild if necessary.

Very fast lookup and insert.

Expected constant insertion time.

h1

h2

Page 188: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 188

d-ary Cuckoo Hashing

d choices for each element.

Worst cased probes fordeleteandlookup.

Task: maintainperfect matching

in thebipartite graph

(L = Elements,R= Cells,E = Choices),

e.g.,insertby BFSof random walk.

h1

h2

h3

Page 189: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 189

Tradeoff: Space↔ Lookup/Deletion Time

Lookup and Delete:d = O(log 1

ε)

probes

Insert:

(1ε

)O(log(1/ε)), (experiments)−→ O(1/ε)?

Page 190: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 190

Experiments

0

1

2

3

4

5

0 0.2 0.4 0.6 0.8 1

ε *

#pro

bes

for

inse

rt

space utilization

d=2d=3d=4d=5

Page 191: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 191

Open Questions and Further Results Tight analysis ofinsertion

Two choices withd slots each[Dietzfelbinger et al.]

cache efficiency

Good Implementation?

Automatic rehash

Always correct

Page 192: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 192

5 Minimum Spanning Trees

undirected GraphG= (V,E).

nodesV, n= |V|, e.g.,V = 1, . . . ,nedgese∈ E, m= |E|, two-element subsets ofV.

edge weightc(e), c(e) ∈ R+.

G is connected, i.e.,∃ path between any two nodes.4

2

3

1

792

5

Find a tree(V,T) with minimumweight∑e∈T c(e) that connects

all nodes.

Page 193: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 193

MST: Overview

Basics: Edge property and cycle property

Jarník-Prim Algorithm

Kruskals Algorithm

Filter-Kruskal

Comparison

(Advanced algorithms using the cycle property)

External MST

Page 194: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 194

Applications

Clustering

Subroutine in combinatorial optimization, e.g., Held-Karp

lower bound for TSP.

Challenging real world instances???

Image segementation−→ [Diss. Jan Wassenberg]

Anyway: almost ideal“fruit fly” problem

Page 195: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 195

Selecting and Discarding MST Edges

The Cut Property

For anyS⊂V consider the cut edges

C= u,v ∈ E : u∈ S,v∈V \SThelightestedge inC can be used in an MST. 4

2

3

1

72

59

The Cycle Property

Theheaviestedge on a cycle is not needed for an MST4

2

3

1

792

5

Page 196: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 196

The Jarník-Prim Algorithm [Jarník 1930, Prim

1957]

Idea: grow a tree

T:= /0

S:= s for arbitrary start nodes

repeatn−1 times

find (u,v) fulfilling the cut property forS

S:= S∪vT:= T ∪(u,v)

4

2

3

1

792

5

Page 197: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 197

Implementation Using Priority Queues

Function jpMST(V, E, w) : Setof Edge

dist=[∞, . . . ,∞] : Array [1..n]// dist[v] is distance ofv from the tree

pred: Array of Edge// pred[v] is shortest edge betweenSandv

q : PriorityQueueof Node withdist[·] as priority

dist[s] := 0; q.insert(s) for anys∈V

for i := 1 to n−1 do dou := q.deleteMin() // new node forS

dist[u] := 0

foreach (u,v) ∈ E doif c((u,v))< dist[v] then

dist[v] := c((u,v)); pred[v] := (u,v)

if v∈ q then q.decreaseKey(v) elseq.insert(v)

return pred[v] : v∈V \s

4

2

3

1

792

5

Page 198: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 198

Graph Representation for Jarník-Prim

We need node→ incident edges

4

2

3

1

792

5

m 8=m+1

V

E

1 3 5 7 91 n 5=n+1

4 1 3 2 4 1 3c 9 5 7 7 2 2 95

2

1

+ fast (cache efficient)

+ more compact than linked lists

− difficult to change

− Edges are stored twice

Page 199: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 199

Analysis

O(m+n) time outside priority queue

n deleteMin (timeO(nlogn))

O(m) decreaseKey (timeO(1) amortized)

O(m+nlogn) usingFibonacci Heaps

practical implementation using simplerpairing heaps.

But analysis is still partlyopen!

Page 200: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 200

Kruskal’s Algorithm [1956]

T := /0 // subforest of the MST

foreach (u,v) ∈ E in ascending order of weightdoif u andv are indifferent subtrees ofT then

T := T ∪(u,v) // Join two subtrees

return T

Page 201: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 201

The Union-Find Data Structure

ClassUnionFind(n : N) // Maintain a partition of 1..n

parent=[n+1, . . . ,n+1] : Array [1..n] of 1..n+⌈logn⌉Function find(i : 1..n) : 1..n

if parent[i]> n then return i

elsei′ := find(parent[i])

parent[i] := i′

return i′

Procedure link(i, j : 1..n)

asserti and j are leaders of different subsets

if parent[i]< parent[ j] then parent[i] := j

else ifparent[i]> parent[ j] then parent[ j] := i

elseparent[ j] := i; parent[i]++ // nextgeneration

Procedureunion(i, j) if find(i) 6= find( j) then link(find(i), find( j))

3 421

556 1

5 6

rank

1

rank

2

4

1 2

3

parent:

Page 202: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 202

Kruskal Using Union Find

T : UnionFind(n)

sortE in ascending order of weight

kruskal(E)

Procedurekruskal(E)

foreach (u,v) ∈ E dou′:= T.find(u)

v′:= T.find(v)

if u′ 6= v′ thenoutput(u,v)

T.link(u′,v′)

Page 203: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 203

Graph Representation for Kruskal

Just an edge sequence (array) !

+ very fast (cache efficient)

+ Edges are stored only once

more compact than adjacency array

Page 204: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 204

Analysis

O(sort(m)+mα(m,n)) = O(mlogm) whereα is the inverse

Ackermann function

Page 205: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 205

Kruskal versus Jarník-Prim I

Kruskal wins for very sparse graphs

Prim seems to win for denser graphs

Switching point isunclear

– How is the inputrepresented?

– How manydecreaseKeys are performed by JP?

(average case:nlog mn [Noshita 85])

– Experimental studies are quiteold [Moret Shapiro 91],

useslow graphrepresentationfor both algs,

andartificial inputs

see attached slides.

Page 206: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 206

5.1 Filtering by Sampling Rather Than Sorting

R:= random sample ofr edges fromE

F :=MST(R) // Wlog assume thatF spansV

L:= /0 // “light edges” with respect toR

foreache∈ E do // Filter

C := the unique cycle ine∪F

if e is not heaviest inC thenL:= L∪e

return MST((L∪F))

Page 207: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 207

5.1.1 Analysis

[Chan 98, KKK 95]

Observation:e∈ L only if e∈MST(R∪e).(Otherwisee could replace some heavier edge inF).

Lemma 4. E[|L∪F|]≤ mnr

Page 208: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 208

MST Verification by Interval Maxima

Number the nodes by the order they were added to the MST

by Prim’s algorithm.

wi = weight of the edge that inserted nodei.

Largest weight on path(u,v) = maxw j‖u< j ≤ v.

14

3

5

8

0 1 84 3 5

14

8

3

5

0 1 4 3 5 8

Page 209: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 209

Interval Maxima

Preprocessing: build

nlogn size array

PreSuf.

Level 0

Level 3

Level 2

Level 1

88 56 30 56 52 74 76

88 88 30 65 65 52 52 77 77 74 74 76 80

98 98 98 15 65 75 77 77 77 41 62 76 80

98 98 75 75 75 56 52 77 77 77 77 77 80

98 15 65 75 34 77 41 62 80

757598

98 75 74

98 3498

2 6To find maxa[i.. j]:

Find the level of the LCA: ℓ= ⌊log2(i⊕ j)⌋.

Returnmax(PreSuf[ℓ][i],PreSuf[ℓ][ j]).

Example: 2⊕6= 010⊕110= 100⇒ ℓ= 2

Page 210: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 210

A Simple Filter Based Algorithm

Chooser =√

mn.

We get expected time

TPrim(√

mn)+O(nlogn+m)+TPrim(mn√mn) = O(nlogn+m)

The constant factor in front of them is very small.

Page 211: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 211

Results10 000 nodes, SUN-Fire-15000, 900 MHz UltraSPARC-III+

Worst Case Graph

0

100

200

300

400

500

600

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Tim

e pe

r ed

ge [n

s]

Edge density

PrimFilter

Page 212: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 212

Results10 000 nodes, SUN-Fire-15000, 900 MHz UltraSPARC-III+

Random Graph

0

100

200

300

400

500

600

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Tim

e pe

r ed

ge [n

s]

Edge density

PrimFilter

Page 213: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 213

Results10 000 nodes,

NEC SX-5Vector Machine

“worst case”

0

200

400

600

800

1000

1200

1400

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Tim

e pe

r ed

ge [n

s]

Edge density

Prim (Scalar)I-Max (Scalar)

Prim (Vectorized)I-Max (Vectorized)

Page 214: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 214

Edge Contraction

Let u,v denote an MST edge.

Eliminatev:

forall (w,v) ∈ E doE := E \ (w,v)∪(w,u) // but remember orignal terminals

4

1

4 3

1

792

59

2

7 (was 2,3)2

3

Page 215: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 215

Boruvka’s Node Reduction Algorithm

For each node find the lightest incident edge.

Include them into the MST (cut property)

contract these edges,

TimeO(m)

At least halves the number of remaining nodes

4

1

5

792

5 2

33

Page 216: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 216

5.2 Simpler and Faster Node Reduction

for i := n downto n′+1 dopick a random nodev

find thelightestedge(u,v) out of v and output it

contract(u,v)

E[degree(v)]≤ 2m/i

∑n′<i≤n

2mi

=2m

(

∑0<i≤n

1i− ∑

0<i≤n′

1i

)

≈2m(lnn− lnn′)=2mlnnn′

4

1

4 3

1

4 32

9 (was 1,4)795

9

7 (was 2,3)

output 2,3

2

3 7output 1,2

52 2

Page 217: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 217

5.3 Randomized Linear Time Algorithm

1. Factor 8 node reduction (3× Boruvka or sweep algorithm)

O(m+n).

2. R⇐m/2 random edges.O(m+n).

3. F ⇐MST(R) [Recursively].

4. Find light edgesL (edge reduction). O(m+n)

E[|L|]≤ mn/8m/2 = n/4.

5. T⇐MST(L∪F) [Recursively].

T(n,m)≤ T(n/8,m/2)+T(n/8,n/4)+c(n+m)

T(n,m)≤ 2c(n+m) fulfills this recurrence.

Page 218: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 218

5.4 External MSTs

Semiexternal Algorithms

Assumen≤M−2B:

runKruskal’s algorithmusingexternal sorting

Page 219: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 219

Streaming MSTs

If M is yet a bit larger we can even do it withm/B I/Os:

T:= /0 // currentapproximation of MST

while there are any unprocessed edgesdoload anyΘ(M) unprocessed edgesE′

T:= MST(T ∪E′) // for any internal MST alg.

Corollary: we can do it withlinearexpectedinternal work

Disadvantages to Kruskal:

Slower in practice

Smaller max.n

Page 220: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 220

General External MST

while n> M−2B doperform some node reduction

use semi-external Kruskal

n’<M

m/nn

Theory:O(sort(m)) expected I/Os by externalizing the linear

time algorithm.

(i.e., node reduction+ edge reduction)

Page 221: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 221

External Implementation I: Sweeping

π : random permutationV→V

sortedges(u,v) by max(π(u),π(v))for i := n downto n′+1 do

pick the nodev with π(v) = i

find thelightestedge(u,v) out of v and output it

contract(u,v)

u v...

relink

relink

outputsweep line

Problem: how to implement relinking?

Page 222: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 222

Relinking Using Priority Queues

Q: priority queue // Order:max node, thenmin edge weight

foreach (u,v ,c) ∈ E do Q.insert((π(u),π(v) ,c,u,v))current :=n+1

loop(u,v ,c,u0,v0) := Q.deleteMin()

if current6= maxu,vthenif current= M+1 then returnoutputu0,v0 ,ccurrent := maxu,vconnect := minu,v

elseQ.insert((minu,v ,connect ,c,u0,v0))

≈ sort(10mln nM ) I/Os with opt. priority queues [Sanders 00]

Problem:Computebound

u v...

relink

relink

outputsweep line

Page 223: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 223

Sweeping with linear internal work

Assumem= O(M2/B

)

k= Θ(M/B) external bucketswith n/k nodes each

M nodes for last“semiexternal”bucket

split currentbucket into

internalbuckets for eachnode

Sweeping:

Scan current internal bucket twice:

1. Find minimum

2. Relink

New external bucket: scan and put ininternalbuckets

Large degree nodes: move tosemiexternalbucket

...

externalexternal

internal

current semiexternal

Page 224: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 224

Experiments

Instances from “classical” MST study[Moret Shapiro 1994]

sparse random graphs

random geometric graphs

grids

O(sort(m)) I/Os

for planar graphs by

removing parallel edges!

Other instances are rather dense

or designed to fool specific algorithms.

PCI−Busses

Controller

Channels

2x64x66 Mb/s

4 Threads2x Xeon

128

E75001 GBDDRRAM

8x45

4x2x100MB/s

400x64 Mb/s

8x80GBMB/s

Chipset

Intel

Page 225: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 225

m≈ 2n

1

2

3

4

5

6

5 10 20 40 80 160 320 640 1280 2560

t / m

[µs]

m / 1 000 000

KruskalPrim

randomgeometric

grid

Page 226: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 226

m≈ 4n

1

2

3

4

5

6

5 10 20 40 80 160 320 640 1280 2560

t / m

[µs]

m / 1 000 000

KruskalPrim

randomgeometric

Page 227: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 227

m≈ 8n

1

2

3

4

5

6

5 10 20 40 80 160 320 640 1280 2560

t / m

[µs]

m / 1 000 000

KruskalPrim

randomgeometric

Page 228: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 228

MST Summary

Edge reduction helps for very dense, “hard” graphs

A fast and simplenode reductionalgorithm

4× less I/Os than previous algorithms

Refined semiexternal MST, use asbase case

Simple pseudo random permutations (no I/Os)

A fast implementation

Experiments with huge graphs (up ton= 4·109 nodes)

External MST is feasible

Page 229: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 229

Open Problems

New experiments for (improved) Kruskal versus

Jarník-Prim

Realistic (huge) inputs

Parallel external algorithms

Implementations for other graph problems

Page 230: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 230

Conclusions

Even fundamental, “simple” algorithmic problems

still raise interesting questions

Implementation and experiments are important and were

neglected by parts of the algorithms community

Theoryan (at least) equally important, essential component

of the algorithm design process

Page 231: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 231

More Algorithm Engineering on Graphs

Count triangles in very large graphs. Interesting as a

measure of clusteredness. (Cooperation with AG Wagner)

External BFS (Master thesis Deepak Ajwani)

Maximum flows: Is the theoretical best algorithm any

good? (Jein)

Approximate max. weighted matching (Studienarbeit Jens

Maue)

Page 232: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 232

Maximal Flows

Theory: O(mΛ log(n2/m) logU

)binary blocking

flow-algorithm mitΛ = minm1/2,n2/3 [Goldberg-Rao-97].

Problem: best case≈ worst case

[Hagerup Sanders Träff WAE 98]:

Implementable generalization

best case≪ worst case

best algorithms for some “difficult” instances

Page 233: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 233

Ergebnis

u v...

relink

relink

outputsweep line

Einfach extern implementierbar

n′ = M semiexterner Kruskal Algorithmus

InsgesamtO(sort(mln n

m))

erwartete I/Os

Für realistische Eingaben mindestens4× bisherals bisher

bekannte Algorithmen

Implementierung in<stxxl> mit bis zu96 GBytegro/3en

Graphen läuft „über Nacht“

Page 234: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 234

More On Experimental Methodology

Scientific Method:

Experiment need a possible outcome thatfalsifiesahypothesis

Reproducible

– keep data/code for at least 10 years

– clear and detaileddescription in papers / TRs

– share instances and code

appl. engin.

realisticmodels

design

implementation

librariesalgorithm−

perf.−guarantees

applications

falsifiable

inductionhypothesesanalysis experiments

algorithmengineering real

Inputs

deduction

Page 235: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 235

Quality Criteria

Beat the state of the art, globally – (not your own toy codes

or the toy codes used in your community!)

Clearlydemonstrate this !

– both codes use same data ideally from accepted

benchmarks (not just your favorite data!)

– comparable machines or fair (conservative) scaling

– Avoid uncomparabilities like: “Yeah we are worse but

but twice as fast”

– real world data wherever possible

– as much different inputs as possible

– its fine if you are better just on some (important) inputs

Page 236: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 236

Not Here but Important

describing the setup

finding sources of measurement errors

reducing measurement errors (averaging, median,unloaded

machine. . . )

measurements in thecreativephase of experimental

algorithmics.

Page 237: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 237

The Starting Point

(Several) Algorithm(s)

A few quantities to be measured: time, space, solution

quality, comparisons, cache faults,. . . There may also be

measurement errors.

An unlimited number of potential inputs. condense to a

few characteristic ones (size,|V|, |E|, . . . or problem

instances from applications)

Usually there is not a lack but anabundanceof data6= many

other sciences

Page 238: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 238

The Process

Waterfall model?

1. Design

2. Measurement

3. Interpretation

Perhaps the paper should at least look like that.

Page 239: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 239

The Process

Eventually stop asking questions (Advisors/Referees listen

!)

build measurement tools

automate (re)measurements

Choice of Experiments driven by risk and opportunity

Distinguish mode

explorative: many different parameter settings, interactive,

short turnaround times

consolidating:many large instances, standardized

measurement conditions, batch mode, many machines

Page 240: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 240

Of Risks and Opportunities

Example: Hypothesis= my algorithm is the best

big risk: untried main competitor

small risk: tuning of a subroutine that takes 20 % of the time.

big opportunity: use algorithm for a new application

new input instances

Page 241: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 241

Presenting Data from Experiments in

Algorithmics

Restrictions

black and white easy and cheap printing

2D (stay tuned)

no animation

no realism desired

Page 242: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 242

Basic Principles

Minimize nondata ink

(form follows function, not a beauty contest,. . . )

Letter size≈ surrounding text

Avoid clutter and overwhelming complexity

Avoid boredom (too little data perm2).

Make the conclusions evident

Page 243: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 243

Tables

+ easy

− easy overuse

+ accurate values (6= 3D)

+ more compact than bar chart

+ good for unrelated instances (e.g. solution quality)

− boring

− no visual processing

rule of thumb that “tables usually outperform a graph for smalldata sets of 20 numbers or less”[Tufte 83]

Curves in main paper, tables in appendix?

Page 244: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 244

2D Figures

default:x= input size, y= f (execution time)

Page 245: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 245

x AxisChoose unit to eliminate a parameter?

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

10 100 1000 10000 100000 1e+06

impr

ovem

ent m

in(T

* 1,T

* ∞)/

T* *

k/t0

P=64P=1024P=16384

lengthk fractional tree broadcasting. latencyt0+k

Page 246: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 246

x Axislogarithmic scale?

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

10 100 1000 10000 100000 1e+06

impr

ovem

ent m

in(T

* 1,T

* ∞)/

T* *

k/t0

P=64P=1024P=16384

yes if x range is wide

Page 247: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 247

x Axislogarithmic scale, powers of two (or

√2)

0

50

100

150

1024 4096 16384 65536 218 220 222 223

(T(d

elet

eMin

) +

T(in

sert

))/lo

g N

[ns

]

N

bottom up binary heapbottom up aligned 4-ary heapsequence heap

with tic marks, (plus a few small ones)

Page 248: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 248

gnuplotset xlabel "N"

set ylabel "(time per operation)/log N [ns]"

set xtics (256, 1024, 4096, 16384, 65536, "2^18" 262144,

set size 0.66, 0.33

set logscale x 2

set data style linespoints

set key left

set terminal postscript portrait enhanced 10

set output "r10000timenew.eps"

plot [1024:10000000][0:220]\

"h2r10000new.log" using 1:3 title "bottom up binary heap"

"h4r10000new.log" using 1:3 title "bottom up aligned 4-ary

"knr10000new.log" using 1:3 title "sequence heap" with linespoints

Page 249: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 249

Data File256 703.125 87.8906

512 729.167 81.0185

1024 768.229 76.8229

2048 830.078 75.4616

4096 846.354 70.5295

8192 878.906 67.6082

16384 915.527 65.3948

32768 925.7 61.7133

65536 946.045 59.1278

131072 971.476 57.1457

262144 1009.62 56.0902

524288 1035.69 54.51

1048576 1055.08 52.7541

2097152 1113.73 53.0349

4194304 1150.29 52.2859

8388608 1172.62 50.9836

Page 250: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 250

x Axislinear scale for ratios or small ranges (#processor,. . . )

0

100

200

300

400

500

600

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Tim

e pe

r ed

ge [n

s]

Edge density

PrimFilter

Page 251: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 251

x AxisAn exotic scale: arrival rate 1− ε of saturation point

1

2

3

4

5

6

7

8

2 4 6 8 10 12 14 16 18 20

aver

age

dela

y

1/ε

nonredundantmirrorring shortest queuering with matchingshortest queue

Page 252: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 252

y AxisAvoid log scale ! scale such that theory gives≈ horizontal lines

0

50

100

150

1024 4096 16384 65536 218 220 222 223

(T(d

elet

eMin

) +

T(in

sert

))/lo

g N

[ns

]

N

bottom up binary heapbottom up aligned 4-ary heapsequence heap

but give easy interpretation of the scaling function

Page 253: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 253

y Axisgive units

0

50

100

150

1024 4096 16384 65536 218 220 222 223

(T(d

elet

eMin

) +

T(in

sert

))/lo

g N

[ns

]

N

bottom up binary heapbottom up aligned 4-ary heapsequence heap

Page 254: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 254

y Axisstart from 0if this does not waste too much space

0

50

100

150

1024 4096 16384 65536 218 220 222 223

(T(d

elet

eMin

) +

T(in

sert

))/lo

g N

[ns

]

N

bottom up binary heapbottom up aligned 4-ary heapsequence heap

you may assume readers to be out of Kindergarten

Page 255: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 255

y Axisclip outclassed algorithms

1

2

3

4

5

6

7

8

2 4 6 8 10 12 14 16 18 20

aver

age

dela

y

1/ε

nonredundantmirrorring shortest queuering with matchingshortest queue

Page 256: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 256

y Axisvertical size: weighted average of the slants of the line segments

in the figure should be about45[Cleveland 94]

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

10 100 1000 10000 100000 1e+06

impr

ovem

ent m

in(T

* 1,T

* ∞)/

T* *

k/t0

P=64P=1024P=16384

Page 257: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 257

y Axisgraph a bit wider than high, e.g., golden ratio[Tufte 83]

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

10 100 1000 10000 100000 1e+06

impr

ovem

ent m

in(T

* 1,T

* ∞)/

T* *

k/t0

P=64P=1024P=16384

Page 258: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 258

Multiple Curves

+ high information density

+ better than 3D (reading off values)

− Easily overdone

≤ 7 smoothcurves

Page 259: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 259

Reducing the Number of Curves

use ratios

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

10 100 1000 10000 100000 1e+06

impr

ovem

ent m

in(T

* 1,T

* ∞)/

T* *

k/t0

P=64P=1024P=16384

Page 260: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 260

Reducing the Number of Curves

omit curves

outclassed algorithms (for case shown)

equivalent algorithms (for case shown)

Page 261: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 261

Reducing the Number of Curves

split into two graphs

1

2

3

4

5

6

7

8

2 4 6 8 10 12 14 16 18 20

aver

age

dela

y

1/ε

nonredundantmirrorring shortest queuering with matchingshortest queue

Page 262: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 262

Reducing the Number of Curvessplit into two graphs

1

1.2

1.4

1.6

1.8

2

2 4 6 8 10 12 14 16 18 20

aver

age

dela

y

1/ε

shortest queuehybridlazy sharingmatching

Page 263: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 263

Keeping Curves apart: log y scale

0

200

400

600

800

1000

1200

1400

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Tim

e pe

r ed

ge [n

s]

Edge density

Prim (Scalar)I-Max (Scalar)

Prim (Vectorized)I-Max (Vectorized)

Page 264: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 264

Keeping Curves apart: smoothing

1

2

3

4

5

6

7

8

2 4 6 8 10 12 14 16 18 20

aver

age

dela

y

1/ε

nonredundantmirrorring shortest queuering with matchingshortest queue

Page 265: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 265

Keys

1

2

3

4

5

6

7

8

2 4 6 8 10 12 14 16 18 20

aver

age

dela

y

1/ε

nonredundantmirrorring shortest queuering with matchingshortest queue

same order as curves

Page 266: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 266

Keysplace in white space

1

1.2

1.4

1.6

1.8

2

2 4 6 8 10 12 14 16 18 20

aver

age

dela

y

1/ε

shortest queuehybridlazy sharingmatching

consistent in different figures

Page 267: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 267

Todsünden1. forget explaining theaxes

2. connecting unrelatedpoints

by lines

3. mindless use/overinterpretation

of double-log plot

4. crypticabbreviations

5. microscopiclettering

6. excessivecomplexity

7. pie charts

Page 268: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 268

Arranging Instances

bar charts

stack components of execution time

careful with shading

postprocessing

phase 2

phase 1

preprocessing

Page 269: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 269

Arranging Instancesscatter plots

1

10

100

1000

1 10 100 1000

n / a

ctiv

e se

t siz

e

n/m

Page 270: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 270

Measurements and Connections

straight line between points do not imply claim of linear

interpolation

different with higher order curves

no points imply an even stronger claim. Good for very

dense smooth measurements.

Page 271: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 271

Grids and Ticks

Avoid grids or make it light gray

usually round numbers for tic marks!

sometimes plot important values on the axis

Page 272: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 272

0

5

10

15

20

32 1024 32768n

Measurementlog n + log ln n + 1log n

errors maynot be of statistical nature!

Page 273: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 273

3D

− you cannot read off absolute values

− interesting parts may be hidden

− only one surface

+ good impression of shape

Page 274: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 274

Caption

what is displayed

how has the data been obtained

surrounding text has more.

Page 275: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 275

Check List

Should the experimental setup from the exploratory phase

be redesigned to increase conciseness or accuracy?

What parameters should be varied? What variables should

be measured? How are parameters chosen that cannot be

varied?

Can tables be converted into curves, bar charts, scatter plots

or any other useful graphics?

Should tables be added in an appendix or on a web page?

Should a 3D-plot be replaced by collections of 2D-curves?

Can we reduce the number of curves to be displayed?

How many figures are needed?

Page 276: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 276

Scale thex-axis to makey-values independent of some

parameters?

Should thex-axis have a logarithmic scale? If so, do the

x-values used for measuring have the same basis as the tick

marks?

Should thex-axis be transformed to magnify interesting

subranges?

Is the range ofx-values adequate?

Do we have measurements for the rightx-values, i.e.,

nowhere too dense or too sparse?

Should they-axis be transformed to make the interesting

part of the data more visible?

Should they-axis have a logarithmic scale?

Page 277: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 277

Is it be misleading to start they-range at the smallest

measured value?

Clip the range ofy-values to exclude useless parts of

curves?

Can we use banking to 45?

Are all curves sufficiently well separated?

Can noise be reduced using more accurate measurements?

Are error bars needed? If so, what should they indicate?

Remember that measurement errors are usuallynot random

variables.

Use points to indicate for whichx-values actual data is

available.

Connect points belonging to the same curve.

Page 278: KITalgo2.iti.kit.edu/sanders/courses/algen14/vorlesung.pdf · Sanders: Algorithm EngineeringApril 22, 2014 15 Goals bridge gaps between theory and practice accelerate transfer of

Sanders: Algorithm EngineeringApril 22, 2014 278

Only use splines for connecting points if interpolation is

sensible.

Do not connect points belonging to unrelated problem

instances.

Use different point and line styles for different curves.

Use the same styles for corresponding curves in different

graphs.

Place labels defining point and line styles in the right order

and without concealing the curves.

Captions should make figures self contained.

Give enough information to make experiments

reproducible.