a deep dive into clojure's data structures - euroclojure 2015

98
What Lies Beneath Mohit Thatte EUROCLOJURE 2015 Barcelona A Deep Dive into Clojure’s data structures @mohitthatte @pastafari

Upload: mohit-thatte

Post on 03-Aug-2015

442 views

Category:

Software


5 download

TRANSCRIPT

Page 1: A deep dive into Clojure's data structures - EuroClojure 2015

What Lies Beneath

Mohit Thatte

EUROCLOJURE 2015Barcelona

A Deep Dive into Clojure’s data structures

@mohitthatte @pastafari

Page 2: A deep dive into Clojure's data structures - EuroClojure 2015

A DAY IN THE LIFE

Image: User:Joonspoon Wikimedia Commons

Page 3: A deep dive into Clojure's data structures - EuroClojure 2015

Programs that use Maps

Map API

Map Implementation

Primitives (JVM, et al)

TOWERS OF ABSTRACTION

Page 4: A deep dive into Clojure's data structures - EuroClojure 2015

“Any sufficiently advanced data structure is indistinguishable from magic”

- Me

With apologies to Arthur Clarke

Page 5: A deep dive into Clojure's data structures - EuroClojure 2015

IMMUTABILITY IS GOOD

Page 6: A deep dive into Clojure's data structures - EuroClojure 2015

PERFORMANCE IS NECESSARY

Page 7: A deep dive into Clojure's data structures - EuroClojure 2015

By U.S. Navy photo [Public domain], via Wikimedia Commons

IMMUTABILITY

PERF

Page 8: A deep dive into Clojure's data structures - EuroClojure 2015

Image: Maj. Gen. William Anders, Apollo 8

Page 9: A deep dive into Clojure's data structures - EuroClojure 2015

“… functional programming’s stricture against destructive updates (assignments)

is a staggering handicap, tantamount to confiscating a master chef’s knives.”

- Chris Okasaki

Page 10: A deep dive into Clojure's data structures - EuroClojure 2015

ABSTRACT DATA TYPE

enqueue add an element to the end

head first element

tail remaining elements

QUEUE

INTERFACE INVARIANTS

NAME

Page 11: A deep dive into Clojure's data structures - EuroClojure 2015

THE CHALLENGE

Correct

Performant

ImmutableX

Page 12: A deep dive into Clojure's data structures - EuroClojure 2015

CHALLENGE ACCEPTED

Page 13: A deep dive into Clojure's data structures - EuroClojure 2015

Structural Sharing

KEY IDEAS

Structural Bootstrapping

Hybrid Structures

Page 14: A deep dive into Clojure's data structures - EuroClojure 2015

STRUCTURAL SHARING

:a :b :c :d :e

(assoc v 2 :zz)

:a :b :zz

Page 15: A deep dive into Clojure's data structures - EuroClojure 2015

STRUCTURAL SHARING

:c

:a

:d

:f

:m

(assoc v 4 :zz)

:e:b

:d

:f

:zz

Page 16: A deep dive into Clojure's data structures - EuroClojure 2015

Image: Alan Levine

Page 17: A deep dive into Clojure's data structures - EuroClojure 2015

STRUCTURAL DECOMPOSITION

Image: Alan Chia (Lego Color Bricks)

Page 18: A deep dive into Clojure's data structures - EuroClojure 2015

HYBRID STRUCTURES

Page 19: A deep dive into Clojure's data structures - EuroClojure 2015

LETS DIVE IN!

Page 20: A deep dive into Clojure's data structures - EuroClojure 2015

‘(1 2 3) Lists: Code manipulation

[1 2 3] Vectors: All things sequential

{:a 1 :b 2} Maps: Structured Data

#{\a \e \i \o \u} Sets: Ermm, Sets

CLOJURE DATA STRUCTURES

Page 21: A deep dive into Clojure's data structures - EuroClojure 2015

MAPS

Page 22: A deep dive into Clojure's data structures - EuroClojure 2015

GET GET value for given key

ASSOC ADD key,value to map

DISSOC REMOVE key,value from map

MERGE MERGE two maps together

THE MAP INTERFACE

Page 23: A deep dive into Clojure's data structures - EuroClojure 2015

WHAT MAKES A GOOD MAP?

Constant time operations independent of number of keys

Efficient space utilization even with mutation

Objects as keys, Objects as values

Page 24: A deep dive into Clojure's data structures - EuroClojure 2015

IDEAS

Page 25: A deep dive into Clojure's data structures - EuroClojure 2015

ARRAYS

IDEA #1

Page 26: A deep dive into Clojure's data structures - EuroClojure 2015

:a 1 :b 2 :c 3

KEY VALUE PAIRS

Page 27: A deep dive into Clojure's data structures - EuroClojure 2015

NOT A GREAT MAP!

Time complexity O(n)

Space efficiency NO

Objects as keys YES

Page 28: A deep dive into Clojure's data structures - EuroClojure 2015

HOW DO WE DO BETTER?

Page 29: A deep dive into Clojure's data structures - EuroClojure 2015

Image: www.pooktre.com

TREES TO THE RESCUE

Page 30: A deep dive into Clojure's data structures - EuroClojure 2015

Ramon Llull, Catalunya c. 1250

Arbol de ciencia

Page 31: A deep dive into Clojure's data structures - EuroClojure 2015

IDEA #2

BINARY SEARCH TREE

Page 32: A deep dive into Clojure's data structures - EuroClojure 2015

13 a

8 f 17

1 11q b

6 z

15 s

r

n25

t22 u27

Page 33: A deep dive into Clojure's data structures - EuroClojure 2015

13 a

17

m

r

25

u27

Page 34: A deep dive into Clojure's data structures - EuroClojure 2015

NOT A GREAT MAP!

Time complexity worst case O(n)

Space efficiency POSSIBLY

Objects as keys YES

Page 35: A deep dive into Clojure's data structures - EuroClojure 2015

How do we keep our trees in ‘balance’?

Page 36: A deep dive into Clojure's data structures - EuroClojure 2015

IDEA #3

BALANCED BINARY SEARCH TREES

Page 37: A deep dive into Clojure's data structures - EuroClojure 2015

RED BLACK TREES

ALWAYS BALANCED, 100 % MONEY BACK GUARANTEE

Guibas, Sedgwick 1978

Page 38: A deep dive into Clojure's data structures - EuroClojure 2015

RED BLACK TREES

Root is black

Every path from root to an empty node contains the same number of black nodes

Every node is colored red or black

No red node can have a red child

Page 39: A deep dive into Clojure's data structures - EuroClojure 2015

RED BLACK TREES

Okasaki ‘96

Page 40: A deep dive into Clojure's data structures - EuroClojure 2015

A PRETTY GOOD MAP!

Time complexity O(log2N)

Space efficiency YES

Objects as keys YES

Page 41: A deep dive into Clojure's data structures - EuroClojure 2015

Clojure’s sorted-maps are Red Black Trees

Page 42: A deep dive into Clojure's data structures - EuroClojure 2015

CONSTRAINTS

KEYS MUST BE COMPARABLE

KEYS ARE COMPARED AT EVERY NODE, THIS CAN BE EXPENSIVE

Page 43: A deep dive into Clojure's data structures - EuroClojure 2015

IDEA #4

TRIE - SEARCH BY DIGIT

Page 44: A deep dive into Clojure's data structures - EuroClojure 2015

t apLEVEL 0

LEVEL 1

LEVEL 2

Page 45: A deep dive into Clojure's data structures - EuroClojure 2015

next(node, symbol)

FINITE STATE MACHINE

Symbols #{a..z}

Nodes, Edges

Page 46: A deep dive into Clojure's data structures - EuroClojure 2015

TRIE IMPLEMENTATIONS

Page 47: A deep dive into Clojure's data structures - EuroClojure 2015

Associate each symbol with an offset, e.g a=0,b=1,…

LOOKUP TABLES

next = lookup(node, offset)

Page 48: A deep dive into Clojure's data structures - EuroClojure 2015

Fast and space efficient trie searches, Bagwell 2000

ADD

Page 49: A deep dive into Clojure's data structures - EuroClojure 2015

NOT A GREAT MAP!

Time complexity O(logmN)

Space efficiency NO

Objects as keys NO

Page 50: A deep dive into Clojure's data structures - EuroClojure 2015

How do we avoid null nodes?

Page 51: A deep dive into Clojure's data structures - EuroClojure 2015

IDEA #4

BST + TRIE = TSTBentley, Sedgwick 1998

Page 52: A deep dive into Clojure's data structures - EuroClojure 2015

Fast and space efficient trie searches, Bagwell 2000

ADD

Page 53: A deep dive into Clojure's data structures - EuroClojure 2015

A DECENT MAP

Time complexity ~O(log2N)

Space efficiency YES

Objects as keys NO

Page 54: A deep dive into Clojure's data structures - EuroClojure 2015

No null nodes, but can we do better

than log2N?

Page 55: A deep dive into Clojure's data structures - EuroClojure 2015

CHALLENGE ACCEPTED

Page 56: A deep dive into Clojure's data structures - EuroClojure 2015

Fast and space efficient trie searches, Bagwell 2000

Array Mapped Trie

IDEA #5

Page 57: A deep dive into Clojure's data structures - EuroClojure 2015

Use bitmaps to determine presence or absence

of symbol

Page 58: A deep dive into Clojure's data structures - EuroClojure 2015

Lets say we have 16 symbols, 0…15

Page 59: A deep dive into Clojure's data structures - EuroClojure 2015

0 1 0 0 0 1 0 0 1 1 1 0 0 0 0 0

USING BITMAPS

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Does the symbol with offset 6 exist?

mask = 1 << offset bitmap & mask

0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0

bitwise AND with a mask

Page 60: A deep dive into Clojure's data structures - EuroClojure 2015

There’s an array alongside that only contains entries

for the 1’s. NOT pre-allocated.

Page 61: A deep dive into Clojure's data structures - EuroClojure 2015

What offset in the dynamic array should I look at?

Page 62: A deep dive into Clojure's data structures - EuroClojure 2015

Image: Martin Fisch, flickr.com

USE THE 1’S AS TALLY MARKS

Page 63: A deep dive into Clojure's data structures - EuroClojure 2015

0 1 0 0 0 1 0 0 1 1 1 0 0 0 0 0

0 1 2 3 4

MapEntry MapEntrySubTrie Pointer MapEntry MapEntry

Page 64: A deep dive into Clojure's data structures - EuroClojure 2015

0 1 0 0 0 1 0 0 1 1 1 0 0 0 0 0

USING BITMAPS15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Where in the array is the entry for ‘6’?

Integer.bitCount(bitmap & mask)

0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1

Count tally marks to the ‘right’ of offset

mask = (1 << 6 ) - 1How do I create a mask to do that?

0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0

Page 65: A deep dive into Clojure's data structures - EuroClojure 2015

What happens if I insert a new map entry?

Page 66: A deep dive into Clojure's data structures - EuroClojure 2015

0 1 1 0 0 1 0 0 1 1 1 0 0 0 0 0

0 1 2 3 4

MapEntry MapEntry MapEntry MapEntry MapEntry

Page 67: A deep dive into Clojure's data structures - EuroClojure 2015

0 1 1 0 0 1 0 0 1 1 1 0 0 0 0 0

0 1 2 3 4 5

Map Entry

Map Entry

SubTrie Pointer

Map Entry

Map Entry

Map Entry

Page 68: A deep dive into Clojure's data structures - EuroClojure 2015

A DECENT MAP

Time complexity O(logmN)

Space efficiency YES

Objects as keys NO

Page 69: A deep dive into Clojure's data structures - EuroClojure 2015

How do we support arbitrary

Objects as keys?

Page 70: A deep dive into Clojure's data structures - EuroClojure 2015

Ideal hash trees, Bagwell 2001

Hashing + AMT

IDEA #6

Page 71: A deep dive into Clojure's data structures - EuroClojure 2015

Ideal hash trees, Bagwell 2001

Use a good hash function to generate an integer key.

STEP 1

0010 1101 1011 1110 1100 1111 1111 1001

hasheq

Page 72: A deep dive into Clojure's data structures - EuroClojure 2015

STEP 2

72021 35

Divide the 32 bit integer into ‘symbols’ 5 bits at a time.

00101 001111010010101 000110100101

11

Use the ‘symbols’ to walk down an AMT

Page 73: A deep dive into Clojure's data structures - EuroClojure 2015

t bits per symbol give

2t symbols

Page 74: A deep dive into Clojure's data structures - EuroClojure 2015

Why 5 bits?

Page 75: A deep dive into Clojure's data structures - EuroClojure 2015

BIT JUGGLING!Compute ‘symbols’ by shifting and masking

0011100011001011010010101010010100 00000 00000 00000 00000 00000 11111

(hash >>> shift) & 0x01f

How to calculate nth digit?

Shift by 5*n and mask with 0x1f

Page 76: A deep dive into Clojure's data structures - EuroClojure 2015

BEST COMMENT EVER.

A persistent rendition of Phil Bagwell's Hash Array Mapped Trie Hickey R., Grand C., Emerick C., Miller A., Fingerhut A.

Uses path copying for persistence HashCollision leaves vs. extended hashing Node polymorphism vs. conditionals No sub-tree pools or root-resizing Any errors are my own

PersistentHashMap.java:19

Page 77: A deep dive into Clojure's data structures - EuroClojure 2015

NODE POLYMORPHISM

ArrayNode - 32 wide pointers to sub-tries

BitmapIndexedNode - bitmap + dynamic array

HashCollisionNode - array for things that collide

Page 78: A deep dive into Clojure's data structures - EuroClojure 2015

EXAMPLE

(let [h (zipmap (range 1e6) (range 1e6))] (get h 123456))

Page 79: A deep dive into Clojure's data structures - EuroClojure 2015

10111 111001100101001 0001028259 223

0101100000110

shift = 0ArrayNode

ArrayNodeshift = 5

ArrayNodeshift = 10

BitmapIndexedNodeshift = 15

… and then follow the AMT down

Page 80: A deep dive into Clojure's data structures - EuroClojure 2015

A GOOD MAP

Time complexity O(log32N)

Space efficiency YES

Objects as keys YES

Page 81: A deep dive into Clojure's data structures - EuroClojure 2015

Key compared only once

Bit juggling for great performance!

HAMT

~6 hops to a leaf node

Page 82: A deep dive into Clojure's data structures - EuroClojure 2015

NEED ROOT RESIZING

NOT AMENABLE TO STRUCTURAL SHARING

REGULAR HASH TABLE?

Page 83: A deep dive into Clojure's data structures - EuroClojure 2015

UPDATES?

Search for the key, clone leaf nodes and path to root

Page 84: A deep dive into Clojure's data structures - EuroClojure 2015

VECTORS

Page 85: A deep dive into Clojure's data structures - EuroClojure 2015

ArrayNode’s all the way. Break ‘index’ into digits and walk down levels.

INTUITION

(let [arr (vec (range 1e6))] (nth arr 123456))

Page 86: A deep dive into Clojure's data structures - EuroClojure 2015

030 182400

shift = 15ArrayNode

ArrayNodeshift = 10

ArrayNode

shift = 5

ArrayNodeshift = 0

00011 000001001011000000000000000000

123456

Page 87: A deep dive into Clojure's data structures - EuroClojure 2015

THE TAIL OPTIMIZATIONPersistentVector

count shift root tail

Page 88: A deep dive into Clojure's data structures - EuroClojure 2015

RIGHT TOOL FOR THE JOB

By Schnobby (Own work) [CC BY-SA 3.0], via Wikimedia Commons

Page 89: A deep dive into Clojure's data structures - EuroClojure 2015

HashMaps do not merge efficiently

Page 90: A deep dive into Clojure's data structures - EuroClojure 2015

data.int-mapMAP CATENATION

Okasaki & Gill’s “Fast Mergeable int maps”

Zach Tellman

Page 91: A deep dive into Clojure's data structures - EuroClojure 2015

Vectors do not concat efficiently

Vectors do not subvec efficiently

Page 92: A deep dive into Clojure's data structures - EuroClojure 2015

VECTOR CATENATION

Based on Bagwell and Rompf, “RRB-Trees: Efficient Immutable Vectors”

logarithmic catenation and slicing

Michal Marczyk

core.rrb-vector

TODO: benchmarks

Page 93: A deep dive into Clojure's data structures - EuroClojure 2015

CTRIESMichál Marczyk

Tomorrow at 0850

Page 94: A deep dive into Clojure's data structures - EuroClojure 2015

1959 Birandais, Fredkin Trie

1960 Windley,Booth, Colin,Hibbard Binary Search Trees

1962 Adelson-Velsky, Landis AVL Trees

1978 Guibas, Sedgwick Red Black Trees

1985 Sleator, Tarjan Splay Trees

1996 Okasaki Purely Functional Data Structures

1998 Sedgwick Ternary Search Trees

2000 Phil Bagwell AMT

2001 Phil Bagwell HAMT

2007 Rich Hickey Clojure!

Page 95: A deep dive into Clojure's data structures - EuroClojure 2015

Reading List

Ideal Hash Trees, Bagwell 2001

Fast and efficient trie searches, Bagwell 2000

Fast Mergeable Integer Maps, Okasaki & Gill, 1998

The worlds fastest scrabble program, Appel & Jacobson, 1988

File searching using variable length keys, Birandais, 1959

Purely Functional Data Structures, Okasaki 1996

Page 96: A deep dive into Clojure's data structures - EuroClojure 2015

Polymatheia: Jean Niklas L’Orange

Page 97: A deep dive into Clojure's data structures - EuroClojure 2015

QUESTIONS?

Ask Michal or Zach or Jean Niklas :)

Page 98: A deep dive into Clojure's data structures - EuroClojure 2015

THANK YOU