programming network data planes - advanced topics in ... · advanced topics in communication...
TRANSCRIPT
![Page 1: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/1.jpg)
Advanced Topics in Communication Networks
Programming Network Data Planes
ETH Zürich
Alexander Dietmüller
Oct. 11 2018
nsg.ee.ethz.ch
![Page 2: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/2.jpg)
2
Last week on
Advanced Topics in Communication Networks
![Page 3: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/3.jpg)
3
Probabilistic data structures like Bloom Filters
help to trade resources with accuracy
1
0
0
0
0
1
0
0
1
0
hash_a(“Hello”)
hash_b(“Hello”)
hash_c(“Hello”)
INSERT“Hello”
Recap
QUERY“Hello”
hash_a(“Hello”)
hash_b(“Hello”)
hash_c(“Hello”)
![Page 4: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/4.jpg)
4
Bloom Filters take a fixed number of operations,
but hash collisions can cause false positives.
1
0
0
0
0
1
0
0
1
0
hash_a(“Hello”)
hash_b(“Hello”)
hash_c(“Hello”)
INSERT“Hello”
Recap
QUERY“Hello”
hash_a(“Hello”)
hash_b(“Hello”)
hash_c(“Hello”)
![Page 5: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/5.jpg)
5
Bloom Filters take a fixed number of operations,
but hash collisions can cause false positives
1
0
0
0
0
1
0
0
1
0
hash_a(“Hello”)
hash_b(“Hello”)
hash_c(“Hello”)
INSERT“Hello”
QUERY“Bye”
hash_a(“Bye”)
hash_c(“Bye”)
hash_b(“Bye”)
Recap
![Page 6: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/6.jpg)
6
A bloom filter is a streaming algorithm
answering specific questions approximately.
Recap
![Page 7: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/7.jpg)
7
A bloom filter is a streaming algorithm
answering specific questions approximately.
Is X in the stream?What is in the stream?Invertible Bloom Filter
Recap
![Page 8: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/8.jpg)
8
A bloom filter is a streaming algorithm
answering specific questions approximately.
Is X in the stream?What is in the stream?Invertible Bloom Filter
What about other questions?
![Page 9: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/9.jpg)
9
This week on
Advanced Topics in Communication Networks
![Page 10: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/10.jpg)
10
Today we’ll talk about: important questions,
how ‘sketches’ answer them,
and limitations of ‘sketches’
![Page 11: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/11.jpg)
11
Is a certain element in the stream?
Bloom Filter
How frequently does an element appear?
Count Sketch, CountMin Sketch, ...
How many elements belong to a certain subnet?
SketchLearn SigComm ‘18
How many distinct elements are in the stream?
HyperLogLog Sketch, ...
What are the most frequent elements?
Count/CountMin + Heap, …
![Page 12: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/12.jpg)
12
In networking, we talk about packet flows,
but these questions apply to other domains as well,
e.g. search engines and databases.
![Page 13: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/13.jpg)
13
Is a certain element in the stream?
Bloom Filter
How frequently does an element appear?
Count Sketch, CountMin Sketch, ...
How many elements belong to a certain subnet?
SketchLearn SigComm ‘18
How many distinct elements are in the stream?
HyperLogLog Sketch, ...
What are the most frequent elements?
Count/CountMin + Heap, …
![Page 14: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/14.jpg)
14
We are going to look at frequencies,
i.e. how often an element occurs in a data stream.
vector of frequencies (counts)
of all distinct elements xi
x=[x1x2⋮
]
![Page 15: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/15.jpg)
15
We are going to look at frequencies,
i.e. how often an element occurs in a data stream.
vector of frequencies (counts)
of all distinct elements xi
x=[x1x2⋮
]distinct flows
![Page 16: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/16.jpg)
16
In the worst case, an algorithm providing
exact frequencies requires linear space.
![Page 17: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/17.jpg)
17
In the worst case, an algorithm providing
exact frequencies requires linear space.
Data Stream
n elements in total
![Page 18: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/18.jpg)
18
In the worst case, an algorithm providing
exact frequencies requires linear space.
Data Stream
n elements in total
→ n distinct elements
(in the worst case)
![Page 19: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/19.jpg)
19
In the worst case, an algorithm providing
exact frequencies requires linear space.
Data Stream
n elements in total
→ n distinct elements
(in the worst case)
→ n counters required? :(
![Page 20: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/20.jpg)
20
Bloom Filtersquickly “filter” only those
elements that might be in
the set
More efficient by allowing
false positives.
Probabilistic datastructures can help again!
![Page 21: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/21.jpg)
21
Bloom Filtersquickly “filter” only those
elements that might be in
the set
More efficient by allowing
false positives.
Sketchesprovide a approximate
frequencies of elements
in a data stream.
More efficient by allowing
mis-counting.
Probabilistic datastructures can help again!
![Page 22: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/22.jpg)
22
Today we’ll talk about: important questions,
how ‘sketches’ answer them,
limitations of ‘sketches’
![Page 23: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/23.jpg)
23
A CountMin sketch uses the same principles as a
counting bloom filter, but is designed to have
provable L1 error bounds for frequency queries.
![Page 24: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/24.jpg)
24
A CountMin sketch uses the same principles as a
counting bloom filter, but is designed to have
provable L1 error bounds for frequency queries.
x=[x1x2⋮ ]
Notation reminder:
vector of frequencies (counts)
of all distinct elements xi
![Page 25: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/25.jpg)
25
A CountMin sketch uses the same principles as a
counting bloom filter, but is designed to have
provable L1 error bounds for frequency queries.
![Page 26: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/26.jpg)
26
Pr [ x̂ iestimatedfrequency
− x itrue
frequency
≥ ε‖x‖1sum offrequencies
]≤δ
The estimation error exceeds
with a probability smaller than
ε‖x‖1δ
![Page 27: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/27.jpg)
27
Pr [ x̂ iestimatedfrequency
− x itrue
frequency
≥ ε‖x‖1sum offrequencies
]≤δ
relative to L1 norm
The estimation error exceeds
with a probability smaller than
ε‖x‖1δ
![Page 28: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/28.jpg)
28
Pr [ x̂ iestimatedfrequency
− x itrue
frequency
≥ ε‖x‖1sum offrequencies
]≤δ
Let ‖x‖1=10000 , ε=0.01 , δ=0.05
The probability for any estimate to be
off by more than 100 is less than 5%(after counting 10000 elements)
![Page 29: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/29.jpg)
29
A CountMin sketch uses the same principles as a
counting bloom filter, but is designed to have
provable L1 error bounds for frequency queries.
![Page 30: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/30.jpg)
30
A CountMin Sketch uses multiple arrays and hashes.
"
w indicesper array(range of hashes)
d arrays(one hash function per array)
w⋅d counters(total size)
counters
![Page 31: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/31.jpg)
31
xa+1hash_a(“Hello”)
hash_c(“Hello”)
COUNT“Hello”
xc+1
xb+1hash_b(“Hello”)
![Page 32: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/32.jpg)
32
Hash collisions cause over-counting.
xa +...hash_a(“Hello”)
hash_c(“Hello”)
xc +...
xb +...hash_b(“Hello”)
hash_a(“Test”)hash_a(“Net”)
hash_b(“Bye”)hash_b(“UDP”)hash_b(“FUBAR”)
hash_c(“TCP”)
![Page 33: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/33.jpg)
33
Returning the minimum value minimizes the error.
xa
hash_a(“Hello”)
hash_c(“Hello”)
QUERY“Hello”
xc
xb
hash_b(“Hello”)
returnmin(xa,xb,xc)
![Page 34: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/34.jpg)
34
A CountMin sketch uses the same principles as a
counting bloom filter, but is designed to have
provable L1 error bounds for frequency queries.
Pr [ x̂iestimatedfrequency
− xitrue
frequency
≥ ε‖x‖1sum offrequencies
]≤δ
![Page 35: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/35.jpg)
35
Understanding the error bounds allows
dimensioning the sketch optimally.
Error Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
![Page 36: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/36.jpg)
37
x̂ iestimatedfrequency
= minh∈h1 .. hd
x̂ih
estimate forspecific hash
Error Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
![Page 37: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/37.jpg)
38
Pr [X≥ c⋅E [X ]]≤1c
The error bounds can be derived
with Markov’s Inequality
wikipedia.org/wiki/Markov's_inequality
Error Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
![Page 38: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/38.jpg)
39
The error bounds can be derived
with Markov’s Inequality
wikipedia.org/wiki/Markov's_inequality
Error Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
Pr [ x̂ih− x i≥c⋅E [ x̂ i
h− x i]]≤
1c
![Page 39: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/39.jpg)
40
x̂ih= x i + ∑
x j≠ xi
x j 1h (xi , x j)
truefrequency
over-countingfrom hash collisions
Error Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
Pr [ x̂ih− x i≥c⋅E [ x̂ i
h− x i]]≤
1c
![Page 40: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/40.jpg)
41
={1, if h (x i)=h (x j)0, otherwise
x̂ih= x i + ∑
x j≠ xi
x j 1h (xi , x j)
hash collision
Error Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
Pr [ x̂ih− x i≥c⋅E [ x̂ i
h− x i]]≤
1c
![Page 41: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/41.jpg)
42
x̂ih− xi = ∑
x j≠ x i
x j 1h (x i , x j)
estimationerror
over-countingfrom hash collisions
Error Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
Pr [ x̂ih− x i≥c⋅E [ x̂ i
h− x i]]≤
1c
![Page 42: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/42.jpg)
43
x̂ih− xi = ∑
x j≠ x i
x j 1h (x i , x j)
E [ x̂ ih− xi ] = E [ ∑x j≠ x i
x j 1h (xi , x j)]
Error Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
Pr [ x̂ih− x i≥c⋅E [ x̂ i
h− x i]]≤
1c
![Page 43: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/43.jpg)
44
E [ x̂ ih− xi ] = E [ ∑x j≠ x i
x j 1h (xi , x j)]
We treat the data as a constant and the
hash as a random function with certain properties.
constantrandom
wikipedia.org/wiki/Universal_hashing
Error Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
x̂ih− xi = ∑
x j≠ x i
x j 1h (x i , x j)
Pr [ x̂ih− x i≥c⋅E [ x̂ i
h− x i]]≤
1c
![Page 44: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/44.jpg)
45
E [ x̂ ih− xi ] = ∑
x j≠ xi
x j E [1h (x i , x j)]
wikipedia.org/wiki/Universal_hashing
Error Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
x̂ih− xi = ∑
x j≠ x i
x j 1h (x i , x j)
Pr [ x̂ih− x i≥c⋅E [ x̂ i
h− x i]]≤
1c
We treat the data as a constant and the
hash as a random function with certain properties.
![Page 45: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/45.jpg)
46
E [ x̂ ih− xi ] = ∑
x j≠ xi
x j E [1h (x i , x j) ]⏟≤1w
wikipedia.org/wiki/Universal_hashing
Error Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
x̂ih− xi = ∑
x j≠ x i
x j 1h (x i , x j)
Pr [ x̂ih− x i≥c⋅E [ x̂ i
h− x i]]≤
1c
We treat the data as a constant and the
hash as a random function with certain properties.
![Page 46: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/46.jpg)
47
E [ x̂ih− xi ] ≤ ∑
x j≠ x i
x j1w
wikipedia.org/wiki/Universal_hashing
Error Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
x̂ih− xi = ∑
x j≠ x i
x j 1h (x i , x j)
Pr [ x̂ih− x i≥c⋅E [ x̂ i
h− x i]]≤
1c
We treat the data as a constant and the
hash as a random function with certain properties.
![Page 47: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/47.jpg)
48
E [ x̂ih− xi ] ≤ ∑
x j≠ x i
x j1w
≤ ∑x j
x j1w
Error Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
x̂ih− xi = ∑
x j≠ x i
x j 1h (x i , x j)
Pr [ x̂ih− x i≥c⋅E [ x̂ i
h− x i]]≤
1c
![Page 48: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/48.jpg)
49
E [ x̂ih− xi ] ≤ ∑
x j≠ x i
x j1w
≤ ‖x‖11w
Error Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
x̂ih− xi = ∑
x j≠ x i
x j 1h (x i , x j)
Pr [ x̂ih− x i≥c⋅E [ x̂ i
h− x i]]≤
1c
![Page 49: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/49.jpg)
50
Pr [ x̂ih− x i≥c⋅E [ x̂i
h− xi ]⏟
≤1w
‖x‖1
]≤1cError Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
![Page 50: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/50.jpg)
51
Pr [ x̂ih− x i≥
cw
‖x‖1]≤1cError Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
![Page 51: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/51.jpg)
52
Pr [ x̂ih− x i≥ ε
h⏟cw
‖x‖1]≤ δh
⏟1c
Error Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
![Page 52: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/52.jpg)
53
The estimate for each hash has
a well defined L1 error bound.
Pr [ x̂ih− x i≥ ε
h⏟cw
‖x‖1]≤ δh
⏟1c
Error Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
![Page 53: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/53.jpg)
54
The estimate for each hash has
a well defined L1 error bound.
What about the minimum?
Error Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
Pr [ x̂ih− x i≥ ε
h⏟cw
‖x‖1]≤ δh
⏟1c
![Page 54: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/54.jpg)
55
Pr [ x̂ i− x i≥cw
‖x‖1] ≤ ?
Error Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
![Page 55: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/55.jpg)
56
Pr [ minh∈h1 .. hd
x̂ ih
⏟x̂ i
− xi≥cw
‖x‖1] ≤ ?
Error Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
![Page 56: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/56.jpg)
57
∏h∈h1 .. hd
Pr [ x̂ ih− x i≥
cw
‖x‖1] ≤ ?
Multiple hash functions work like independent trials.
Error Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
Pr [ minh∈h1 .. hd
x̂ ih
⏟x̂ i
− xi≥cw
‖x‖1] ≤ ?
⇔
![Page 57: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/57.jpg)
58
∏h∈h1 .. hd
Pr [ x̂ ih− x i≥
cw
‖x‖1]⏟
≤1c
≤ ?
Error Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
error bound per hash
Pr [ minh∈h1 .. hd
x̂ ih
⏟x̂i
− xi≥cw
‖x‖1] ≤ ?
⇔
![Page 58: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/58.jpg)
59
Error Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
∏h∈h1 .. hd
Pr [ x̂ ih− x i≥
cw
‖x‖1]⏟
≤1c
≤1
cd
Pr [ minh∈h1 .. hd
x̂ ih
⏟x̂i
− xi≥cw
‖x‖1] ≤ ?
⇔
![Page 59: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/59.jpg)
60
Error Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
Pr [ minh∈h1 .. hd
x̂ ih
⏟x̂i
− xi≥cw
‖x‖1] ≤1
cd
![Page 60: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/60.jpg)
61
Error Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
Pr [ x̂ i− x i≥cw
‖x‖1] ≤1
cd
![Page 61: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/61.jpg)
62
Error Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
Pr [ x̂ i− x i≥ ε⏟cw
‖x‖1]≤ δ⏟1cd
We have proven the error bounds!
But what about the constant c?
![Page 62: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/62.jpg)
63
For every c, there is a pair ( ) achieving
the error bound and confidence ( ).ε ,δd ,w
ε=cw
⇒ w= ⌈ cε ⌉
δ=1
cd⇒ d=⌈ logc 1δ ⌉
(hash range)
(#hashes)
Error Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
![Page 63: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/63.jpg)
64
Choosing c=e minimizes the
total number of counters.
ε=ew
⇒ w= ⌈ eε ⌉
δ=1
ed⇒ d= ⌈ ln 1δ ⌉
d⋅w=cε logc
1δ
=minimize e
ε ln1δ
(hash range)
(#hashes)
Error Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
![Page 64: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/64.jpg)
65
A CountMin sketch recipe
w= ⌈ eε ⌉
d= ⌈ ln 1δ ⌉
(hash range)
(#hashes)
Given , choosing
requires the minimum number of
counters s.t. the CountMin Sketch
can guarantee that
x̂i− xi≥ε‖x‖1with a probability less than δ
ε ,δ
Error Bounds
per hash/array
Optimal Size
Error Bounds
for the minimum
![Page 65: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/65.jpg)
66
A CountMin sketch uses the same principles as a
counting bloom filter, but is designed to have
provable L1 error bounds for frequency queries.
![Page 66: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/66.jpg)
67
A CountMin sketch uses the same principles as a
counting bloom filter, but is designed to have
provable L1 error bounds for frequency queries.
CountMin sketch recipe
Choose d= ⌈ ln 1δ ⌉ , w= ⌈eε ⌉
Then x̂i− x i≥ε‖x‖1 with a probability less than δ
![Page 67: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/67.jpg)
68
A CountMin sketch uses the same principles as a
counting bloom filter, but is designed to have
provable L1 error bounds for frequency queries.
→ only one design out of many!
![Page 68: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/68.jpg)
69
A Count sketchMin uses the same principles as a
counting bloom filter, but is designed to have
provable L2 error bounds for frequency queries.
![Page 69: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/69.jpg)
70
The Count sketch uses additional hashing to
give L2 error bounds, but requires more resources.
CountMin sketch
h1, …, h
d: U → {1, …, w}
COUNT xi:
for h in h1, …, h
d:
Regh[h(x
i)] + 1
QUERY xi:
return minh in h1, …, hd
(
Regh[h(x
i)]
)
![Page 70: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/70.jpg)
71
CountMin sketch
h1, …, h
d: U → {1, …, w}
COUNT xi:
for h in h1, …, h
d:
Regh[h(x
i)] + 1
QUERY xi:
return minh in h1, …, hd
(
Regh[h(x
i)]
)
Count sketch
h1, …, h
d: U → {1, …, w}
g: U → {+1, -1}
COUNT xi:
for h in h1, …, h
d:
Regh[h(x
i)] + g(x
i)
QUERY xi:
return medianh in h1, …, hd
(
Regh[h(x
i)] * g(x
i)
)
The Count sketch uses additional hashing to
give L2 error bounds, but requires more resources.
![Page 71: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/71.jpg)
72
CountMin sketch recipe
Choose d= ⌈ ln 1δ ⌉ , w= ⌈eε ⌉
Then x̂i− x i≥ε‖x‖1 with a probability less than δ
The Count sketch uses additional hashing to
give L2 error bounds, but requires more resources.
![Page 72: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/72.jpg)
73
CountMin sketch recipe
Choose d= ⌈ ln 1δ ⌉ , w= ⌈eε ⌉
Then x̂i− x i≥ε‖x‖1 with a probability less than δ
Count sketch recipe
Choose d= ⌈ ln 1δ ⌉ , w=⌈eε2 ⌉Then x̂i− x i≥ε‖x‖2 with a probability less than δ
The Count sketch uses additional hashing to
give L2 error bounds, but requires more resources.
![Page 73: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/73.jpg)
Sketches are the new black
OpenSketch
NSDI ‘13
UnivMon
SIGCOMM ‘16
SketchLearn
SIGCOMM ‘18
...and many more!
[source] [source] [source]
![Page 74: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/74.jpg)
Sketches are the new black
OpenSketch
NSDI ‘13
UnivMon
SIGCOMM ‘16
SketchLearn
SIGCOMM ‘18
[source] [source][source]
![Page 75: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/75.jpg)
76
SketchLearn combines multiple sketches with
elaborate post-processing for flexibility
![Page 76: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/76.jpg)
77
Today we’ll talk about: important questions,
how ‘sketches’ answer them,
limitations of ‘sketches’
![Page 77: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/77.jpg)
78
Sketches compute statistical summaries,
favoring elements with high frequency.
Pr [ x̂i− x iestimationerror
≥ε‖x‖1]≤δrelative to sum of all elements
![Page 78: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/78.jpg)
79
Sketches compute statistical summaries,
favoring elements with high frequency.
Let ε=0.01, ‖x‖1=10000 (⇒ ε⋅‖x‖1=100)
Assume two flows xa , xb ,
with ‖xa‖1=1000, ‖xb‖1=50
high frequency
low frequency
![Page 79: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/79.jpg)
80
Sketches compute statistical summaries,
favoring elements with high frequency.
Let ε=0.01, ‖x‖1=10000 (⇒ ε⋅‖x‖1=100)
Assume two flows xa , xb ,
with ‖xa‖1=1000, ‖xb‖1=50
Error relative to stream size: 1%
![Page 80: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/80.jpg)
81
Sketches compute statistical summaries,
favoring elements with high frequency.
Error relative to stream size: 1%
flow size: xa: 10%, xb: 200%
Let ε=0.01, ‖x‖1=10000 (⇒ ε⋅‖x‖1=100)
Assume two flows xa , xb ,
with ‖xa‖1=1000, ‖xb‖1=50
![Page 81: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/81.jpg)
82
Other Problems a Sketch can’t handle
causality patterns rare things
![Page 82: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/82.jpg)
83
Regardless of their limitations, sketches provide
trade-offs between resources and error, and
provable guarantees to rely on.
![Page 83: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct](https://reader034.vdocuments.us/reader034/viewer/2022050210/5f5cd29f3d613f0817608a20/html5/thumbnails/83.jpg)
84
Advanced Topics in Communication Networks
Programming Network Data Planes
ETH Zürich
Alexander Dietmüller
Oct. 11 2018
nsg.ee.ethz.ch