compact data structures: data streamingbecchett/elective/slide/stream.pdf · 30 billions emails, 1...

31
Streaming L. Becchetti What is streaming? Probabilistic excursus Count-Min sketch Heavy hitters Inner product and join Compact data structures: data streaming Luca Becchetti “Sapienza” Universit` a di Roma – Rome, Italy April 26, 2012

Upload: others

Post on 12-Nov-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

Compact data structures: data streaming

Luca Becchetti

“Sapienza” Universita di Roma – Rome, Italy

April 26, 2012

Page 2: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

1 What is streaming?

2 Probabilistic excursus

3 Count-Min sketch

4 Heavy hitters

5 Inner product and join

Page 3: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

Motivations

US Bbones [Odlyzko, 2003] [Ipoque GMBH, 2007]

Traffic explosion in past years [Muthukrishnan, 2005a]30 billions emails, 1 billions SMS, IMs daily (2005)≈ 1 billion packets/router x hr

Page 4: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

Motivations

LAN

LAN

INTERNET

Router

SNMP logFlow log

Packet log

Logs

SNMP: (Router ID, Interface ID, Timestamp, Bytes sent sincelast obs.)

Flow: (Source IP, Dest IP, Start Time, Duration, No. Packets,No. Bytes)

(Source IP, Dest IP, Src/Dest Port Numbers, Time, No. Bytes)

Page 5: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

Challenges [Muthukrishnan, 2005a]

1 link with 2 Gb/s. Say avg packet size is 50 bytes

Number of pkts/sec = 5 Million

Time per pkt = 0.2 µsec (time available for processing)

If we capture pkt headers per packet: src/dest IP, time,no of bytes, etc. at least 10 bytes

Space per second is 50 MB. Space per day is 4.5 TB perlink

ISPs have hundreds of links.

Focus is on solutions for real applications

Note: we seek solutions that work in practice → easy toimplement, require small space, allow fast updates and queries

Page 6: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

Some typical queries [Muthukrishnan, 2005a]

How many distinct IP addresses use a given link currentlyor anytime during the day?

What are the top k voluminous flows currently inprogress in a link?

How many distinct flows were observed?

Are traffic patterns in two routers correlated? What are(un)usual trends?

Network monitoring just one possible application

On-line statistics on search engines’ query logs

On-line statistics on server logs

Finding near-duplicate Web pages

Page 7: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

Informally...

Streaming involves ([Muthukrishnan, 2005a]):

Small number of passes over data. (Typically 1?)

Sublinear space (sublinear in the universe or number ofstream items?)

A model of computation...

Similar to dynamic, online, approximation or randomizedalgorithms, but with more constraints

Constraints impose limitations that make many “easy”problems hard (further in this lecture)

Being poly-time/poly-space no longer sufficient

Page 8: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

The streaming model [Muthukrishnan, 2005b]

Processingunit

Uj

Uj+1

Uj+2

Uj+3

Streaming model

Data flow

Data item arrive over time

Uj processed before Uj+1 arrives

Only one pass (or few passes, at most O(log))

Page 9: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

The streaming model [Muthukrishnan, 2005b]

Underlying data (signal): an n-dimensional array A, ntypically large (e.g., the size of the IP address space)

Update arrive over time. The j-th update is a pairUj = (i , x), where i is an item (index):

Ai = Ai + x

In general, x can be any

Initially: Ai = 0,∀i = 1, . . . n

A(t): the state of the array after the first t updates

Goal

Compute and maintain functions over A in small space,with fast updates and computation

typically, space << n

Page 10: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

Caveats about updates, items and values

j-th update Uj = (i , x)

item i is from a discrete universe of finite size

value x

Example

i = (Source IP, Dest. IP, Protocol)x = packet size (in bytes)

Wlog, i can be considered an integer

Hash to an integer otherwise. Example:

i = (“151.100.12.3”, “210.15.0.2”, “TCP”)

i → h(i), where h(·) maps strings to integers

For example, the integer corresponding to theconcatenation of the strings

Page 11: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

Special cases of the model[Muthukrishnan, 2005b]

j-th update changes A(j) (Time series model)

Cash register: x ≥ 0

Turnstile model: most general model

In this lecture

Compact summaries of data streams (Count-Minsketches [Cormode and Muthukrishnan, 2005])

Statistics: point queries, heavy hitters, join-size, No.distinct items

Only a drop in a sea of results...

Page 12: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

Statistics and aggregates

Point query Q(i): estimate Ai

basic building block for more complex queries

φ-heavy hitters of A: return all i : Ai > φ‖A‖1

Other aggregates (see further)

Join size of two DB relations observed in a streamingfashion

Scalar product of two streams (viewed as two vectors ofsame size)

Number of distinct items observed in A

Page 13: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

Markov’s and Chebyshev’s inequalities

Theorem (Markov’s inequality)

Let X denote a random variable that assumes onlynon-negative values. Then, for every a > 0:

P[X ≥ a] ≤ E[X ]

a.

Theorem (Chebyshev’s inequality)

Let X denote a random variable. Then, for every a > 0:

P[|X − E[X ] | ≥ a] ≤ var [X ]

a2.

Page 14: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

Hash functions

- Often used to implement dictionaries- In general, a hash function h : U → [n] maps elements ofsome discrete universe U onto integers belonging to somerange [n] = 0, 1, . . . , n− 1. Typically, |U| >> n. Ideally, themapping should be uniform. We assume without loss ofgenerality that U is some subset of the integers (why can westate this?)

U h(x)

0

n-1

Ideal behaviour: items in U mapped uniformly at random in 0, ... , n-1

x

Page 15: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

Hash functions/2

Mapping should look “random” → If m items are mapped,then every i ∈ [n] should be the image of ≈ m/n items.

E.g.: if U = 0, . . . ,m − 1 consider h(x) = x mod p,with p a suitable prime.

Problem: this works if items from U appear at random→ often many correlations present

Create an adversarial sequence that maps all elements ofthe sequence onto the same i

Main question in many applications: mitigate the impactof adversarial sequences

Page 16: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

Randomizing the hash function

Use a randomly generated hash function to map items tointegers.Idea: even if correlations present, items are mappedrandomly. Ideal behaviour (hard to achieve in practice)

For each x ∈ U, P[h(x) = j ] = 1/n, for everyj = 1, . . . , n

The values h(x) are independent

Caveats

This does not mean that every evaluation of h(x) yields adifferent random mapping, but only that h(x) is equallylikely to take any value in [0, . . . , n − 1]

Not easy to design an “ideal” hash function (many trulyrandom bits necessary)

Page 17: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

Families of universal hash functions

We assume we have a suitably defined family F of hashfunctions, such that every member of h ∈ F is a functionh : U → [n].

Definition

F is a 2-universal hash family if, for any h(·) chosen uniformlyat random from F and for every x , y ∈ U we have:

P[h(x) = h(y)] ≤ 1

n.

Definitions generalizes to k-universality[Mitzenmacher and Upfal, 2005, Section 13.3]

Problem: define “compact” universal hash families

Page 18: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

A 2-universal family

Assume U = [m] and assume the range of the hash functionswe use is [n], where m ≥ n (typically, m >> n). We considerthe family F defined by hab(x) = ((ax + b) mod p) mod n,where a ∈ 1, . . . , p − 1, b ∈ 0, . . . , p and p is a primep ≥ m.

How to choose u.a.r. from FFor a given p: Simply choose a u.a.r. from 1, . . . , p − 1 andb u.a.r. from 0, . . . , p

Page 19: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

A 2-universal family/cont.

Theorem ([Carter and Wegman, 1979,Mitzenmacher and Upfal, 2005])

F is a 2-universal hash family. In particular, if a, b are chosenuniformly at random:

P[hab(x) = i ] =1

n,∀x ∈ U, i ∈ [n].

P[hab(x) = hab(y)] ≤ 1

n,∀x , y ∈ U.

Page 20: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

The CM sketch[Cormode and Muthukrishnan, 2005]

In the remainder: cash-register model

2-Dimensional array whose size is determined by designparameters ε and δ (their meaning explained further)

Array is C [j , l ], where j = 1, . . . , d and l = 1, . . . ,w

d =⌈ln 1

δ

⌉(depth)

w =⌈eε

⌉(width)

Every entry initially 0

d hash functions h1, . . . , hd chosen uniformly at randomfrom a 2-universal (pairwise-independent) family (seefirst lecture)

hr : 1, . . . , n → 1, . . . ,w

Update

Pair (i , c) is observed, meaning that, ideally, Ai = Ai + c

Page 21: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

CM sketch: Update procedure

w

d

hd

h1

i

+c

+c

+c

+c

CM sketch update

update(i, c)

Require: i: array index, c: value

1: for j : 1 . . . d do2: C[j, hj(i)] = C[j, hj(i)] + c

3: end for

Page 22: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

Point query

Basic query, building block for the others

Q(i): estimate Ai

Point query estimate

PQ(i)

Require: i: array index

1: return Ai = minj C[j, hj(i)]

Theorem ([Cormode and Muthukrishnan, 2005])

Ai ≥ Ai . Furthermore, P[Ai > Ai + ε‖A‖1

]≤ δ, where

‖A‖1 =∑n

i=1 |Ai | is the 1-norm of A.

Page 23: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

Proof of theorem

Define Iijk = 1 if (i 6= k)⋂

(hj(i) = hj(k)), 0 otherwise

P[hj(i) = hj(k)] ≤ 1w ≤

εe by pairwise independence

Define Xij =∑n

k=1 IijkAk

Xij ≥ 0 and C [j , hj(i)] = Ai + Xij → Ai ≥ Ai

Xij is the error introduced by collisions

E[Xij ] = E[∑n

k=1 IijkAk ] =∑n

k=1 AkE[Iijk ] ≤ εe ‖A‖1

Notice that the only random variables are the Iijk ’s andthe Xij ’s

Furthermore,

P[Ai > Ai + ε‖A‖1

]= P[∀j : C [j , hj(i)] > Ai + ε‖A‖1]

= P[∀j : Ai + Xij > Ai + ε‖A‖1] = P[∀j : Xij > eE[Xij ]]

= Πdj=1P[Xij > eE[Xij ]] < e−d ≤ δ,

where the fifth inequality follows from Markov’s inequality.

Page 24: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

Heavy hitters[Cormode and Muthukrishnan, 2005]

φ-heavy hitters of A: i : Ai > φ‖A‖1Fact: No. φ-heavy hitters between 0 and 1/φ

Approximate heavy hitters: accept i such thatAi ≥ (φ− ε)‖A‖1 for some specified ε < φ

We consider the cash-register model

Heavy hitters algorithm: ingredients

CM sketch and point query basic building blocks

Return items whose estimate exceeds φ‖A‖1

Assume cs is the s-th update → ‖A‖1 =∑t

s=1 cs →‖A‖1 can be easily maintained and updated in smallspace

Page 25: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

Heavy hitters: update and query

update(i, c, S, H)

Require: i: array index, c ≥ 0, S: CM sketch, H: heap

1: update(i, c, S) Update CM sketch2: Ai = PQ(i)

3: if Ai > φ‖A‖1 then4: HeapUpdate(i, Ai, H) Insert if i 6∈ H5: end if6: s = HeapMin(H)

7: while PQ(s) ≤ φ‖A‖1 do

8: HeapDelete(s)

9: s = HeapMin(H)

10: end while

Heap and query

Generic heap element: pair (i , Ai ) ordered by Ai

Heavy hitters: return all elements i in H such thatAi > φ‖A‖1

Page 26: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

Heavy hitters: performance

Theorem ([Cormode and Muthukrishnan, 2005])

Assume Inserts only (cash register model). With CM sketches

using space O(

1ε log ‖A‖1

δ

)and update time O

(log ‖A‖1

δ

)per

item:

Every heavy hitter is output

With probability at least 1− δ : i) no item whose realcount is ≤ (φ− ε)‖A‖1 is output and ii) the number of

items in the heap is O(

1φ−ε

)Question

Assume d =⌈eε

⌉and w =

⌈ln n

δ

⌉and let T be the estimated

set of heavy hitters. Recall that Ai ≥ Ai . After any number tof insertions, define Sε = i : Ai < (φ− ε)‖A‖1. Prove that

P[Sε ∩ T 6= ∅] ≤ δ.

Page 27: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

Solution

Consider i ∈ Sε. If i ∈ T ,Ai > φ‖A‖1 → ∀j : C [j , hj(i)] > φ‖A‖1. Hence:

Ai < (φ− ε)‖A‖1 → C [j , hj(i)]− Ai > ε‖A‖1,∀j .

This implies:

P[i ∈ T ] = P[Ai > Ai + ε‖A‖1

]= P[∀j : C [j , hj(i)] > Ai + ε‖A‖1] < e−d =

δ

n,

where the third inequality follows from the general result seenfor PQ(i). Finally, since |Sε| ≤ n:

P[Sε ∩ T 6= ∅] = P[∪i∈Sε(i ∈ T )] ≤ δ

n· |Sε| ≤ δ.

Page 28: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

Join size

4 2 2 3 4 2

142142

R.A

S.A

1 2 3 4

1 2 3 4

**R.B

R.C

Join of two relations R and S on same attribute A

Tuple

We want to compute COUNT(R 1A S)

Space Ω(n) required (in the picture: n = 4)In the example: COUNT(R 1A S) = 3x2 + 2x2 = 10

Underlying operation: compute scalar product of twonon-negative vectors of same size

Page 29: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

Inner product

Given two streams A and B, we want to keep a compactsummary of A · BWe use two CM sketches CA and CB for A and B

Estimation

Let Sj =∑w

k=1 CA[j , k]CB [j , k]

Estimation: return S ← minj Sj

Theorem ([Cormode and Muthukrishnan, 2005])

S ≥ A · B. Furthermore:

P[S > A · B + ε‖A‖1‖B‖1] ≤ δ.

Q1: Prove theorem above. Proof follows same lines as forpoint query

Page 30: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

Carter, J. L. and Wegman, M. N. (1979).

Universal classes of hash functions.

Journal of Computer and System Sciences, 18(2):143–154.

Cormode, G. and Muthukrishnan, S. (2005).

An improved data stream summary: the count-min sketch andits applications.

J. Algorithms, 55(1):58–75.

Ipoque GMBH, G. (2007).

Internet study 2007. URL: http://www.ipoque.com/.

Mitzenmacher, M. and Upfal, E. (2005).

Probability and Computing : Randomized Algorithms andProbabilistic Analysis.

Cambridge University Press.

Muthukrishnan, S. (2005a).

Data stream algorithms. URL:http://www.cs.rutgers.edu/˜muthu/str05.html.

Page 31: Compact data structures: data streamingbecchett/Elective/slide/stream.pdf · 30 billions emails, 1 billions SMS, IMs daily (2005) ˇ1 billion packets/router x hr. Streaming L. Becchetti

Streaming

L. Becchetti

What isstreaming?

Probabilisticexcursus

Count-Minsketch

Heavy hitters

Inner productand join

Muthukrishnan, S. (2005b).

Data streams: Algorithms and applications.

In Foundations and Trends in Theoretical Computer Science,Now Publishers or World Scientific, volume 1.

Draft at authors homepage:http://www.cs.rutgers.edu/˜muthu/stream-1-1.ps.

Odlyzko, A. M. (2003).

Internet traffic growth: sources and implications.

In Proc. of SPIE conference on Optical Transmission Systemsand Equipment for WDM Networking, pages 1–15.