basic algorithmic techniques for data streams
TRANSCRIPT
![Page 1: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/1.jpg)
Basic algorithmic techniques for data streams
Mariano Zelke
Goethe-Universitat Frankfurt am MainInstitut fur Informatik
Arbeitsgruppe Theorie Komplexer Systeme
DEIS’10, Dagstuhl, Nov. 2010
![Page 2: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/2.jpg)
Contents
Basic Definitions
SamplingGeneral IdeaReservoir SamplingSampling Applications
Advanced SamplingAMS SamplingSliding Window Sampling
Count-Min Sketch
Mariano Zelke Basic algorithmic techniques for data streams 2/12
![Page 3: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/3.jpg)
Data Stream Model
I Stream: m elements of some universe of size n
a1, a2, a3, . . . , am ai ∈ 1, . . . , nI Goal: gain information about stream
statistical information (median, frequency moments, ...),longest increasing subsequence,...
I But: algorithms are restricted to
I sequential access to items in streamI limited memory, sublinear in m and n
Mariano Zelke Basic algorithmic techniques for data streams 3/12
![Page 4: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/4.jpg)
Data Stream Model
I Stream: m elements of some universe of size n
a1, a2, a3, . . . , am ai ∈ 1, . . . , nI Goal: gain information about stream
statistical information (median, frequency moments, ...),longest increasing subsequence,...
I But: algorithms are restricted to
I sequential access to items in streamI limited memory, sublinear in m and n
Mariano Zelke Basic algorithmic techniques for data streams 3/12
![Page 5: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/5.jpg)
Data Stream Model
I Stream: m elements of some universe of size n
a1, a2, a3, . . . , am ai ∈ 1, . . . , n
I Goal: gain information about stream
statistical information (median, frequency moments, ...),longest increasing subsequence,...
I But: algorithms are restricted to
I sequential access to items in streamI limited memory, sublinear in m and n
Mariano Zelke Basic algorithmic techniques for data streams 3/12
![Page 6: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/6.jpg)
Data Stream Model
I Stream: m elements of some universe of size n
a1, a2, a3, . . . , am ai ∈ 1, . . . , nI Goal: gain information about stream
statistical information (median, frequency moments, ...),longest increasing subsequence,...
I But: algorithms are restricted to
I sequential access to items in streamI limited memory, sublinear in m and n
Mariano Zelke Basic algorithmic techniques for data streams 3/12
![Page 7: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/7.jpg)
Data Stream Model
I Stream: m elements of some universe of size n
a1, a2, a3, . . . , am ai ∈ 1, . . . , nI Goal: gain information about stream
statistical information (median, frequency moments, ...),longest increasing subsequence,...
I But: algorithms are restricted to
I sequential access to items in streamI limited memory, sublinear in m and n
Mariano Zelke Basic algorithmic techniques for data streams 3/12
![Page 8: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/8.jpg)
Data Stream Model
I Stream: m elements of some universe of size n
a1, a2, a3, . . . , am ai ∈ 1, . . . , nI Goal: gain information about stream
statistical information (median, frequency moments, ...),longest increasing subsequence,...
I But: algorithms are restricted to
I sequential access to items in streamI limited memory, sublinear in m and n
Mariano Zelke Basic algorithmic techniques for data streams 3/12
![Page 9: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/9.jpg)
Data Stream Model
I Stream: m elements of some universe of size n
a1, a2, a3, . . . , am ai ∈ 1, . . . , nI Goal: gain information about stream
statistical information (median, frequency moments, ...),longest increasing subsequence,...
I But: algorithms are restricted toI sequential access to items in stream
I limited memory, sublinear in m and n
Mariano Zelke Basic algorithmic techniques for data streams 3/12
![Page 10: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/10.jpg)
Data Stream Model
I Stream: m elements of some universe of size n
a1, a2, a3, . . . , am ai ∈ 1, . . . , nI Goal: gain information about stream
statistical information (median, frequency moments, ...),longest increasing subsequence,...
I But: algorithms are restricted toI sequential access to items in streamI limited memory, sublinear in m and n
Mariano Zelke Basic algorithmic techniques for data streams 3/12
![Page 11: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/11.jpg)
Sampling
General idea:
I sample (= select) items from stream according to some rule
I rule is randomized or deterministic
I use sampled items to get information about whole stream
I only need to store sampled items⇒ good for streaming
Mariano Zelke Basic algorithmic techniques for data streams 4/12
![Page 12: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/12.jpg)
Sampling
General idea:
I sample (= select) items from stream according to some rule
I rule is randomized or deterministic
I use sampled items to get information about whole stream
I only need to store sampled items⇒ good for streaming
Mariano Zelke Basic algorithmic techniques for data streams 4/12
![Page 13: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/13.jpg)
Sampling
General idea:
I sample (= select) items from stream according to some rule
I rule is randomized or deterministic
I use sampled items to get information about whole stream
I only need to store sampled items⇒ good for streaming
Mariano Zelke Basic algorithmic techniques for data streams 4/12
![Page 14: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/14.jpg)
Sampling
General idea:
I sample (= select) items from stream according to some rule
I rule is randomized or deterministic
I use sampled items to get information about whole stream
I only need to store sampled items⇒ good for streaming
Mariano Zelke Basic algorithmic techniques for data streams 4/12
![Page 15: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/15.jpg)
Sampling
General idea:
I sample (= select) items from stream according to some rule
I rule is randomized or deterministic
I use sampled items to get information about whole stream
I only need to store sampled items⇒ good for streaming
Mariano Zelke Basic algorithmic techniques for data streams 4/12
![Page 16: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/16.jpg)
Sampling
General idea:
I sample (= select) items from stream according to some rule
I rule is randomized or deterministic
I use sampled items to get information about whole stream
I only need to store sampled items⇒ good for streaming
Mariano Zelke Basic algorithmic techniques for data streams 4/12
![Page 17: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/17.jpg)
Easy Starter
Sample out a single item uniformly at random from the stream
I Easy if m is known in advance:
I Pick a random number r ∈ 1, 2, . . . ,mI Go over the stream and snatch out sampled item
ar
But what if m is not known in advance?
Mariano Zelke Basic algorithmic techniques for data streams 5/12
![Page 18: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/18.jpg)
Easy Starter
Sample out a single item uniformly at random from the stream
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ar , . . . , am
I Easy if m is known in advance:
I Pick a random number r ∈ 1, 2, . . . ,mI Go over the stream and snatch out sampled item
ar
But what if m is not known in advance?
Mariano Zelke Basic algorithmic techniques for data streams 5/12
![Page 19: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/19.jpg)
Easy Starter
Sample out a single item uniformly at random from the stream
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ar , . . . , am
I Easy if m is known in advance:
I Pick a random number r ∈ 1, 2, . . . ,mI Go over the stream and snatch out sampled item
ar
But what if m is not known in advance?
Mariano Zelke Basic algorithmic techniques for data streams 5/12
![Page 20: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/20.jpg)
Easy Starter
Sample out a single item uniformly at random from the stream
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ar , . . . , am
I Easy if m is known in advance:
I Pick a random number r ∈ 1, 2, . . . ,m
I Go over the stream and snatch out sampled item
ar
But what if m is not known in advance?
Mariano Zelke Basic algorithmic techniques for data streams 5/12
![Page 21: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/21.jpg)
Easy Starter
Sample out a single item uniformly at random from the stream
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ar , . . . , am
I Easy if m is known in advance:
I Pick a random number r ∈ 1, 2, . . . ,mI Go over the stream and snatch out sampled item
ar
But what if m is not known in advance?
Mariano Zelke Basic algorithmic techniques for data streams 5/12
![Page 22: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/22.jpg)
Easy Starter
Sample out a single item uniformly at random from the stream
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ar , . . . , am
I Easy if m is known in advance:
I Pick a random number r ∈ 1, 2, . . . ,mI Go over the stream and snatch out sampled item
ar
But what if m is not known in advance?
Mariano Zelke Basic algorithmic techniques for data streams 5/12
![Page 23: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/23.jpg)
Easy Starter
Sample out a single item uniformly at random from the stream
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ar , . . . , am
I Easy if m is known in advance:
I Pick a random number r ∈ 1, 2, . . . ,mI Go over the stream and snatch out sampled item
ar
But what if m is not known in advance?
Mariano Zelke Basic algorithmic techniques for data streams 5/12
![Page 24: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/24.jpg)
Easy Starter
Sample out a single item uniformly at random from the stream
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ar , . . . , am
I Easy if m is known in advance:
I Pick a random number r ∈ 1, 2, . . . ,mI Go over the stream and snatch out sampled item
ar
But what if m is not known in advance?
Mariano Zelke Basic algorithmic techniques for data streams 5/12
![Page 25: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/25.jpg)
Easy Starter
Sample out a single item uniformly at random from the stream
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ar , . . . , am
I Easy if m is known in advance:
I Pick a random number r ∈ 1, 2, . . . ,mI Go over the stream and snatch out sampled item
ar
But what if m is not known in advance?
Mariano Zelke Basic algorithmic techniques for data streams 5/12
![Page 26: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/26.jpg)
Easy Starter
Sample out a single item uniformly at random from the stream
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ar , . . . , am
I Easy if m is known in advance:
I Pick a random number r ∈ 1, 2, . . . ,mI Go over the stream and snatch out sampled item
ar
But what if m is not known in advance?
Mariano Zelke Basic algorithmic techniques for data streams 5/12
![Page 27: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/27.jpg)
Easy Starter
Sample out a single item uniformly at random from the stream
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ar , . . . , am
I Easy if m is known in advance:
I Pick a random number r ∈ 1, 2, . . . ,mI Go over the stream and snatch out sampled item
ar
But what if m is not known in advance?
Mariano Zelke Basic algorithmic techniques for data streams 5/12
![Page 28: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/28.jpg)
Easy Starter
Sample out a single item uniformly at random from the stream
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ar , . . . , am
I Easy if m is known in advance:
I Pick a random number r ∈ 1, 2, . . . ,mI Go over the stream and snatch out sampled item
ar
But what if m is not known in advance?
Mariano Zelke Basic algorithmic techniques for data streams 5/12
![Page 29: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/29.jpg)
Easy Starter
Sample out a single item uniformly at random from the stream
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ar , . . . , am
I Easy if m is known in advance:
I Pick a random number r ∈ 1, 2, . . . ,mI Go over the stream and snatch out sampled item
ar
But what if m is not known in advance?
Mariano Zelke Basic algorithmic techniques for data streams 5/12
![Page 30: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/30.jpg)
Easy Starter
Sample out a single item uniformly at random from the stream
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ar , . . . , am
I Easy if m is known in advance:
I Pick a random number r ∈ 1, 2, . . . ,mI Go over the stream and snatch out sampled item
ar
But what if m is not known in advance?
Mariano Zelke Basic algorithmic techniques for data streams 5/12
![Page 31: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/31.jpg)
Easy Starter
Sample out a single item uniformly at random from the stream
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ar , . . . , am
I Easy if m is known in advance:
I Pick a random number r ∈ 1, 2, . . . ,mI Go over the stream and snatch out sampled item
ar
But what if m is not known in advance?
Mariano Zelke Basic algorithmic techniques for data streams 5/12
![Page 32: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/32.jpg)
Easy Starter
Sample out a single item uniformly at random from the stream
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ar , . . . , am
I Easy if m is known in advance:
I Pick a random number r ∈ 1, 2, . . . ,mI Go over the stream and snatch out sampled item
ar
But what if m is not known in advance?
Mariano Zelke Basic algorithmic techniques for data streams 5/12
![Page 33: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/33.jpg)
Easy Starter
Sample out a single item uniformly at random from the stream
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ar , . . . , am
I Easy if m is known in advance:
I Pick a random number r ∈ 1, 2, . . . ,mI Go over the stream and snatch out sampled item
ar
But what if m is not known in advance?
Mariano Zelke Basic algorithmic techniques for data streams 5/12
![Page 34: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/34.jpg)
Easy Starter
Sample out a single item uniformly at random from the stream
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ar , . . . , am
I Easy if m is known in advance:
I Pick a random number r ∈ 1, 2, . . . ,mI Go over the stream and snatch out sampled item
ar
But what if m is not known in advance?
Mariano Zelke Basic algorithmic techniques for data streams 5/12
![Page 35: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/35.jpg)
Easy Starter
Sample out a single item uniformly at random from the stream
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ar , . . . , am
I Easy if m is known in advance:
I Pick a random number r ∈ 1, 2, . . . ,mI Go over the stream and snatch out sampled item
ar
But what if m is not known in advance?
Mariano Zelke Basic algorithmic techniques for data streams 5/12
![Page 36: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/36.jpg)
Easy Starter
Sample out a single item uniformly at random from the stream
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ar , . . . , am
I Easy if m is known in advance:
I Pick a random number r ∈ 1, 2, . . . ,mI Go over the stream and snatch out sampled item ar
But what if m is not known in advance?
Mariano Zelke Basic algorithmic techniques for data streams 5/12
![Page 37: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/37.jpg)
Easy Starter
Sample out a single item uniformly at random from the stream
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ar , . . . , am
I Easy if m is known in advance:
I Pick a random number r ∈ 1, 2, . . . ,mI Go over the stream and snatch out sampled item ar
But what if m is not known in advance?
Mariano Zelke Basic algorithmic techniques for data streams 5/12
![Page 38: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/38.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out a single item uniformly at random from the streamwithout knowing its length
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 39: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/39.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out a single item uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , ai+1, ai+2, . . . , am
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 40: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/40.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out a single item uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , ai+1, ai+2, . . . , am
current sample:
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 41: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/41.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out a single item uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , ai+1, ai+2, . . . , am
current sample: a1
take as sample
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 42: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/42.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out a single item uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , ai+1, ai+2, . . . , am
current sample: ar
replace with probability 1/2
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 43: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/43.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out a single item uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , ai+1, ai+2, . . . , am
current sample: ar
replace with probability 1/3
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 44: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/44.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out a single item uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , ai+1, ai+2, . . . , am
current sample: ar
replace with probability 1/4
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 45: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/45.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out a single item uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , ai+1, ai+2, . . . , am
current sample: ar
replace with probability 1/i
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 46: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/46.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out a single item uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , ai+1, ai+2, . . . , am
current sample: ar
replace with probability 1/m
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 47: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/47.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out a single item uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , ai+1, ai+2, . . . , am
final sample: ar
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 48: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/48.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out a single item uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , ai+1, ai+2, . . . , am
current sample:
Pr [final sample ai ] =
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 49: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/49.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out a single item uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , ai+1, ai+2, . . . , am
current sample: ai
replace
Pr [final sample ai ] = 1i
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 50: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/50.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out a single item uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , ai+1, ai+2, . . . , am
current sample: ai
do not replace
Pr [final sample ai ] = 1i × (1− 1
i+1 )
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 51: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/51.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out a single item uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , ai+1, ai+2, . . . , am
current sample: ai
do not replace
Pr [final sample ai ] = 1i × (1− 1
i+1 )× (1− 1i+2 )
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 52: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/52.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out a single item uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , ai+1, ai+2, . . . , am
current sample: ai
do not replace
. . .
Pr [final sample ai ] = 1i × (1− 1
i+1 )× (1− 1i+2 )× . . . × (1− 1
m )
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 53: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/53.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out a single item uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , ai+1, ai+2, . . . , am
final sample: ai
Pr [final sample ai ] = 1i × (1− 1
i+1 )× (1− 1i+2 )× . . . × (1− 1
m )
= 1i ×
ii+1 × i+1
i+2 × . . . × m−1m
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 54: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/54.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out a single item uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , ai+1, ai+2, . . . , am
final sample: ai
Pr [final sample ai ] = 1i × (1− 1
i+1 )× (1− 1i+2 )× . . . × (1− 1
m )
= 1i ×
ii+1 × i+1
i+2 × . . . × m−1m
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 55: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/55.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out a single item uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , ai+1, ai+2, . . . , am
final sample: ai
Pr [final sample ai ] = 1i × (1− 1
i+1 )× (1− 1i+2 )× . . . × (1− 1
m )
= 1m
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 56: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/56.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out a single item uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , ai+1, ai+2, . . . , am
final sample: ai
Pr [final sample ai ] = 1i × (1− 1
i+1 )× (1− 1i+2 )× . . . × (1− 1
m )
= 1m
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 57: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/57.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out several items uniformly at random from the streamwithout knowing its length
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 58: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/58.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out several items uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 59: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/59.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out several items uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
current samples
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 60: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/60.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out several items uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
current samples a1 a2 a3 a4
take into sample
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 61: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/61.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out several items uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
current samples a1 a2 a3 a4
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 62: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/62.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out several items uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
current samples a1 a2 a3 a4
a5
sample with probability 1/5
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 63: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/63.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out several items uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
current samples a1 a2 a3 a4
a5
replace a sampled elementuniformly at random 1
414
14
14
sample with probability 1/5
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 64: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/64.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out several items uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
current samples a1 a2 a3 a4
a5
replace a sampled elementuniformly at random
sample with probability 1/5
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 65: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/65.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out several items uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
current samples a1 a5 a3 a4
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 66: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/66.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out several items uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
current samples a1 a5 a3 a4
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 67: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/67.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out several items uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
current samples a1 a5 a3 a4
a6
sample with probability 1/6
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 68: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/68.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out several items uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
current samples a1 a5 a3 a4
a6
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 69: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/69.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out several items uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
current samples a1 a5 a3 a4
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 70: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/70.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out several items uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
current samples ar 1 ar 2 ar 3 ar 4
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 71: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/71.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out several items uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
current samples ar 1 ar 2 ar 3 ar 4
ai
sample with probability 1/i
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 72: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/72.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out several items uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
current samples ar 1 ar 2 ar 3 ar 4
ai
replace a sampled elementuniformly at random 1
414
14
14
sample with probability 1/i
Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 73: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/73.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out several items uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
current samples ar 1 ar 2 ar 3 ar 4
ai
replace a sampled elementuniformly at random 1
414
14
14
sample with probability 1/i
Memory usage for sampling k items:Mariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 74: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/74.jpg)
Reservoir Sampling [J. Vitter ’85]
Sample out several items uniformly at random from the streamwithout knowing its length
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
current samples ar 1 ar 2 ar 3 ar 4
ai
replace a sampled elementuniformly at random 1
414
14
14
sample with probability 1/i
Memory usage for sampling k items: k · log nMariano Zelke Basic algorithmic techniques for data streams 6/12
![Page 75: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/75.jpg)
Sampling Applications
stream:
Mariano Zelke Basic algorithmic techniques for data streams 7/12
![Page 76: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/76.jpg)
Sampling Applications
stream:
sampling
samples:
Mariano Zelke Basic algorithmic techniques for data streams 7/12
![Page 77: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/77.jpg)
Sampling Applications
stream:
sampling
samples:
Goal: determine query selectivity on items of stream
(assume query to be invariant of stream order)
Mariano Zelke Basic algorithmic techniques for data streams 7/12
![Page 78: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/78.jpg)
Sampling Applications
stream:
sampling
samples:
determine selectivity of query on samples
Mariano Zelke Basic algorithmic techniques for data streams 7/12
![Page 79: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/79.jpg)
Sampling Applications
stream:
sampling
samples:
determine selectivity of query on samples
To get (1± ε)-estimate with probability 1− δ: What sample size s ?
Mariano Zelke Basic algorithmic techniques for data streams 7/12
![Page 80: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/80.jpg)
Sampling Applications
stream:
sampling
samples:
determine selectivity of query on samples
To get (1± ε)-estimate with probability 1− δ: What sample size s ?
Query selects mc stream items ⇒ Exp[ s+ ] = s
c
Mariano Zelke Basic algorithmic techniques for data streams 7/12
![Page 81: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/81.jpg)
Sampling Applications
stream:
sampling
samples:
determine selectivity of query on samples
To get (1± ε)-estimate with probability 1− δ: What sample size s ?
Query selects mc stream items ⇒ Exp[ s+ ] = s
c
Chernoff-Hoeffding-Ineq.: Pr [ |s+ − Exp[ s+ ]| > ε · s+ ] ≤ e−Θ(ε2s)
Mariano Zelke Basic algorithmic techniques for data streams 7/12
![Page 82: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/82.jpg)
Sampling Applications
stream:
sampling
samples:
determine selectivity of query on samples
To get (1± ε)-estimate with probability 1− δ: What sample size s ?
Query selects mc stream items ⇒ Exp[ s+ ] = s
c
Chernoff-Hoeffding-Ineq.: Pr [ |s+ − Exp[ s+ ]| > ε · s+ ] ≤ e−Θ(ε2s)
sample O(
1ε2 · log 1
δ
)items
Mariano Zelke Basic algorithmic techniques for data streams 7/12
![Page 83: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/83.jpg)
Sampling Applications
stream:
sampling
samples:
determine selectivity of query on samples
To get (1± ε)-estimate with probability 1− δ: What sample size s ?
Query selects mc stream items ⇒ Exp[ s+ ] = s
c
Chernoff-Hoeffding-Ineq.: Pr [ |s+ − Exp[ s+ ]| > ε · s+ ] ≤ e−Θ(ε2s)
Memory usage: O(
1ε2 · log 1
δ
)items × log n bits
Mariano Zelke Basic algorithmic techniques for data streams 7/12
![Page 84: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/84.jpg)
Sampling Applications
stream:
sampling
samples:
determine selectivity of query on samples
To get (1± ε)-estimate with probability 1− δ: What sample size s ?
Query selects mc stream items ⇒ Exp[ s+ ] = s
c
Chernoff-Hoeffding-Ineq.: Pr [ |s+ − Exp[ s+ ]| > ε · s+ ] ≤ e−Θ(ε2s)
Memory usage: O(
1ε2 · log 1
δ · log n)
bits
Mariano Zelke Basic algorithmic techniques for data streams 7/12
![Page 85: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/85.jpg)
Sampling Applications
stream:
sampling
samples:
Mariano Zelke Basic algorithmic techniques for data streams 7/12
![Page 86: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/86.jpg)
Sampling Applications
stream:
sampling
samples:
Goal: find the median of the stream
Mariano Zelke Basic algorithmic techniques for data streams 7/12
![Page 87: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/87.jpg)
Sampling Applications
stream:
sampling
samples:
sort samples
sorted
samples: ≤ ≤ ≤ ≤ ≤ ≤ ≤ ≤ ≤ ≤
Mariano Zelke Basic algorithmic techniques for data streams 7/12
![Page 88: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/88.jpg)
Sampling Applications
stream:
sampling
samples:
sort samples
sorted
samples: ≤ ≤ ≤ ≤ ≤ ≤ ≤ ≤ ≤ ≤
Pick median of samples as an estimate of stream’s median
median of samples
Mariano Zelke Basic algorithmic techniques for data streams 7/12
![Page 89: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/89.jpg)
Sampling Applications
stream:
sampling
samples:
sort samples
sorted
samples: ≤ ≤ ≤ ≤ ≤ ≤ ≤ ≤ ≤ ≤median of samples
Pick median of samples as an estimate of stream’s median
To get (1± ε)-estimate with probability 1− δ:
sample O(
1ε2 · log 1
δ
)items
Mariano Zelke Basic algorithmic techniques for data streams 7/12
![Page 90: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/90.jpg)
AMS Sampling [Alon, Matias, Szegedy ’96]
Stream a1, a2, a3, . . . , am
Frequency of an item: f i =∣∣j : a j = i
∣∣kth frequency moment Fk =
n∑i=1
f ki
Frequency moments provide useful statistics:
I F0: number of distinct elements in streamI F1: length of stream, mI F2: size of self joinI Fk , k ≥ 2: skew of distribution
Trivial determination of Fk : maintain counters for each f i
⇒ Ω(n) bits needed
Mariano Zelke Basic algorithmic techniques for data streams 8/12
![Page 91: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/91.jpg)
AMS Sampling [Alon, Matias, Szegedy ’96]
Stream a1, a2, a3, . . . , am
Frequency of an item: f i =∣∣j : a j = i
∣∣kth frequency moment Fk =
n∑i=1
f ki
Frequency moments provide useful statistics:
I F0: number of distinct elements in streamI F1: length of stream, mI F2: size of self joinI Fk , k ≥ 2: skew of distribution
Trivial determination of Fk : maintain counters for each f i
⇒ Ω(n) bits needed
Mariano Zelke Basic algorithmic techniques for data streams 8/12
![Page 92: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/92.jpg)
AMS Sampling [Alon, Matias, Szegedy ’96]
Stream a1, a2, a3, . . . , am
Frequency of an item: f i =∣∣j : a j = i
∣∣
kth frequency moment Fk =n∑
i=1f ki
Frequency moments provide useful statistics:
I F0: number of distinct elements in streamI F1: length of stream, mI F2: size of self joinI Fk , k ≥ 2: skew of distribution
Trivial determination of Fk : maintain counters for each f i
⇒ Ω(n) bits needed
Mariano Zelke Basic algorithmic techniques for data streams 8/12
![Page 93: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/93.jpg)
AMS Sampling [Alon, Matias, Szegedy ’96]
Stream a1, a2, a3, . . . , am
Frequency of an item: f i =∣∣j : a j = i
∣∣Example: stream 2, 3, 3, 2, 3, 1, 2, 3
f 1 = 1, f 2 = 3, f 3 = 4
Frequency moments provide useful statistics:
I F0: number of distinct elements in streamI F1: length of stream, mI F2: size of self joinI Fk , k ≥ 2: skew of distribution
Trivial determination of Fk : maintain counters for each f i
⇒ Ω(n) bits needed
Mariano Zelke Basic algorithmic techniques for data streams 8/12
![Page 94: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/94.jpg)
AMS Sampling [Alon, Matias, Szegedy ’96]
Stream a1, a2, a3, . . . , am
Frequency of an item: f i =∣∣j : a j = i
∣∣kth frequency moment Fk =
n∑i=1
f ki
Frequency moments provide useful statistics:
I F0: number of distinct elements in streamI F1: length of stream, mI F2: size of self joinI Fk , k ≥ 2: skew of distribution
Trivial determination of Fk : maintain counters for each f i
⇒ Ω(n) bits needed
Mariano Zelke Basic algorithmic techniques for data streams 8/12
![Page 95: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/95.jpg)
AMS Sampling [Alon, Matias, Szegedy ’96]
Stream a1, a2, a3, . . . , am
Frequency of an item: f i =∣∣j : a j = i
∣∣kth frequency moment Fk =
n∑i=1
f ki
Frequency moments provide useful statistics:I F0: number of distinct elements in streamI F1: length of stream, mI F2: size of self joinI Fk , k ≥ 2: skew of distribution
Trivial determination of Fk : maintain counters for each f i
⇒ Ω(n) bits needed
Mariano Zelke Basic algorithmic techniques for data streams 8/12
![Page 96: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/96.jpg)
AMS Sampling [Alon, Matias, Szegedy ’96]
Stream a1, a2, a3, . . . , am
Frequency of an item: f i =∣∣j : a j = i
∣∣kth frequency moment Fk =
n∑i=1
f ki
Frequency moments provide useful statistics:I F0: number of distinct elements in streamI F1: length of stream, mI F2: size of self joinI Fk , k ≥ 2: skew of distribution
Trivial determination of Fk : maintain counters for each f i
⇒ Ω(n) bits needed
Mariano Zelke Basic algorithmic techniques for data streams 8/12
![Page 97: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/97.jpg)
AMS Sampling [Alon, Matias, Szegedy ’96]
Stream a1, a2, a3, . . . , am
Frequency of an item: f i =∣∣j : a j = i
∣∣kth frequency moment Fk =
n∑i=1
f ki
Frequency moments provide useful statistics:I F0: number of distinct elements in streamI F1: length of stream, mI F2: size of self joinI Fk , k ≥ 2: skew of distribution
Trivial determination of Fk : maintain counters for each f i
⇒ Ω(n) bits needed
Mariano Zelke Basic algorithmic techniques for data streams 8/12
![Page 98: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/98.jpg)
AMS Sampling [Alon, Matias, Szegedy ’96]
AMS Sampling (for estimating Fk)
1. Pick random item aj from stream2. Compute r =
∣∣j ′ : j ′ ≥ j , aj′ = aj∣∣
3. At the end of the stream calculate
Mariano Zelke Basic algorithmic techniques for data streams 8/12
![Page 99: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/99.jpg)
AMS Sampling [Alon, Matias, Szegedy ’96]
AMS Sampling (for estimating Fk)1. Pick random item aj from stream
2. Compute r =∣∣j ′ : j ′ ≥ j , aj′ = aj
∣∣3. At the end of the stream calculate
Mariano Zelke Basic algorithmic techniques for data streams 8/12
![Page 100: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/100.jpg)
AMS Sampling [Alon, Matias, Szegedy ’96]
AMS Sampling (for estimating Fk)1. Pick random item aj from stream2. Compute r =
∣∣j ′ : j ′ ≥ j , aj′ = aj∣∣
3. At the end of the stream calculate
Mariano Zelke Basic algorithmic techniques for data streams 8/12
![Page 101: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/101.jpg)
AMS Sampling [Alon, Matias, Szegedy ’96]
AMS Sampling (for estimating Fk)1. Pick random item aj from stream2. Compute r =
∣∣j ′ : j ′ ≥ j , aj′ = aj∣∣
Example: stream 2, 3, 3, 2, 3, 1, 2, 3
aj = 3, r = 3
3. At the end of the stream calculate
Mariano Zelke Basic algorithmic techniques for data streams 8/12
![Page 102: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/102.jpg)
AMS Sampling [Alon, Matias, Szegedy ’96]
AMS Sampling (for estimating Fk)1. Pick random item aj from stream2. Compute r =
∣∣j ′ : j ′ ≥ j , aj′ = aj∣∣
3. At the end of the stream calculate
m(rk − (r − 1)k)
Mariano Zelke Basic algorithmic techniques for data streams 8/12
![Page 103: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/103.jpg)
AMS Sampling [Alon, Matias, Szegedy ’96]
AMS Sampling (for estimating Fk)1. Pick random item aj from stream2. Compute r =
∣∣j ′ : j ′ ≥ j , aj′ = aj∣∣
3. At the end of the stream calculate
m(rk − (r − 1)k)
for k ≥ 1 : Exp[m(rk − (r − 1)k) ]
Mariano Zelke Basic algorithmic techniques for data streams 8/12
![Page 104: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/104.jpg)
AMS Sampling [Alon, Matias, Szegedy ’96]
AMS Sampling (for estimating Fk)1. Pick random item aj from stream2. Compute r =
∣∣j ′ : j ′ ≥ j , aj′ = aj∣∣
3. At the end of the stream calculate
m(rk − (r − 1)k)
for k ≥ 1 : Exp[m(rk − (r − 1)k) ]
=[m(f k1 −(f1−1)k)
Mariano Zelke Basic algorithmic techniques for data streams 8/12
![Page 105: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/105.jpg)
AMS Sampling [Alon, Matias, Szegedy ’96]
AMS Sampling (for estimating Fk)1. Pick random item aj from stream2. Compute r =
∣∣j ′ : j ′ ≥ j , aj′ = aj∣∣
3. At the end of the stream calculate
m(rk − (r − 1)k)
for k ≥ 1 : Exp[m(rk − (r − 1)k) ]
=[m(f k1 −(f1−1)k) + m((f1−1)k−(f1−2)k)
Mariano Zelke Basic algorithmic techniques for data streams 8/12
![Page 106: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/106.jpg)
AMS Sampling [Alon, Matias, Szegedy ’96]
AMS Sampling (for estimating Fk)1. Pick random item aj from stream2. Compute r =
∣∣j ′ : j ′ ≥ j , aj′ = aj∣∣
3. At the end of the stream calculate
m(rk − (r − 1)k)
for k ≥ 1 : Exp[m(rk − (r − 1)k) ]
=[m(f k1 −(f1−1)k) + m((f1−1)k−(f1−2)k) +. . .+ m(2k−1k)
Mariano Zelke Basic algorithmic techniques for data streams 8/12
![Page 107: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/107.jpg)
AMS Sampling [Alon, Matias, Szegedy ’96]
AMS Sampling (for estimating Fk)1. Pick random item aj from stream2. Compute r =
∣∣j ′ : j ′ ≥ j , aj′ = aj∣∣
3. At the end of the stream calculate
m(rk − (r − 1)k)
for k ≥ 1 : Exp[m(rk − (r − 1)k) ]
=[m(f k1 −(f1−1)k) + m((f1−1)k−(f1−2)k) +. . .+ m(2k−1k) + m · 1k
Mariano Zelke Basic algorithmic techniques for data streams 8/12
![Page 108: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/108.jpg)
AMS Sampling [Alon, Matias, Szegedy ’96]
AMS Sampling (for estimating Fk)1. Pick random item aj from stream2. Compute r =
∣∣j ′ : j ′ ≥ j , aj′ = aj∣∣
3. At the end of the stream calculate
m(rk − (r − 1)k)
for k ≥ 1 : Exp[m(rk − (r − 1)k) ]
=[m(f k1 −(f1−1)k) + m((f1−1)k−(f1−2)k) +. . .+ m(2k−1k) + m · 1k
+m(f k2 −(f2−1)k) + m((f2−1)k−(f2−2)k) +. . .+ m(2k−1k) + m · 1k
Mariano Zelke Basic algorithmic techniques for data streams 8/12
![Page 109: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/109.jpg)
AMS Sampling [Alon, Matias, Szegedy ’96]
AMS Sampling (for estimating Fk)1. Pick random item aj from stream2. Compute r =
∣∣j ′ : j ′ ≥ j , aj′ = aj∣∣
3. At the end of the stream calculate
m(rk − (r − 1)k)
for k ≥ 1 : Exp[m(rk − (r − 1)k) ]
=[m(f k1 −(f1−1)k) + m((f1−1)k−(f1−2)k) +. . .+ m(2k−1k) + m · 1k
+m(f k2 −(f2−1)k) + m((f2−1)k−(f2−2)k) +. . .+ m(2k−1k) + m · 1k
. . .
+m(f kn −(fn−1)k) + m((fn−1)k−(fn−2)k) +. . .+ m(2k−1k) + m · 1k]
Mariano Zelke Basic algorithmic techniques for data streams 8/12
![Page 110: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/110.jpg)
AMS Sampling [Alon, Matias, Szegedy ’96]
AMS Sampling (for estimating Fk)1. Pick random item aj from stream2. Compute r =
∣∣j ′ : j ′ ≥ j , aj′ = aj∣∣
3. At the end of the stream calculate
m(rk − (r − 1)k)
for k ≥ 1 : Exp[m(rk − (r − 1)k) ]
=[m(f k1 −(f1−1)k) + m((f1−1)k−(f1−2)k) +. . .+ m(2k−1k) + m · 1k
+m(f k2 −(f2−1)k) + m((f2−1)k−(f2−2)k) +. . .+ m(2k−1k) + m · 1k
. . .
+m(f kn −(fn−1)k) + m((fn−1)k−(fn−2)k) +. . .+ m(2k−1k) + m · 1k]
1m
Mariano Zelke Basic algorithmic techniques for data streams 8/12
![Page 111: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/111.jpg)
AMS Sampling [Alon, Matias, Szegedy ’96]
AMS Sampling (for estimating Fk)1. Pick random item aj from stream2. Compute r =
∣∣j ′ : j ′ ≥ j , aj′ = aj∣∣
3. At the end of the stream calculate
m(rk − (r − 1)k)
for k ≥ 1 : Exp[m(rk − (r − 1)k) ]
= f k1 + f k2 + f k3 + . . . + f kn
Mariano Zelke Basic algorithmic techniques for data streams 8/12
![Page 112: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/112.jpg)
AMS Sampling [Alon, Matias, Szegedy ’96]
AMS Sampling (for estimating Fk)1. Pick random item aj from stream2. Compute r =
∣∣j ′ : j ′ ≥ j , aj′ = aj∣∣
3. At the end of the stream calculate
m(rk − (r − 1)k)
for k ≥ 1 : Exp[m(rk − (r − 1)k) ]
= f k1 + f k2 + f k3 + . . . + f kn = Fk
Mariano Zelke Basic algorithmic techniques for data streams 8/12
![Page 113: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/113.jpg)
AMS Sampling [Alon, Matias, Szegedy ’96]
AMS Sampling (for estimating Fk)1. Pick random item aj from stream2. Compute r =
∣∣j ′ : j ′ ≥ j , aj′ = aj∣∣
3. At the end of the stream calculate
m(rk − (r − 1)k)
To get (1± ε)-estimate of Fk with probability 1− δ:
Mariano Zelke Basic algorithmic techniques for data streams 8/12
![Page 114: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/114.jpg)
AMS Sampling [Alon, Matias, Szegedy ’96]
AMS Sampling (for estimating Fk)1. Pick random item aj from stream2. Compute r =
∣∣j ′ : j ′ ≥ j , aj′ = aj∣∣
3. At the end of the stream calculate
m(rk − (r − 1)k)
To get (1± ε)-estimate of Fk with probability 1− δ:
Run O(n1−1/k
ε2
)parallel instances, take average A
Mariano Zelke Basic algorithmic techniques for data streams 8/12
![Page 115: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/115.jpg)
AMS Sampling [Alon, Matias, Szegedy ’96]
AMS Sampling (for estimating Fk)1. Pick random item aj from stream2. Compute r =
∣∣j ′ : j ′ ≥ j , aj′ = aj∣∣
3. At the end of the stream calculate
m(rk − (r − 1)k)
To get (1± ε)-estimate of Fk with probability 1− δ:
Run O(n1−1/k
ε2
)parallel instances, take average A
⇒ Chebyshev-Ineq.: Pr [ |A− Fk | > ε · Fk ] ≤ p < 12
Mariano Zelke Basic algorithmic techniques for data streams 8/12
![Page 116: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/116.jpg)
AMS Sampling [Alon, Matias, Szegedy ’96]
AMS Sampling (for estimating Fk)1. Pick random item aj from stream2. Compute r =
∣∣j ′ : j ′ ≥ j , aj′ = aj∣∣
3. At the end of the stream calculate
m(rk − (r − 1)k)
To get (1± ε)-estimate of Fk with probability 1− δ:
Run O(n1−1/k
ε2
)parallel instances, take average A
⇒ Chebyshev-Ineq.: Pr [ |A− Fk | > ε · Fk ] ≤ p < 12
Take median M over O(log 1
δ
)such averages
Mariano Zelke Basic algorithmic techniques for data streams 8/12
![Page 117: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/117.jpg)
AMS Sampling [Alon, Matias, Szegedy ’96]
AMS Sampling (for estimating Fk)1. Pick random item aj from stream2. Compute r =
∣∣j ′ : j ′ ≥ j , aj′ = aj∣∣
3. At the end of the stream calculate
m(rk − (r − 1)k)
To get (1± ε)-estimate of Fk with probability 1− δ:
Run O(n1−1/k
ε2
)parallel instances, take average A
⇒ Chebyshev-Ineq.: Pr [ |A− Fk | > ε · Fk ] ≤ p < 12
Take median M over O(log 1
δ
)such averages
⇒ Chernoff-Ineq.: Pr [ |M − Fk | > ε · Fk ] ≤ δ
Mariano Zelke Basic algorithmic techniques for data streams 8/12
![Page 118: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/118.jpg)
AMS Sampling [Alon, Matias, Szegedy ’96]
AMS Sampling (for estimating Fk)1. Pick random item aj from stream2. Compute r =
∣∣j ′ : j ′ ≥ j , aj′ = aj∣∣
3. At the end of the stream calculate
m(rk − (r − 1)k)
To get (1± ε)-estimate of Fk with probability 1− δ:
Run O(n1−1/k
ε2
)parallel instances, take average A
⇒ Chebyshev-Ineq.: Pr [ |A− Fk | > ε · Fk ] ≤ p < 12
Take median M over O(log 1
δ
)such averages
⇒ Chernoff-Ineq.: Pr [ |M − Fk | > ε · Fk ] ≤ δ
Memory consumption: O(n1−1/k
ε2 · log 1δ · (log n + logm)
)bits
Mariano Zelke Basic algorithmic techniques for data streams 8/12
![Page 119: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/119.jpg)
Sliding Window Sampling
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 120: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/120.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 121: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/121.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 122: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/122.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
current sample: a2
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 123: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/123.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
current sample: a2
Current sample might be ”far away“ from actual item
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 124: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/124.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
current sample: a2
Current sample might be ”far away“ from actual item
- fine for some applications
- for others only w recent items matter
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 125: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/125.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
current sample: a2
Current sample might be ”far away“ from actual item
- fine for some applications
- for others only w recent items matter
⇒ only consider items in a sliding window of size w
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 126: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/126.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Current sample might be ”far away“ from actual item
- fine for some applications
- for others only w recent items matter
⇒ only consider items in a sliding window of size w
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 127: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/127.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Current sample might be ”far away“ from actual item
- fine for some applications
- for others only w recent items matter
⇒ only consider items in a sliding window of size w
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 128: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/128.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Current sample might be ”far away“ from actual item
- fine for some applications
- for others only w recent items matter
⇒ only consider items in a sliding window of size w
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 129: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/129.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Current sample might be ”far away“ from actual item
- fine for some applications
- for others only w recent items matter
⇒ only consider items in a sliding window of size w
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 130: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/130.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Current sample might be ”far away“ from actual item
- fine for some applications
- for others only w recent items matter
⇒ only consider items in a sliding window of size w
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 131: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/131.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Current sample might be ”far away“ from actual item
- fine for some applications
- for others only w recent items matter
⇒ only consider items in a sliding window of size w
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 132: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/132.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Current sample might be ”far away“ from actual item
- fine for some applications
- for others only w recent items matter
⇒ only consider items in a sliding window of size w
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 133: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/133.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Current sample might be ”far away“ from actual item
- fine for some applications
- for others only w recent items matter
⇒ only consider items in a sliding window of size w
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 134: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/134.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Current sample might be ”far away“ from actual item
- fine for some applications
- for others only w recent items matter
⇒ only consider items in a sliding window of size w
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 135: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/135.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Current sample might be ”far away“ from actual item
- fine for some applications
- for others only w recent items matter
⇒ only consider items in a sliding window of size w
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 136: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/136.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Goal: sample an item uniformly from sliding window of size w
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 137: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/137.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Goal: sample an item uniformly from sliding window of size w
Trivial: - memorize whole window content ⇒ w · log n bits
- impractical if w is large
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 138: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/138.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Goal: sample an item uniformly from sliding window of size w
Trivial: - memorize whole window content ⇒ w · log n bits
- impractical if w is large
Better idea:
Algorithm:
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 139: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/139.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Goal: sample an item uniformly from sliding window of size w
Trivial: - memorize whole window content ⇒ w · log n bits
- impractical if w is large
Better idea:
Algorithm: 1: For each ai pick random value ri ∈ (0, 1)
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 140: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/140.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Goal: sample an item uniformly from sliding window of size w
Trivial: - memorize whole window content ⇒ w · log n bits
- impractical if w is large
Better idea:
Algorithm: 1: For each ai pick random value ri ∈ (0, 1)
2: In window (ai−w+1, . . . , ai ) choose aj with smallest rj
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 141: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/141.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Goal: sample an item uniformly from sliding window of size w
Trivial: - memorize whole window content ⇒ w · log n bits
- impractical if w is large
Better idea:
Algorithm: 1: For each ai pick random value ri ∈ (0, 1)
2: In window (ai−w+1, . . . , ai ) choose aj with smallest rj
3: Only maintain items in window whose r -value is
3: minimal among subsequent r -values
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 142: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/142.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Algorithm: 1: For each ai pick random value ri ∈ (0, 1)
2: In window (ai−w+1, . . . , ai ) choose aj with smallest rj
3: Only maintain items in window whose r -value is
3: minimal among subsequent r -values
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 143: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/143.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Algorithm: 1: For each ai pick random value ri ∈ (0, 1)
2: In window (ai−w+1, . . . , ai ) choose aj with smallest rj
3: Only maintain items in window whose r -value is
3: minimal among subsequent r -values
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 144: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/144.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Algorithm: 1: For each ai pick random value ri ∈ (0, 1)
2: In window (ai−w+1, . . . , ai ) choose aj with smallest rj
3: Only maintain items in window whose r -value is
3: minimal among subsequent r -values
random value r 1 = 0.2
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 145: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/145.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Algorithm: 1: For each ai pick random value ri ∈ (0, 1)
2: In window (ai−w+1, . . . , ai ) choose aj with smallest rj
3: Only maintain items in window whose r -value is
3: minimal among subsequent r -values
random value r 1 = 0.2
a1
0.2
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 146: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/146.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Algorithm: 1: For each ai pick random value ri ∈ (0, 1)
2: In window (ai−w+1, . . . , ai ) choose aj with smallest rj
3: Only maintain items in window whose r -value is
3: minimal among subsequent r -values
current
sample
random value r 1 = 0.2
a1
0.2
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 147: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/147.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Algorithm: 1: For each ai pick random value ri ∈ (0, 1)
2: In window (ai−w+1, . . . , ai ) choose aj with smallest rj
3: Only maintain items in window whose r -value is
3: minimal among subsequent r -values
current
sample
random value r 2 = 0.4
a1
0.2
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 148: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/148.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Algorithm: 1: For each ai pick random value ri ∈ (0, 1)
2: In window (ai−w+1, . . . , ai ) choose aj with smallest rj
3: Only maintain items in window whose r -value is
3: minimal among subsequent r -values
current
sample
random value r 2 = 0.4
a1
0.2
a2
0.4
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 149: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/149.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Algorithm: 1: For each ai pick random value ri ∈ (0, 1)
2: In window (ai−w+1, . . . , ai ) choose aj with smallest rj
3: Only maintain items in window whose r -value is
3: minimal among subsequent r -values
current
sample
random value r 2 = 0.4
a1
0.2
a2
0.4
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 150: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/150.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Algorithm: 1: For each ai pick random value ri ∈ (0, 1)
2: In window (ai−w+1, . . . , ai ) choose aj with smallest rj
3: Only maintain items in window whose r -value is
3: minimal among subsequent r -values
current
sample
random value r 3 = 0.7
a1
0.2
a2
0.4
a3
0.7
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 151: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/151.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Algorithm: 1: For each ai pick random value ri ∈ (0, 1)
2: In window (ai−w+1, . . . , ai ) choose aj with smallest rj
3: Only maintain items in window whose r -value is
3: minimal among subsequent r -values
current
sample
random value r 4 = 0.8
a1
0.2
a2
0.4
a3
0.7
a4
0.8
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 152: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/152.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Algorithm: 1: For each ai pick random value ri ∈ (0, 1)
2: In window (ai−w+1, . . . , ai ) choose aj with smallest rj
3: Only maintain items in window whose r -value is
3: minimal among subsequent r -values
current
sample
random value r 5 = 0.6
a1
0.2
a2
0.4
a3
0.7
a4
0.8
a5
0.6
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 153: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/153.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Algorithm: 1: For each ai pick random value ri ∈ (0, 1)
2: In window (ai−w+1, . . . , ai ) choose aj with smallest rj
3: Only maintain items in window whose r -value is
3: minimal among subsequent r -values
current
sample
random value r 5 = 0.6
a1
0.2
a2
0.4
a3
0.7
a4
0.8
a5
0.6
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 154: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/154.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Algorithm: 1: For each ai pick random value ri ∈ (0, 1)
2: In window (ai−w+1, . . . , ai ) choose aj with smallest rj
3: Only maintain items in window whose r -value is
3: minimal among subsequent r -values
current
sample
random value r 5 = 0.6
a1
0.2
a2
0.4
a3
0.7
a4
0.8
a5
0.6
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 155: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/155.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Algorithm: 1: For each ai pick random value ri ∈ (0, 1)
2: In window (ai−w+1, . . . , ai ) choose aj with smallest rj
3: Only maintain items in window whose r -value is
3: minimal among subsequent r -values
current
sample
random value r 5 = 0.6
a1
0.2
a2
0.4
a5
0.6
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 156: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/156.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Algorithm: 1: For each ai pick random value ri ∈ (0, 1)
2: In window (ai−w+1, . . . , ai ) choose aj with smallest rj
3: Only maintain items in window whose r -value is
3: minimal among subsequent r -values
current
sample
random value r 6 = 0.5
a1
0.2
a2
0.4
a5
0.6
a6
0.5
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 157: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/157.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Algorithm: 1: For each ai pick random value ri ∈ (0, 1)
2: In window (ai−w+1, . . . , ai ) choose aj with smallest rj
3: Only maintain items in window whose r -value is
3: minimal among subsequent r -values
current
sample
random value r 6 = 0.5
a1
0.2
a2
0.4
a5
0.6
a6
0.5
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 158: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/158.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Algorithm: 1: For each ai pick random value ri ∈ (0, 1)
2: In window (ai−w+1, . . . , ai ) choose aj with smallest rj
3: Only maintain items in window whose r -value is
3: minimal among subsequent r -values
current
sample
random value r 6 = 0.5
a1
0.2
a2
0.4
a6
0.5
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 159: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/159.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Algorithm: 1: For each ai pick random value ri ∈ (0, 1)
2: In window (ai−w+1, . . . , ai ) choose aj with smallest rj
3: Only maintain items in window whose r -value is
3: minimal among subsequent r -values
current
sample
random value r 6 = 0.5
a1
0.2
a2
0.4
a6
0.5
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 160: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/160.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Algorithm: 1: For each ai pick random value ri ∈ (0, 1)
2: In window (ai−w+1, . . . , ai ) choose aj with smallest rj
3: Only maintain items in window whose r -value is
3: minimal among subsequent r -values
current
sample
random value r 6 = 0.5
a2
0.4
a6
0.5
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 161: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/161.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
current
sample
axrx
a5
0.6
ayry
· · ·
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 162: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/162.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Analysis: What is the length of ` ?
current
sample
axrx
a5
0.6
ayry
· · ·
linked list `
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 163: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/163.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Analysis: What is the length of ` ?
Worst case: |`| = w But that is very unlikely.
current
sample
axrx
a5
0.6
ayry
· · ·
linked list `
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 164: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/164.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Analysis: What is the length of ` ?
Exp[ |`| ] =
current
sample
axrx
a5
0.6
ayry
· · ·
linked list `
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 165: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/165.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Analysis: What is the length of ` ?
Exp[ |`| ] = Pr [ai ∈ ` ] +
current
sample
axrx
a5
0.6
ayry
· · ·
linked list `
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 166: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/166.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Analysis: What is the length of ` ?
Exp[ |`| ] = Pr [ai ∈ ` ] + Pr [ai−1 ∈ ` ] +
current
sample
axrx
a5
0.6
ayry
· · ·
linked list `
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 167: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/167.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Analysis: What is the length of ` ?
Exp[ |`| ] = Pr [ai ∈ ` ] + Pr [ai−1 ∈ ` ] + . . .+ Pr [ai−w+1 ∈ ` ]
current
sample
axrx
a5
0.6
ayry
· · ·
linked list `
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 168: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/168.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Analysis: What is the length of ` ?
Exp[ |`| ] = Pr [ai ∈ ` ] + Pr [ai−1 ∈ ` ] + . . .+ Pr [ai−w+1 ∈ ` ]
= 1 +
current
sample
axrx
a5
0.6
ayry
· · ·
linked list `
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 169: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/169.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Analysis: What is the length of ` ?
Exp[ |`| ] = Pr [ai ∈ ` ] + Pr [ai−1 ∈ ` ] + . . .+ Pr [ai−w+1 ∈ ` ]
= 1 + 12 +
current
sample
axrx
a5
0.6
ayry
· · ·
linked list `
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 170: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/170.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Analysis: What is the length of ` ?
Exp[ |`| ] = Pr [ai ∈ ` ] + Pr [ai−1 ∈ ` ] + . . .+ Pr [ai−w+1 ∈ ` ]
= 1 + 12 + 1
3 +
current
sample
axrx
a5
0.6
ayry
· · ·
linked list `
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 171: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/171.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Analysis: What is the length of ` ?
Exp[ |`| ] = Pr [ai ∈ ` ] + Pr [ai−1 ∈ ` ] + . . .+ Pr [ai−w+1 ∈ ` ]
= 1 + 12 + 1
3 + 14 + . . .+ 1
w
current
sample
axrx
a5
0.6
ayry
· · ·
linked list `
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 172: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/172.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Analysis: What is the length of ` ?
Exp[ |`| ] = Pr [ai ∈ ` ] + Pr [ai−1 ∈ ` ] + . . .+ Pr [ai−w+1 ∈ ` ]
= 1 + 12 + 1
3 + 14 + . . .+ 1
w = O(logw)
current
sample
axrx
a5
0.6
ayry
· · ·
linked list `
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 173: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/173.jpg)
Sliding Window Sampling
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , ai , . . . , am
Analysis:
Exp[ memory usage ] = O(logw · log n)
current
sample
axrx
a5
0.6
ayry
· · ·
linked list `
Mariano Zelke Basic algorithmic techniques for data streams 9/12
![Page 174: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/174.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 175: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/175.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
Point query: fi = ?
Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 176: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/176.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
Point query: fi = ?
Trivial solution: maintain n counters
Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 177: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/177.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
2/ε
Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 178: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/178.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
2/ε
rand. hash funct. h1
Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 179: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/179.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
2/ε
rand. hash funct. h1
Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 180: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/180.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
2/ε
rand. hash funct. h1
Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 181: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/181.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
2/ε
rand. hash funct. h1 +1
Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 182: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/182.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
2/ε
rand. hash funct. h1 +1
Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 183: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/183.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
2/ε
rand. hash funct. h1 +1
Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 184: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/184.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
2/ε
rand. hash funct. h1
Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 185: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/185.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
Point query: fi = ?
2/ε
rand. hash funct. h1
Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 186: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/186.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
Point query: fi = ?
2/ε
rand. hash funct. h1
i
Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 187: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/187.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
Point query: fi = ?
2/ε
rand. hash funct. h1
i
fi
Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 188: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/188.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
Point query: fi = ?
Give fi as estimate for fi
2/ε
rand. hash funct. h1
i
fi
Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 189: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/189.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
Point query: fi = ?
Give fi as estimate for fi
Analysis:
2/ε
rand. hash funct. h1
i
fi
Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 190: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/190.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
Point query: fi = ?
Give fi as estimate for fi
Analysis: fi ≥ fi
2/ε
rand. hash funct. h1
i
fi
Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 191: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/191.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
Point query: fi = ?
Give fi as estimate for fi
Analysis: fi ≥ fi
Exp[ fi ] ≤ fi + ε ·m/2
2/ε
rand. hash funct. h1
i
fi
Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 192: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/192.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
Point query: fi = ?
Give fi as estimate for fi
Analysis: fi ≥ fi
Exp[ fi ] ≤ fi + ε ·m/2
Markov-Ineq.: Pr [ fi > fi + ε ·m ] ≤ 12
2/ε
rand. hash funct. h1
i
fi
Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 193: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/193.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
2/ε
rand. hash funct. h1
Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 194: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/194.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
2/ε
rand. hash funct. h1
rand. hash funct. h2
rand. hash funct. h3
rand. hash funct. hd
d =
log 1δ
Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 195: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/195.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
2/ε
rand. hash funct. h1
rand. hash funct. h2
rand. hash funct. h3
rand. hash funct. hd
d =
log 1δ
Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 196: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/196.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
2/ε
rand. hash funct. h1
rand. hash funct. h2
rand. hash funct. h3
rand. hash funct. hd
d =
log 1δ
+1
+1
+1
+1
Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 197: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/197.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
2/ε
rand. hash funct. h1
rand. hash funct. h2
rand. hash funct. h3
rand. hash funct. hd
d =
log 1δ
+1
+1
+1
+1
Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 198: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/198.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
2/ε
rand. hash funct. h1
rand. hash funct. h2
rand. hash funct. h3
rand. hash funct. hd
d =
log 1δ
Point query: fi = ?
Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 199: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/199.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
2/ε
rand. hash funct. h1
rand. hash funct. h2
rand. hash funct. h3
rand. hash funct. hd
d =
log 1δ
i
Point query: fi = ?
Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 200: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/200.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
2/ε
rand. hash funct. h1
rand. hash funct. h2
rand. hash funct. h3
rand. hash funct. hd
d =
log 1δ
i
Point query: fi = ?
Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 201: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/201.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
2/ε
rand. hash funct. h1
rand. hash funct. h2
rand. hash funct. h3
rand. hash funct. hd
d =
log 1δ
i
Point query: fi = ?
Give fi := minimum of -values as estimate for fi
Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 202: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/202.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
2/ε
rand. hash funct. h1
rand. hash funct. h2
rand. hash funct. h3
rand. hash funct. hd
d =
log 1δ
i
Analysis: Pr [ fi > fi + ε ·m ] ≤(
12
)d= δ
Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 203: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/203.jpg)
Count-Min Sketch [Cormode, Muthukrishnan ’04]
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, . . . , aj , . . . , am
2/ε
rand. hash funct. h1
rand. hash funct. h2
rand. hash funct. h3
rand. hash funct. hd
d =
log 1δ
i
Analysis: Pr [ fi > fi + ε ·m ] ≤(
12
)d= δ
Memory consumption: O(
1ε · log 1
δ · logm + log 1δ · log n
)Mariano Zelke Basic algorithmic techniques for data streams 10/12
![Page 204: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/204.jpg)
Recap
I Reservoir sampling for uniform selection
I AMS sampling for frequency moments
I Sliding window sampling
I Count-Min sketch for point queries
Mariano Zelke Basic algorithmic techniques for data streams 11/12
![Page 205: Basic algorithmic techniques for data streams](https://reader030.vdocuments.us/reader030/viewer/2022012809/61bf4cd74bfa8e1cc516713f/html5/thumbnails/205.jpg)
References
I N. Alon, Y. Matias, and M. Szegedy: The space complexity ofapproximating the frequency moments. J. Comput. Syst. Sci.58, pp. 137-147, 1999.
I G. Cormode and S. Muthukrishnan: An Improved DataStream Summary: The Count-Min Sketch and itsApplications. Journal of Algorithms 55(1), pp. 58-75, 2005.
I B. Lahiri and S. Tirthapura: Stream Sampling. Pages2838-2842 in: Ling Liu, M. Tamer zsu (Eds.): Encyclopedia ofDatabase Systems. Springer US, 2009.
I S. Muthukrishnan: Data Streams: Algorithms andApplications. Foundations and Trends in TheoreticalComputer Science, 1(2), 2005.
I J. Vitter: Random Sampling with a Reservoir. ACMTransactions on Mathematical Software, 11(1), pp. 37-57,1985.
Mariano Zelke Basic algorithmic techniques for data streams 12/12