group testing and new algorithmic applications
DESCRIPTION
Group Testing and New Algorithmic Applications. Ely Porat Bar- Ilan University. Compressive sensing. Theory of Big data. Pattern matching. Distributed. Coding theory. Group testing. Game theory. Theory of Big data. Succinct data structures. Streaming algorithm. Sketching & LSH. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/1.jpg)
Ely Porat
Bar-Ilan University
Group Testing and New Algorithmic Applications
![Page 2: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/2.jpg)
Theory of Big data Pattern matching
Game theoryCoding theory
Compressive sensing
Group testing Distributed
![Page 3: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/3.jpg)
Bloom filters
Theory of Big data
Succinct data structures
Streaming algorithmSketching & LSH
Big Databases
![Page 4: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/4.jpg)
Group Testing Overview
Test soldier for a disease
WWII example: syphillis
![Page 5: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/5.jpg)
Group Testing Overview
Test an army for a disease
WWII example: syphillis
What if only one soldier has the
disease?
Can pool blood samples and
check if at least one soldier has
the disease
![Page 6: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/6.jpg)
More Motivations• Syphilis, HIV [Dor43]• Mapping genomes [BLC91, BBK+95, TJP00]• Quality control in product testing [SG59]• Searching files in storage systems [KS64]• Sequential screening of experimental variables [Li62]• Efficient contention resolution algorithms for multiple access
communication [KS64, Wol85]• Data compression [HL00]• Software testing [BG02, CDFP97]• DNA sequencing [PL94]• Molecular biology [DH00, FKKM97, ND00, BBKT96]
![Page 7: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/7.jpg)
Adaptive group testing
Number of sickd ≤ 2
![Page 8: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/8.jpg)
Adaptive general case
Number of sick≤d
2dAt most d positive => There remain n/2
Run in recursion
n
O(dlog(n/d))
![Page 9: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/9.jpg)
Non adaptive group testing
• All the tests set in advance.
n
t
![Page 10: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/10.jpg)
Non adaptive group testing
n
t
1 0 1 1 0 0 0 1 1 0 100 0 1 0 1 0 1 0 1 0 110 1 0 1 0 1 1 0 0 1 011 0 1 1 0 1 0 1 0 1 001 1 0 1 1 0 0 1 0 0 100 1 0 0 1 0 1 0 1 0 11
110101
0
0
0
1
0
0
0
0
0
1
0
0
=
(and,or) matrix vector multiplication
![Page 11: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/11.jpg)
Non adaptive group testing
1 2 3 n…………
1
2
3
t
.
.
.
1 0 0 1………….
0 0 1 0………….
0 0 0 1………….
1 1 1 0………….
.
.
.
x1
x2
x3
xn
.
.
.
.
.
.
r1
r2
r3
rt
.
.
.
unknown
To be designed
Observed
Upper bound: t=O(d2logn) [PR08]Lower bound: t=Ω(d2logdn) [DR82]
![Page 12: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/12.jpg)
Non adaptive group testing
![Page 13: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/13.jpg)
2-Stage group testing
![Page 14: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/14.jpg)
2-Stage group testing
We misclassified 2 soldiers.
Using O(dlog n/d) measurement.We will misclassified O(d) soldiers,
which we can easily one by one in a second stage
Property of unbalanced expander.
![Page 15: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/15.jpg)
Adaptive vs Non adaptiveIf one test take a day performing.Adaptive testing might take a month
2 stage group testing – take 2 daysTime
Store lessto be check later
![Page 16: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/16.jpg)
Group testing for Pattern Matching
Text:n
Pattern:m
![Page 17: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/17.jpg)
Part of 20M€ consortium project which is supported by MOI (cyber security)
Supported byGroup testing for Pattern Matching
![Page 18: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/18.jpg)
Motivation…• Stock market
![Page 19: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/19.jpg)
Motivation..• Espionage
The rest we monitor
![Page 20: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/20.jpg)
Motivation…• Viruses and malware
Software solutions:Snort: 73.5MbClamAV: 1.48Gb
Using TCAMs:Snort: 680KbClamAV: 25Mb
Our solution (software):Snort: 51KbClamAV: 216Kb
![Page 21: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/21.jpg)
Group testing for Pattern Matching
Text:
Pattern:
• Pattern matching with wildcards – O(nlogm) [CH02]
• Up to k mismatches [CEPR07,CEPR09].
• Sketching hamming distance [PL07,AGGP13].• Pattern matching in the streaming model [PP09]
n
m
![Page 22: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/22.jpg)
Group testing for Pattern Matching
Text:
Pattern:
• Up to k mismatch using group testing
Group testing scheme
Performing the tests is easy.However how can we analyze the results?
![Page 23: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/23.jpg)
Fast DecodingThe naïve decoding take O(nt) time.
![Page 24: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/24.jpg)
Fast DecodingWe perform 3 GT schemes.
1. The original.2. First projection.3. Second projection.
![Page 25: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/25.jpg)
Fast DecodingWe first decode the projections.
Then we check the d2 options naively
In [NPR11] we mange to have scheme With optimal number of measurements
and decode time O(d2log2n). (Using recursion and 2-stage GT)
If we use the scheme of 2 stage GT,We will have 4d2 candidate to check
![Page 26: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/26.jpg)
Faster Decoding
According to LW theorem the number of candidate in the join is d1.5 In [NPRR12] we show how to do join in optimal time.Best paper award
This give a scheme with optimal number of measurements, which can be decode in time O(d1+Ԑpoly(logn))
![Page 27: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/27.jpg)
Compressive Sensing
n
t
2
2
0
10
1
![Page 28: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/28.jpg)
Compressive Sensing
n
t
1 0 1 1 0 0 0 1 1 0 100 0 1 0 1 0 1 0 1 0 110 1 0 1 0 1 1 0 0 1 011 0 1 1 0 1 0 1 0 1 001 1 0 1 1 0 0 1 0 0 100 1 0 0 1 0 1 0 1 0 11
220101
0
0
0
1
0
0
0
0
0
1
0
0
=
![Page 29: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/29.jpg)
Compressive Sensing
n
t
1 0 1 1 0 0 0 1 1 0 100 0 1 0 1 0 1 0 1 0 110 1 0 1 0 1 1 0 0 1 011 0 1 1 0 1 0 1 0 1 001 1 0 1 1 0 0 1 0 0 100 1 0 0 1 0 1 0 1 0 11
13.7
0.1
0.2
0.1
5.8
0.1
0.3
0.1
0.2
0.1
7.3
0.1
0.2
=
13.9
0.7
6.4
1.08.2
![Page 30: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/30.jpg)
Compressive SensingProblem definition
Find a matrix Ф and an algorithm A s.t.:
)(* yAxxyRx n
qdp xxCxx |||*|
qdkxk xxxk
||minarg )(support
In [PS12] we gave the first optimal number of measurement sublinear decoding time.For p=q=1In [GLPS09, GNPRS13] we gave a randomized solution (foreach) for p=q=2 with sublineardecoding.
![Page 31: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/31.jpg)
How Compressive Sensing help Massive Recommender Systems
• Consider designing recommender system for web pages– Time a user examines a page is an implicit rating– Millions of users– Each user examines thousands of pages throughout
the year– Hard to store and process the information
![Page 32: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/32.jpg)
Fingerprint Based Approach
F1a1 C1
F2a2 C2
Fnan Cn
Similarity (ai,aj)...
![Page 33: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/33.jpg)
Sampling Approach
c,l,t
a1 C1
a,c,d,f,h,l,m,n,p,r,s,t
f,m,s
a2 C2
a,b,c,f,h,l,m,n,o,p,r,s
Regular sampling doesn’t work
![Page 34: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/34.jpg)
Minwise hashing approach
h
a1
a,c,d,f,h,l,m,n,p,r,s,t
h
a2
a,b,c,f,h,l,m,n,o,p,r,s
h(x) 5,3, 7,9,2,8
h(x) 5,4, 3,7,2,8
[BHP09,BPR09,BP10,FPS11,FPS12,T13]
![Page 35: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/35.jpg)
Min wise hash function
A B
)(minarg)(minarg xhxh BAxBAx
![Page 36: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/36.jpg)
Min wise hash function
A B
![Page 37: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/37.jpg)
Similarity
A B
We get ±є approximation with probability 1-δ
Min wise independent
![Page 38: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/38.jpg)
Reducing sketching space [BP10]Instead of
Additional pairwise independent hash
It was discover independently by Ping Li and Christian Konig
![Page 39: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/39.jpg)
Reducing sketching space [BP10]
Our algorithm estimates
![Page 40: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/40.jpg)
Reducing sketching space even farther [BP10]
We usually interesting in the case that sets are very similar.Assume J>1-t => p>1-0.5t
A B A-B
0110100101
0100101101
001000-1000
CS 20-2
![Page 41: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/41.jpg)
Reducing sketching space even farther [BP10]
We usually interesting in the case that sets are very similar.Assume J>1-t => p>1-0.5t
A B A xor B
0110100101
0100101101
0010001000
CS 101
This give an improvement of2
2log2
tt
![Page 42: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/42.jpg)
Removing the min wise independent requirement [BP11]
• [KNW10] gave bits sketch for distinct count (F0)
• Their sketch is not linear – However given S(A) and S(B) one can calculate
S(A+B) (that will give the size of the union)
1log1
2O
![Page 43: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/43.jpg)
Removing the min wise independent requirement [BP11]
BABABA
BABA
J
)(~
OJ
BABABA
J
Using F2 instead of F0 we managed to reduce the sketch size to
tt
O 1log1log)(
12
Using more randomness we mange to remove factor t1log
![Page 44: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/44.jpg)
File sharingThe naïve way
Supported by
![Page 45: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/45.jpg)
File sharingTorrent/Emule/Kazaa
![Page 46: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/46.jpg)
File sharingSource:
Clients:
Coupon collector O(nlogn)In practice it could be 7Gb instead 1Gb
![Page 47: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/47.jpg)
Network coding
![Page 48: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/48.jpg)
Network coding
1 2 i nSource:
Client 1: 3X7+2X17, 5X2+X5+4X10, ....Client 2: 2X1+3X3+X17, ....Client 3: Client 4:
In a big field, n linear combinations will sufficeWe require 1Gb upload for 1Gb file
![Page 49: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/49.jpg)
PoisonTorrent/Emule/Kaza
![Page 50: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/50.jpg)
Signatures against poison
MD5
Si
.torrent file
S1S2...Sn
1 2 i n
We might receive poisoned packetBut we won't forward it
![Page 51: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/51.jpg)
Signatures in network coding
MD5
Si
.torrent fileS1,S2,...Sn,S(X1+X2),S(X1+X3),.......
1 2 i n
There are exponential number of options
![Page 52: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/52.jpg)
Zhao - Homomorphic signature
1 2 n
1
2
n
1 0 ... 0
0 1 ... 0
. . . .
0 0 ... 1
M=
We can find a vector u s.t. Mu=0
A correct packet v will be orthogonal to u<v,u>=0
![Page 53: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/53.jpg)
Zhao - Homomorphic signatureWe can find a vector u s.t. Mu=0
A correct packet v will be orthogonal to u<v,u>=0
But if Eve know u then she can find v which is orthogonal to u.
Solution:Instead of sending u to everyone send vector
![Page 54: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/54.jpg)
Zhao - Homomorphic signature
Given v which is a linear combination of the files packets
It require n+m power operations.In practice it take more time then downloading
![Page 55: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/55.jpg)
Selective verification [PW12]
S'i
Packeti
S''i
If we have both signatures we can choose randomly which to check
![Page 56: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/56.jpg)
Problem
Eve can combine signatures
![Page 57: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/57.jpg)
Solution
Use a linear error correcting code.
12
n
1 0 ... 00 1 ... 0. . . .0 0 ... 1
We perform Zhao signature on each block
![Page 58: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/58.jpg)
Analysis
q^n – True combinations
12
n
1 0 ... 00 1 ... 0. . . .0 0 ... 1
=defective (for our GT)
![Page 59: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/59.jpg)
Analysis
Pr[one block pass the test]<qn/qdn=q-(d-1)n
Pr[r/2 out of r pass the test]< 2rq-(d-1)r/2
dnn+m
r1 2
![Page 60: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/60.jpg)
Analysis
dnn+m
r1 2
Using union bound: the probability that a bad packet exist is bounded by q(n+m)+r/log q-(d-1)nr
Pr[one block pass the test]<qn/qdn=q-(d-1)n
Pr[r/2 out of r pass the test]< 2rq-(d-1)r/2
In practice we improve Zhao signature by a factor of 60.
![Page 61: Group Testing and New Algorithmic Applications](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816945550346895de0cebf/html5/thumbnails/61.jpg)
Conclusion
• Group testing/Compressive sensing is very effective tool.
• We improved both construction and achieved sublinear decoding time.
• Surprising important applications.