improve sketching of hamming distance with error correcting
DESCRIPTION
Improve sketching of Hamming Distance with Error Correcting. Ely Porat Bar-Ilan University Google Inc. Ohad Lipsky Bar-Ilan University Check Point Inc. December 2003. Problem Definition (1). Alice. Bob. T A. T B. n. n. hamm(T A ,T B ). Given k - bound on the number of mismatches. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/1.jpg)
Improve sketching of Hamming Distance with Error Correcting
Ely Porat
Bar-Ilan University
Google Inc
Ohad Lipsky
Bar-Ilan University
Check Point Inc
December 2003
![Page 2: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/2.jpg)
Problem Definition (1)Alice Bob
n nTA TB
hamm(TA,TB)
Given k - bound on the number of mismatches
December 2003
![Page 3: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/3.jpg)
Problem Definition (2)
n nTA TB
Calculate hamm(TA,TB) given only SA,SB
SA SB
S S
Finding the mistakes
Given k - bound on the number of mismatches
December 2003
![Page 4: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/4.jpg)
Motivations
• Data Bases
• Internet
• Error Correcting
Router A
Router B
Router C
Router D
December 2003
![Page 5: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/5.jpg)
Outline:
• Simple Solution
• Error Correcting
• Improved Solution
• Improve more
• Recursion
• File sharing
December 2003
![Page 6: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/6.jpg)
Simplest Solution - O(k2log1/)
• Binary Alphabet
• Allocate k2 cells.
• Take the input array and hash each bit to one of the cells.
• In each cell remember the xor of all the values hash to it.
0 1 1 0December 2003
![Page 7: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/7.jpg)
Simplest Solution - O(k2log1/)
1 1 0 0
0 1 0 0
December 2003
![Page 8: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/8.jpg)
Simplest Solution - O(k2log1/)
• Due to the birthday principal:The probability that 2 Error will fallto the same cell < 1/2
• log1/ - to get a probability to fail
0 1 1 0December 2003
![Page 9: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/9.jpg)
Alphabet
• Denote with S the size of the alphabet.• We can encode each latter with it’s unary
representation.
• The only effect is that each mistake will be counted twice.
0 - 1000000….01 - 0100000….0.S-1 - 0000000….1
0 - 1000000….05 - 0000010….0
December 2003
![Page 10: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/10.jpg)
Error correcting - O(k2logNS)
• Here we allocate two kind of k2 cellsk2 of logS bits. k2 of logNS bits.
5 8 3 2
15 6 7 8
C1[h(A[i])]+=A[i]
C2[h(A[i])]+=iA[i]
December 2003
![Page 11: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/11.jpg)
Error correcting - O(k2logNS)
• As before with probability > 1/2 there won’t fall 2 Errors in the same cell.
5 8 3 2
15 6 7 8
C1[h(A[i])]+=A[i]
C1[h(A[i])]+=iA[i]
December 2003
![Page 12: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/12.jpg)
Error correcting - O(k2logNS)
• We get from the red cells:
5 8 3 2
C1[h(A[i])]+=A[i]
5 6 3 2
5
3
8 - 6 = 5 - 3
December 2003
![Page 13: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/13.jpg)
Error correcting - O(k2logNS)
• We get from the blue cells:
15 11 7 5
15 9 7 5
5
3
11 - 9 = 2*(5 - 3) => i=2
C2[h(A[i])]+=iA[i]
0 1 2
December 2003
![Page 14: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/14.jpg)
Error correcting - O(k2logNS)
• The probability to succeed is about 1/2.
• To lower the failer probability we will run it 3 times.
• We will get a list of possible mistakes each time.
• Output all the mistakes that appear in at least 2 of the 3 runs.
December 2003
![Page 15: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/15.jpg)
O(klog2k) - Solution
• The Idea is two stage hashes:
k/logk
w.h.p O(logk)
Bar-Yossef, Jayram, Kumar, Sivakumar 03 December 2003
![Page 16: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/16.jpg)
O(klog2k) - Solution
O(logk)
O(log2k)
The Probability to fail is less then 1/2.
Run it 2logk timesAnd take the max.
=> failer probabilty less then 1/k2
Space = O(log3k)
keep accumulated XOR
Bar-Yossef, Jayram, Kumar, Sivakumar 03 December 2003
![Page 17: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/17.jpg)
O(klog2k) - Solution
k/logkO(log3k) O(log3k) O(log3k) O(log3k)
O(klog2k)
P(Failer) k/logk * 1/k2 < 1/k
Bar-Yossef, Jayram, Kumar, Sivakumar 03 December 2003
![Page 18: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/18.jpg)
O(k2log*klogk) -Idea (recursion)
k/logk
logk/loglogk
Pr(F)<1/logck
logk/loglogk runs, take max
December 2003
![Page 19: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/19.jpg)
Error Correcting O(klogNS)Alice Bob
n nTA TB
r0r1r2…
p=(N3S)
ri random w.p
1
k0 o.w
1 TA riaimod pi0
n 1
1 TB ribimod pi0
n 1
1 TA
1 TA 1 TB 0
rj a j b j random
nomistake
onemistake
more thenone
Constant Probability
December 2003
![Page 20: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/20.jpg)
Error Correcting O(klogNS)Alice Bob
n nTA TB
1 TA riaimod pi0
n 1
1 TB ribimod pi0
n 1
1 TA
1 TA 1 TB 0
rj a j b j random
nomistake
onemistake
more thenone
1' TA iriaimod pi0
n 1
1' TB iribimod pi0
n 1
1' TA
1' TA 1' TB 1 TA 1 TB
jrj a j b j rj a j b j
j
If we wrong w.h.p j>n
December 2003
![Page 21: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/21.jpg)
Error Correcting O(klogNS)Alice Bob
n nTA TB
1' TA 1' TB 1 TA 1 TB
j
rj , aj - bj
December 2003
![Page 22: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/22.jpg)
Error Correcting O(klogNS)Alice Bob
n nTA TB
1 TA ,1' TA
2 TA ,2 ' TA
ck ln k TA ,ck ln k ' TA
O(klnk)
December 2003
![Page 23: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/23.jpg)
RecursionAlice Bob
n nTA TB
1 TA ,1' TA
2 TA ,2 ' TA
ck TA ,ck ' TA
ck
ri random w.p
1
k0 o.w
n nTA TB
1 TA ,1' TA
2 TA ,2 ' TA
ck2TA ,ck
2' TA
ri random w.p
2
k0 o.w
ck
2
December 2003
![Page 24: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/24.jpg)
RecursionAlice Bob
n nTA TB
ck
ri random w.p
1
k0 o.w
ri random w.p
2
k0 o.w
ck
2
ri random w.p
4
k0 o.w
ck
4
ck ck
2ck
4 2ck
O(klogNS)
December 2003
![Page 25: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/25.jpg)
Complexity
n nTA TB
SA SB
S S
Size: O(klogNS)Computing sketch: O(nlogk)Comparing sketches: O(klogk)
December 2003
![Page 26: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/26.jpg)
O(klogk) -Solution
• We can just encode in unary and hash the input to k3 cells and then run the O(klogNS)=O(klogk) algorithm.
December 2003
![Page 27: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/27.jpg)
Reed-Solomon Codes
1 1 1 1
1 2 3 2k
1 22 32 2k 2
1 2n 3n 2k n
a0
a1
a2
an 1
p 1 p 2
p 2k
p x a0 a1x a2x2 an 1x
n 1
We manage to develop a deterministic algorithm based on that.But the encoding and the decoding is slower.
Amir, Farach 95Feigenbaum, Ishai, Malkin, Nissim, Strauss, Wright 01Bar-Yossef, Jayram, Kumar, Sivakumar 03
Efremenko, Porat, Rothschild 06Efremenko, Porat 07
![Page 28: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/28.jpg)
File Sharing
nsource Napster
Source need to stay until someone will have the whole file. (and willing to stay)
There is bottleneck at the end.
![Page 29: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/29.jpg)
File Sharing
nsource emule/kazaa/torrent
The source has to send nlnn blocksbefore disconnecting.
Sometimes there are some bottlenecks
![Page 30: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/30.jpg)
Improved File Sharing - Ver 1
a0a1a2…………….an-1n
source
p x a0 a1x a2x2 an 1x
n 1
ai F2b
0 , p
0 ,
1 , p1 ,
2 , p2 , n6, p n6
n6
![Page 31: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/31.jpg)
Improved File Sharing - Ver 1n6
Each client that got n points can recreate the file
There is no more nlnn
Almost no bottlenecks
![Page 32: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/32.jpg)
Improved File Sharing - Ver 2
ai F2ba0a1a2…………….an-1
nsource
Send linear equations on the file.
r0,0 r0,1 r0,n 1
r1,0 r1,1 r1,n 1
rn 1,0 rn 1,1 rn 1,n 1
Pr success 12b
n 1
2bn
1
2bn 2
2bn
1
2bn i
2bn
1
1
2bn
1 2 b 1
![Page 33: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/33.jpg)
Improved File Sharing - Ver 2
a0a1a2…………….an-1n
source
Problems: 1. Heavy to encode each packet we need to go over all the file.2. Very heavy to decode O(n2) block operation + O(n3) fields operations.
Facts:1. If you get n(1/2-) random combination of two blocks you won’t have dependents w.h.p.2. If you have d - pairs combinations you can easilly reduce your system to n-d variables.
Solution: Use sparse functionals
![Page 34: Improve sketching of Hamming Distance with Error Correcting](https://reader035.vdocuments.us/reader035/viewer/2022062222/568157f0550346895dc568ae/html5/thumbnails/34.jpg)
Improved File Sharing - Ver 2
a0a1a2…………….an-1n
source
Futures: 1. Backward compatibility.2. Even if you don’t have the whole file you can mix functionals.