recovering data in presence of malicious errors atri rudra university at buffalo, suny

57
Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

Upload: emmanuel-pellman

Post on 01-Apr-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

Recovering Data in Presence of Malicious Errors

Atri RudraUniversity at Buffalo, SUNY

Page 2: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

2

The setupC(x)

x

y = C(x)+error

x Give up

Mapping C Error-correcting code or just code Encoding: x C(x) Decoding: y X C(x) is a codeword

Page 3: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

3

Codes are useful!

CellphonesSatellite Broadcast Deep-space

communicationInternet

CDs/DVDs RAID ECC MemoryPaper Bar-codes

Page 4: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

4

Redundancy vs. Error-correction Repetition code: Repeat every bit say 100

times Good error correcting properties Too much redundancy

Parity code: Add a parity bit Minimum amount of redundancy Bad error correcting properties

Two errors go completely undetected

Neither of these codes are satisfactory

1 1 1 0 0 1

1 0 0 0 0 1

Page 5: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

5

Two main challenges in coding theory Problem with parity example

Messages mapped to codewords which do not differ in many places

Need to pick a lot of codewords that differ a lot from each other

Efficient decoding Naive algorithm: check received word with all

codewords

Page 6: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

6

The fundamental tradeoff

Correct as many errors as possible with as little redundancy as possible

This talk: Answer is yes

Can one achieve the “optimal” tradeoff with efficient encoding and decoding ?

Page 7: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

7

Overview of the talk Specify the setup

The model What is the optimal tradeoff ?

Previous work Construction of a “good” code High level idea of why it works Future Directions

Some recent progress

Page 8: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

8

Error-correcting codesC(x)

x

y

x Give up

Mapping C : kn

Message length k, code length n n≥ k

Rate R = k/n 1

Efficient means polynomial in n Decoding Complexity

Page 9: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

9

Shannon’s world

Noise is probabilistic Binary Symmetric Channel

Every bit is flipped

w/ probability p Benign noise model

For example, does not capture

bursty errorsClaude E. Shannon

Page 10: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

10

Hamming’s world

Errors are worst case error locations arbitrary symbol changes

Limit on total number of errors Much more powerful than

Shannon Captures bursty errors

We will consider this channel

model

Richard W. Hamming

Page 11: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

11

A “low level” view

Think of each symbol in being a packet The setup

Sender wants to send k packets After encoding sends n packets Some packets get corrupted Receiver needs to recover the original k packets

Packet size Ideally constant but can grow with n

Page 12: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

12

Decoding

C(x) sent, y received x k, y n

How much of y must be correct to recover x ? At least k packets must be correct At most (n-k)/n = 1-R fraction of errors 1-R is the information-theoretic limit

: the fraction of errors decoder can handle Information theoretic limit implies 1-R

x C(x)

yR = k/n

Page 13: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

13

Can we get to the limit or 1-R ? Not if we always want to uniquely recover the

original message Limit for unique decoding, (1-R)/2

(1-R)/2 (1-R)/2

1-R

c1

c2

y

R 1-R

(1-R)/2

Page 14: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

14

List decoding [Elias57, Wozencraft58] Always insisting on unique codeword is

restrictive The “pathological” cases are rare

“Typical” received word can be decoded beyond (1-R)/2

Better Error-Recovery Model Output a list of answers List Decoding Example: Spell Checker

(1-R)/2

Almost all the space in higher dimension.

All but an exponential (in n) fraction

Page 15: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

15

Advantages of List decoding

Typical received words have an unique closest codeword List decoding will return list size of one such

received words Still deal with worst case errors How to deal with list size

greater than one ? Declare an error; or Use some side information

Spell checker

(1-R)/2

Page 16: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

16

The list decoding problem

Given a code and an error parameter For any received word y

Output all codewords c such that c and y disagree in at most fraction of places

Fundamental Question The best possible tradeoff between R and ?

With “small” lists Can it approach information-theoretic limit 1-R ?

Page 17: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

17May 25, 2007 Ph.D. Final Exam 17

Other applications of list decoding Cryptography

Cryptanalysis of certain block-ciphers [Jakobsen98] Efficient traitor tracing scheme [Silverberg, Staddon, Walker 03]

Complexity Theory Hardcore predicates from one way functions [Goldreich,Levin 89;

Impagliazzo 97; Ta-Shama, Zuckerman 01] Worst-case vs. average-case hardness [Cai, Pavan, Sivakumar 99;

Goldreich, Ron, Sudan 99; Sudan, Trevisan, Vadhan 99; Impagliazzo, Jaiswal,

Kabanets 06] Other algorithmic applications

IP Traceback [Dean,Franklin,Stubblefield 01; Savage, Wetherall, Karlin,

Anderson 00] Guessing Secrets [Alon,Guruswami,Kaufman,Sudan 02; Chung, Graham,

Leighton 01]

Page 18: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

18

Overview of the talk Specify the setup

The model The optimal tradeoff between rate and fraction of

errors Previous work Construction of a “good” code High level idea of why it works Future Directions

Some recent progress

Page 19: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

19

Information theoretic limit

< 1 - R Information-

theoretic limit Can handle

twice as many errors

Rate (R)

Unique decoding

Inf. theoretic limit

Fra

c. o

f Err

ors

()

Page 20: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

20

Achieving information theoretic limit There exist codes that achieve the

information theoretic limit ≥ 1-R-o(1) Random coding argument

Not a useful result Codes are not explicit No efficient list decoding algorithms

Need explicit construction of such codes We also need poly time (list) decodability

Requires list size to be polynomial

Page 21: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

21

The challenge

Explicit construction of code(s) Efficient list decoding algorithms up to the

information theoretic limit For rate R, correct 1-R fraction of errors

Shannon’s work raised similar challenge Explicit codes achieving the information theoretic

limit for stochastic models The challenge has been met [Forney 66, Luby-

Mitzenmacher-Shokrollahi-Spielman 01, Richardson-Urbanke01] Now for stronger adversarial model

Page 22: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

22

Guruswami-Sudan

The best until 1998

1 - R1/2

Reed-Solomon codes

Sudan 95, Guruswami-Sudan98

Better than unique decoding

At R=0.8 Unique: 10% Inf. Th. limit: 20% GS : 10.56 %

Unique decoding

Inf. theoretic limit

Fra

c. o

f Err

ors

()

Rate (R)

Motivating Question:

Close the gap between blue and

green line with explicit efficient codes

Page 23: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

23

The best until 2005

1-(sR)s/(s+1)

s 1 Parvaresh,Vardy

s=2 in the plot

Based on Reed-Solomon codes

Improves GS for R < 1/16

Unique decoding

Inf. theoretic limit

Guruswami-Sudan

Parvaresh-Vardy

Fra

c. o

f Err

ors

()

Rate (R)

Page 24: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

24

Our Result

1- R - > 0 Folded RS codes [Guruswami, R.

06]

Unique decoding

Inf. theoretic limit

Guruswami-Sudan

Parvaresh-Vardy

Fra

c. o

f Err

ors

()

Rate (R)

Our work

Page 25: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

25

Overview of the talk Specify the setup

The model The optimal tradeoff between rate and fraction of

errors Previous work Our Construction High level idea of why it works Future Directions

Recent progress

Page 26: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

26

The main result

Construction of algebraic family of codes For every rate R >0 and >0

List decoding algorithm that can correct 1 - R - fraction of errors

Based on Reed-Solomon codes

Page 27: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

27

Algebra terminology

F will denote a finite field Think of it as integers mod some prime

Polynomials Coefficients come from F Poly of degree 3 over Z7

f(X) = X3 +4X +5 Evaluate polynomials at points in F

f(2) = (8 + 8 + 5) mod 7 = 21 mod 7 =0 Irreducible polynomials

No non-trivial polynomial factors X2+1 is irreducible over Z7 , while X2-1 is not

Page 28: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

28

Reed-Solomon codes

Message: (m0,m1,…,mk-1) Fk

View as poly. f(X) = m0+m1X+…+mk-1Xk-1

Encoding, RS(f) = ( f(1),f(2),…,f(n) ) F ={ 1,2,…,n}

[Guruswami-Sudan] Can correct up to

1-(k/n)1/2 errors in polynomial timef(1) f(2) f(3) f(4) f(n)

Page 29: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

29

Parvaresh Vardy codes (of order 2)

f(1) f(2) f(3) f(4) f(n)

g(1) g(2) g(3) g(4) g(n)

f(X) g(X)g(X)=f(X)q mod E(X)

Extra information from g(X) helps in decoding Rate, RPV = k/2n [PV05] PV codes can correct 1 -(k/n)2/3 errors

in polynomial time 1 - (2RPV)2/3

Page 30: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

30

Towards our solution

Suppose g(X) = f(X)q mod E(X) = f(X) Let us look again at the PV codeword

f(1) f(1)

g(1) g(1)f(1) f(1)

Page 31: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

31

Folded Reed Solomon Codes Suppose g(X) = f(X)q mod E(X) = f(X) Don’t send the redundant symbols Reduces the length to n/2

R = (k/2)/(n/2) = k/n Using PV result, fraction of errors

1 - (k/n)2/3 = 1 - R2/3

f(1) f(1)

f(1) f(1)

Page 32: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

32

Getting to 1-R-

Started with PV code with s = 2 to get 1 - R2/3

Start with PV code with general s 1 - Rs/(s+1)

Pick s to be “large” enough to approach 1-R- Decoding complexity increases from that of

Parvaresh-Vardy but still polynomial

Page 33: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

33

What we actually do We show that for any generator F\{ 0 }

g(X) = f(X)q mod E(X) = f(X) Can achieve similar compression by grouping

elements in orbits of m’~n/m, R ~ (k/m)/(n/m) = k/n

f(1) f(m) f((m’-1)m )

f(m-1) f(2m-1) f(mm’-1)

f() f(m+1) f((m’-1)m+1 )

Page 34: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

34

Proving f(X)q mod E(X) = f(X) First use the fact f(X)q = f(Xq) over F

Need to show f(Xq) mod E(X) = f(X) Proving Xq mod E(X) = X suffices Or, E(X) divides Xq-1 - E(X) = Xq-1 – is irreducible

Page 35: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

35

Our Result

· 1- R - > 0 Folded RS codes [Guruswami, R.

06]

Unique decoding

Inf. theoretic limit

Guruswami-Sudan

Parvaresh-Vardy

Fra

c. o

f Err

ors

()

Rate (R)

Our work

Page 36: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

36

“Welcome” to the dark side…

Page 37: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

37

Limitations of our work

To get to 1 - R - , need s > 1/ Alphabet size = ns > n1/

Fortunately can be reduced to 2poly(1/)

Concatenation + Expanders [Guruswami-Indyk’02] Lower bound is 21/

List size (running time) > n1/

Open question to bring this down

Page 38: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

38

Time to wake up

Page 39: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

39

Overview of the talk List Decoding primer Previous work on list decoding Codes over large alphabets

Construction of a “good” code High level idea of why it works

Codes over small alphabets The current best codes

Future Directions Some (very) modest recent progress

Page 40: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

40

Optimal Tradeoff for List Decoding Best possible is H-1 (1-R)

H()= - log - (1- )log(1- ) Exists (H-1(1-R-),O(1/ )) list decodable code

Random code of rate R has the property whp > H-1(1-R+) implies super poly list size

For any code

For large q, H-1 (1-R) 1-R

q

q

q

q

Page 41: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

41

Our Results (q=2)

Optimal tradeoff H-1(1-R)

[Guruswami, R. 06] “Zyablov”

bound [Guruswami, R.

07] Blokh-Zyablov

# E

rro

rs

Rate

Zyablov bound

Blokh-Zyablov bound

Previous best

Optimal Tradeoff

Page 42: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

42

How do we get binary codes ? Concatenation of codes [Forney 66]

C1: (GF(2k))K (GF(2k))N (“Outer” code)

C2: GF(2)k (GF(2))n (“Inner” code)

C1± C2: (GF(2))kK (GF(2))nN

Typically k=O(log N) Brute force decoding for inner code

m1 m2

wNw1 w2

mKm

C1(m)

C2(w1) C2(w2)C2(wN) C1± C2(m)

Page 43: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

43

List Decoding concatenated code C1 = folded RS code

C2 = “suitably chosen” binary code Natural decoding algorithm

Divide up the received word into blocks of length n

Find closest C2 codeword for each block

Run list decoding algorithm for C1 Loses Information!

Page 44: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

44

List Decoding C2

y1 y2 yN

How do we “list decode” from lists ?

2 GF(2)n

S1 S2 SN

2 GF(2)k

Page 45: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

45

The list recovery problem

Given a code and an error parameter For any set of lists S1,…,SN such that

|Si| s, for every i

Output all codewords c such that ci 2 Si for at least 1-fraction of i’s

List decoding is special case with s=1

Page 46: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

46

List Decoding C1± C2

y1 y2 yN

S1 S2 SN

List decode C 2

List Recovering Algorithm for C1

Page 47: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

47

Putting it together [Guruswami, R. 06] C1 can be list recovered from 1 and C2 can be

list decoded from 2 errors C1± C2 list decoded from 12 errors

Folded RS of rate R list recoverable from 1-R errors

Exists inner codes of rate r list decoded from H-1 (1-r) errors Can find one by “exhaustive” search

C1± C2 list decodable fr’m (1-R)H-1(1-r) errors

Page 48: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

48

Multilevel Concatenated Codes C1: (GF(2k))K (GF(2k))N (“Outer” code 1)

C2: (GF(2k))L (GF(2k))N (“Outer” code 2)

Cin: GF(2)2k (GF(2))n (“Inner” code)

m1 m2 mK m

vNv1 v2 C1(m)

M1 M2 ML M

wNw1 w2 C2(M)

Cin(v1,w1) Cin(v2,w2) Cin(vN,wN)

C1 and C2 are FRS

Page 49: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

49

Advantage over rate rR Concat Codes C1, C2 ,Cin

have rates R1, R2 and r Final rate r(R1+R2)/2, choose R1< R

Step 1: Just recover m List decode Cin up to H-1 (1-r) errors

List recover C1 up to 1-R1 errors m1 m2 mK m

vNv1 v2 C1(m)

M1 M2 ML M

wNw1 w2 C2(M)

Cin(v1,w1) Cin(v2,w2) Cin(vN,wN)

Can handle (1-R1)H-1(1-r) >(1-R)H-1(1-r)

errors

Page 50: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

50

Advantage over Concatenated Codes Step 2: Just recover M, given m

Subcode of Cin of rate r/2 acts on M List decode subcode upto H-1(1-r/2) errors List recover C2 upto 1-R2 errors

Can handle (1-R2) H-1(1-r/2) errorsm1 m2 mK m

vNv1 v2 C1(m)

M1 M2 ML M

wNw1 w2 C2(M)

Cin(v1,w1) Cin(v2,w2) Cin(vN,wN)

Page 51: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

51

Wraping it up

Total errors that can be handled min{(1-R1)H-1(1-r) , (1-R2) H-1(1-r/2) }

Better than (1-R)H-1 (1-r) (R1+R2)/2=R (recall that R1<R) H-1(1-r/2) > H-1(1-r) so choose R2 a bit > R

Optimize over choices of r, R1 and R2

Need nested list decodability of inner code Blokh Zyablov follows from multiple outer

codes

Page 52: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

52

Our Results (q=2)

Optimal tradeoff H-1(1-R)

[Guruswami, R. 06] “Zyablov”

bound [Guruswami, R.

07] Blokh-Zyablov

# E

rro

rs

Rate

Zyablov bound

Blokh-Zyablov bound

Previous best

Optimal Tradeoff

Page 53: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

53

How far can concatenated codes go? Outer code: folded RS Random and independent inner codes

Different inner codes for each outer symbol Can get to the information theoretic limit

= H-1(1-R) [Guruswami, R. 08]

Page 54: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

54

To summarize

List decoding: A central coding theory notion Permits decoding up to the optimal fraction of

adversarial errors Bridges adversarial and probabilistic approaches

to information theory Shannon’s information theoretic limit p = H-1 (1-R) List decoding information theoretic limit = H-1(1-R)

Efficient list decoding possible for algebraic codes

Page 55: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

55

Our Contributions

Folded RS codes are explicit codes that achieve information theoretic limit for list decoding

Better list decoding for binary codes Concatenated codes can get us to list

decoding capacity

Page 56: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

56

Open Questions

Reduce decoding complexity of our algorithm List decoding for binary codes

Explicitly achieve error bound = H-1(1-R) Erasures: decode when = 1-R

Non-algebraic codes ? Graph based codes ? Other applications of these new codes

Extractors [Guruswami, Umans, Vadhan 07] Approximating NP-witnesses [Guruswami, R. 08]

Page 57: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

57

Thank You

Questions ?