how to generate the sbox of luffa · serpent s2 • decomposed by osvik – search strategy •...

32
© 2010, Hitachi, Ltd. All rights are reserved. 1 How to generate the Sbox of Luffa ESC2010@Remich (Jan.11.2010) Dai Watanabe SDL, Hitachi Luffa is a registered trademark of Hitachi, Ltd.

Upload: others

Post on 12-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

1

How to generate the Sbox of Luffa

ESC2010@Remich (Jan.11.2010)

Dai WatanabeSDL, Hitachi

Luffa is a registered trademark of Hitachi, Ltd.

Page 2: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

2

Outline• Topic

– How to find an 4-bit sbox optimized for bit slice implementation

• Known approaches– Serpent and Noekeon

• Luffa– v1: Strategic approach– v2: Non-strategic approach

• Summary

Page 3: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

3

Chaining of Luffa

V0

V1

Vw-1

Q0

Q1

Qw-1

M(1) M(2)

Q0

Q1

Qw-1

M(N)

Q0

Q1

Qw-1

0..0

MI MI MI MI

Z0

hash value

256 bits• Luffa is a variant of sponge

– But, fixed length permutations for all hash length• The number of Qj increases if the hash length gets long

(w=3, 4, 5 for hash_len=256, 384, 512)– Insert message and mix the state by the linear map MI

Page 4: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

4

Non-linear permutation Q• Input/Output

– 256 bits (8 32-bit words)

• Functions– tweak

• Applied before step functions

– Step functions• 8 steps

a0 a1 a2 a3 a4 a5 a6 a7

step function

step function

step function

a0 a1 a2 a3 a4 a5 a6 a7

tweak

8 steps

Page 5: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

5

Step function

a0 a1 a2 a3 a4 a5 a6 a7

a0 a1 a3 a4 a5 a6 a7a2

c0(r) c4(r)

4-bit Sbox (bit slice) Sbox

Constant addition(1-bit / Sbox)

Sbox

Feistel ladder of 4 rounds

32 bits

<<<14

<<<2

<<<1

<<<10

Page 6: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

6

Known approaches

Page 7: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

7

Bit slice ciphers• Serpent (Anderson et al.)

– Step 1: Choose an Sbox with good cryptographic properties

– Step 2: Decompose to a set of instructions for the bit slice implementation

• Noekeon (Daemen et al.)– Step 1: Construct a set of instructions with

some properties– Step 2: Check if the Sbox has desirable

properties

Page 8: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

8

Serpent S2

• Decomposed by Osvik– Search strategy

• Optimized for 2 ALUs(2 ops. per cycle)

• 1 temporary register• Complicated tree search

– Result• 8 cycles• 16 instructions

– Optimized for 3 ALUs• Used in Hamsi• 7 cycles• 16 instructions

r0 r1 r2 r3 r4

1(rotation ignored)

Page 9: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

9

Sbox of Noekeon• Optimized for

hardware– Using NOR ops. as

well as AND ops.– Parallelizable in

hardware• Special property

– S-1 = S In order to unify Enc() and Dec()

r0 r1 r2 r3

(rotation ignored)

Page 10: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

10

Sbox of Noekeon (cont.)• Not optimized for

software– A lot of MOVs are

required– NOR is not good

choice in software– 1 or 2 ops. per cycle– 10 cycles in total– (There may be a better

decomposition)

r0 r1 r2 r3 r4

(rotation ignored)

1

1

Page 11: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

11

Sbox of Luffa v1

Page 12: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

12

Approach• Instruction based Sbox design

– Similar approach to Noekeon– Optimized for Intel Core2

• 3 ALUs (3 instructions per cycle)• Allows 1 temporary register

• Check cryptographic properties later– Optimal maximum differential probability (MDP)– Optimal maximum linear probability (MLP)– No fixed point– High algebraic degree

Intel is a registered trademark and Core is the name of products of Intel Corporation in the U.S. and other countries.

Page 13: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

13

Basic functions preserving injection

a0 a1 a2 a3 Instructions: MOV tmp,a2; OP tmp,a1; /* OP=AND/OR */ XOR a3,tmp;

a0 a1 a2 a3 Instructions: XOR a3,a0;

a0 a1 a2 a3 Instructions: NOT a3;

1

Intel syntax: “OP target, source”ex.) “AND a, b” => “a &= b;”

Page 14: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

14

Implementation and Iteration a0 a1 a2 a3 tmp

a0 a1 a2 a3

MOV tmp,a3; OP a3,a2; XOR a3,tmp;

MOV tmp,a3; OP a3,a2; XOR a3,tmp;

MOV tmp,a3; OP tmp,a2; XOR a1,tmp;

MOV tmp,a3; OP tmp,a2; XOR a1,tmp;

Page 15: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

15

Experimental result• No good Sbox is found in 3 rounds• A lot of “good” Sboxes are found in 4 rounds

– Smallest # of instructions is 13 which satisfies…• MDP = 1/4• MLP = 1/2+1/4• No fixed point

• Is this construction optimal?– No, 4 MOV ops. do not help mixing– 4 round functions take 8 cycles

Page 16: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

16

Modification approach: MOV to XORa0 a1 a2 a3

Instructions: MOV tmp,a2; OP tmp,a1; XOR a3,tmp;

a0 a1 a2 a3 tmp

Instructions: MOV tmp,a2; OP tmp,a1; XOR tmp,a3; OP a3,a0; XOR a3,a2; OP a2,tmp; …

type-I

type-II

1 cycle

Page 17: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

17

Parallelization at the first rounda0 a1 a2 a3 tmp

Instructions: MOV tmp,a2; OP tmp,a1; XOR tmp,a3;

a0 a1 a2 a3 tmp

Instructions: MOV tmp,a3; OP a3,a1; XOR a3,a2;

least requirement: 5 cycles(Type-II)least requirement: 5 cycles(Type-II)

least requirement: 8 cycles(Type-I)least requirement: 8 cycles(Type-I)

Page 18: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

18

How to be surjective?

• Type-II round function is not always surjective

• How to fix it?– No idea– But some 4 round iterations still generate

permutations

Page 19: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

19

Sbox with Type-II round function• Smallest Sbox ever

– 6 cycles – 10 instructions

• Good properties– MDP = 1/4– MLP = 1/2 + 1/4– No fixed point

• A lot of free slots

r0 r1 r2 r3 r4

1

Page 20: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

20

Serpent S2 revisited

• 1 type-I (without MOV)• 3 type-II• Additional XORs and a

NOT

r0 r1 r2 r3 r4

1

Page 21: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

21

Sbox of Luffa v1• Search strategy

– Depth-first tree search– 6 cycles– Mixture of type-I, II– XOR or NOT for free

slots• Result

– 1 type-I + 4 type-II– 16 instructions

r0 r1 r2 r3 r4

1

1

Page 22: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

22

Sbox of Luffa v2

Page 23: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

23

Motivation for the change

• Higher order differential attack on Luffa v1– 7 out of 8 step functions has “non-

randomness”– The terms of high degree in ANFs reduce the

calculation complexity– (The detail will be presented at FSE2010)

Page 24: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

24

[FYI] ANFs of Luffa v1

• y0 = 1 +x2 +x0x1 +x1x3 + x2x3 + x0x1x3

• y1 = 1 +x0 +x2 +x0x1 +x0x2 +x3 +x1x3 + x2x3 + x0x1x3

• y2 = 1 +x1 +x1x3 + x2x3 + x0x1x3

• y3 = x0 +x1 +x2 +x0x1 +x1x2 +x0x1x2 +x1x3

Page 25: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

25

Approach for Luffa v2• Approach for Luffa v1 covers small area

– ANFs of generated sboxes are very similar • How to make it larger?

– Generate instructions randomly– Can a set of random instructions have good

properties?• No, in general• Most of them are not permutations• Even if it is a permutation, it tends to be linear

(experimentally)

Page 26: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

26

Sbox of Luffa v2• Search strategy

– Random generation of instructions

• At least 1 AND/OR op. per a cycle

– Evaluation at each cycle• 1) degrees• 2) # of terms

• Result– Search took days– It generates various ANFs

r0 r1 r2 r3 r4

1

1

Page 27: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

27

Summary

• Note that – Their destinations are different– Optimization level of the tool are different– Security (algebraic property) is hard to evaluate

Approach Target Time Memory # of instructions

Osvik Tablefirst

Any a few minutes per “selected” sbox

Large maybe optimal

Luffa v1 Instruction first Small negligible negligible optimal

Luffa v2 Instruction first Large a few minutes for a “good” sbox

negligible optimal

Page 28: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

28

Open problems?

• How large area does our approaches cover?

• Complexity of ANFs– How many instructions (and cycles) are

necessary for a “random” polynomial?– How to decide the ANFs are acceptably

good?– Trade-off between # of instructions and the

“complexity” of ANFs may exist

Page 29: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

29

Open problems?

• More theory for Osvik’s decomposition– Why his approach works for all sboxes with

only a temporary register?• Groebner basis theory says that it is possible if

there are a lot of temporary registers

• A method to decide if the given decomposition is optimal

Page 30: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

30

Open problems?

• Other approaches?– Ex. Affine equivalence of sboxes

[EUROCRYPT’03]• But affine transformations are not free• How to find the “good” representative

Page 31: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

31

References• R.Anderson, E.Biham and L.Knudsen,

``Serpent: A Proposal for the Advanced Encryption Standard,''

• E.Biham, R.Anderson and L.Knudsen,``Serpent: A New Block Cipher Proposal,‘’ FSE‘97.

• A. Biryukov, C. De Canniere, A. Braeken, and B. Preneel, ``A Toolbox for Cryptanalysis: Linear and Affine Equivalence Algorithms, '' Eurocrypt'03

• J.Daemen, M.Peeters, G.Van Assche and V.Rijmen, ``Nessie Proposal: NOEKEON.''

• D.A.Osvik, ``Speeding up Serpent,'‘ The 3rd AES Conference, 2000.

Page 32: How to generate the Sbox of Luffa · Serpent S2 • Decomposed by Osvik – Search strategy • Optimized for 2 ALUs (2 ops. per cycle) • 1 temporary register • Complicated tree

© 2010, Hitachi, Ltd. All rights are reserved.

32

[FYI] ANFs of Luffa v2

• y0 = 1 + x0 + x1 + x1x2 + x0x3 + x1x3 + x0x1x3 + x0x2x3,

• y1 = x0 + x0x1 + x1x2 + x3 + x0x3 + x1x3 + x0x1x3 + x0x2x3,

• y2 = 1 + x1 + x0x2 + x1x2 + x0x1x2 + x3 + x1x3 + x0x1x3 + x2x3,

• y3 = 1 + x1 + x2 + x0x2 + x1x2 + x0x1x2 + x0x3 + x1x3 + x0x1x3 + x2x3.