how to generate the sbox of luffa · serpent s2 • decomposed by osvik – search strategy •...
TRANSCRIPT
© 2010, Hitachi, Ltd. All rights are reserved.
1
How to generate the Sbox of Luffa
ESC2010@Remich (Jan.11.2010)
Dai WatanabeSDL, Hitachi
Luffa is a registered trademark of Hitachi, Ltd.
© 2010, Hitachi, Ltd. All rights are reserved.
2
Outline• Topic
– How to find an 4-bit sbox optimized for bit slice implementation
• Known approaches– Serpent and Noekeon
• Luffa– v1: Strategic approach– v2: Non-strategic approach
• Summary
© 2010, Hitachi, Ltd. All rights are reserved.
3
Chaining of Luffa
V0
V1
Vw-1
Q0
Q1
Qw-1
M(1) M(2)
Q0
Q1
Qw-1
M(N)
Q0
Q1
Qw-1
0..0
MI MI MI MI
Z0
hash value
256 bits• Luffa is a variant of sponge
– But, fixed length permutations for all hash length• The number of Qj increases if the hash length gets long
(w=3, 4, 5 for hash_len=256, 384, 512)– Insert message and mix the state by the linear map MI
© 2010, Hitachi, Ltd. All rights are reserved.
4
Non-linear permutation Q• Input/Output
– 256 bits (8 32-bit words)
• Functions– tweak
• Applied before step functions
– Step functions• 8 steps
a0 a1 a2 a3 a4 a5 a6 a7
step function
step function
step function
a0 a1 a2 a3 a4 a5 a6 a7
tweak
8 steps
© 2010, Hitachi, Ltd. All rights are reserved.
5
Step function
a0 a1 a2 a3 a4 a5 a6 a7
a0 a1 a3 a4 a5 a6 a7a2
c0(r) c4(r)
4-bit Sbox (bit slice) Sbox
Constant addition(1-bit / Sbox)
Sbox
Feistel ladder of 4 rounds
32 bits
<<<14
<<<2
<<<1
<<<10
© 2010, Hitachi, Ltd. All rights are reserved.
6
Known approaches
© 2010, Hitachi, Ltd. All rights are reserved.
7
Bit slice ciphers• Serpent (Anderson et al.)
– Step 1: Choose an Sbox with good cryptographic properties
– Step 2: Decompose to a set of instructions for the bit slice implementation
• Noekeon (Daemen et al.)– Step 1: Construct a set of instructions with
some properties– Step 2: Check if the Sbox has desirable
properties
© 2010, Hitachi, Ltd. All rights are reserved.
8
Serpent S2
• Decomposed by Osvik– Search strategy
• Optimized for 2 ALUs(2 ops. per cycle)
• 1 temporary register• Complicated tree search
– Result• 8 cycles• 16 instructions
– Optimized for 3 ALUs• Used in Hamsi• 7 cycles• 16 instructions
r0 r1 r2 r3 r4
1(rotation ignored)
© 2010, Hitachi, Ltd. All rights are reserved.
9
Sbox of Noekeon• Optimized for
hardware– Using NOR ops. as
well as AND ops.– Parallelizable in
hardware• Special property
– S-1 = S In order to unify Enc() and Dec()
r0 r1 r2 r3
(rotation ignored)
© 2010, Hitachi, Ltd. All rights are reserved.
10
Sbox of Noekeon (cont.)• Not optimized for
software– A lot of MOVs are
required– NOR is not good
choice in software– 1 or 2 ops. per cycle– 10 cycles in total– (There may be a better
decomposition)
r0 r1 r2 r3 r4
(rotation ignored)
1
1
© 2010, Hitachi, Ltd. All rights are reserved.
11
Sbox of Luffa v1
© 2010, Hitachi, Ltd. All rights are reserved.
12
Approach• Instruction based Sbox design
– Similar approach to Noekeon– Optimized for Intel Core2
• 3 ALUs (3 instructions per cycle)• Allows 1 temporary register
• Check cryptographic properties later– Optimal maximum differential probability (MDP)– Optimal maximum linear probability (MLP)– No fixed point– High algebraic degree
Intel is a registered trademark and Core is the name of products of Intel Corporation in the U.S. and other countries.
© 2010, Hitachi, Ltd. All rights are reserved.
13
Basic functions preserving injection
a0 a1 a2 a3 Instructions: MOV tmp,a2; OP tmp,a1; /* OP=AND/OR */ XOR a3,tmp;
a0 a1 a2 a3 Instructions: XOR a3,a0;
a0 a1 a2 a3 Instructions: NOT a3;
1
Intel syntax: “OP target, source”ex.) “AND a, b” => “a &= b;”
© 2010, Hitachi, Ltd. All rights are reserved.
14
Implementation and Iteration a0 a1 a2 a3 tmp
a0 a1 a2 a3
MOV tmp,a3; OP a3,a2; XOR a3,tmp;
MOV tmp,a3; OP a3,a2; XOR a3,tmp;
MOV tmp,a3; OP tmp,a2; XOR a1,tmp;
MOV tmp,a3; OP tmp,a2; XOR a1,tmp;
© 2010, Hitachi, Ltd. All rights are reserved.
15
Experimental result• No good Sbox is found in 3 rounds• A lot of “good” Sboxes are found in 4 rounds
– Smallest # of instructions is 13 which satisfies…• MDP = 1/4• MLP = 1/2+1/4• No fixed point
• Is this construction optimal?– No, 4 MOV ops. do not help mixing– 4 round functions take 8 cycles
© 2010, Hitachi, Ltd. All rights are reserved.
16
Modification approach: MOV to XORa0 a1 a2 a3
Instructions: MOV tmp,a2; OP tmp,a1; XOR a3,tmp;
a0 a1 a2 a3 tmp
Instructions: MOV tmp,a2; OP tmp,a1; XOR tmp,a3; OP a3,a0; XOR a3,a2; OP a2,tmp; …
type-I
type-II
1 cycle
© 2010, Hitachi, Ltd. All rights are reserved.
17
Parallelization at the first rounda0 a1 a2 a3 tmp
Instructions: MOV tmp,a2; OP tmp,a1; XOR tmp,a3;
a0 a1 a2 a3 tmp
Instructions: MOV tmp,a3; OP a3,a1; XOR a3,a2;
least requirement: 5 cycles(Type-II)least requirement: 5 cycles(Type-II)
least requirement: 8 cycles(Type-I)least requirement: 8 cycles(Type-I)
© 2010, Hitachi, Ltd. All rights are reserved.
18
How to be surjective?
• Type-II round function is not always surjective
• How to fix it?– No idea– But some 4 round iterations still generate
permutations
© 2010, Hitachi, Ltd. All rights are reserved.
19
Sbox with Type-II round function• Smallest Sbox ever
– 6 cycles – 10 instructions
• Good properties– MDP = 1/4– MLP = 1/2 + 1/4– No fixed point
• A lot of free slots
r0 r1 r2 r3 r4
1
© 2010, Hitachi, Ltd. All rights are reserved.
20
Serpent S2 revisited
• 1 type-I (without MOV)• 3 type-II• Additional XORs and a
NOT
r0 r1 r2 r3 r4
1
© 2010, Hitachi, Ltd. All rights are reserved.
21
Sbox of Luffa v1• Search strategy
– Depth-first tree search– 6 cycles– Mixture of type-I, II– XOR or NOT for free
slots• Result
– 1 type-I + 4 type-II– 16 instructions
r0 r1 r2 r3 r4
1
1
© 2010, Hitachi, Ltd. All rights are reserved.
22
Sbox of Luffa v2
© 2010, Hitachi, Ltd. All rights are reserved.
23
Motivation for the change
• Higher order differential attack on Luffa v1– 7 out of 8 step functions has “non-
randomness”– The terms of high degree in ANFs reduce the
calculation complexity– (The detail will be presented at FSE2010)
© 2010, Hitachi, Ltd. All rights are reserved.
24
[FYI] ANFs of Luffa v1
• y0 = 1 +x2 +x0x1 +x1x3 + x2x3 + x0x1x3
• y1 = 1 +x0 +x2 +x0x1 +x0x2 +x3 +x1x3 + x2x3 + x0x1x3
• y2 = 1 +x1 +x1x3 + x2x3 + x0x1x3
• y3 = x0 +x1 +x2 +x0x1 +x1x2 +x0x1x2 +x1x3
© 2010, Hitachi, Ltd. All rights are reserved.
25
Approach for Luffa v2• Approach for Luffa v1 covers small area
– ANFs of generated sboxes are very similar • How to make it larger?
– Generate instructions randomly– Can a set of random instructions have good
properties?• No, in general• Most of them are not permutations• Even if it is a permutation, it tends to be linear
(experimentally)
© 2010, Hitachi, Ltd. All rights are reserved.
26
Sbox of Luffa v2• Search strategy
– Random generation of instructions
• At least 1 AND/OR op. per a cycle
– Evaluation at each cycle• 1) degrees• 2) # of terms
• Result– Search took days– It generates various ANFs
r0 r1 r2 r3 r4
1
1
© 2010, Hitachi, Ltd. All rights are reserved.
27
Summary
• Note that – Their destinations are different– Optimization level of the tool are different– Security (algebraic property) is hard to evaluate
Approach Target Time Memory # of instructions
Osvik Tablefirst
Any a few minutes per “selected” sbox
Large maybe optimal
Luffa v1 Instruction first Small negligible negligible optimal
Luffa v2 Instruction first Large a few minutes for a “good” sbox
negligible optimal
© 2010, Hitachi, Ltd. All rights are reserved.
28
Open problems?
• How large area does our approaches cover?
• Complexity of ANFs– How many instructions (and cycles) are
necessary for a “random” polynomial?– How to decide the ANFs are acceptably
good?– Trade-off between # of instructions and the
“complexity” of ANFs may exist
© 2010, Hitachi, Ltd. All rights are reserved.
29
Open problems?
• More theory for Osvik’s decomposition– Why his approach works for all sboxes with
only a temporary register?• Groebner basis theory says that it is possible if
there are a lot of temporary registers
• A method to decide if the given decomposition is optimal
© 2010, Hitachi, Ltd. All rights are reserved.
30
Open problems?
• Other approaches?– Ex. Affine equivalence of sboxes
[EUROCRYPT’03]• But affine transformations are not free• How to find the “good” representative
© 2010, Hitachi, Ltd. All rights are reserved.
31
References• R.Anderson, E.Biham and L.Knudsen,
``Serpent: A Proposal for the Advanced Encryption Standard,''
• E.Biham, R.Anderson and L.Knudsen,``Serpent: A New Block Cipher Proposal,‘’ FSE‘97.
• A. Biryukov, C. De Canniere, A. Braeken, and B. Preneel, ``A Toolbox for Cryptanalysis: Linear and Affine Equivalence Algorithms, '' Eurocrypt'03
• J.Daemen, M.Peeters, G.Van Assche and V.Rijmen, ``Nessie Proposal: NOEKEON.''
• D.A.Osvik, ``Speeding up Serpent,'‘ The 3rd AES Conference, 2000.
© 2010, Hitachi, Ltd. All rights are reserved.
32
[FYI] ANFs of Luffa v2
• y0 = 1 + x0 + x1 + x1x2 + x0x3 + x1x3 + x0x1x3 + x0x2x3,
• y1 = x0 + x0x1 + x1x2 + x3 + x0x3 + x1x3 + x0x1x3 + x0x2x3,
• y2 = 1 + x1 + x0x2 + x1x2 + x0x1x2 + x3 + x1x3 + x0x1x3 + x2x3,
• y3 = 1 + x1 + x2 + x0x2 + x1x2 + x0x1x2 + x0x3 + x1x3 + x0x1x3 + x2x3.