algebraic methods in analyzing lightweight cryptographic …pub.ist.ac.at/~mwalter/docs/ma.pdf ·...

Algebraic methods in analyzinglightweight cryptographicsymmetric primitivesMaster-Thesis von Michael Walter

Department of Computer ScienceTheoretische Informatik – Kryptogra-phie und Computeralgebra (CDC)

Algebraic methods in analyzing lightweight cryptographic symmetric primitives

Vorgelegte Master-Thesis von Michael Walter

Prüfer: Prof. Johannes A. BuchmannBetreuer: Dr. Stanislav Bulygin

Tag der Einreichung:

AcknowledgementI would like to thank my supervisor, Dr. Stanislav Bulygin, and Prof. Johannes Buchmannfor giving me the opportunity to write my thesis on this very interesting topic in the fieldof algebraic cryptanalysis. I especially want to thank Dr. Bulygin for always givingguidance and support while leaving enough room for my own creativity. It was a greatpleasure to work with him.Furthermore, I would like to express thanks to all SAGE developers for theircontributions to a very useful software, but in particular to Nathann Cohen and MartinR. Albrecht, who were very helpful in overcoming technical problems.I also thank the two proofreaders, Franziskus Kiefer and Artjom Kochtchi, for taking thetime to read and give detailed comments on what can only be described as very roughversions of this work.Last, but certainly not least, I want to thank my whole family, and my parents inparticular, for their support not only during this thesis, but especially during all the yearsof my studies.

Erklärung zur Master-Thesis

Hiermit versichere ich, die vorliegende Master-Thesis ohne Hilfe Dritternur mit den angegebenen Quellen und Hilfsmitteln angefertigt zuhaben. Alle Stellen, die aus Quellen entnommen wurden, sind als solchekenntlich gemacht. Diese Arbeit hat in gleicher oder ähnlicher Formnoch keiner Prüfungsbehörde vorgelegen.

Darmstadt, den 18. April 2012

(Michael Walter)

2

AbstractIn this work we analyze two lightweight cryptographic primitives, the hash functionSPONGENT and the block cipher EPCBC, using algebraic methods. Regarding SPONGENT,we are able to improve on previously known results by finding two semi-free-startcollisions for round-reduced SPONGENT-88 with 6 rounds, and prove that no semi-free-start collisions exist for SPONGENT-128 with 6 rounds. For EPCBC we are able todemonstrate practical attacks for both versions, EPCBC-48 and EPCBC-96, for up to 3rounds. For EPCBC-48 we demonstrate weaknesses and find a theoretical attack up toround 8, which is 25% of the full cipher. We obtaine similar results for EPCBC-96 upto round 5 and identified a significant class of weak keys for 6 rounds. Furthermore,we introduce a novel method of optimizing guessing strategies using Mixed IntegerLinear Programming and demonstrate its application to the two primitives underinvestigation, which leads to our results.

3

ContentsList of Figures 6

List of Tables 7

1. Introduction 8

2. Preliminaries 92.1. Algebraic Cryptanalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2. SAT solving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.1. SAT solving in Algebraic Cryptanalysis . . . . . . . . . . . . . . . . 112.3. Hash functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3.1. Sponge Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4. Block ciphers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.5. Mixed Integer Linear Programming in Cryptanalysis . . . . . . . . . . . . 152.6. Tools and Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3. Optimization of Guessing Strategies using Mixed Integer Linear Pro-gramming 183.1. Guessing Strategies in Algebraic Cryptanalysis . . . . . . . . . . . . . . . 18

3.1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.1.2. Information Propagation . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2. Guessing Strategy Optimization as MILP . . . . . . . . . . . . . . . . . . . 203.2.1. Simple Propagation Model . . . . . . . . . . . . . . . . . . . . . . . 203.2.2. S-Box adjusted Propagation Model . . . . . . . . . . . . . . . . . . 22

4. SPONGENT 254.1. Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.2. Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.2.1. Algebraic Representation . . . . . . . . . . . . . . . . . . . . . . . . 274.2.2. Preimage Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.2.3. Semi-free-start Collisions . . . . . . . . . . . . . . . . . . . . . . . . 29

5. EPCBC 345.1. Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.1.1. Key schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.1.2. Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.2. Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.2.1. Algebraic Representation . . . . . . . . . . . . . . . . . . . . . . . . 365.2.2. Known Plaintext Attack . . . . . . . . . . . . . . . . . . . . . . . . . 375.2.3. Weak Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4

6. Conclusion 526.1. Summary and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526.2. Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

A. Algebraic Representation of S-Boxes with Quadratic Equations 58A.1. SPONGENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58A.2. EPCBC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5

List of Figures2.1. Illustration of a sponge (obtained from http://sponge.noekeon.org/) 13

3.1. One round of the key schedule of EPCBC-48 . . . . . . . . . . . . . . . . . 203.2. Example of state variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.3. Example of EPCBC S-Box relation . . . . . . . . . . . . . . . . . . . . . . . . 233.4. Example of conflict in EPCBC network . . . . . . . . . . . . . . . . . . . . . 24

4.1. Guessing 16 bits of π88 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.1. Guessing 32 bits of EPCBC-48 . . . . . . . . . . . . . . . . . . . . . . . . . . 395.2. Comparison of t32

true (top) and t32false (bottom) for EPCBC-48 to brute-

force attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.3. Guessing 64 bits of EPCBC-96 . . . . . . . . . . . . . . . . . . . . . . . . . . 435.4. Guessing 64 bits of EPCBC-96 considering masks . . . . . . . . . . . . . . 465.5. Conditions to meet mask requirements . . . . . . . . . . . . . . . . . . . . 48

6

List of Tables3.1. Semantics of new variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.1. Parameters for SPONGENT-88 and SPONGENT-128 . . . . . . . . . . . . . . 254.2. Parameters for πb in SPONGENT-88 and SPONGENT-128 . . . . . . . . . . 264.3. SPONGENT S-Box (hex) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.4. Results for preimage attack on SPONGENT-88 . . . . . . . . . . . . . . . . 284.5. Results for semi-free-start collisions for SPONGENT-88 (left) and

SPONGENT-128 (right) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.6. Comparison of solving times for maximal and random guessing strategies 324.7. Semi-free-start collisions for π88 with R= 6 (hex) . . . . . . . . . . . . . 32

5.1. EPCBC S-Box (hex) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.2. Standard algebraic known plaintext attack on EPCBC-48 (left) and

EPCBC-96 (right) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.3. Algebraic known plaintext attack on EPCBC-48 with known difference

vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.4. Algebraic known plaintext attack on EPCBC-48 with guessed difference

vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.5. Comparison of t32

true (left) and t32false (right) for EPCBC-96 to brute-force

attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445.6. Information flow in EPCBC S-Box . . . . . . . . . . . . . . . . . . . . . . . . 455.7. Comparison of t32

true (left) and t32false (right) for weak keys of EPCBC-96

to brute-force attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

7

1 IntroductionRecent years have seen a drastic increase in the importance of embedded systems.More and more objects in the world around us are being equiped with some circuitry,giving rise to a seemingly endless potential of intelligent and convenient solutionsto today’s problems. In consequence, small and cheap hardware, like RFID tags orintegrated circuit printing, is being developed. However, for many applications theselow resource devices need to be able to provide security properties, like privacy orauthenticity. This is achieved by employing cryptographic algorithms. Many of thesealgorithms have not been designed to be implemented on small devices and are thusnot suitable for this purpose. For this reason, the relatively young field of lightweightcryptography tries to cater to that need by designing cryptographic primitives thatcan be implemented with a small hardware footprint, while still preserving security.

The main challenge in lightweight cryptography – to reduce the hardware footprintas far as possible while maintaining acceptable performance and security levels –requires careful analysis of every new cryptosystem that is being proposed. While theproposals tend to get smaller and smaller in terms of hardware footprint, one needsto make sure that these primitives still meet the claimed security properties. As itturns out, algebraic cryptanalysis seems to be a powerful approach to the analysisof lightweight primitives, since the restrictions on the hardware footprint also poserestrictions on the algebraic complexity of these primitives.

In this work we will analyze two recently proposed lightweight primitives, the hashfunction SPONGENT [8] and the block cipher EPCBC [47]. Both follow the design prin-ciples of PRESENT [9] block cipher. Since we are not able to break the full primitives,we will concentrate on reduced versions of these cryptosystems. However, we willalways assume the full state width of a system and only apply the reductions to thenumber of rounds the primitive iterates. Our results suggest that SPONGENT is se-cure against pure algebraic attacks, while EPCBC does exhibit some weaknesses thatshould be generally avoided in the design of block ciphers.

However, our contribution goes beyond the analysis of two specific primitives. Weintroduce a novel technique to optimize guessing strategies in algebraic cryptanaly-sis using Mixed Integer Programming. Guessing strategies play an important role inalgebraic cryptanalysis to estimate the complexity of theoretical attacks or demon-strate certain weaknesses. We show how these new methods can be applied toSPONGENT and EPCBC, and how they can even be used to identify classes of weakkeys for EPCBC. While we concentrate on PRESENT-like primitives, this approach isapplicable, and should be seen, in a more general context.

The next chapter outlines some related work and discusses some preliminaries ofour work. In Chapter 3 we explain our approach to optimizing guessing strategies. Inthe following chapters, 4 and 5, we show how to apply the method to SPONGENT andEPCBC respectively. Finally, in Chapter 6 we sum up our results.

8

2 PreliminariesThis chapter gives an overview of the preliminaries of this work and puts it in the con-text of related work. Section 2.1 describes the basics of algebraic cryptanalysis andSection 2.2 briefly introduces SAT solving and its application in algebraic cryptanal-ysis. In the two following sections, 2.3 and 2.4, the two different kinds of primitives,of which examples are considered in this work, are introduced. Since a key techniqueof our approach is the application of Mixed Integer Programming, a short overviewabout its applications in cryptanalysis are given in Section 2.5. Finally, Section 2.6lists the tools and hardware used in the context of this work.

2.1 Algebraic Cryptanalysis

The idea of algebraic cryptanalysis is to relate the inputs and outputs of a crypto-graphic primitive by a set of polynomial equations. It can be argued that this ideadates back as far as 1949 to a frequently cited quote by Claude Shannon [41]. How-ever, only in the past decade it has emerged as specific cryptanalytic method withits successful application to several cryptographic primitives [17,18,24,25]. The ad-vantage of algebraic cryptanalysis is its broad applicability, since any cryptographicprimitive can be described by a set of polynomial equations. Furthermore, algebraicmethods seem to be very suitable to be combined with other attacks, like e.g. side-channel attacks [39] or differential cryptanalysis [2]. On the downside, applicationsto non-leightweight block ciphers, one of the most important kind of cryptographicprimitives, have been of limited success so far, due to their high algebraic complexity.

The polynomial equations are usually constructed over GF(2), but using otherfields can also be suitable for certain primitives [11, 16]. Adding sufficient infor-mation to the polynomial system, one can deduct more information from it by solv-ing the system. For example, in a typical known plaintext attack on an encryptionscheme, the adversary would set the variables corresponding to the inputs and out-puts to the values of the known plaintext and ciphertext, respectively, and attempt tofind a solution to the system. If the known plain-/ciphertext uniquely dertermine thesecret key of the primitive, a solution to the polynomial system immediately recoversthe key.

The main obstacle in algebraic cryptanalysis is the computationally expensive solv-ing of the polynomial system. Typically, these systems are very large for practicalcryptographic primitives and solving them is often not easier than attacking the prim-itive by brute-force. Still, algebraic cryptanalysis has yielded some very interestingresults on some primitives, especially in lightweight cryptography. In this work wewill also focus on lightweight cryptography, where the algebraic complexity is con-strained by considerations regarding hardware footprint and power consumption.

To solve a large polynomial system, several techniques may be employed, whichhave different strengths and drawbacks. For example, using Gröbner basis techniquesis usually rather slow in comparison to other techniques, but has more potential tocontribute to the understanding of the polynomial system and employ algebraic tricks

9

as described for example in [45]. In contrast, we will mainly focus on SAT solving inthis work.

2.2 SAT solving

The satisfiability problem (SAT) is the classical NP-complete problem and has thusreceived intensive research for decades. SAT consists of deciding if a given booleanformula is satisfiable, i.e. if there is an assignment for the variables so that the for-mula evaluates to true. With SAT being NP-hard, every problem in NP can be reducedto it. With this in mind, it is natural that SAT solving has a vast variety of applica-tions, reaching from the industrial area, e.g. circuit design or product configuration,to research, e.g. algebraic cryptanalysis or automatic theorem proving. Not surpris-ingly, its broad applicability still motivates a range of current research, as can be seenby the anual competitions1 and conferences, e.g. [40].

The intensive research in the area of SAT solving has resulted in a range of solvers2.By most of them, an instance of the SAT problem, i.e. a boolean formula, is assumedto be in Conjunctive Normal Form (CNF), meaning that it comprises a set of clauses,all assumed to be related by conjunctions. Each clause consists of a set of literalsand each literal is either a variable or a negated variable. The literals are related bydisjunctions. An small example of a formula in CNF is given in Equation 2.1

(x1 ∨ x2 ∨¬x3)∧ (¬x2 ∨ x3)∧ (x2 ∨ x3) (2.1)

The solver we are using in this work is a so-called conflict-driven solver, Crypto-MiniSat [42], which is based on the DPLL algorithm [21, 22], proposed in 1960.This algorithm tries to search the tree spanning all possible assignments for the vari-ables, where every node corresponds to a variable and every branch to a possibleassignment of this variable, as efficiently as possible using depth-first backtracking.The efficiency of the algorithm is essentially based on the repeated application oftwo techniques: unit propagation and pure literal elimination. The former meansthat clauses that contain only one literal can only be satisfied when this variable isassigned accordingly. The latter refers to the fact that variables that only appear ex-clusively as either positive or negative literals in the whole formula can be assignedin such a way that all clauses containing them are satisfied and can thus be elimi-nated from the problem. When successively assigning variables to traverse the tree,a clause might be found that is not satisfiable anymore. This is called a conflict.Upon finding such a conflict, conflict-driven solvers “learn” a new clause, a so-calledconflict clause, to guide the further search.

Many modern solvers employ a range of other heuristics that have been empiricallyshown to improve the search. Using CryptoMiniSat deemed us to be a good choice,since it is specifically designed for cryptanalytical problems and it has won the SAT-Race 20103.

1 http://www.satcompetition.org/2 For a (non-exhaustive) list, see http://www.cril.univ-artois.fr/SAT11/.3 http://baldur.iti.uka.de/sat-race-2010/results.html

10

2.2.1 SAT solving in Algebraic Cryptanalysis

The idea behind employing SAT solvers for algebraic cryptanalysis is to convert theset of polynomial equations into an equivalent SAT instance, feed it into a SAT solverand translate a solution for the SAT problem into a solution of the polynomial system.So far, this has been one of the most efficient methods for algebraic cryptanalysis, notleast because this approach is able to take advantage of the vast amounts of past andpresent research in this area.

One crucial part of this approach is the conversion of the algebraic representationto a SAT instance. In principle, there are two different approaches. Both of them con-sider each polynomial equation individually and translate it into a set of clauses. Thefinal SAT instance is the conjunction of all those clauses. Two methods of convertingan equation into clauses of a CNF have been proposed so far:

• In [3] a technique is described that converts the polynomials directly usingtheir structure. While this is a very fast approach, most of the time it includesintroducing many new variables and often yields a comparatively large repre-sentation.

• The second technique, decribed in [12], converts the polynomials by consider-ing their truth table. This usually results in a more compact description of theSAT instance, but is not as efficient as the former.

We want to point out though that the ineffiency in the conversion is not the bottleneckof algebraic cryptanalysis, so the efficiency advantage of the former method is hardlyrelevant. On the other hand, in [14] both methods were tested and the experimentssuggested that the latter method yields better results for solving the final instances.Because of this we used the latter method in this work. For further details on theconversion and SAT solving we refer to [3].

2.3 Hash functions

In general, a hash function h : {0,1}∗ → {0, 1}n maps strings of arbitrary length tostrings of fixed length. A cryptographic hash function should behave like a randomoracle. For every query a random oracle returns a truly random output from itsoutput domain, with the exception that it always returns the same output for thesame query. The term “a function should behave like a random oracle” means thatthe function should not have any weaknesses the random oracle does not have. Inthis context, there are several security properties of interest. If an algorithm is foundthat demonstrates that any of these properties can be broken more efficiently thanfor a random oracle, the hash function is considered broken, even if this algorithm ispractically infeasible.

Preimage resistanceA hash function h : {0, 1}∗ → {0, 1}n is preimage resistant, if for any given value

t ∈ {0, 1}n it is computationally infeasible to find a preimage m ∈ {0, 1}∗ with h(m) =t with non-negligible probability. A brute-force attack on a random oracle has anexpected time complexity of 2n−1.

11

Second preimage resistanceA hash function h : {0, 1}∗ → {0, 1}n is second preimage resistant, if for any given

value m1 ∈ {0, 1}∗ it is computationally infeasible to find a second preimage m2 6=m1 ∈ {0,1}∗ with h(m1) = h(m2)with non-negligible probability. A brute-force attackon a random oracle also has an expected time complexity of 2n−1.

Collision resistanceA hash function h : {0,1}∗ → {0, 1}n is collision resistant, if it is computationally

infeasible to find m1, m2 ∈ {0, 1}∗ with m1 6= m2 and h(m1) = h(m2) with non-negligible probability. A brute-force attack on a random oracle has an expected timecomplexity of 2

n2 .

Most hash functions use a finite internal state to compute the output to a certaininput. Usually this state is initialized with a fixed value, IV , given in the specification.For cryptanalytic purposes it is often of interest to explore the possibility of findinga collision for some arbitrary initial value of the internal state that is fixed acrossdifferent applications. If such a collision is found it is usually denoted as semi-free-start collision.

Semi-free-start collisionFor a hash function hIV : {0, 1}∗ → {0, 1}n using an inner state initialized with

IV , the values IV ′, m1, and m2 constitute a semi-free-start collision if hIV ′(m1) =hIV ′(m2) and m1 6= m2.

The availability of hash functions that have the above properties is of significantimportance in the field of cryptography. The properties are relied upon in numerousprotocol designs and the corresponding security proofs. For example, many signatureschemes rely heavily on collision or second preimage resistant hash functions to beable to provide unforgeability. Furthermore, in many authentication schemes hashvalues are used and the scheme would be rendered useless if the hash function didnot provide the above security properties.

As pointed out in [6], most practical hash functions use an internal state, thatis updated, often using a compression function f : {0, 1}m → {0,1}n with m > n,according to the next message block and the current value of the state. A populardesign pattern is the well-known Merkle-Damgård construction [20], which can beused to turn a collision resistant compression function into a collision resistant hashfunction. However, due to the finity of the internal state, such a hash function willalways exhibit weaknesses based on collisions in this state and can thus never achievethe same security as a random oracle.

2.3.1 Sponge Functions

A recent proposal introduced the notion of sponge functions as an alternative, morerealistic reference model for hash functions in contrast to the random oracle [6].The computation of a sponge function runs in two phases, the absorbing phase and

12

the squeezing phase. A sponge function (or sponge) has a state of size b bits, whichconsists of an outer state of size r (called rate) and an inner state of size c suchthat b = c + r. The state is updated using a permutation4 f : {0, 1}b → {0,1}b.During the absorbing phase the input is split into r-bit blocks, denoted here by pi,and successively added to the first r bits of the state, interleaved with applications off . During the squeezing phase the output is produced in r-bit blocks, denoted hereby zi, by outputting the first r bits of the state, also interleaved by applications of f .The computation of a sponge is illustrated in Figure 2.1. For security reasons it is

Figure 2.1.: Illustration of a sponge (obtained from http://sponge.noekeon.org/)

required that the input does not end with the all zero block. This can be achieved bypadding the input accordingly before applying the sponge (cf. Section 4.1).

Inner collisionLet m1 and m2 be two messages with m1 6= m2. Furthermore, let S1 and S2 be

the inner states of size c that are obtained by absorbing m1 and m2 by a spongerespectively. Then, the pair (m1, m2) is called an inner collision for the sponge, iffS1 = S2.

In [7] it was shown that a sponge behaves like a random oracle in the absence ofinner collisions. Of course, due to the finity of the inner state, inner collisions mustexist and, in theory, can be found by brute-force. Consequentially, if the output lengthn is larger than the size of the inner state, c, the security of a sponge is dominated byc as opposed to the output length n, as in the random oracle. On the other hand, thesponge offers a similar security with regards to the parameter c as the random oraclewith regards to the output length n.

It is possible to construct specific hash functions using the sponge framework byfixing the parameters b, r, c, and, most importantly, the permutation f , and truncat-ing the output to the desired output length n. Following the logic above, f should be

4 Strictly speaking, f does not need to be a permutation, but can be any transformation. However,in the context of this work, only permutations are considered, so we neglect other transformationshere.

13

such that it does not exhibit any properties that enable an adversary to find an innercollision more efficiently than a generic attack with time complexity of 2

c2 to achieve

the maximum of security. Using this strategy to construct a hash function is denotedas hermetic sponge strategy [5] and we will denote hash functions designed using thisapproach as sponge hash functions.

The authors of [5] used exactly this strategy to design Keccak – one of the fiveSHA-3 finalists. In [33] an attempt was made to analyze the preimage resistanceof Keccak using SAT-based algebraic methods, similar to the methods in this work,with limited success. Preimages were found only for greatly reduced versions. Sim-ilarily, Dinur et al. were able to find collisions and near-collisions on Keccak-224and Keccak-256 only for greatly round reduced versions by combining algebraic anddifferential techniques [23]. We would like to point out, though, that the hardwarefootprint of Keccak is far from being considered as lightweight (which was not aprimary design target). To our best knowledge, no lightweight sponge hash functionhas been analyzed using algebraic methods so far.

2.4 Block ciphers

A block cipher Ek : {0, 1}b → {0, 1}b takes a plaintext p of a certain block size b anda secret key k as input and maps to an equally sized ciphertext c. The ciphertextis then decipherable with the secret key. While the main pupose of block ciphers isobviously to encrypt messages for secure transmission, there are several other appli-cations of block ciphers. Specifically, block ciphers can be used to construct othercryptographic primitives. For example, operating a block cipher in OBF or CTR modewill constitute a stream cipher. Furthermore, block ciphers can be used to constructcompression functions, which in turn can be used to build cryptoraphic hash func-tions by using the Merkle-Damgård construction [20], as already mentioned in theprevious section. These are only two of many further applications block ciphers have,so their importance in the field of cryptography is undeniable.

One of the most well-known block ciphers to date is DES [44] developed in theearly 1970s by IBM. By now, it is insecure in its original version due to the smallkey size. However, in a revised version, denoted as TripleDES [4], it is still con-sidered practically secure, even though theoretical attacks have been found [29].During the recent years, DES has been largely replaced by the much more secureAES [37]. There have been several attempts to attack AES algebraically [16, 19],none of which have revealed weaknesses of the full cipher so far. However, there hasbeen some progress in combining algebraic techniques with side-channel analysis toobtain practical attacks on certain implementations [32].

In recent work [11], Bouillaguet et al. proposed an attack on AES which seemsto be related to our methods. They also construct a system of polynomial equationsto the cipher and substitute a set of known values, e.g. a known plain-/ciphertextpair, into the system. They then attempt to find a minimal number of variables inthe system that need to be guessed, in order to verify or falsify the guess. For this,the authors also try to take advantage of information flow as heavily as possible.In this respect, they also optimize guessing strategies for guess-and-determine at-tacks. However, our approach differs mainly in two ways. Firstly, we are combiningthe optimized strategies with algebraic attacks and thus our optimization problem is

14

of a different, even though similar, nature: we restrict the number of guesses andtry to maximize the information flow. Secondly, we model the problem as MixedInteger Problem and solve it using an off-the-shelf solver on comparatively cheaphardware, while Bouillaguet et al. designed their own solving algorithms, which theyimplemented themselves and tested on platforms with several hundred cores.

Especially in the area of lightweight cryptography new block cipher designs areproposed constantly. A prominent example is the PRESENT block cipher [9], whichhas inspired a range of other PRESENT-like block ciphers like PRINTcipher [26] orEPCBC [47], and even hash functions like SPONGENT [8]. PRESENT-like block ciphershave in common that they usually iterate a number of similar rounds, each consistingof a key addition layer, a substitution layer and a bit permutation layer. While bitpermutation and key addition layer are in principle the same among all of theseblock ciphers, they usually differ in the S-Box they employ for the substitution layerand in the key scheduling algorithm. An S-Box is a non-linear operation applied tosmall blocks of the state individually.

While striving for ever smaller hardware footprints, security margins are reducedto a bare minimum in lightweight cryptography, and sometimes even pushed toofar, as the example of the PRINT cipher shows. There have been several more orless successful attacks on PRINTcipher [1, 14, 15, 27], some of them employing al-gebraic cryptanalysis [14]. On PRESENT itself pure algebraic attacks were not quiteas successful, as they were only able to break round reduced versions [36]. Exam-ples like these demonstrate the necessity to carefully analyze every new proposal oflightweight block ciphers.

In the context of block ciphers there are several interesting attack models thatdescribe how much information a potential adversary is able to obtain. We giveinformal definitions for two models, that are most relevant to and suffient for thiswork. More elaborate definitions can be found in any introductory literature.

Known plaintext attackIn this model it is assumed that a number of random plain-/ciphertext pairs are

available to the adversary. If he is able to recover the key or part of the key moreefficiently than brute-forcing it, the block cipher is considered broken under thisattack model.

Chosen plaintext attackIn this model it is assumed that the adversary is able to choose different plaintexts

before his attack and obtain the corresponding ciphertexts. If he is able to recoverthe key or part of the key more efficiently than brute-forcing it, the block cipher isconsidered broken under this attack model.

2.5 Mixed Integer Linear Programming in Cryptanalysis

A Mixed Integer Linear Program (MILP) maximizes or minimizes a linear objectivefunction under linear constraints, but, as opposed to Linear Programs, some or all ofthe decision variables do not have to be in a continuous space but may be restricted

15

to integral values. More formally, for a given c ∈ Rn, A ∈ Rm×n, b ∈ Rm, and p ∈{0, · · · , n} an MILP is defined as:

maxx{cT x |Ax ≤ b, x ∈ Zp ×Rn−p} (2.2)

The problem has been proven to be NP-hard, but there are several solvers, likeCPLEX5, GUROBI6 or Coin-OR7, that can solve MILPs quite efficiently up to a cer-tain size.

Even though MILPs bear a lot of potential, their use in cryptanalysis has been verylimited so far. In [35] Mouha et al. proposed a way to prove security bounds againstlinear and differential cryptanalysis by modelling the minimization of active S-Boxesas MILP. Also, MILP was used in [15] to further explore the potential of the invariantcoset attack on the PRINTcipher, originally introduced by Leander et al. [27]. In thiscontext, an MILP was formulated describing a 0-1 polyhedron comprising verticesthat identified classes of weak keys each.

In algebraic cryptanalysis MILPs have been considered in two ways so far. In [10]Borghoff et al. used MILPs to solve the polynomial system for the stream cipherBIVIUM in a similar fashion as SAT solvers are used, i.e. converting the polynomialsystem to an equivalent MILP and solving it with an MILP solver. The reported resultsare, however, significantly slower than the ones for employing SAT solvers. On theother hand, Oren et al. used pseudo-Boolean optimization, a certain kind of MILP, tointroduce error tolerance to algebraic side-channel attacks [38], which was originallyproposed in [39].

In contrast, we employ MILP to optimize guessing strategies with regards to themaximal information propagation.

2.6 Tools and Hardware

In the context of this work most implementations, like equation generators, wererealized using the open source mathematics software system SAGE [43]. The poly-nomial systems were converted to SAT instances using PolyBoRi CNF converter im-plemented by Michael Brickenstein [13]. Solving the SAT instances was done usingMate Soos’ CryptoMiniSat [42]. When applying Mixed Integer Linear Programming,as outlined in the next chapter, the modeling and solving was carried out using theSAGE interface to IBM ILOG CPLEX V12.1, which was run under the academic li-cense. Finally, for the application of the Double Description Method, also outlined inthe next chapter, Fukuda’s implementation cddlib8 was used, also through the SAGEinterface.

All results regarding timings stem from experiments run on a test server with 16virtual processors9 of type Quad-Core AMD Opteron(tm) Processor 8356 operating ata frequency of 2.3 GHz. The server has a total of 128s GB of RAM, but only fractions5 http://www-01.ibm.com/software/integration/optimization/cplex-optimizer6 http://www.gurobi.com7 http://www.coin-or.org8 http://www.ifor.math.ethz.ch/~fukuda/cdd_home/index.html9 We state the number of processors purely for completeness – multithreading was not actually used

for the experiments that involved timings.

16

of this were used in the experiments. The server operates using a 64-bit linux kernelin version 2.6.32.

17

3 Optimization of Guessing Strategiesusing Mixed Integer LinearProgramming

In this chapter we introduce a novel technique to optimize guessing strategies in al-gebraic cryptanalysis employing Mixed Integer Linear Programming. To this end,Section 3.1 briefly introduces the concept of guessing in algebraic cryptanalysis.Subsequently, Section 3.2 illustrates how MILP can be utilized to optimize guess-ing strategies.

3.1 Guessing Strategies in Algebraic Cryptanalysis

As outlined in the previous chapter, a drawback of algebraic cryptanalysis is the lackof efficiency. Like exhaustively searching the whole search space for a targeted secret,i.e. a brute-force attack, solving systems of algebraic equations over GF(2) has anexponential time complexity in general. Even the corresponding decision problem –to decide if a given set of algebraic equations has a solution – is known to be NP-complete. Consequentially, analyzing primitives that yield a fairly large polynomialsystem is often practically infeasible.

3.1.1 Motivation

Despite the problems stemming from the lack of efficiency of algebraic cryptanaly-sis, results about many primitives can often still be obtained by employing guessingstrategies. For this, a certain number of variables of the polynomial system is fixed toeither the correct values or an incorrect guess and thus yielding a practically solvablesystem, as for example in [14]. To explore the implications of results obtained by fol-lowing a guessing strategy, consider a primitive with a secret of length n. We denotethe time required to find a solution of the corresponding polynomial system as tn

true.To denote the time required to prove that such a system does not have a solution,we use tn

false. Finally, we will denote the time needed to execute the application ofthe primitive by teval. Typically, results from guessing strategies involve one of twoscenarios. In the first scenario we assume part of the secret is revealed to the adver-sary by some oracle. For a good cryptographic primitive there should not be a way torecover the rest of the secret faster than by brute-force. This means, if we can showthat for a primitive with a secret of length n in bits and a set of k revealed secret bits,that

tn−ktrue < 2n−k−1 · teval (3.1)

we have demonstrated a weakness in the primitive. Practical examples of this sce-nario are side-channel attacks, where additional information is available to the ad-versary due to the exploitation of physical properties of an implementation.

18

In the second scenario guessing bits are used to estimate the complexity of analgebraic attack without assuming any knowledge of the secret. Any algorithm thatrecovers the secret faster than a brute-force attack is considered to be an attack, evenif it is practically infeasible, in which case it is of theoretical nature. So, assume thatwe can show, for example by sufficiently sampling tn−k

false and tn−ktrue , that

2k−1 · tn−kfalse + tn−k

true < 2n−1 · teval (3.2)

tn−kfalse +

tn−ktrue

2k−1< 2n−k · teval (3.3)

holds for a given primitive. Note, that the left hand side of Inequality 3.2 is theexpected time needed to find a solution to the polynomial system by repeatedlyguessing k out of the n secret bits and solving the reduced system. It follows thatin this case we have found a theoretical attack for the given primitive employingalgebraic techniques, even though we might not be able to estimate tn

true.The distinction of these two scenarios is necessary in our case due to the fact that

tntrue and tn

false can differ significantly when employing SAT solvers. If a system has asolution, the SAT solver will terminate when the solution is found. On average thiswill be the case when half the search space has been explored. If, however, a systemdoes not have a solution, one of two cases might occur. Either the solver is able tofind a contradiction in the system and the search will terminate very quickly, or thesolver has to explore the whole search space to prove that there is no solution.

3.1.2 Information Propagation

Many cryptographic schemes, for example PRESENT-like block ciphers, are con-structed from iterative rounds, for example during the encryption and/or the keyschedule, where the new state is computed with certain operations from the oldstate. It follows, that some information about a state can be deducted when informa-tion about the previous state is known. In algebraic cryptanalysis the states usuallycorrespond to certain variables in the polynomial system, so information about somevariables can propagate to other variables. For example, consider Figure 3.1 whichdepicts one round of the key schedule in EPCBC-48 (neglecting the constant addi-tion, which is irrelevant here for reasons that will become apparent later). The figureshows a set of S-Boxes at the top and at the bottom, and in between the permutationlayer is illustrated. The old state is the input of the S-Boxes at the top and the newstate is the input of the S-Boxes at the bottom. If we know the input of one of theS-Boxes at the top, we obviously also know the output of this specific S-Box and thusfour of the bits of the new state. On the other hand, if we had guessed the four bitsat random positions at four different S-Boxes of the input state, we could not havededucted any information about the new state. It follows, that revealing the valuesof different bits yields different information gain. The same holds for guesses: whenguessing bits at the input of the round this might impose values for bits in the newstate and thus decrease the number of unknowns in the system. Since the differ-ence between actually known bits, i.e. bits revealed to the adversary, and guessedbits is irrelevant for information propagation, we will use these terms in this workinterchangeably in the context of information propagation.

19

Figure 3.1.: One round of the key schedule of EPCBC-48

The difference in the number of inferable bits is even magnified when consideringmore rounds and a larger number of known or guessed bits. So, guessing k vari-ables of a polynomial system can reduce the size of the system by far more than onlyk variables. Our experiments, outlined in the next chapters, show that solvers canbenefit from this propagated information – the more information inferable from theguesses the shorter the solving times. Although we do not claim that the amount ofpropagated information is the only factor that influences the solving times for differ-ent guessing strategies, our experiments suggest that it is definitely an important one.Consequentially, our goal is to find guessing strategies that maximize the informationgain in order to optimize solving times for the reduced polynomial systems.

3.2 Guessing Strategy Optimization as MILP

In this section we introduce our models of the information propagation problem asMILP that can be used to tackle the problems employing corresponding off-the-shelfsolvers. In this work PRESENT-like primitives are considered. As outlined in Section2.4, those primitives exhibit a structural similarity. The reappearing element is theround based structure which interleaves substitution, permutation, and key or con-stant addition layer. This structure can be depicted as a network of S-Boxes, similarto Figure 3.1 with multiple rounds. For the purpose of maximizing information prop-agation, the constant addition layer can be neglected, because constants are usuallypublicly known and thus do not constrain information flow in the network. Also, insome cases, as for example in EPCBC, the key addition can be circumvented as well(cf. Chapter 5). Thus, in this work, we neglect both kinds of addition layers. Ourgoal in this section is to model such a reduced PRESENT-like network and the corre-sponding information propagation maximization problem as Mixed Integer Program(MILP).

3.2.1 Simple Propagation Model

In this section we introduce a simple model in the sense that we assume that theoutput bits of a certain S-Box can only be learned if all of its input bits are known.For this, let us assume a network consisting of n rounds, a state size of b bits, 4-bitS-Boxes, and a permutation P : {0, · · · , b − 1} → {0, · · · , b − 1} describing how thebits are permuted during the permutation layer, i.e. the j-th bit is moved to position

20

P( j). For the model we introduce two boolean decision variables for every bit of thestate in each round with the semantics shown in Table 3.1.

Variable Semantics

x i, j x i, j = 1 ⇐⇒ The j-th bit is known after round i.gi, j gi, j = 1 ⇐⇒ The j-th bit after round i is guessed.

Table 3.1.: Semantics of new variables

The objective function is now straight-forward:

maxn∑

i=0

b−1∑

j=0

x i, j (3.4)

Similarily, we can easily limit the number of bits we want to guess to an arbitraryinteger k by introducing the constraint:

n∑

i=0

b−1∑

j=0

gi, j ≤ k (3.5)

Finally, we have to translate the semantics in Table 3.1 into our model. Note,that a bit is known, if it is either guessed or learned through propagation. For thisconsider an arbitrary S-Box in round i and let x i, j0, x i, j1, x i, j2, x i, j3 be the variablescorresponding to the input bits of this S-Box. Note, that the variables correspondingto the output bits of the S-Box are x i+1,P( j0), x i+1,P( j1), x i+1,P( j2), x i+1,P( j3). This isillustrated in Figure 3.2. To model the propagation of information through this S-Box

Figure 3.2.: Example of state variables

while allowing to guess bits of the output, we include the following set of constraints:

x i+1,P( jt ) ≤ x i, js + gi+1,P( jt ) for all t, s ∈ {0, · · · , 3} (3.6)

This set of constraints ensures that an output bit of the S-Box is only known, i.e.x i+1,P( jt ) = 1, if it is guessed or all corresponding input bits are known. Including

21

this set of constraints for every S-Box in every round models the information flowfor the whole network. Note, that the bits before the first round cannot be learnedthrough information propagation, so the constraints for x0, j reduce to:

x0, j ≤ g0, j for all j ∈ {0, · · · , b− 1} (3.7)

This concludes the model for the information propagation maximization problem,which can now be fed into a MILP solver. However, in many scenarios other con-straints might be desirable and can easily be included. For example, in many casesit can make sense to restrict guesses to the input state. In this case one could simplysplit the Constraint 3.5 into the constraints:

b−1∑

j=0

g0, j ≤ k

n∑

i=1

b−1∑

j=0

gi, j = 0

(3.8)

3.2.2 S-Box adjusted Propagation Model

In the previous subsection only known and unknown bits were distinguished, buttheir specific values were disregarded. In this section, we want to take them intoaccount. However, the variables in our model will still represent the same semantics(cf. Table 3.1), not their specific values. We will adjust the constraints to account fortheir values.

Before, S-Boxes were treated as simple black boxes, which swallow all informationunless the full input is known. This is, however, not always true. For many S-Boxessome information about the output can be inferred even if the input is only partiallyknown. For example, if the second, third, and fourth bit of the input of the S-Boxused in EPCBC are known or assumed to have the value 0, as depicted in Figure 3.3,the second and third bit of the output must have the values 0 and 1, respectively.For many S-Boxes there are many similar relations, which can usually be found veryeasily by brute-force since S-Boxes usually have a very small size, especially in light-weight cryptography. Taking them into account can be very advantageous in weakkey and/or chosen plaintext scenarios, as demonstrated in Section 5.2.3.

To allow for more flexible information flow, we need to replace the Constraints 3.6.For brevity we restrict the guessed bits to the first round. Extending this approachto guesses anywhere remains open for future work. Again, consider an arbitrary S-Box in round i with the input variables x i, j0, x i, j1, x i, j2, x i, j3 and output variablesx i+1,P( j0), x i+1,P( j1), x i+1,P( j2), x i+1,P( j3) (cf. Figure 3.2). The concatenation of thesevariables can be seen as an 8-dimensional binary vector and the Constraints 3.6 de-scribe a 0/1-polytope in 8-dimensional space that contains all points that representa valid information flow through an S-Box. For example, this polytope contains thepoints (1,1, 1,1, 1,1, 1,1) and (1,0, 1,0, 0,0, 0,0), which represent the informationflow with fully known input propagated to the output and partial input that is not

22

0 00

0 1

*

* *

Figure 3.3.: Example of EPCBC S-Box relation

propagated, respectively. The polytope does not contain the point (0, 1,1, 1,0, 1,1, 0),as would be desired for the example of the EPCBC S-Box above. To remedy this we canconstruct the polytope using its vertex representation, i.e. we construct the polytopeas convex hull of the set of points that all describe a valid information flow. Note, thatthis polytope will not contain any other points in {0,1}8, because all these points arevertices of an 8-dimensional cube and the convex hull of any subset of the verticesof this cube will not contain an integral point previously not in the subset. This canbe proven by observing that for pairwise different v i ∈ {0,1}n and an integral pointv =

∑

i λiv i with 0 ≤ λi ≤ 1 and∑

i λi = 1 in the convex hull of the v i, the weightsλi must be integral as well. This is true, because for every component v ( j) of v itholds that

v ( j) =∑

i∈Vj

λi (3.9)

where Vj = {i|v( j)i = 1}. Since the v i are pairwise different, this means that the λi

must be integral for v to be integral.Subsequently, the vertex representation can be converted into a set of equations

and inequalities describing the same polytope using the Double Description Method[34]. Including this set of constraints into the MILP instead of the Constraints 3.6 forevery S-Box yields an MILP that models the information flow for a specific S-Box.

We are aware that this method neglects the fact that only certain values for partiallyknown inputs of an S-Box actually yield information about certain output bits. Thiscan result in an invalid information propagation being computed if several partialinputs are used in subsequent rounds. A conceivable example using aforementionedEPCBC S-Box relation is depicted in Figure 3.4. Here the partial input is used to inferthat the second and third output bit of an S-Box are 0 and 1. Furthermore, thethird output bit is part of partially known input of an S-Box in the subsequent rounditself, but assumed to be 0 in order to allow for further information to be gained.This yields a conflict and thus the information propagation is invalid. This is due tothe MILP model using boolean decision variables and only distinguishing known andunknown bits, but not their values. Extending the model to propagate inferred valuescorrectly remains an open problem. In this work, the conflicts were tackled in one oftwo ways. Either we tried to manually rearrange guesses without losing informationpropagation, or, if this failed, we ignored the propagation loss due to not meeting the

23

condition of the partial input at the second S-Box and thus rather approximated theinformation propagation maximization problem.

0 0 0

0 00

0 1

0 1

Figure 3.4.: Example of conflict in EPCBC network

24

4 SPONGENTIn this chapter we describe our algebraic cryptanalysis of the lightweight hash func-tion SPONGENT [8]. In the first section we give a brief description of the SPONGENT fam-ily, before we present our attack and the results of our experiments.

4.1 Description

The SPONGENT family is a family of hash functions designed using the hermetic spongeconstruction. The parameters b, r, n, and c differ for each instantiation of the family,but they all employ a very similar permutation, which we will denote by πb fol-lowing the terminology of [8]. In this work we focus on the lightest members ofthe SPONGENT family – SPONGENT-88 and SPONGENT-128. Table 4.1 lists the valuesfor the four numeric parameters. To accomodate for the requirement that the inputmust be divisible by r and is not allowed to end with the all zero block, a single 1 bitis appended and then padded with zeros to the length of the next number divisibleby r. However, in this work we will neglect the padding, since it does not have alarge influence on the results of algebraic cryptanalysis as it only introduces a set ofknown values to the polynomial system. The initial value of the state (IV) is the allzero state.

Hash function n b c r

SPONGENT-88 88 88 80 8SPONGENT-128 128 136 128 8

Table 4.1.: Parameters for SPONGENT-88 and SPONGENT-128

The b-bit permutation πb is inspired by the PRESENT block cipher and, again follow-ing the terminology of [8], can be described as in Algorithm 1 with b-bit input STATE.R is the round number and is listed for both versions in Table 4.2. Subprocedures areexplained below.

Algorithm 1 SPONGENT permutation πb

for i = 1→ R doSTATE← lCounter b(i)⊕ STATE⊕ lCounterb(i)STATE← sBoxLayerb(STATE)STATE← pLayerb(STATE)

end for

lCounterbThis function takes an integer i and returns the value of an LFSR after i−1 clocks.

The size, the defining primitive polynomial and its inital value (not to be confusedwith the IV of the sponge state) of the LFSR are listed in Table 4.2. For further

25

details, we refer to [8]. The first line of Algorithm 1 denotes that the value returnedby lCounterb is added to the rightmost bits of the state, reversed, and then added tothe leftmost bits. This step can be seen to constitute a constant addition layer.

Permutation R LFSR Size LFSR Primitive Polynomial LFSR IV (hex)

π88 45 6 x6+ x5+ 1 05π136 70 7 x7+ x + 1 7A

Table 4.2.: Parameters for πb in SPONGENT-88 and SPONGENT-128

sBoxLayerb

To execute this function the state is split into 4-bit nibbles and fed into b4

S-Boxesin parallel. The definition of the S-Box is listed in Table 4.3. The results are simplyconcatenated and returned. This layer constitutes the non-linear part of πb.

input 0 1 2 3 4 5 6 7 8 9 A B C D E F

output E D B 0 2 1 4 F 7 A 8 5 9 C 3 6

Table 4.3.: SPONGENT S-Box (hex)

pLayerb

This function simply permutes the bits of the state. For this the j-th bit is moved tobit Pb( j), where

Pb( j) =

(

j · b4

mod b− 1 if 0≤ j ≤ b− 2

b− 1 if j = b− 1

In this work, we will call Pb a bit permutation, which is not to be confused with the(b-bit) permutation πb.

As stated in Section 2.3, the security provided by a sponge depends on the sizeof its inner state. It follows that due to the size of its inner state, SPONGENT-88aims at 80 bit security against preimage attacks and 40 bit security against collisionattacks. Of course, the latter is not practically relevant, since 40 bit security cannotbe considered secure. This hash function is designed for purposes that only requirepreimage resistance. On the other hand, SPONGENT-128 is designed to provide 120bit security against preimage attacks and 64 bit security against collision attacks.

4.2 Analysis

To our best knowledge, SPONGENT has not been analyzed using algebraic techniquesso far. Furthermore, no cryptanalysis has been carried out except by the designersthemselves in [8]. In the following subsections we present the results of our analysis.

26

4.2.1 Algebraic Representation

To relate the inputs and outputs by polynomials in the boolean polynomials ringGF(2), variables were introduced to represent the bits of the input blocks, the outputblocks, and the state as it evolves during the repeated applications of πb. Also,variables were introduced that represent the bits of the state during each application,i.e. the state after each round, of πb. So, when processing a message m with the hashfunction SPONGENT-n with R-round permutation πb, the number of variables nv of thesystem is given by

nv = |m|+ n+� |m|+ n

r− 1�

· (R+ 1) · b (4.1)

where |m| denotes the bit length of m, which we require to be divisible by r forbrevity.

The number of equations ne in our polynomial system can be computed in a similarfashion. For every block during the absorbing phase we include the nπ equations nec-essary to relate the variables during the application of πb and the b linear equationsto relate the output state of the last phase to the input state of the current phase,also relating the input variables of the block that is being absorbed. We are awarethat these linear equations could be circumvented easily, but since the number ofequations is rather dominated by nπ and the contribution of linear equations to thehardness of the resulting problem is negligible, we keep them for the sake of clarity.Analogously, for the squeezing phase we include the nπ equations for every applica-tion of πb and a set of linear equations relating the state variables and the outputvariables. Overall, the number of equations is

ne =|m|r· (b+ nπ) +

�n

r− 1�

· (r + b+ nπ) + r (4.2)

where nπ = R · b4· nsbox, since in every round we have b

4S-Boxes, for which we need

nsbox equations to relate the respective state variables. Here, the linear equationsaccomodating the constant addition and bit permutation are substituted into theS-Box equations.

Since the S-Box operation is the only non-linear operation, the crucial part of theequation generation is the way it is represented. This can be done by 21 polynomialequations of degree 2 in the 4 inputs and 4 outputs, as generated by the SAGE modulesage.crypto.mq.sbox and listed in the appendix in Section A.1. These can either beused directly in the polynomial system or processed further. Employing Gröbner basistechniques, we generated 4 explicit equations describing for each output variable yiits dependence on the input variables x i, i.e. yi = fi(x0, x1, x2, x3). This results inthe following equations:

y0 = x0+ x1 · x2+ x1+ x3

y1 = x0 · x3+ x0+ x1 · x2 · x3+ x1 · x2+ x1 · x3+ x2 · x3+ 1

y2 = x0 · x3+ x1 · x2 · x3+ x1+ x2+ 1

y3 = x0 · x1 · x3+ x0 · x1+ x0 · x2 · x3+ x0 · x3+ x1 · x3+ x2+ x3+ 1

27

The resulting polynomials have a higher degree than the original ones (degree 3),but the description is much more compact. Similarily, we generated the inverseexplicit equations describing the dependence of every input bit on the output bits.This results in a similar description:

x0 = y0 · y1 · y3+ y0 · y2 · y3+ y0 · y2+ y1+ y2+ y3+ 1

x1 = y0 · y1 · y2+ y0 · y2+ y0+ y1 · y2+ y1+ y2 · y3+ 1

x2 = y0 · y1 · y2+ y0 · y1+ y0+ y1 · y2+ y1 · y3+ y1+ y2

x3 = y0 · y1+ y0 · y2+ y1 · y2+ y3

We carried out experiments using all three representations.

4.2.2 Preimage Attack

Our first approach was a simple preimage attack following the philosophy of [33]. Wegenerated the equations over GF(2) for SPONGENT-88 relating the input to the outputfor round-reduced versions. The output length of SPONGENT-88 is 88 implying thatproducing the full output from the state after absorbing the full input requires 11applications of πb. This already yields a very large polynomial system, as can be seenusing the Equations 4.1 and 4.2, so we decided to assume a theoretical scenario withan unpadded one block message. Under this assumption a brute-force attack has anexpected complexity of only 2r−1, i.e. in the case of SPONGENT-88 it is 27. Still, wedeemed this assumption as reasonable for preliminary testing.

During our experiments we ran into the problem that the resulting SAT instancesexhibit a very large variance in hardness to solve. For every S-Box representation justdescribed we successively increased the number of rounds of πb and each time ran 50experiments with a time limit of 6000s. Table 4.4 shows the results for the maximumnumber of rounds R with a success rate of 100% for each S-Box representation.

S-Box repr. max R avg solving time

original 6 14.2sexplicit 12 94.6sinverse expl. 8 644.3s

Table 4.4.: Results for preimage attack on SPONGENT-88

We want to point out that these were just preliminary tests and further testingwould be necessary to obtain any specific results, including tests with lower successrates, guessing, and so forth. But considering the fact that a brute-force attack hassuch a low complexity in this scenario, which, on the other hand, in many casesalready poses significant problems for the algebraic attack, we concluded that such ageneral preimage attack scenario is not suitable for attacking SPONGENT algebraically.SPONGENT seems to be very resistant against this kind of attack.

However, we did conclude from these preliminary tests that the explicit representa-tion for the S-Box operation is superior to the other representations, which coincideswith the results found in [14]. As a result, we focused on this representation in thefollowing experiments.

28

4.2.3 Semi-free-start Collisions

In their preliminary security analysis in [8] the authors point out that a collisioncan be constructed by finding a differential message block ∆mi that can be canceledout with another differential message block ∆mi+1 in the next step of the absorbingphase. Note, that the message blocks only influence the outer state, i.e. the r bits thatthe message blocks are added to, but the adversary has full control over this outerstate. Finding such a pair of differential blocks involves finding an inner state s andtwo outer states o(1) and o(2), such that πb(s||o(1)) and πb(s||o(2)) collide in the innerstate. We use the term semi-free-start collision, because this corresponds to findingan inner collision – in case of sponge hash functions the equivalence of finding acollision – with full control over the IV.

In [8] Bogdanov et al. used techniques from differential cryptanalysis and computepropabilities for the success of these techniques for different round-reduced versionsof the members of the SPONGENT family. Furthermore, they applied the Dedicated Re-bound Attack [31] to the 6-round version of SPONGENT-88 and, again, computed thepropability of success. In contrast, we tried to find actual semi-free-start collisions,not only the propability of their existence. The following two subsections summerizeour results.

Remark 4.2.1. Bogdanov et al. used the Dedicated Rebound Attack to contruct a differ-ential path for πb that does not involve differences in the outer state of the input and theoutput state. Actually, the differential path they suggest does involve differences in threebits of the inner state of the output, which the authors do not comment on any furtherin that work. They also state that from the probabilities they calculated they expect atleast one solution to exist. We generated the polynomial system corresponding to thatdifferential path, converted it to CNF and fed it into the SAT solver. The solver needed149.79s to prove that this instance does not have a solution. Thus, the differential pathsuggested in [8] does not yield a semi-free-start collision.

Standard approach

Our approach to finding semi-free-start collisions consisted of encoding the aboveconditions as polynomial system. This can be done by constructing the polynomialsystem for πb twice using the same variables for the inner state at the input and theoutput of πb. Furthermore, to circumvent the trivial solution, we encoded that thereis a difference in the first r bits of the two inputs. For this, let o(1)i and o(2)i be thevariables corresponding to the bits that need to include a difference, i.e. we requireo(1)i 6= o(2)i for at least one i. For each i ∈ {0, · · · , r − 1} we introduced a variable zi

and included the equation zi = o(1)i +o(2)i +1, thus zi = 0 iff o(1)i and o(2)i are different.Furthermore, we require at least one zi to be 0 by including the equation

r−1∏

i=0

zi = 0 (4.3)

We employed this approach to find semi-free-start collisions for SPONGENT-88 andSPONGENT-128 with successively increasing round numbers, starting with R= 1. The

29

results for 1 ≤ R ≤ 5 are shown in Table 4.5. With this standard approach wefound solutions for the rather trivial cases R ∈ {1, 2} for SPONGENT-88 and R = 1for SPONGENT-128. Furthermore, we were able to show that for both versions ofSPONGENT there are no semi-free-start collisions for all other R ≤ 5. For larger R thisapproach was practically infeasible with the available resources.

R time in s solution

1 0.01 yes2 0.02 yes3 0.12 no4 2.49 no5 354.72 no

R time in s solution

1 0.02 yes2 0.04 no3 0.11 no4 1.11 no5 5.92 no

Table 4.5.: Results for semi-free-start collisions for SPONGENT-88 (left) andSPONGENT-128 (right)

It is noteworthy that for R= 5 the solver proved SPONGENT-128 having no solutionmuch faster than for SPONGENT-88, even though the former is a larger system andintuitively should take longer. However, the structure of the system corresponding toSPONGENT-128 could have led the solver to conclicts much sooner than the one forSPONGENT-88 and thus allow for more efficient branch cutting, so this is a plausibleresult.

Guessing

In order to explore semi-free-start collisions for the case R = 6 we used guessingstrategies. The idea was to either enumerate all possible guesses for a set of variablesand thereby search the whole space for a possible solution, or, if this proves to beinfeasible, to at least find strategies for which the computation can be estimated tobe less than exhaustively brute-forcing all possible values for o(1), o(2), and s.

Intuitively, there are two approaches one can follow when guessing. Either, one cantry guessing in the outer state of the inputs, i.e. guessing values of o(i), to produceearly conflicts and thus rule out incorrect guesses as early as possible. Or, one cantry to guess in the inner state of the input, i.e. guessing values of s, since propagationposes restrictions on subsequent states for both subsystems simultaniously. Also, thelatter approach allows more degrees of freedom, since for the former strategy themaximal number of variables one can guess is 2r, while the latter allows to guess upto b− r variables. Of course, both approaches can be combined.

We applied both approaches to SPONGENT-88. First, we decided to guess 16 bits ofthe inner state. Naturally, the question arises which values should be guessed. This iswhere we applied the techniques outlined in Chapter 3. We modeled the informationpropagation for π88 using the simple propagation model (cf. Section 3.2.1), sincethe inner state is not in control of the attacker and thus he cannot leverage on betterpropagation of certain values. Additionally, we restricted guesses to the variables thatboth subsystems have in common. It turned out that the MILP for R > 4 is infeasibleto solve with the resources available for this work, so we solved the MILP for R = 4.However, the results show that the information is only propagated to the end of round

30

2, so limiting R to 4 does not diminish optimality in this case. The result is illustratedin Figure 4.1. The rows of rectangles represent S-Boxes and their inputs are the statesof the respective rounds. The top row corresponds to the first substitution layer andits input is the input of π88. Bits that are known are represented as vertical lines inthe respective S-Box. The lines between the substitution layer of successive roundsrepresent the propagation of known bits. So, an optimal way of guessing 16 bits interms of maximizing information propagation is to guess the bits in the input of thefirst round as depicted. This results in 40 known bits overall. Note, that we did notlimit guessing to the input of the first round. Still, in the optimal solution found bythe MILP solver, only bits in the input are guessed.

Figure 4.1.: Guessing 16 bits of π88

To test if this indeed yields a good strategy, we compared this strategy with guess-ing the inputs of four random S-Boxes in the first round. For the experiments, wechose the polynomial system from Remark 4.2.1, since it is not too hard to run theexperiments multiple times in practice, but also not too easy to not be able to drawany conclusions. To achieve meaningful results the SAT solver was randomized, i.e.CryptoMiniSat was invoked using the randomize option with a random number, sincewe are repeatedly solving very similar systems. For each strategy, the maximizing andfour random strategies, the solver was run 50 times with random values as guesses.The results are listed in Table 4.6, where each random strategy is denoted by the setof S-Boxes for which the inputs were guessed. The results show that the maximiz-ing strategy not only has a much lower average solving time for the system, but the

31

solving times also exhibit a much lower standard deviation. The experiments suggestthat maximizing the information propagation indeed has a positive influence on theaverage solving times.

Strategy max [3,9,10,15] [3,6,17,19] [2,9,11,19] [4,15,17,20]

avg in s 35.1 228.6 376.3 1240.1 1743.2stdev in s 104.7 670.6 1369.0 1561.4 3201.4

Table 4.6.: Comparison of solving times for maximal and random guessing strategies

Consequentially, we started experimenting with guessing according to the strategyabove. We started out with the first guess being all zero. After about 30h the solverproduced a solution, which can be found in the first row of Table 4.7. In the table,we denote the inner state of π88(s||o(i)) by s′ and its outer states by o(i)

′.

For guessing in the outer state, we fixed values for all o(1)i . Our first experiment,where we set o(1)i = 0, took a bit longer than 5h and produced a solution which islisted in the second line of Table 4.7.

s o(1) o(2) s′ o(1)′

o(2)′

a219243e0f7470bd0000 a7 e1 a5aac066208f7a9fbbd1 30 ca

07556f6db9e21c9b396a 00 07 4bdc1b6911dc2a8d2d7e 7b f7

Table 4.7.: Semi-free-start collisions for π88 with R= 6 (hex)

In both experiments we found solutions with our first guess. This means either, wewere incredibly lucky with our guesses or, more likely, there are many solutions tothe system.

However, this did not hold when we applied a similar approach to SPONGENT-128.We also started the experiments with guessing 16 bits in the inner state of the inputto π136. The first 450 guesses did not yield a solution. On average, solving the systemfor each of these guesses took the solver 1549s. If we had to exhaustively explorethe whole search space, which is the case if there is no solution, extrapolating thesamples would yield an estimated runtime of over 3 years for the whole search. Whilethis is theoretically feasible, it is out of scope of this work.

Analogously to SPONGENT-88, we tried guessing bits in the outer state of the inputsto π136. Again, we successively guessed values for o(1). Although the guesses yieldedan average solving time of 13591s, there are only 256 guesses to explore and thusrunning the experiment for every possible guess yields an overall runtime of around40 days. We carried out this search and did not find a solution. This means that thereare no semi-free-start collisions for SPONGENT-128 with R= 6.

Any attempts to apply the techniques to R = 7 for any of the SPONGENT variantswere unsuccessful, since the increase of complexity of the algebraic attack was toolarge. With no guessing strategy – neither inner state, outer state, nor a hybridapproach – a practical search for semi-free-start collisions was found. In fact, allstrategies even yielded a larger estimated complexity than a brute-force attack. For

32

the latter, we assumed the same computing power that was used for the experimentsabove and an implementation that uses two processor cycles per round of πb (onefor the constant addition and one for the substitution layer).

This chapter demonstrated that we were able to find a semi-free-start collision forthe 6-round version of SPONGENT-88 and were able to prove that no such collisionsexist for the 6-round version of SPONGENT-128 with purely algebraic methods. How-ever, considering the infeasibility of the analysis on 7 rounds and the high numberof rounds in the specification, we conclude that SPONGENT is secure against algebraicattacks.

33

5 EPCBCIn this chapter we describe our attack on the lightweight block cipher EPCBC [47].Again, we give a short description of the primitive first before detailing our attackand the results.

5.1 Description

EPCBC is a lightweight block cipher proposed by Yap et al. in 2011 [47]. The de-sign of this cipher is mainly driven by two objectives. Firstly, being a lightweightcipher, EPCBC is supposed to provide a maximum of security with a very low hard-ware footprint. Secondly, the cipher is designed for a specific purpose, namely toprovide secure encryption for Electronic Product Codes (EPC). EPCs are codes witha bit length of 96 in their smallest variant. The authors of [47] observe that so farthere are no lightweight block ciphers with a suitable block size for EPCs. To remedythis they propose two ciphers, one with block size n = 48 and one with block sizen= 96. Both variants have a key length of 96 bits. We will denote them by EPCBC-48and EPCBC-96, respectively.

The cipher is, like the SPONGENT permutation πb, heavily inspired by PRESENT.Accordingly, the key schedule and the encryption itself exhibit very strong struc-tural similarities to PRESENT. This has the advantage that some security proofs forPRESENT directly carry to EPCBC.

5.1.1 Key schedule

Both versions of EPCBC run in 32 rounds with an additional key addition at the endresulting in 33 subkeys. The key schedules are outlined in Algorithm 2 and 3, largelyfollowing the terminology of [47]. Note, that for EPCBC-48 the key schedule is builtof 8 seperate 4-round PRESENT-like, very similar permutations, which have a blocksize of 48 and related initial values, and only differ in the constant addition. Thiswill be useful during our attack, where we will denote each of these permutations asa block.

sBoxLayerThe sBoxLayer of EPCBC and PRESENT are identical. For completeness, the defini-

tion of the 4-bit S-Box is given in Table 5.1.

input 0 1 2 3 4 5 6 7 8 9 A B C D E F

output C 5 6 B 9 0 A D 3 E F 8 4 7 1 2

Table 5.1.: EPCBC S-Box (hex)

34

Algorithm 2 EPCBC-48 key schedule(LKeystate,RKeystate) = 96-bit keySubkey[0]← LKeystatefor i = 0→ 7 do

temp← LKeystate⊕RKeystatefor j = 0→ 3 do

RKeystate← sBoxLayer(RKeystate)RKeystate← pLayer(RKeystate)RKeystate← RKeystate⊕ (4i+ j)Subkey[4i+ j+ 1]← RKeystate

end forLKeystate← RKeystateRKeystate← temp

end for

Algorithm 3 EPCBC-96 key scheduleKeystate= 96-bit keySubkey[0]← Keystatefor i = 0→ r − 1 do

Keystate← sBoxLayer(Keystate)Keystate← pLayer(Keystate)Keystate← Keystate⊕ iSubkey[i+ 1]← Keystate

end for

pLayerThe pLayer also strongly resembles the one of PRESENT (and, not surprisingly, also

the one of SPONGENT). Bit j is moved to position P( j) with

P( j) =

(

j · n4

mod n− 1 if 0≤ j ≤ n− 2

n− 1 if j = n− 1

5.1.2 Encryption

The encryption of the plaintexts is mainly borrowed from PRESENT and is describedby Algorithm 4 for both versions of EPCBC. The STATE is initialized with the plaintextand subsequently the round function is applied r times to it. Finally, the last subkeyis added, after which STATE contains the ciphertext. For both versions of EPCBC ris 32. The functions sBoxLayer and pLayer are identical to the ones of the keyschedule described in the previous section. Note, that the encryption process andthe key schedule exhibit a strong structural symmetry, which allows for significantinformation flow.

5.2 Analysis

Since EPCBC is a recently proposed block cipher, the analytic effort targeting thiscipher is very limited so far. As far as we know, the only cryptanalysis to date is

35

Algorithm 4 EPCBC Encryptionfor i = 0→ r − 1 do

STATE← STATE⊕ Subkey[i]STATE← sBoxLayer(STATE)STATE← pLayer(STATE)

end forSTATE← STATE⊕ Subkey[r]

the preliminary analysis in the proposal itself [47]. The authors do comment onalgebraic cryptanalysis but use a more or less rule-of-thumb estimation to argue thatEPCBC is secure against algebraic attacks. This estimation is partially justified, since itis mainly based on results stemming from analysis of PRESENT, but a more thoroughexamination is necessary, as we will show in the next section, to ensure that thecipher meets the claimed security properties.

5.2.1 Algebraic Representation

We generated the polynomial system for EPCBC-n for n ∈ {48,96} in a similar fash-ion as the one for SPONGENT in Chapter 4. We introduced variables for every bit inevery state during the encryption and the key schedule. Note, that the block size ofEPCBC-48 is half the size of the key size, so at least two plain-/ciphertext pairs needto be available in a known/chosen plaintext scenario to uniquely determine the key.Accordingly, we generated the system for two encryptions under the same key for thisversion. We will denote the number of plain-/ciphertext pairs as np, where usuallynp = 2 for EPCBC-48, and np = 1 for EPCBC-96. The number of variables nv is givenby

nv = np · r · n+ (r + 1) · n+�¡ r

4

¤

· 96�

(5.1)

The last summand (put in brackets) corresponds to variables of the key schedule inEPCBC-48 that are linearly dependent on other variables in the key schedule. Theycould easily be eliminated, but, again for the sake of clarity, we dispense with that.

The number of equations ne mainly depends on the number of S-Boxes involved,and the number of equations nsbox required to represent the S-Box operation. Thenumber of S-Boxes is r · b

4for the key schedule and just as many for each encryption.

The number of equations ne is then given by

ne = (np + 1) · (r ·n

4· nsbox) +

�

n+¡ r

4

¤

· 96�

(5.2)

where the term in brackets again accounts for the number of linear equations only inthe key schedule of EPCBC-48 that could easily be eliminated along with the corre-sponding variables.

36

As pointed out in [47], the S-Box operation can be represented as 21 quadraticequations, which are listed in the Appendix A in Section A.2. As for SPONGENT, weused Gröbner basis techniques to extract a compact explicit representation:

y0 = x0+ x1 · x2+ x2+ x3

y1 = x0 · x1 · x2+ x0 · x1 · x3+ x0 · x2 · x3+ x1 · x3+ x1+ x2 · x3+ x3

y2 = x0 · x1 · x3+ x0 · x1+ x0 · x2 · x3+ x0 · x3+ x1 · x3+ x2+ x3+ 1

y3 = x0 · x1 · x2+ x0 · x1 · x3+ x0 · x2 · x3+ x0+ x1 · x2+ x1+ x3+ 1

We also generated an inverse explicit representation:

x0 = y0+ y1 · y3+ y2+ 1

x1 = y0 · y1 · y2+ y0 · y1 · y3+ y0 · y2 · y3+ y0 · y2+ y0+ y1 · y3+ y1+ y2 · y3+ y3

x2 = y0 · y1 · y2+ y0 · y1 · y3+ y0 · y1+ y0 · y2 · y3+ y0 · y2+ y0 · y3+ y1 · y2

+ y1 · y3+ y3+ 1

x3 = y0 · y1 · y2+ y0 · y1+ y0 · y2 · y3+ y0+ y1+ y2+ y3

With these representations in mind, it is clear that nsbox could be as low as 4. So,if neglecting the terms corresponding to the linearly dependent variables and linearequations, for EPCBC-48 nv = 4656 and ne = 4608 are reasonable estimations, aswell as nv = 6240 and ne = 9216 for EPCBC-96.

Remark 5.2.1. In [47] the authors estimate nv ≈ 6240 and ne ≈ 16380 for onlyone encryption and key schedule of EPCBC-48, and nv ≈ 12480 and ne ≈ 32760 forEPCBC-96. This might be a valid estimation if restricting the degree of the equationsto 2, which can make sense for certain solving algorithms. However, when employingSAT solving, this is not true and, as we have shown, this is a gross overestimation,especially regarding the number of equations. The large size of the system is the mainargument of the authors to deem EPCBC secure against algebraic attacks.

5.2.2 Known Plaintext Attack

In this section we present our attacks on round reduced versions of EPCBC-n forn ∈ {48, 96} in a general known plaintext scenario.

Standard Approach

We generated the polynomial system for np encryptions of EPCBC-48 and EPCBC-96,respectively, as described above with successively increasing round numbers r start-ing with r = 1. In theory, substituting the values of the known plain-/ciphertext pairsinto the system and solving the system recovers the key. We evaluated the approachfor each version and r using the following experiment 100 times. First, we chose akey and np plaintexts at random and computed the corresponding ciphertexts. Then,we set the variables corresponding to the plain-/ciphertext pairs in the polynomial

37

system accordingly and attempted to solve the system using the SAT solver to re-cover the key. The average solving times and standard deviations are listed in Table5.2 for 1≤ r ≤ 3. The standard attack was practically infeasible for larger r with ourresources.

r 1 2 3

avg in s 0.0077 0.27 458.3stdev in s 0.004 0.013 866.4

r 1 2 3

avg in s 0.017 0.28 556.7stdev in s 0.005 0.014 2282.7

Table 5.2.: Standard algebraic known plaintext attack on EPCBC-48 (left) andEPCBC-96 (right)

Guessing

To obtain results for EPCBC for r > 3 we employed guessing strategies. The slightdifferences in the key schedule of EPCBC-48 and EPCBC-96 allow for different ap-proaches regarding the guessing strategies. We first outline our approach and theresults for EPCBC-48.

EPCBC-48Taking into account the structure of EPCBC-48, we identified two potantially suc-

cessful guessing strategies. We denote these strategies according to the number ofbits we guess for each strategy, 64 and 48 bits.64-bit strategy: Before outlining this approach, we would like to point out a few

important observations about the cipher. First, when considering the key schedule,the key is divided into a left and a right part. The left part is used as the first subkey,so when guessing a bit in the left part, this immediatly reveals inputs to the firstsubstitution layer in both encryptions, since the plaintexts are both known. Further-more, assume a set of bits is guessed in the left part of the key and information ispropagated to the next round. This information is lost due to the next key additionlayer, because we do not have any information about the second subkey, which iscomputed from the right part of the key. However, assume we are also guessing thesame set of bits in the right part of the key that we are guessing in the left part. Due tothe strong symmetry between encryption and key schedule this results in informationbeing propagated about the same set of bits in the key schedule and the encryptions.This means that the information is preserved through the key addition layer, sincenow for all bits of the state either both or no inputs are known to this layer. This istrue at least up to round 4. At this point the key schedule is “reinitialized” with thesum of the left and the right part (denoted by “temp” in the key schedule algorithm).Anyhow, in our experiments propagation did not extend further than to the end ofround 4. Moreover, if we assume that we have guessed the same set of bits in the leftand right part of the key, we also know the same set of bits of the value in temp andthus for the “initial” value of the next block in the key schedule. This results in thesame information propagation in the second block of the key schedule. Nonetheless,without guessing additional bits, no knowledge about new values of the encryptionitself is gained.

38

To put this in numbers, assume that guessing a set of bits of size k in the right partof the key results in an additional information gain of z bits through propagation (sothe overall information gain is z + k). By guessing the same bits in the left key, weachieve the same gain in both encryptions and, moreover, obtain information aboutz bits in the second block of the key schedule. So, with guessing 2k bits, we are ableto infer the values of 4z bits, the overall information gain in the whole system being4z + 2k bits. For this reason, we believe it makes sense to guess the same set of bitsin the left and right pair of the key.

In our analysis we guessed 32 bits in each part of the key, so 64 bits overall. Wemodeled the information propagation in one 4-round block of the key schedule usingthe simple propagation model as described in Chapter 3. We set the number of bits toguess to 32 and restricted the guesses to the input of the network. The result, yielding96 known values overall, is shown in Figure 5.1. In consequence, with guessingk = 32 bits, z = 64 additional values can be inferred. Accordingly, with guessing2k = 64 bits, we can reduce the number of unknown variables in the system by atleast 4z + 2k = 320. Note that this is a lower bound, since special relations of the S-Boxes can lead to more information being propagated for certain values (cf. Section3.2.2). In fact, for 100 random keys and plaintexts the average overall informationgain of guessing 64 bits according to this strategy was 352.81 bits.

Figure 5.1.: Guessing 32 bits of EPCBC-48

To obtain estimations for the complexity of our attack we ran similar experimentsas in the previous section. However, this time we ran two variations of the experi-ment. Firstly, we revealed the 64 bits at positions described above for each randomkey and plaintext to estimate t32

true (cf. Inequality 3.1). Secondly, we guessed themrandomly to estimate t32

false (cf. Inequality 3.3). Again we experimented with succes-sively increasing r starting with r = 5. Averaging over all 100 runs for each r yieldedour estimations for t32

true and t32false, respectively. Note, that with k = 64 the term in-

volving tn−ktrue in Inequation 3.3 is negligible. As a lower bound for teval we accounted

a processor cycle for each substitution and each constant/key addition layer. Sincea brute-force attacker would also have to test two plaintexts for each potential key,there are 2 · 2+ 2= 6 cycles per round. We assumed a processor speed equivalent tothe one used for our experiments (cf. Section 2.6). The results of our experiments are

39

shown in Figure 5.2, with the estimation of t32true (“est. t_true”) compared to 231 · teval

(“brute-force”) at the top, and the estimation of t32false (“est. t_false”) compared to

232 · teval at the bottom, for 5 ≤ r ≤ 8. The results show that up to r = 7, given 64

5 6 7 8

r

0

20

40

60

80

100

avg

solv

ing

tim

e in s

brute-force est. t_true

5 6 7 8

r

0

50

100

150

200

avg s

olv

ing t

ime in s

brute-force est. t_false

Figure 5.2.: Comparison of t32true (top) and t32

false (bottom) for EPCBC-48 to brute-forceattack

bits of the key, the missing 32 key bits can be recovered more efficiently with the al-gebraic attack than by brute-force, thus exhibiting a weakness of the round reducedcipher. Furthermore, the diagram on the right shows that for up to r = 7 also thekey recovery is faster when guessing 64 bits successively and solving the reducedsystem than using the brute-force attack. The speed-up factor in both cases for r = 7is around 24. Hence, we have found a theoretical attack on 7-round EPCBC-48. Thediagrams also show that this is not the case anymore for r = 8.

40

48-bit strategy: This approach exploits a rather obvious weakness in the key sched-ule of the cipher. Observe that the 4-block permutation is carried out on RKeystate toobtain the next four subkeys. While RKeystate is initialized with the right part of thekey, after each block of the key schedule RKeystate is updated using temp. The firstvalue temp takes on is the sum of the left part and the right part of the key, as pointedout in the previous section. In consequence, if the left and right parts of the key areknown to have a certain difference, the second block of the key schedule is initializedwith this known difference vector and we can easily compute the subkeys number 5 to8. Attacking 8 rounds of EPCBC-48 under the assumption that the difference vectorof the key is known should not be harder than attacking 4 rounds under the sameassumption. Thus, we can either assume an oracle to reveal the 48-bit differencevector to the adversary or try to guess the difference vector.

Naturally, a brute-force adversary can also take advantage of this property. If anadversay obtains a plain-/ciphertext pair encrypted under a key of which he knowsthe difference vector, he only has to brute-force at most 248 different keys. Note, thateven when attacking EPCBC-48 with 4 < r ≤ 8 the adversary only has to encrypt4 rounds for each potential key, since he can precompute the last subkeys and thuscompute the inner state of the encryption at the end of round 4 using the ciphertext.To sum up, assuming the knowledge of the difference vector, the expected time torecover the key for r ≤ 8 is: 247 ·min{r, 4}· tbf r

, where tbf rdenotes the runtime of one

round of the cipher. Again, we accounted one processor cycle for each substitutionand each addition layer and assumed the same processor speed as for our experi-ments (cf. Section 2.6). With these assumptions a brute-force attack recovering theremaining 48 bits when the difference vector is revealed takes almost 17 days for4 ≤ r ≤ 8, providing the upper bound for t48

true. The upper bound for t48false is twice as

large.

We tested this approach assuming knowledge of the difference vector and thenguessing the difference vector. The results are listed in Table 5.3 and 5.4, respectively.As suspected, the hardness of the problem does not increase with increasing r for4 ≤ r ≤ 8 and the 8-round version of EPCBC-48 exhibits weaknesses here. Withour estimations of t48

true we achieve a speed-up factor of more than 212 in case thedifference vector is known and more than 210 in case it is guessed, compared tothe brute-force adversary. However, the complexity of the algebraic attack increasessteeply for r = 9. This is not surprising, since the internal state of the encrpytionscannot be computed because the last subkey is not known (or guessed), even thoughfour intermediate subkeys are still known.

r 4 5 6 7 8

avg in s 175.0 156.4 283.0 164.7 142.1stdev in s 291.3 247.3 605.1 250.0 250.5

Table 5.3.: Algebraic known plaintext attack on EPCBC-48 with known differencevector

41

r 4 5 6 7 8

avg in s 1133.9 817.8 826.9 1296.3 1236.7stdev in s 1630.4 1051.8 1318.8 2699.9 1850.7

Table 5.4.: Algebraic known plaintext attack on EPCBC-48 with guessed differencevector

Remark 5.2.2. This approach suggests a generalization to guessing the initial value ofthe last block of the key schedule for arbitrary r. Due to this structural weakness of thekey schedule, we can reduce breaking the r-round cipher faster than a brute-force attackon the 96-bit key to breaking the (r − 4)-round cipher faster than a brute-force attackon 48 bits of the key.

EPCBC-96For EPCBC-96 the 48-bit strategy is not applicable, so we followed a similar ap-

proach to the 64-bit strategy. Note, that in EPCBC-96 the key schedule is completelysymmetric to the encryption. This means, when guessing bits only in the first state ofthe key schedule, i.e. in the key itself, every propagated bit corresponds to a knownbit in the encryption. That is why we modeled the information flow of the key sched-ule as described in Section 3.2.1 using the simple propagation model, restricted theguesses to the first state, and solved the system for 4 rounds and k = 64 guesses. Theresult is depicted in Figure 5.3. By guessing according to this strategy we were ableto infer at least z = 160 additional bits. Accounting for the bits propagated in theencryption, this sums up to reducing the polynomial system by at least 384 variables.On average, for 100 random plaintexts and keys the system was reduced by 389.54variables.

42

Figu

re5.3.:G

uess

ing

64bi

tsofEPCBC-96

43

The estimations of t32true and t32

false stemming from averaging the results of 100 runseach can be found in Table 5.5. These show that the algebraic attacks using thisguessing strategy (or having the corresponding bits revealed) is significantly fasterthan the brute-force attack up to r = 5. For r ≥ 6 this does not hold anymore.

r 4 5 6

231 · teval in s 14.9 18.7 22, 4avg in s 0.12 0.39 1687.5stdev in s 0.006 0.023 2941.1

r 4 5 6

232 · teval in s 29.9 37.3 44.8avg in s 0.062 0.42 4324.7stdev in s 0.040 0.038 5614.7

Table 5.5.: Comparison of t32true (left) and t32

false (right) for EPCBC-96 to brute-forceattack

5.2.3 Weak Keys

We were able to identify a class of weak keys for round-reduced EPCBC-96 of signifi-cant size. This section describes this class and our test results, showing that for thesekeys improved analytic results can be obtained. However, the attack on keys in thisclass relies on the stronger assumption of an adaptively chosen plaintext scenario.

To identify the class of weak keys we used the techniques introduced in Section3.2.2. In order to use this model we first have to identify the set of vertices on the8-dimensional cube that models the information flow through the EPCBC S-Box. Forthis, we denote a pair of partial input and output, where the partial output can beinferred if the input bits of an S-Box are known to have the corresponding values, asa mask. In Table 5.6 we list all such masks for the EPCBC S-Box, which we found byexhaustive search. In this table a “*” denotes that the bit is not known. In the rightcolumn the vertex corresponding to each mask is listed. To this set of vertices weadded the all-one and all-zero vertex to model the propagation of the fully knownand completely unknown input, respectively. Also, for every vector in {0, 1}4 that isnot the first part of any vertex, we concatenated the all-zero vector (in {0,1}4) to itand added the result to the set of vertices, thus modeling the partial inputs to S-Boxesthat do not allow to infer any output values. Note, that Table 5.6 contains duplicatevertices. For the optimization this does not matter, but it gives us some degrees offreedom when assigning values to the bits of the optimal solution later. Applying theDouble Description Method [34] to the set of vertices to obtain the correspondingconstraints yielded 181 inequalities.

The final optimization problem was formulated as described in Chapter 3 andsolved for guessing k = 64 bits. The result, depicted in Figure 5.4, yields 288 knownvalues overall. Accordingly, the number of eliminated variables for the whole sys-tem with 64 guessed bits would be 512. However, this obviously only holds if thepartial inputs of the S-Boxes correspond to the values required by the masks used inthe solution. For example, consider the right-most S-Box at the end of round 2 (sothe third row of S-Boxes) in Figure 5.4. In this case the optimizer used the vertex(1,1, 0,1, 0,1, 1,0). Looking at Table 5.6, we see that only one mask corresponds tothat vertex: 10*1 → *11*. To take advantage of the information flow proposed by

44

mask vertex

*000 → *01* (0,1, 1,1, 0,1, 1,0)*001 → *1** (0,1, 1,1, 0,1, 0,0)*010 → *00* (0,1, 1,1, 0,1, 1,0)*011 → **10 (0, 1,1,1, 0,0, 1,1)*100 → *1** (0, 1,1, 1,0,1, 0,0)*101 → ***1 (0, 1,1, 1,0, 0,0,1)*110 → ***1 (0, 1,1, 1,0, 0,0, 1)*111 → **00 (0, 1,1, 1,0, 0,1, 1)

0*00 → 0*1* (1,0, 1,1, 1,0, 1,0)0*01 → 11** (1,0, 1,1, 1,1, 0,0)0*10 → **01 (1,0, 1,1, 0,0, 1,1)0*11 → *0*0 (1, 0,1,1, 0,1, 0,1)1*00 → 1*** (1, 0,1, 1,1,0, 0,0)1*01 → 0**1 (1, 0,1, 1,1, 0,0,1)1*10 → *0** (1, 0,1, 1,0, 1,0, 0)1*11 → *1*0 (1, 0,1, 1,0, 1,0, 1)

mask vertex

00*0 → *0*1 (1, 1,0, 1,0, 1,0, 1)00*1 → ***0 (1, 1,0, 1,0, 0,0, 1)01*0 → 01** (1, 1,0, 1,1, 1,0, 0)01*1 → 1*** (1, 1,0, 1,1, 0,0, 0)10*0 → *0*0 (1, 1,0, 1,0, 1,0, 1)10*1 → *11* (1, 1,0, 1,0, 1,1, 0)11*0 → 1**1 (1, 1,0, 1,1, 0,0, 1)11*1 → 0*0* (1, 1,0, 1,1, 0,1, 0)

001* → *0** (1, 1,1, 0,0, 1,0, 0)010* → *11* (1, 1,1, 0,0, 1,1, 0)011* → **0* (1, 1,1, 0,0, 0,1, 0)100* → **1* (1, 1,1, 0,0, 0,1, 0)101* → ***0 (1, 1,1, 0,0, 0,0, 1)110* → **01 (1, 1,1, 0,0, 0,1, 1)

**11 → ***0 (0, 0,1, 1,0, 0,0, 1)*0*0 → *0** (0, 1,0, 1,0, 1,0, 0)

Table 5.6.: Information flow in EPCBC S-Box

the MILP solver, that partial input at this specific S-Box should be 10*1. So the ques-tion is, which keys and plaintexts meet the mask requirements of the propagationsolution returned by the solver in both, the key schedule and the encryption, andhow can this be used in an attack.

45

Figu

re5.4.:G

uess

ing

64bi

tsofEPCBC-96

cons

ider

ing

mas

ks

46

Key scheduleTo identify the necessary conditions for the key we tried to assign values to the

masks in use during the key schedule. As it turns out, not all masks can be statisfieddue to conflicts (cf. Section 3.2.2). However, the crucial S-Boxes are the ones in row3, where masks apply at every S-Box. If the information is propagated as depicted,six S-Boxes in row 4 have fully known input and thus propagate further (withoutany conditions that need to be met). This, in turn, leads to another four S-Boxeshaving fully known input in row 5. So we are aiming to find keys that meet themask conditions at row 3. Note, that there is a good chance that masks still applyfor the S-Boxes in row 4 with partially known input, since many of them still havethree known input bits (cf. Table 5.6). Anyhow, they will be different from theones proposed by the optimal solution, so the number of 512 known bits will, ingeneral, not be achieved. However, our experiments show, that this approach is“good enough” to achieve a very high average information propagation.

We assigned the values required at the inputs of the masks in row 3 as depictedin red numbers in Figure 5.5. The outputs are just given for completeness. A “+”means that either value meets the condition for a mask corresponding to the correctvertex.

47

0+

0

0+

0 0

0

10 1

11

0+

0

0+

0 0

0

+0 0

0 +

0+

0

0+

0 0

0

10 1

11

0+

0

0+

0 0

0

10 1

11

0+

0

0+

0 0

0

10 1

11

0+

0

0+

0 0

0

10 1

11

0+

0

0+

0+

0

0+

0 0

0

0 0

0

10 1

11

10 1

11

Figu

re5.5.:C

ondi

tions

tom

eetm

ask

requ

irem

ents

48

Now it is clear that for the key schedule to take advantage of this S-Box adjustedinformation flow, there are 55 bits at the input of round 3, or, equivalently, at theoutput of round 2, that need to have a specific value each. In this section, a keywill be denoted as weak, if the key schedule applied to it results in inner states, i.e.subkeys, meeting these requirements. There must be 241 inputs that result in the55 bits having the desired values, since two rounds of the key schedule also yield a96-bit permutation.

EncryptionWe also want to achieve the information flow for the encryptions. Due to the

symmetry of the key schedule and the encryption, we already know the conditionsthat must be met during the encryption – we need the same 55 bits to have the samevalue as in row 3 of Figure 5.5. In the case of the encryption we have to mind thekey addition layer. Before the state is processed by the substitution layer, the i-thsubkey is added to it. For the third subkey, i.e. the subkey produced after roundtwo of the key schedule, we already know the values of the relevant bits from theprevious section if we are assuming a weak key. Since we want the same masks toapply during the encryption, we want, for exactly these bits, the sum of the thirdencryption state and the third subkey to have the same values as the third subkey. Itfollows that these bits should be zero in the third state, i.e. after two rounds of theencryption. For the rest of this section, we will denote a plaintext that meets theseconditions with regards to a certain weak key also as “weak”. In contrast to keys, weuse the double quotes when talking about plaintexts to clarify the distinction.

The attack

Similar to the attack outlined in Section 5.2.2 there are two possible scenarios forthe attack. Firstly, an oracle reveals the set of key bits to the adversary, which in thecontext of this attack are assumed to be weak, and he tries to recover the rest of thebits. Secondly, the adversary attempts to recover the key under the sole assumptionthat the key is weak. Again, the latter will include guessing 64 bits. However, incontrast to the attack in Section 5.2.2, only guesses yielding weak keys are consid-ered. Furthermore, we assume an adaptively chosen plaintext scenario. For this, weassume the adversary is able to choose two plaintexts after the oracle reveals the keybits, or each time after the adversary chooses a guess, respectively. In both cases, weneed to specify how to choose “weak” plaintexts, given the 64 weak key bits. Also,for the latter attack scenario, we need to specify how to enumerate the weak keys.

Enumeration of weak keysIn context of the attack it is important to know if the weak keys can be enumerated

efficiently. For this it would be beneficial to turn the conditions on the key scheduleidentified above into conditions on the key. In theory this is possible using Gröbnerbases. However, in this case the system proved to be too complex to apply Gröbnerbasis techniques in reasonable time. Anyway, the weak keys can still all be enumer-ated using the following method. First we fix the 55 relevant bits after the secondround to the desired values and enumerate all possible values for the other 9 knownbits in this state. These 9 bits correspond to the 9 “+” in Figure 5.5. For each of

49

those values, inverting the 2-round permutation on these 64 bits will procude a setof 64 key bits that constitute a set of weak key bits. The weak key bits correspondto the values of the 16 active S-Boxes in the input of the first round. In this context,active means that bits in the input of the S-Box are known. From Figure 5.4 it isclear that the values of the 55 relevant bits at the end of round two only depend onexactly these weak key bits. Each of those weak key bit sets can be expanded to aweak key by assigning arbitrary values to the remaining 32 bits (the unknown bitsin Figure 5.5 at the input of row 1). Consequentially, each of those 29 sets of 64weak bits corresponds to 232 weak keys. Note, that this method yields an efficientenumeration procedure for the 241 weak keys as well as for the 29 weak sets of keybits, i.e. guesses.

Choosing plaintextsAssume we are given a set of weak key bits. From Figure 5.5 and due to the strong

symmetry between key schedule and encryption it is clear that the “weakness” of aplaintext only depends on the 64 weak key bits and the corresponding bits of theplaintext. This means, that a plaintext that is “weak” for a certain weak key is also“weak” for all 232 weak keys that have the same set of 64 weak bits at the specifiedpositions. To obtain a “weak” plaintext for a given set of weak key bits, we firstconstruct a state that has the desired properties, i.e. we choose a 96-bit state thatis zero at the 55 specified positions. Then, we select an arbitrary key of those 232

weak keys corresponding to the set of weak key bits and “decrypt” the constructedstate with a 2-round version of EPCBC-96 under this key. This will result in a “weak”plaintext for the given set of weak key bits.

ResultsWe applied the attack to EPCBC-96 for r ∈ {6, 7} with 100 randomly chosen weak

keys each, guessing correctly and randomly (but also weak). In both cases we ob-served an average information propagation of 497 bits. This means that we wereable to reduce the number of variables in the system by over 100 variables more thanfor general keys on average (cf. Section 5.2.2). The results are listed in Table 5.7.Again, our estimation of t32

true is on the left and the one of t32false is on the right. The

estimations show that, when given a set of 64 weak key bits, for r = 6 recovering theremaining 32 bits in an adaptively chosen plaintext scenario is more efficient usingthe algebraic attack than using a brute-force attack, exhibiting a speed-up factor ofmore than 25. Also, recovering the key under the assumption that it is weak by enu-merating the sets of weak key bits and solving the reduced system is more efficientthan brute-forcing all weak keys by a factor of more than 26.5. In fact, since thereare only 29 sets of weak key bits, applying this attack in practice on a single PC, likethe one we have used, is a matter of minutes. However, we do need to point out thatthis attack has a fairly high data complexity. In general, we need to choose a newplaintext for each guess. Since we need 28 guesses in average, we need just as manychosen plain-/ciphertext pairs.

The results also show that this attack is not faster than brute-force for r = 7.However, in this attack we only considered one optimal solution returned by theMILP solver to the information flow maximization problem. Preliminary tests showed

50

r 6 7

231 · teval in s 22.4 26.1avg in s 0.5 71.0stdev in s 0.037 86.4

r 6 7

232 · teval in s 44.8 52.3avg in s 0.44 125.1stdev in s 1.78 153.3

Table 5.7.: Comparison of t32true (left) and t32

false (right) for weak keys of EPCBC-96 tobrute-force attack

that there are multiple optimal solution, which might lead to further classes of weakkeys. Some of these might even make attacks on more rounds of EPCBC-48 possible.

Finally, we want to remark on the drastic difference of observed average hardnessof the polynomial system when guessing 64 bits of general keys in contrast to theweak keys just discussed. In Table 5.5 (left) for r = 6 the average solving time isreported to be 1687.5s with a standard deviation of 2941.1s. Obviously, the differentinstances that were tested exhibit a large inhomogeneity regarding their hardness.The very low corresponding average solving time and standard deviation for weakkeys in Table 5.7 for r = 6 suggest that the hardness of the instances is at least relatedto the amount of propagated information. In consequence, apart from identifying aclass of weak keys, this section serves as support for the claim made in Section 3.1.2:the more information can be inferred by guessing a set of bits, the easier the problemis to solve.

51

6 ConclusionIn this chapter we first give a short summary of our results, before outlining potentialfuture work.

6.1 Summary and Discussion

In this work we introduced a novel technique to improve guessing strategies in thealgebraic cryptanalysis of PRESENT-like primitives using Mixed Integer Programming.We applied the techniques to the cryptanalysis of SPONGENT and EPCBC. In additionto optimizing the guessing strategies, we also showed how to use these techniques toidentify classes of weak keys.

When analyzing SPONGENT we concentrated on finding semi-free-start collisions forthe two lightest members of this family of hash functions. We were able to achievethe following results:

• We found two semi-free-start collisions for the 6-round version of π88, the per-mutation of SPONGENT-88.

• We showed that there are no semi-free-start collisions for the 6-round versionof π136, the permutation of SPONGENT-128.

However, on the way to finding actual collisions, the semi-free-start collisions haveto be turned into actual collisions by finding a message, which, after being absorbed,results in the inner state for which the semi-free-start collision was found. This non-trivial operation, called finding a path, is shown to be hard in general by the authorsof [6]. Furthermore, SPONGENT-88 is not designed to provide collision resistance,since a simple brute-force attack only has a complexity of 244. Finally, since the6 rounds that we were able to obtain results on, constitute only 13% of the fullnumber of rounds in π88 and 8.6% in π136, we conclude that SPONGENT is secureagainst such algebraic attacks.

We also analyzed both versions of EPCBC using algebraic methods with the follow-ing results:

• We found practical attacks for EPCBC-48 and EPCBC-96 with up to 3 out of the32 rounds.

• We demonstrated weaknesses and theoretical attacks on EPCBC-48 with up to 8rounds.

• We demonstrated weaknesses and theoretical attacks on EPCBC-96 with up to 5rounds.

• We identified a class of 241 weak keys for EPCBC-96 with 6 rounds.

Our attacks mainly exploit two structural weaknesses of the cipher. In both versionsof the cipher the encrpytion and key schedule exhibit a strong symmetry. This allows

52

for a lot of information propagation, as outlined in Chapter 5. Furthermore, the waythe 96-bit key is processed in EPCBC-48 to produce the subkeys allows for a very ef-ficient guessing strategy, where whole blocks of successive subkeys can be computedfrom known or guessed bits. Even though our attacks do not yet threaten the fullcipher, we recommend avoiding such weaknesses in the design of block ciphers. Es-pecially in light of side-channel attacks these weaknesses can prove to be fatal to thesecurity of a cipher.

6.2 Future Work

As mentioned in Section 3.2.2, our analysis of weak keys could be improved a lot byextending the MILP approach to take into account the values of the known bits, notonly whether they are known or unknown. This could avoid conflicts as they wereobserved in Section 5.2.3 and thus aid the automization of the process of identifyingweak keys. This, in turn, bears the potential of recovering more classes of weakkeys of EPCBC-96, which we believe have a relatively high information flow and aretherefore linked to solutions of the corresponding MILP.

It would also be interesting to apply the techniques introduced in Chapter 3 tosimilar primitives, for example LBlock [46].

On a more general note, we believe that (Mixed) Integer Linear Programming hasa lot of potential in the field of cryptanalysis, which has by far not been exploredto the fullest yet. There seems to be a tendency among problems of cryptanalysisto be of discrete nature and often involve the maximization/minimization of a well-defined linear function or simply the enumeration of feasible solutions. In manycases, MILP solvers can be employed to find or guide the search towards solutions.When confronted with a new problem, we would recommend to cryptanalysts to firstexplore the potential of modeling the problem as MILP and using an off-the-shelfsolver, before going through the tedious process of designing an own “solver” to theproblem.

53

Bibliography[1] M. A. Abdelraheem, G. Leander, and E. Zenner. Differential cryptanalysis of

round-reduced PRINTcipher: Computing roots of permutations. In FSE’11, vol-ume 6733 of Lecture Notes in Computer Science, pages 1–17, Berlin, Heidelberg,2011. Springer-Verlag.

[2] M. R. Albrecht, C. Cid, T. Dullien, J.-C. Faugère, and L. Perret. Algebraic pre-computations in differential and integral cryptanalysis. In X. Lai, M. Yung, andD. Lin, editors, Inscrypt, volume 6584 of Lecture Notes in Computer Science,pages 387–403. Springer, 2010.

[3] G. Bard. Algebraic Cryptanalysis. Springer, 2009.

[4] W. C. Barker, N. I. of Standards, and T. (U.S.). Recommendation for the TripleData Encryption Algorithm (TDEA) block cipher [electronic resource] / William C.Barker . U.S. Dept. of Commerce, Technology Administration, National Instituteof Standards and Technology, Gaithersburg, MD :, 2004.

[5] G. Bertoni, J. Daemen, M. Peeters, and G. V. Assche. KECCAK sponge functionfamily main document. Submission to NIST (Round 2), 2009.

[6] G. Bertoni, J. Daemen, M. Peeters, and G. Van Assche. Sponge Functions. EcryptHash Workshop 2007, May 2007.

[7] G. Bertoni, J. Daemen, M. Peeters, and G. Van Assche. On the Indifferentiabilityof the Sponge Construction. In EUROCRYPT’08, volume 4965 of Lecture Notesin Computer Science, pages 181–197, Berlin, Heidelberg, 2008. Springer-Verlag.

[8] A. Bogdanov, M. Knezevic, G. Leander, D. Toz, K. Varici, and I. Verbauwhede.SPONGENT: A Lightweight Hash Function. In B. Preneel and T. Takagi, edi-tors, CHES, volume 6917 of Lecture Notes in Computer Science, pages 312–325.Springer, 2011.

[9] A. Bogdanov, L. R. Knudsen, G. Le, C. Paar, A. Poschmann, M. J. B. Robshaw,Y. Seurin, and C. Vikkelsoe. PRESENT: An Ultra-Lightweight Block Cipher. InCHES, volume 4727 of Lecture Notes in Computer Science, 2007.

[10] J. Borghoff, L. R. Knudsen, and M. Stolpe. Bivium as a Mixed-Integer LinearProgramming Problem. In M. G. Parker, editor, IMA Int. Conf., volume 5921 ofLecture Notes in Computer Science, pages 133–152. Springer, 2009.

[11] C. Bouillaguet, P. Derbez, and P.-A. Fouque. Automatic search of attacks onround-reduced aes and applications. IACR Cryptology ePrint Archive, 2012:69,2012.

[12] M. Brickenstein. Boolean Gröbner bases – Theory, Algorithms and Applications.Logos Berlin, 2010.

54

[13] M. Brickenstein and A. Dreyer. PolyBoRi: A framework for Gröbner-basiscomputations with Boolean polynomials. Journal of Symbolic Computation,44(9):1326 – 1345, 2009. Effective Methods in Algebraic Geometry.

[14] S. Bulygin and J. Buchmann. Algebraic Cryptanalysis of the Round-Reducedand Side Channel Analysis of the Full PRINTCipher-48. In Lin et al. [28], pages54–75.

[15] S. Bulygin and M. Walter. Study of the invariant coset attack on PRINTcipher:more weak keys with practical key recovery. Cryptology ePrint Archive, Report2012/085, 2012.

[16] C. Cid, S. Murphy, and M. J. B. Robshaw. Small Scale Variants of the AES. InH. Gilbert and H. Handschuh, editors, FSE, volume 3557 of Lecture Notes inComputer Science, pages 145–162. Springer, 2005.

[17] N. Courtois. Fast Algebraic Attacks on Stream Ciphers with Linear Feedback. InD. Boneh, editor, CRYPTO, volume 2729 of Lecture Notes in Computer Science,pages 176–194. Springer, 2003.

[18] N. Courtois and G. V. Bard. Algebraic Cryptanalysis of the Data EncryptionStandard. In S. D. Galbraith, editor, IMA Int. Conf., volume 4887 of LectureNotes in Computer Science, pages 152–169. Springer, 2007.

[19] N. Courtois and J. Pieprzyk. Cryptanalysis of Block Ciphers with OverdefinedSystems of Equations. In Y. Zheng, editor, ASIACRYPT, volume 2501 of LectureNotes in Computer Science, pages 267–287. Springer, 2002.

[20] I. Damgård. A Design Principle for Hash Functions. In G. Brassard, editor,CRYPTO, volume 435 of Lecture Notes in Computer Science, pages 416–427.Springer, 1989.

[21] M. Davis, G. Logemann, and D. Loveland. A machine program for theorem-proving. Commun. ACM, 5(7):394–397, July 1962.

[22] M. Davis and H. Putnam. A Computing Procedure for Quantification Theory. J.ACM, 7(3):201–215, July 1960.

[23] I. Dinur, O. Dunkelman, and A. Shamir. New attacks on Keccak-224 and Keccak-256. Cryptology ePrint Archive, Report 2011/624, 2011.

[24] J.-C. Faugère and A. Joux. Algebraic Cryptanalysis of Hidden Field Equation(HFE) Cryptosystems Using Gröbner Bases. In B. Dan, editor, CRYPTO’03, vol-ume 2729 of Lecture Notes in Computer Science, pages 44–60. Springer Berlin /Heidelberg, 2003.

[25] J.-C. Faugère, F. Levy-dit Vehel, and L. Perret. Cryptanalysis of Minrank. InD. Wagner, editor, CRYPTO’08, volume 5157 of Lecture Notes in Computer Sci-ence, pages 280–296, Berlin, Heidelberg, August 2008. Springer-Verlag.

[26] L. R. Knudsen, G. Leander, A. Poschmann, and M. J. B. Robshaw. PRINTcipher:A Block Cipher for IC-Printing. In Mangard and Standaert [30], pages 16–32.

55

[27] G. Leander, M. A. Abdelraheem, H. AlKhzaimi, and E. Zenner. A Cryptanalysisof PRINTcipher: The Invariant Subspace Attack. In CRYPTO’11, volume 6847of Lecture Notes in Computer Science, pages 206–221, Berlin, Heidelberg, 2011.Springer-Verlag.

[28] D. Lin, G. Tsudik, and X. Wang, editors. Cryptology and Network Security - 10thInternational Conference, CANS 2011, Sanya, China, December 10-12, 2011. Pro-ceedings, volume 7092 of Lecture Notes in Computer Science. Springer, 2011.

[29] S. Lucks. Attacking Triple Encryption. In S. Vaudenay, editor, FSE, volume 1372of Lecture Notes in Computer Science, pages 239–253. Springer, 1998.

[30] S. Mangard and F.-X. Standaert, editors. Cryptographic Hardware and EmbeddedSystems, CHES 2010, 12th International Workshop, Santa Barbara, CA, USA,August 17-20, 2010. Proceedings, volume 6225 of Lecture Notes in ComputerScience. Springer, 2010.

[31] F. Mendel, C. Rechberger, M. Schläffer, and S. S. Thomsen. The Rebound Attack:Cryptanalysis of Reduced Whirlpool and Grøstl. In O. Dunkelman, editor, FSE,volume 5665 of Lecture Notes in Computer Science, pages 260–276. Springer,2009.

[32] M. S. E. Mohamed, S. Bulygin, M. Zohner, A. Heuser, and M. Walter. ImprovedAlgebraic Side-Channel Attack on AES. Cryptology ePrint Archive, Report2012/084, 2012.

[33] P. Morawiecki and M. Srebrny. A SAT-based preimage analysis of reduced KEC-CAK hash functions. Cryptology ePrint Archive, Report 2010/285, 2010.

[34] T. Motzkin, H. Raiffa, G. L. Thompson, and R. M. Thrall. The Double DescriptionMethod. In H. W. Kuhn and A. W. Tucker, editors, Contributions to the Theory ofGames II. Princeton University Press, 1953.

[35] N. Mouha, Q. Wang, D. Gu, and B. Preneel. Differential and Linear Cryptanal-ysis using Mixed-Integer Linear Programming. In Inscript’11, Lecture Notes inComputer Science. Springer-Verlag, 2011.

[36] J. Nakahara, Jr., P. Sepehrdad, B. Zhang, and M. Wang. Linear (Hull) andAlgebraic Cryptanalysis of the Block Cipher PRESENT. In CANS ’09, volume5888 of Lecture Notes in Computer Science, pages 58–75, Berlin, Heidelberg,2009. Springer-Verlag.

[37] NIST. Advanced Encryption Standard (AES) (FIPS PUB 197). National Instituteof Standards and Technology, Nov. 2001.

[38] Y. Oren, M. Kirschbaum, T. Popp, and A. Wool. Algebraic Side-Channel Analysisin the Presence of Errors. In Mangard and Standaert [30], pages 428–442.

[39] M. Renauld, F.-X. Standaert, and N. Veyrat-Charvillon. Algebraic side-channelattacks on the aes: Why time also matters in dpa. In C. Clavier and K. Gaj,editors, CHES, volume 5747 of Lecture Notes in Computer Science, pages 97–111. Springer, 2009.

56

[40] K. A. Sakallah and L. Simon, editors. Theory and Applications of SatisfiabilityTesting - SAT 2011 - 14th International Conference, SAT 2011, Ann Arbor, MI,USA, June 19-22, 2011. Proceedings, volume 6695 of Lecture Notes in ComputerScience. Springer, 2011.

[41] C. E. Shannon. Communication Theory of Secrecy Systems. Bell System Techni-cal Journal, 28(4):656–715, 1949.

[42] M. Soos. Cryptominisat 2.5.0. In SAT Race competitive event booklet, July 2010.

[43] W. Stein et al. Sage Mathematics Software (Version 4.7.2). The Sage Develop-ment Team, 2011. http://www.sagemath.org.

[44] W. Tuchman. Internet besieged. chapter A brief history of the data encryp-tion standard, pages 275–280. ACM Press/Addison-Wesley Publishing Co., NewYork, NY, USA, 1998.

[45] R.-P. Weinmann. Algebraic Methods in Block Cipher Cryptanalysis. PhD thesis,TU Darmstadt, April 2009.

[46] W. Wu and L. Zhang. Lblock: A lightweight block cipher. In J. Lopez andG. Tsudik, editors, ACNS, volume 6715 of Lecture Notes in Computer Science,pages 327–344, 2011.

[47] H. Yap, K. Khoo, A. Poschmann, and M. Henricksen. EPCBC - A Block CipherSuitable for Electronic Product Code Encryption. In Lin et al. [28], pages 76–97.

57

A Algebraic Representation ofS-Boxes with Quadratic Equations

A.1 SPONGENT

For a SPONGENT S-Box, let x0, x1, x2, x3 be the variables corresponding to the inputand y0, y1, y2, y3 the ones for the output of the S-Box. Then, the 21 quadraticequations describing the S-Box, are

0= x1 · x2+ x0+ x1+ x3+ y0

0= x0 · x1+ x0 · x2+ x1 · x3+ x0 · y0+ x0 · y1+ x0 · y2+ x2+ x3+ y3+ 1

0= x0 · x2+ x0 · y0+ x1 · y0+ x0 · y1+ x0 · y2+ x0+ x2+ y0+ y3+ 1

0= x0 · x1+ x0 · x2+ x0 · x3+ x1 · y1+ x0 · y3+ x2+ y0+ y3+ 1

0= x0 · x1+ x0 · y0+ x0 · y1+ x0 · y2+ x1 · y2+ x0 · y3+ x2+ x3+ y0+ y2+ 1

0= x0 · x2+ x0 · x3+ x0 · y2+ x1 · y3+ x3+ y0

0= x0 · x1+ x0 · x2+ x2 · x3+ x0 · y0+ x0 · y1+ x0 · y2+ y0+ y1+ y2+ y3+ 1

0= x0 · x1+ x0 · y0+ x2 · y0+ x0 · y1+ x0 · y2+ y0+ y1+ y2+ y3+ 1

0= x0 · x2+ x0 · y0+ x0 · y1+ x2 · y1+ x0 · y2+ x0 · y3+ x1+ x2+ x3+ y1

+ y2+ y3+ 1

0= x0 · x1+ x0 · x2+ x0 · x3+ x2 · y2+ x0 · y3+ x2+ x3+ y0+ y2+ 1

0= x0 · x1+ x0 · x3+ x0 · y1+ x2 · y3+ x1+ x2+ y0+ y1+ y3

0= x0 · x1+ x0 · x2+ x0 · y0+ x3 · y0+ x0 · y1+ x0 · y2+ x1+ y2+ y3

0= x3 · y1+ x2+ y0+ y1+ y2

0= x3 · y2+ x1+ y0+ y1+ 1

0= x0 · x1+ x0 · x2+ x0 · x3+ x0 · y3+ x3 · y3+ x0+ x2+ x3+ y0+ y1+ y2

0= x0 · x1+ x0 · y0+ x0 · y1+ y0 · y1+ x0 · y2+ x0 · y3+ x1+ x2+ y1+ y3

0= x0 · x2+ x0 · y0+ x0 · y1+ x0 · y2+ y0 · y2+ x0 · y3+ x1+ y0+ y1+ 1

0= x0 · x1+ x0 · x2+ x0 · y0+ y0 · y3+ x0+ x2+ y1+ y2

0= x0 · x1+ x0 · x2+ y1 · y2+ x2+ x3+ y0+ 1

0= x0 · x1+ x0 · y0+ x0 · y3+ y1 · y3+ x0+ x2+ x3+ y0+ y1+ y2+ y3

0= x0 · x2+ x0 · y0+ x0 · y3+ y2 · y3+ x0+ x1+ x2+ x3+ y1

A.2 EPCBC

For an EPCBC S-Box, let x0, x1, x2, x3 be the variables corresponding to the input andy0, y1, y2, y3 the ones for the output of the S-Box. Then, the 21 quadratic equationsdescribing the S-Box, are

58

0= x1 · x2+ x0+ x2+ x3+ y0

0= x0 · x1+ x0 · x2+ x1 · x3+ x0 · y0+ x0+ x1+ x3+ y0+ y2+ y3

0= x0 · x2+ x0 · y0+ x1 · y0+ x0+ x1+ x3+ y0+ y2+ y3

0= x0 · x1+ x0 · x2+ x0 · y0+ x0 · y1+ x0

0= x0 · x1+ x0 · x2+ x0 · y0+ x0 · y3+ x0+ x1+ x2+ y0+ y3+ 1

0= x0 · y0+ x0 · y2+ x1 · y2+ x1 · y3+ x1+ x2+ x3+ y2+ 1

0= x0 · x1+ x0 · x2+ x2 · x3+ x0 · y0+ x0+ x1+ x2+ y1+ y2+ 1

0= x0 · x1+ x0 · y0+ x2 · y0+ x1+ x2+ x3+ y0+ y1+ y2+ 1

0= x0 · x2+ x0 · x3+ x0 · y0+ x1 · y1+ x2 · y1+ x3+ y3+ 1

0= x0 · x2+ x0 · x3+ x1 · y1+ x0 · y2+ x2 · y2+ x0+ x1+ y0+ y1+ y2+ y3

0= x0 · x1+ x0 · x3+ x0 · y2+ x1 · y2+ x2 · y3+ x2+ x3+ y0+ y1+ y2+ 1

0= x0 · x1+ x0 · x3+ x0 · y0+ x3 · y0+ x1 · y1+ x0 · y2+ x1 · y2+ x0+ x2

+ y1+ y2+ y3

0= x0 · x3+ x3 · y1+ x1 · y2+ x0+ x1+ x2+ y1+ y2+ y3

0= x0 · x1+ x0 · x2+ x0 · x3+ x0 · y2+ x3 · y2+ x0+ x2+ x3+ y0+ y1+ y3+ 1

0= x0 · x2+ x1 · y1+ x0 · y2+ x3 · y3+ x0+ x3+ y0+ y2+ y3

0= x0 · x1+ x0 · x3+ x0 · y0+ y0 · y1+ x0 · y2+ x3+ y0+ y1+ y2+ y3

0= x0 · x2+ x0 · x3+ x0 · y0+ x1 · y1+ x0 · y2+ x1 · y2+ y0 · y2+ y0+ y2+ 1

0= x0 · x2+ x0 · x3+ x0 · y0+ x0 · y2+ y0 · y3+ x0+ x2+ y1+ y3+ 1

0= x0 · x2+ x0 · x3+ x0 · y2+ y1 · y2+ x3+ y3+ 1

0= y1 · y3+ x0+ y0+ y2+ 1

0= x0 · x1+ x0 · x3+ x0 · y2+ y2 · y3+ x0+ x1+ y1+ y2+ y3+ 1

59

algebraic methods in analyzing lightweight cryptographic …pub.ist.ac.at/~mwalter/docs/ma.pdf ·...

Documents