softwarebenchmarkingofthe2nd round caesarcandidates · 2016. 9. 5. · softwarebenchmarkingofthe2nd...
TRANSCRIPT
Software Benchmarking of the 2nd roundCAESAR Candidates
Ralph Ankele1 and Robin Ankele2
1 Information Security Group, Royal Holloway, University of LondonMcCrea Building, Egham TW20 0EX, UK
[email protected] Department of Computer Science, University of Oxford
Wolfson Building, Oxford OX1 3QD, [email protected]
Abstract. The software performance of cryptographic schemes is an important factorin the decision to include such a scheme in real-world protocols like TLS, SSH or IPsec.In this paper, we develop a benchmarking framework to perform software performancemeasurements on authenticated encryption schemes. In particular, we apply ourframework to independently benchmark the 29 remaining 2nd round candidates of theCAESAR competition. Many of these candidates have multiple parameter choices,or deploy software optimised versions raising our total number of benchmarked im-plementations to 232. We illustrate our results in various diagrams and hope thatour contribution helps developers to find an appropriate cipher in their selection choice.
Keywords: Software benchmarks · CAESAR · 2nd round candidates · real-worldsusecases · tls · ssh · optimisations · aes-ni · sse · avx · neon
1 IntroductionAn authenticated encryption (AE) scheme provides confidentiality, integrity and authentic-ity simultaneously. These goals are typically achieved by combining separate cryptographicprimitives, an encryption scheme to ensure confidentiality and a message authenticationcode for integrity and authenticity. However, the generic composition, the combinationof a confidentiality mode with an authentication mode, may lead to implementationerrors and is often not efficient (e.g. the input stream has to be processed twice, forthe encryption scheme and the message authentication code). Recently, some practicalattacks [CHVV03], [Vau02] have confirmed the vulnerability of these AE schemes. In2013 the CAESAR [Ber14] competition was announced to overcome these attacks andto achieve fast and efficient AE schemes and should be suitable for widespread adoption.In March 2014, 57 first round candidates were submitted to the CAESAR competition.Abed et al. [AFL14] provide a classification and a general overview of the first roundcandidates. In July 2015, the second round candidates were announced. The deadlinefor the announcement of the third round candidates is expected in end July 2016 thatrequires the CAESAR committee to make a selection decision which candidates will beincluded in the third round.
Beside of the security guarantees of the authenticated encryption mode, the softwareand hardware performance of the candidate becomes more and more important. One ofthe requirements imposed on all CAESAR submissions was that they should demonstratethe potential to be superior to AES-GCM in at least one significant aspect. One ofthose aspects that is particularly significant for the usage in widely deployed protocols
2 Software Benchmarking of the 2nd round CAESAR Candidates
such as TLS, SSH and IPsec is software performance. Therefore, we have developed abenchmarking framework for automated and accurate software performance measurementsand applied it to benchmark the remaining 29 second round candidates. Many of thesecandidates have multiple parameter choices, extending the number to 125 implementations.Overall, we have considered several software performance optimised implementations,raising the total number of benchmarked implementations to 232.
During our benchmarks, we perform and illustrate our measurements on a continuousfunction (with message and associated data sizes from 0 to 2048 bytes, in 128 byte steps)and on some fixed length sizes, relating to real-world usecases — 1 byte (one keystroke ,SSH), 16 bytes (small payload), 557 bytes (average IP packet size), 1.5 kB (ethernet MTUsize, TLS), 16 kB (max. TCP packet size) and 1 MB (e.g. file upload).
Related Work. In order to evaluate the software performance of the CAESAR candidates,Bernstein provides the SUPERCOP∗ [Ber16] framework, which is a general purposeframework for the evaluation of the software performance of cryptographic software.Saarinen presented the BRUTUS† [Saa16] framework, in order to test the CAESARcandidates for cryptographic weaknesses. Additionally to the software performance ofthe candidates, the hardware performance is also an important factor in the selection ofthe final portfolio. Homsirikamol et al. [HDF+15] present a hardware API to performhardware benchmarks of the CAESAR candidates.
Contribution and Outline. In this paper, we present a complementary benchmarkingframework to perform automatic and accurate measurements on authenticated encryptionfunctions. We apply this framework to perform independent software benchmarks on the2nd round CAESAR candidates and illustrate the results in various diagrams.
This paper is organised as follows. In Section 2, we introduce the 2nd round candidatesof the CAESAR competition and classify them regarding their underlying cryptographicprimitive. Section 3, presents our benchmarking framework and the measurement settings.In Section 4, we give an overview and discuss the software optimisation implemented inthe 2nd round candidates. Finally, in Section 5 we present the results of the benchmarkingand Section 6 concludes the paper.
2 Classification of the 2nd round CAESAR CandidatesIn the CAESAR [Ber14] call for submissions the choice of underlying primitive and type ofcipher was open for the designers. After 57 submissions for the first round, 29 submissionshave been selected for the second round. Within these remaining candidates there are 15based on block ciphers, 8 based on sponge constructions, 4 based on stream ciphers, 2based permutations and 1 based on a compression function. Table 1 gives an overview ofthe 2nd round candidates of the CAESAR competition sorted by type.
Some of the candidates are online. An online cipher is defined such that the encryptionof message block Mi is only depending on already processed message blocks M1, . . . ,Mi−1.Furthermore, an inverse-free scheme indicates if the inverse of an underlying primitive isneeded (i.e. for the purpose of decryption of the underlying block cipher). An authenticatedencryption scheme is nonce-misuse resistant, if the cipher is still secure under the repetitionof nonces. Moreover, we can distinguish between nonce misuse resistance for online andoffline ciphers. On the one hand, offline ciphers security can be claimed if the repetitionof the ciphers leak only the ability to see a repeated message. On the other hand, foronline ciphers security can be claimed, if nonces are repeated only the common prefix ofthe messages can be observed.∗https://bench.cr.yp.to/supercop.html†https://github.com/mjosaarinen/brutus
Ralph Ankele and Robin Ankele 3
Table 1: Overview of the 2nd round CAESAR candidates ††. Possible types of schemes are: BC: blockcipher based schemes, SC: stream cipher based schemes, Sponge: scheme based on Sponge construction(duplex or otherwise), P: permutation based scheme and CF: scheme based on compression function.Underlying primitives are: AES: when the full AES-128 is used, AES[r]: when a round reduced version ofAES is used, other dedicated primitive or constructions: e.g. SHA2, SPN, LFSR. Entries for nonce misuseresistance are: None: when no security is claimed by the designers if nonces are repeated, Max: when foroffline ciphers the repetition of nonces only leak the ability to see a repeated message, LCP: for onlineciphers, all an adversary can observe is the longest common prefix of messages for repeated nonces.
Name Type Primitive Parallel Online Inverse- Security Nonce mis- ReferenceEnc. / Dec. Enc. / Dec. free proof use resistantAEGIS BC AES[1] X/ - X/X - - None [WP14]
AES-COPA BC AES partly/partly X/X - X LCP [ABL+15]AES-JAMBU BC AES, Simon - / - X/X X - LCP [WH15]AES-OTR BC AES X/X X/X X X None [Min16]
AEZ BC AES[4] X/X - / - X X Max [HKR15]CLOC BC AES, TWINE - / - X/X - X None [IMGM15]Deoxys BC AES X/X X/X - X LCP [JNP15a]ELmD BC AES partly/partly X/X - X LCP [DN15]Joltik BC Joltik-BC X/X X/X - X LCP [JNP15b]OCB BC AES X/X X/X - X None [KR14]POET BC AES partly/partly X/X X X LCP [AFF+15]
SCREAM BC SPN X/X X/X - - None [GLS+15]SHELL BC AES partly/partly X/X - X None [Wan15]SILC BC AES, Present, LED - /- X/X X X None [IMG+15]
Tiaoxin BC AES[1] X/X X/X X - None [Nik15]OMD CF SHA2 - / - X/X - X None [CMN+15]
Minalpher P SPN X/X X/X X [STA+15]PAEQ P AESQ X/X X/X X X [BK14]ACORN SC LFSR X/X X/X X - None [Wu15]HS1-SIV SC ChaCha/Poly1305 - / - - / - X X Max [Kro15]MORUS SC LRX - / - X/X X - None [HW15]TriviA-ck SC Trivium partly/partly - / - X X None [CM15]Ascon Sponge SPN - / - X/X X X [DEMS15]
ICEPOLE Sponge Keccack-like X/ X X/X X X [MGH+15]Ketje Sponge Keccack-f - / - X/X X X None [BDP+14]Keyak Sponge Keccack-f X/ X X/X X X None [BDP+15]NORX Sponge LRX X/X X/X X X None [AJN15]π-cipher Sponge ARX X/X X/X X X Intermediate [GMS+15]
PRIMATEs Sponge SPN - / - X/X X X LCP [ABB+14]STRIBOB Sponge Streebog - / - X/X X X None [OSB15]
†† Data obtained from: https://aezoo.compute.dtu.dk/doku.php
3 Benchmarking FrameworkSimilar to the SUPERCOP [Ber16] and BRUTUS [Saa16] frameworks, we have developeda benchmarking framework to facilitate automatic software performance evaluations forauthenticated encryption functions. Using our toolkit, one can perform independentand accurate software performance measurements (i.e. timing measurements), verify theimplementation of the cipher under test, generate and illustrate statistical informationabout the software performance, and compare different parameters and optimisations ofa cipher among its different optimisations or to other ciphers under test. Nevertheless,our benchmarking suite is currently optimised to perform measurements of authenticatedencryption functions it remains open to be extended to perform benchmarking of otherfunctionality and include other types of software tests.
Generally, our benchmarking framework provides the following benefits compared tothe SUPERCOP and BRUTUS framework.
� Very simple with only focus on comparison of AE schemes� Due to its simplicity open for extension of other software performance tests
4 Software Benchmarking of the 2nd round CAESAR Candidates
� Optimised TSC instruction (i.e. rdtscp) for accurate timing measurements� Reduction of noise in the measurement using single user mode, averaging and median� Benchmarking, verification, illustration and comparison of results
Using our framework, we built and benchmarked the 2nd round candidates (see Table 1)of the ongoing CAESAR competition. By July 2016, this includes the implementations of29 candidates. Many of these candidates have multiple parameter choices, extending thenumber to 125 implementations. Overall, we have considered several software performanceoptimised implementations, raising the total number of benchmarked implementations to232‡.
3.1 Measurement SetupFor our measurements we had two different measurement setups. The first measurementswere taken on a 64-bit Sandy Bridge 2.3GHz dual-core Intel Core i5-2415M processor.The underlying hardware was a Mac Book Pro Early 2011 running OS X 10.11 (ElCapitan) in single user mode. The second measurements were taken on a 64-bit Skylake2.4GHz dual-core Intel Core i5-6300U processor. The underlying hardware was a DELLLatitude E7470 running Ubuntu 16.04 in single user mode. The implementations undertest were built using the clang compiler version 6.1.0 (clang-602.0.53) and the gcc compilerversion 5.4.0 (5.4.0-6ubuntu1-16.04.2). Additionally, we enabled the following compilerflags for compile time optimisation of the underlying source code: -Ofast -fno-stack-protector -march=native. In order to get rid of the noise(e.g. context switches) duringthe measurement and to get a clean and stable result in an uninterrupted environment,we choose to boot in the single user mode, which just starts the core operating systemand provides a sh command interface, without starting other core elements of the OS Xoperating system (e.g. network services, multi-user support, graphical user interface) thatmay interfere with our benchmarking. Furthermore, we used the same method as describedin the paper from Krovetz and Rogaway [KR11]. In their approach, they determine theperformance of the underlying function as the median of 91 averaged timings of 200measurements each.
3.2 High Resolution Methods for CPU Timing InformationIn general, there are several ways to determine the software performance of a functionunder test. Each operating system offers several methods of accurate timing sources to beused to determine the runtime (e.g. latency) or throughput of a function. Using multi-coreprocessors, one must be aware of handling hyper-threading and over-clocking of the CPUduring their benchmarking, to avoid distorted or falsified measurements.
For the performance of high precision timing measurements on Windows platforms, onecould use the HPET (i.e. High Precision Event Timer) or the QueryPerformanceCounter.On Unix based platforms one can use the posix conform time() and clock() functions,as used in the BRUTUS framework. However, a much more accurate timing source onx86 processors is the Timer Stamp Counter (TSC) as used in the SUPERCOP framework.The TSC is a 64 bit register and counts the number of cycles since the last reset of themachine. Using the RDTSC assembler instruction, one can read out the current TSC valueand use it for accurate timing measurements.
In newer processor hardware, Intel processors practice out-of-order execution, whereinstructions are not always performed in the order as they appear in the source code. Thiscould lead to a skewed measurement result. To avoid such errors, the CPUID instruction, aserialisation instruction, can be inserted, which‡Number of implementations supported by our benchmarking hardware.
Ralph Ankele and Robin Ankele 5
Table 2: Real-world use case settings for our benchmarking.
Message Size Associated Data Size Comments1 byte 5 byte one keystroke (e.g. SSH)16 bytes 5 byte small payload557 byte 5 byte average IP packet size**1.5 kB 5 byte ethernet MTU, TLS16 kB 5 byte max TCP packet size1 MB 5 byte file upload
** http://slaptijack.com/networking/average-ip-packet-size
forces every preceding instruction to complete before the execution of the RDTSC instruction.In our framework, we make use of the RDTSCP [Pao10] instruction, which is an optimisedversions of the combination CPUID + RDTSC.
3.3 Authenticated Encryption Benchmarking Settings and Real-WorldUsecases
In our benchmarking, we implemented measurements on a continuous function (i.e. wherewe vary the message size and the associated data size) and over some fixed message andassociated data sizes (i.e. representing some real-world usecases).
Overall, we consider message size from 0 to 1 MB and associated data size from 0 to 2KB. The sizes of the key and nonce are fixed-size values, as determined by the CAESARspecification. In the case of the continuous function we consider message and associateddata size from 0 to 2048 bytes in 128 byte steps. On the other hand, for the timingmeasurements with fixed message and associated data size, we consider the values inTable 2.
The associated data is fixed to 5 bytes§ for all measurements, which represents thetypical size of the header of a TLS packet. Additionally, for better comparability, werepresent the performance results of the authenticated encryption candidates in cycle perbyte (cbp), which is a function of the throughput of the ciphers, instead of releasing thetimings (i.e. latency) of the candidates.
4 Software OptimisationsSeveral ciphers are supported by software-optimised implementations. These optimisationsare achieved by the usage of architecture-specific instructions and intrinsics. An intrinsicfunction is handled in a special manner by the compiler where the intrinsic function mapsto some specific assembler instructions. Table 3 gives an overview of the mainly usedinstruction set extensions in the 2nd round candidates. Furthermore, in the followingwe give a short overview of the most common used instruction set extensions to achievesoftware optimised algorithms.
4.1 AES-NI.As many CAESAR candidates are AES based, they can obtain an increase in softwareperformance by using the AES New Instructions [Gue12]. These set of instructions areavailable beginning with the 2010 Intel processors based on the 32nm microarchitectureWestmere. The instruction set consists of six instructions that offer full hardware support
§http://netsekure.org/2010/03/tls-overhead
6 Software Benchmarking of the 2nd round CAESAR Candidates
Table 3: Most common optimisations for the CAESAR candidatesInstruction Set Vendor Microarchitecture CAESAR Candidate
AES-NI Intel Westmere OCB, OTR, AEGIS, CLOC, SILC, POET,JAMBU, AEZ, Deoxys, PAEQ, Tiaoxin, TriviA-ck
SSE Intel Pentium III OCB, CLOC, Keyak, Morus, OMD,Scream, Stribob, Tiaoxin, TriviA-ck
AVX Intel Sandy Bridge NORX, OMDAVX2 Intel Haswell Keyak, Minalpher, Morus, NORXNEON ARM ARMv7-A OCB, NORX, Scream, Stribob
for AES enabling fast and secure encryption and decryption. The AES New Instructionsincrease the performance by more than an order of magnitude for parallel modes (e.g. CTRmode), and provide roughly 2-3 fold gains for non-parallelisable modes (e.g. CBC mode).On a Westmere microarchitecture the AES-NI instructions can run with approximately1.3 cycles per byte in parallel mode of operation.
4.2 SSE.
The Streaming SIMD Extensions [Sie09] contain around 70 instructions for high perfor-mance optimised implementations. The SSE instructions are available since Pentium IIIand the latest update SSE4.2 is available with the Nehalem microarchitecture. The exten-sions provide up to 16 128-bit registers (xmm0-15) and provide vector-mode operations,enabling parallel execution of one operation on multiple data. The speedup that can beachieved with the SSE instructions depends on whether the data can be parallelised, asoperations are used in a vector-mode where the same operation is executed on multipledata.
4.3 AVX.
The Advanced Vector Extensions [Lom11] were introduced with the Sandy Bridge mi-croarchitecture and extend the SSE instructions with a new set of instructions and a newcoding scheme. The AVX1 instructions extend the previous xmm0-15 128-bit registers to16 256-bit registers (ymm0-15). Furthermore, memory alignments have been relaxed andthree-operand non-destructive operations have been added. Previously, with two-operandcommands the source operand was always overwritten by the instruction (e.g. A = A+ B). With the new three-operand instructions (e.g. A = B + C) an additional movinstruction to secure the source-value is not necessary. Additionally, the AVX2 [Bux11]instructions are an advancement of the AVX instructions supporting integer datatypes andadding vector shift operations. The new instructions were introduced with the Haswellmicroarchitecture.
4.4 NEON.
The implementation of the Advanced SIMD extensions used in ARM processors is calledNEON [neo] and is available with the Cortex-A microarchitecture. ARM processors areused in many mobile devices such as tablets and mobile phones. The NEON instructionsare 128-bit SIMD extensions providing a flexible and powerful acceleration as they performpackaged SIMD processing. Registers are therefore considered as vectors of elements withthe same data type and perform the same operation on all lanes.
Ralph Ankele and Robin Ankele 7
5 Results
We used our framework to benchmark all 2nd round CAESAR candidates. Figure 1 andFigure 2 show a comparison of the best implementations for each CEASAR candidateon SandyBridge and Skylake processors. To demonstrate the impact of a CAESARcandidate in a real-world setting, we also made benchmarks in the following settings.Figure 3 compares the best implementation per CAESAR candidate for the TLS settingon the Skylake architecture, while Figure 4 shows the same setting on the SandyBridgearchitecture. We chose the TLS setting, because we believe it is the most common usecasefor authenticated encryption. Therefore, we set the associated data length to 5 bytes¶which represents the typical length of a TLS header. The message length is set to 1500bytes, which corresponds to the MTU size of an ethernet frame. Moreover, we also madebenchmarks for the SSH setting. In this setting, we fix the associated data length to 5bytes and the message length to 1 byte (i.e. one key stroke). Figure ?? compares the bestimplementation per CAESAR candidate for the SSH setting on the Skylake architecture,and Figure 6 shows the same setting on the SandyBridge architecture. Furthermore, wecreated benchmarks for all candidates with increasing message/associated data lengthsgiven in the appendix. Additionally, 3D plots are provided for the best implementation ofeach cipher, in order to illustrate the evolution in performance when the message lengthand the associated data length increase. We give detailed illustrations of these benchmarksin the appendices.
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
0.51
0.71
0.92
1.36
1.75
2.473
3.94
5.546.55
910.7613.1416.03
20.5324.92
31.11
47.47
61.14
82.4
116.3
219.33
272.37
1059.18
1590.87
6611.66
Pe
rfo
rma
nce
(cp
b)
acorn128v2
aeadaes256ocbtaglen128v1
aegis128l
aes128n8t8clocv2
aes128n8t8silcv2
aes128otrpv3
aes128otrsv2
aes128poetv2aes4ls0lt0
aescopav2
aesjambuv2
aezv4
ascon128av11
deoxysneq256128v13
elmd600v2
hs1sivlov1
icepole128av2
joltikneq6464v13
ketjesrv1
lakekeyakv2
minalpherv11
morus640128v1
norx6441
omdsha512k512n256tau256v2
paeq64
pi64cipher128v2
primatesv1gibbon80
scream10v3
shellaes128v2d4n80
stribob192r2
tiaoxinv2
trivia0v2
Figure 1: Comparison of the best implementation (on the SandyBridge processor)for each CAESAR candidate. The associated data is fixed to 0 bytes.
¶http://netsekure.org/2010/03/tls-overhead
8 Software Benchmarking of the 2nd round CAESAR Candidates
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
0.29
0.47
0.63
0.78
1.05
1.37
1.65
2.01
2.55
3.69
5.09
7.07
9.14
11.42
17.61
25.16
40.8
61.14
104.93
128.46
482.85
609.55
1309.31567.98
Pe
rfo
rma
nce
(cp
b)
acorn128v2
aeadaes256ocbtaglen128v1
aegis128l
aes128gcmv1
aes128n8t8clocv2
aes128n8t8silcv2
aes128otrpv3
aescopav2
aesjambuv2
aezv4
ascon128av11
deoxysneq256128v13
elmd600v2
hs1sivlov1
icepole128av2
joltikneq6464v13
ketjesrv1
minalpherv11
morus1280256v1
norx6441
omdsha512k512n256tau256v2
paeq64
pi64cipher256v2
poetv2aes4
primatesv1gibbon80
scream10v3
seakeyakv2
shellaes128v2d4n80
stribob192r2
tiaoxinv2
trivia0v2
Figure 2: Comparison of the best implementation (on the SkyLake processor) foreach CAESAR candidate. The associated data is fixed to 0 bytes.
100
101
102
103
104
aegis128l_aesnic
tiaoxinv2_nim
aezv4_aesni
morus640128v1_sse2
aeadaes128ocbtaglen128v1_opt
aes128poetv2aes4ls0lt0_ni
deoxysneq256128v13_aesni
norx3241_xmm
aes128n8t8clocv2_aesni
aes128n8t8silcv2_aesni
paeq64_aesni
lakekeyakv2_generic64lc
aesjambuv2_aesni
hs1sivlov1_ref
acorn128v2_opt
ascon128av11_opt64
scream10v3_sse
omdsha512k128n128tau128v2_avx1
aes128otrsv2_ref
stribob192r2_ssse3
icepole128av2_ref
pi64cipher128v2_ref
trivia128v2_ref
elmd600v2_ref
shellaes128v2d4n64_ref
ketjejrv1_compact
aescopav2_ref
minalpherv11_ref
joltikneq6464v13_ref
primatesv1gibbon80_ref
Performance (cpb)
0.62
0.74
1.41
1.53
2.45
3.58
3.59
4.93
5.57
5.7
6.69
7.42
10.72
11.79
13.74
14.68
19.67
21.28
23.35
26.56
26.93
27.6
46.83
59.19
79.6
175.51
215.16
1115.12
1185.21
5894.83
Figure 3: Comparison of 2nd round CAESAR candidates in the TLS setting (onthe SandyBridge processor).
Ralph Ankele and Robin Ankele 9
100
101
102
103
tiaoxinv2_nim
aegis128l_aesnic
aezv4_aesni
aes128otrpv3_nip7m2
morus1280256v1_avx2
aeadaes128ocbtaglen128v1_opt
deoxysneq128128v13_aesni
poetv2aes4_ni
aes128gcmv1_openssl
norx6441_ymm
aes128n12t8clocv2_aesni
aes128n8t8silcv2_aesni
lakekeyakv2_generic64
paeq80_aesni
ascon128av11_opt64
aesjambuv2_aesni
hs1sivlov1_ref
pi64cipher128v2_goptv
trivia0v2_sse4
acorn128v2_opt
scream10v3_sse
icepole128av2_ref
omdsha512k128n128tau128v2_sse4
stribob192r2_ssse3
ketjesrv1_reference
shellaes128v2d8n80_ref
aescopav2_ref
minalpherv11_ref
primatesv1gibbon80_ref
joltikeq12864v13_ref
Performance (cpb)
0.38
0.4
0.68
0.82
1.07
1.26
1.37
1.6
1.8
2.43
2.57
2.62
4.12
4.56
6.25
6.31
7.11
7.14
9.01
9.05
9.26
9.45
18.05
22.47
36.76
37.31
105.68
543.38
1309.54
1757.25
Figure 4: Comparison of 2nd round CAESAR candidates in the TLS setting (onthe SkyLake processor).
100
101
102
103
104
aegis128_aesni
tiaoxinv2_nim
aezv4_aesni
aes128n8t8clocv2_aesni
aesjambuv2_aesni
morus640128v1_sse2
aes128n8t8silcv2_aesni
ascon128av11_opt64
paeq128_aesni
norx3241_xmm
aes128poetv2aes4ls0lt0_ni
aeadaes128ocbtaglen128v1_opt
deoxysneq256128v13_aesni
lakekeyakv2_generic64lc
aes128otrsv2_ref
hs1sivlov1_ref
stribob192r2_ssse3
trivia128v2_ref
scream10v3_ref
elmd600v2_ref
acorn128v2_opt
omdsha256k192n104tau128v2_avx1
pi32cipher256v2_ref
ketjejrv1_compact
icepole128av2_ref
shellaes128v2d4n64_ref
aescopav2_ref
joltikneq6464v13_ref
minalpherv11_ref
primatesv1hanuman80_ref
Performance (cpb)
47.98
79.05
89.42
91.44
94.82
96.11
126.73
166.5
175.2
182.46
186.21
320.42
381.65
425.78
433.06
478.97
516.84
596.4
652.64
743.39
812.69
850.87
925.52
1396.83
1575.73
2033.89
2097.23
4717.89
7219.62
18828.45
Figure 5: Comparison of 2nd round CAESAR candidates in the SSH setting (onthe SandyBridge processor).
10 Software Benchmarking of the 2nd round CAESAR Candidates
100
101
102
103
104
aegis128l_aesnic
aezv4_aesni
tiaoxinv2_nim
aes128otrpv3_nip7m2
aesjambuv2_aesni
aes128n12t8clocv2_aesni
ascon128av11_opt64
aes128n8t8silcv2_aesni
morus1280256v1_avx2
deoxysneq128128v13_aesni
poetv2aes4_ni
paeq80_aesni
norx6441_ymm
aeadaes128ocbtaglen128v1_opt
aes128gcmv1_openssl
lakekeyakv2_generic64
hs1sivlov1_ref
trivia0v2_sse4
acorn128v2_opt
stribob192r2_ssse3
ketjesrv1_reference
pi64cipher128v2_goptv
icepole128av2_ref
scream10v3_sse
aescopav2_ref
omdsha512k128n128tau128v2_sse4
shellaes128v2d8n80_ref
minalpherv11_ref
joltikeq12864v13_ref
primatesv1gibbon80_ref
Performance (cpb)
44.82
49.46
50.06
54.47
61.53
67.45
71.4
80.5
100.4
101.41
130
144.74
158.96
191.06
213.41
235.17
256.33
257.01
391.85
440.1
475.58
489.86
646.63
712.97
954.52
1121.2
1267.94
3879.76
6969.07
7487.21
Figure 6: Comparison of 2nd round CAESAR candidates in the SSH setting (onthe SkyLake processor).
The benchmarking framework and our scripts to visualise the results are available online‖for verification purposes and to encourage other designers and researchers to use ourframework for benchmarking their cryptographic schemes.
6 ConclusionIn this paper, we present a new benchmarking framework to perform software performancemeasurements of authenticated encryption schemes. We apply our framework to benchmarkthe 2nd round CAESAR candidates. Software performance is an important factor inreal-world applications. Our benchmarks show that candidates supported by a software-optimised implementation achieve a huge speedup with regards to others, without anoptimised implementation. The aim of the CAESAR competition is to provide a portfolioof authenticated encryption algorithms, including implementations optimised for software(and hardware), as in the eStream portfolio [est08]. Currently, nearly two thirds ofthe candidates are supported by at least one software optimisation. Nevertheless, withour benchmarks we want to encourage designers to increase the number of optimisedimplementations.
AcknowledgementsWe would like to thank the designers of the CAESAR candidates, which supported us withtheir software implementations. The first author is supported by the Marie Sklodowska-Curie ITN ECRYPT-NET grant (Project Reference 643161).
‖https://github.com/TheBananaMan/caesar_benchmarks_secondround
Ralph Ankele and Robin Ankele 11
References[ABB+14] Elena Andreeva, Begül Bilgin, Andrey Bogdanov, Atul Luykx, Florian Mendel,
Bart Mennink, Nicky Mouha, Quingju Wang, and Kan Yasuda. PRIMATEsv1.02. https://competitions.cr.yp.to/round2/primatesv102.pdf, 2014.
[ABL+15] Elena Andreeva, Andrey Bogdanov, Atul Luykx, Bart Mennink, El-mar Tischhauser, and Kan Yasuda. AES-COPA v.2. https://competitions.cr.yp.to/round2/aescopav2.pdf, 2015.
[AFF+15] Farzaneh Abed, Scott R. Fluhrer, John Foley, Christian Forler, Eik List,Stefan Lucks, David McGrew, and Jakob Wenzel. The POET Family of On-Line Authenticated Encryption Schemes. https://competitions.cr.yp.to/round2/poetv20.pdf, 2015.
[AFL14] Farzaneh Abed, Christian Forler, and Stefan Lucks. General Overview of theAuthenticated Schemes for the First Round of the CAESAR Competition.Cryptology ePrint Archive, Report 2014/792, 2014.
[AJN15] Jean-Philippe Aumasson, Philipp Jovanovic, and Samuel Neves. NORX v2.0.https://competitions.cr.yp.to/round2/norxv20.pdf, 2015.
[BDP+14] Guido Bertoni, Joan Daemen, Michaël Peeters, Gilles Van Assche, andRonny Van Keer. CAESAR submission: KETJE v1. https://competitions.cr.yp.to/round1/ketjev1.pdf, 2014.
[BDP+15] Guido Bertoni, Joan Daemen, Michaël Peeters, Gilles Van Assche, andRonny Van Keer. CAESAR submission: KEYAK v2. https://competitions.cr.yp.to/round2/keyakv21.pdf, 2015.
[Ber14] Daniel J. Bernstein. Caesar: Competition for authenticated encryption:Security, applicability, and robustness. https://competitions.cr.yp.to/caesar.html, 2014.
[Ber16] Daniel J. Bernstein. Supercop. https://bench.cr.yp.to/supercop.html,2016.
[BK14] Alex Biryukov and Dimitry Khovratovich. PAEQ v1. https://competitions.cr.yp.to/round1/paeqv1.pdf, 2014.
[Bux11] Mark Buxton. Intel R© advanced vector extensions programming ref-erence. https://software.intel.com/en-us/blogs/2011/06/13/haswell-new-instruction-descriptions-now-available, 2011.
[CHVV03] Brice Canvel, Alain Hiltgen, Serge Vaudenay, and Martin Vuagnoux. PasswordInterception in a SSL/TLS Channel, pages 583–599. Springer, 2003.
[CM15] Avik Chakraborti and Nandi Mridul. TriviA-ck-v2. https://competitions.cr.yp.to/round2/triviackv2.pdf, 2015.
[CMN+15] Simon Cogliani, Diana Maimut̨, David Naccache, Rodrigo Portella, RezaReyhanitabar, Serge Vaudenay, and Damian Vizár. Offset Merkle-Damgard(OMD) version 2.0. https://competitions.cr.yp.to/round2/omdv20.pdf,2015.
[DEMS15] Christoph Dobraunig, Maria Eichlseder, Florian Mendel, and Martin Schläffer.ASCON v1.1. https://competitions.cr.yp.to/round2/asconv11.pdf, 2015.
12 Software Benchmarking of the 2nd round CAESAR Candidates
[DN15] Nilanjan Datta and Mridul Nandi. ELmD v2.0. https://competitions.cr.yp.to/round2/elmdv20.pdf, 2015.
[est08] estream: the ecrypt stream cipher project. http://www.ecrypt.eu.org/stream/, 2008.
[GLS+15] Vincent Grosso, Gaëtan Leurent, François-Xavier Standaert, Kerem Varici,Anthony Journault, François Durvaux, Lubos Gaspar, and Stephanie Kerckhof.SCREAM: Side-Channel Resistant Authenticated Encryption with Masking.https://competitions.cr.yp.to/round2/screamv3.pdf, 2015.
[GMS+15] Danilo Gligoroski, Hristina Mihajloska, Simona Samardjiska, Håkon Jacobsen,Mohamed El-Hadedy, Rune Erlend Jensen, and Daniel Otte. π-cipher v2.0.https://competitions.cr.yp.to/round2/picipherv20.pdf, 2015.
[Gue12] Shay Gueron. Intel R© advanced encryption standard (aes) new instructions set.Technical report, Intel Corporation, 2012.
[HDF+15] Ekawat Homsirikamol, William Diehl, Ahmed Ferozpuri, Farnoud Farahmand,Malik Umar Sharif, and Kris Gaj. GMU Hardware API for AuthenticatedCiphers. Cryptology ePrint Archive, Report 2015/669, 2015.
[HKR15] Viet Tung Hoang, Ted Krovetz, and Phillip Rogaway. AEZ v4.1: Authenti-cated Encryption by Enciphering. https://competitions.cr.yp.to/round2/aezv41.pdf, 2015.
[HW15] Tao Huang and Hongjun Wu. The Authenticated Cipher MORUS (v1). https://competitions.cr.yp.to/round2/morusv11.pdf, 2015.
[IMG+15] Tetsu Iwata, Kazuhiko Minematsu, Jian Guo, Sumio Morioka, and EitaKobayashi. SILC: SImple Lightweight CFB. https://competitions.cr.yp.to/round2/silcv2.pdf, 2015.
[IMGM15] Tetsu Iwata, Kazuhiko Minematsu, Jian Guo, and Sumio Morioka. CLOC:Compact Low-Overhead CFB. https://competitions.cr.yp.to/round2/clocv2.pdf, 2015.
[JNP15a] Jérémy Jean, Ivica Nikolić, and Thomas Peyrin. Deoxys v1.3. https://competitions.cr.yp.to/round2/deoxysv13.pdf, 2015.
[JNP15b] Jérémy Jean, Ivica Nikolić, and Thomas Peyrin. Joltik v1.3. https://competitions.cr.yp.to/round2/joltikv13.pdf, 2015.
[KR11] Ted Krovetz and Phillip Rogaway. The Software Performance of Authenticated-Encryption Modes, pages 306–327. Springer, 2011.
[KR14] Ted Krovetz and Phillip Rogaway. OCB (v1). https://competitions.cr.yp.to/round1/ocbv1.pdf, 2014.
[Kro15] Ted Krovetz. HS1-SIV (v2). https://competitions.cr.yp.to/round2/hs1sivv2.pdf, 2015.
[Lom11] Chris Lomont. Introduction to intel R© advanced vector exten-sions. https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions, 2011.
[MGH+15] Paweł Morawiecki, Kris Gaj, Ekawat Homsirikamol, Krystian Matusiewicz,Josef Pieprzyk, Marcin Rogawski, Marian Srebrny, and Marcin Wójcik. ICE-POLE v2. https://competitions.cr.yp.to/round2/icepolev2.pdf, 2015.
Ralph Ankele and Robin Ankele 13
[Min16] Kazuhiko Minematsu. AES-OTR v3. https://competitions.cr.yp.to/round2/aesotrv3.pdf, 2016.
[neo] Neon.
[Nik15] Ivica Nikolić. Tiaoxin-346. https://competitions.cr.yp.to/round2/tiaoxinv2.pdf, 2015.
[OSB15] Markku-Juhani O. Sarrinen and Billy B. Brumley. STRIBOBr2: "WHIRLBOB".https://competitions.cr.yp.to/round2/stribobr2.pdf, 2015.
[Pao10] Gabriele Paoloni. How to benchmark code execution times on intel R© ia-32 andia-64 instruction set architectures. Technical report, Intel Corporation, 2010.
[Saa16] Markku-Juhani Saarinen. The BRUTUS automatic cryptanalytic framework.Journal of Cryptographic Engineering, 6(1):75–82, 2016.
[Sie09] Sam Siewert. Using Intel R© Streaming SIMD Extensions andIntel R© Integrated Performance Primitives to Accelerate Algorithms.https://software.intel.com/en-us/articles/using-intel-streaming-simd-extensions-and-intel-integrated-performance-primitives-to-accelerate-algorithms, 2009.
[STA+15] Yu Sasaki, Yosuke Todo, Kazumaro Aoki, Yusuke Naito, Takeshi Sugawara,Yumiko Murakami, Mitsuru Matsui, and Shoichi Hirose. Minalpher v1.1.https://competitions.cr.yp.to/round2/minalpherv11.pdf, 2015.
[Vau02] Serge Vaudenay. Security Flaws Induced by CBC Padding — Applications toSSL, IPSEC, WTLS..., pages 534–545. Springer, 2002.
[Wan15] Lei Wang. SHELL v2.0. https://competitions.cr.yp.to/round2/shellv20.pdf, 2015.
[WH15] Hongjun Wu and Tao Huang. The JAMBU Lightweight Authentication Encryp-tion Mode (v2). https://competitions.cr.yp.to/round2/aesjambuv2.pdf,2015.
[WP14] Hongjun Wu and Bart Preneel. AEGIS: A Fast Authenticated Encryption Al-gorithm (v1). https://competitions.cr.yp.to/round1/aegisv1.pdf, 2014.
[Wu15] Hongjun Wu. ACORN: a Lightweight Authenticated Cipher (vs). https://competitions.cr.yp.to/round2/acornv2.pdf, 2015.
14 Software Benchmarking of the 2nd round CAESAR Candidates
A Comparison of all Implementations per Candidate
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
8.6410.48
25.16
400.75489.48
1313.5
Perf
orm
ance (
cpb)
Figure 7: acorn128v2_opt (blue), acorn128v2_ref (orange)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
0.31
0.410.5
0.64
1.45
5.92
7.68
9.88
17.86
25.84
Perf
orm
ance (
cpb)
Figure 8: aegis128_aesni(blue), aegis128_ref (orange), aegis128l_aesnia (yel-low), aegis128l_aesnib (brown), aegis128l_aesnic (green), aegis128l_ref (lightblue), aegis256_aesni (purple), aegis256_ref (darkblue)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
0.62
0.79
2.01
49.32
66.33
Perf
orm
ance (
cpb)
Figure 9: aezv4_aesni (blue), aezv4_ref (orange)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
6.21
8.61
9.32
10.3710.86
14.04
17.06
20.27
Perf
orm
ance (
cpb)
Figure 10: ascon128av11_opt64 (blue), ascon128av11_ref (orange), as-con128v11_opt64 (yellow), ascon128v11_ref (purple)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
2.52.8
5.18
14.3
24.92
232.82
Perf
orm
ance (
cpb)
Figure 11: aes128n12t8clocv2_aesni (blue), aes128n12t8clocv2_ref (or-ange), aes128n8t8clocv2_aesni (yellow), aes128n8t8clocv2_ref (purple),twine80n6t4clocv2_ref (green), twine80n6t4clocv2_vperm (light blue)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
104.48107.35
128.46
Perf
orm
ance (
cpb)
Figure 12: aescopav2_ref(blue)
Ralph Ankele and Robin Ankele 15
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
0.91
1.31
124.1
191.63
287.61379
444.16
Perf
orm
ance (
cpb)
Figure 13: deoxyseq128128v13_ref (blue), deoxyseq256128v13_ref (orange),deoxysneq128128v13_aesni (yellow), deoxysneq128128v13_ref (purple), de-oxysneq256128v13_aesni (light blue), deoxysneq256128v13_ref (cyan)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
58.159.76
64.02
75.09
82.67
110.38
Perf
orm
ance (
cpb)
Figure 14: elmd1000v2_ref (blue), elmd1001v2_ref (green),elmd101270v2_ref(yellow), elmd101271v2_ref (purple), elmd600v2_ref(green), elmd601v2_ref (cyan doted), elmd61279v2_ref (magneta),elmd61271v2_ref (purple dotted)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
6.95
8.258.86
10.32
12.18
15.01
41.76
Perf
orm
ance (
cpb)
Figure 15: hs1sivhiv1_ref (blue), hs1sivlov1_ref (orange), hs1sivv1_ref (yel-low)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
8.79
9.57
12.31
32.46
37.67
Perf
orm
ance (
cpb)
Figure 16: icepole128av2_ref (blue), icepole128v2_ref (orange), ice-pole256av2_ref (yellow)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
7.21
9.99
17.74
22.95
48.26
78.82
99.3
Perf
orm
ance (
cpb)
Figure 17: aesjambuv2_aesni (blue), aesjambuv2_ref (orange), si-monjambu128v2_ref (yellow), simonjambu64v2_ref (purple), simon-jambu96v2_ref (green)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
482.39
872.78
976.93
1749.99
1904.17
Perf
orm
ance (
cpb)
Figure 18: joltikeq12864v13_ref (light green), joltikeq6464v13_ref(orange), joltikeq80112v13_ref (yellow), joltikeq9696v13_ref (pur-ple), joltikneq12864v13_ref (green), joltikneq6464v13_ref (light blue),joltikneq80112v13_ref (magenta), joltikneq9696v13_ref (blue)
16 Software Benchmarking of the 2nd round CAESAR Candidates
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
36.2238.89
80.12
91.32
114.57
Perf
orm
ance (
cpb)
Figure 19: ketjejrv1_compact (blue), ketjejrv1_reference (orange), ket-jesrv1_compact (yellow), ketjesrv1_reference (purple)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
4.04
5.055.98
11.49
14.27
29.29
146.8
320.04
Perf
orm
ance (
cpb)
Figure 20: lakekeyakv2_reference32bit (magenta), lakekeyakv2_compact(red), lakekeyakv2_generic32 (yellow), lakekeyakv2_generic32lc (purple),lakekayakv2_generic64 (green), lakekeyakv2_generic64lc (light blue),lakekeyakv2_SandyBridge (dark blue)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
8.51
9.4310.08
11.14
13.08
14.46
20.9922.86
27.0429.5
42.64
Perf
orm
ance (
cpb)
Figure 21: riverkeyakv2_compact (orange), riverkeyakv2_generic32 (yel-low), riverkeyak_generic32lc (purple), riverkeyakv2_generic64 (purple),riverkeyakv2_generic64lc (light blue), riverkeyakv2_reference (brown),riverkeyakv2_reference32bits (dark blue), riverkeyakv2_SandyBridge (blue)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
3.58
5.29
8.27
14.54
23.57
86.74
178.1
254.38
836.19
Perf
orm
ance (
cpb)
Figure 22: seakeyakv2_reference32bits (magenta), seakeyakv2_compact(orange), seakeyakv2_generic32 (yellow), seakeyakv2_generic32lc (pur-ple), seakeyakv2_generic64 (green), seakeyakv2_generic64lc (light blue),seakeyakv2_SandyBridge (blue)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
9.1311.1214.8619.8826.07
38.94
67.43
118.69
450.79
772.37
4371.63
Perf
orm
ance (
cpb)
Figure 23: lunarkeyakv2_reference32bits (magenta), lunarkeyakv2_compact(orange), lunarkeyakv2_generic32 (yellow), lunarkeyakv2_generic32lc (pur-ple), lunarkeyav2_generic64 (green), lunarkeyakv2_generic64lc (light blue),lunarkeyakv2_SandyBridge (blue)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
4.92
7.599.84
13.22
20.64
34.92
47.25
175.09
243.92
395.88
1682.93
Perf
orm
ance (
cpb)
Figure 24: oceankeyakv2_reference32bits (magenta), oceankeyakv2_compact(orange), oceankeyakv2_generic32 (yellow), oceankeyakv2_generic32lc (pur-ple), oceankeyakv2_generic64 (green), oceankeyakv2_generic64lc (light blue),oceankeyakv2_SandyBridge (blue)
Ralph Ankele and Robin Ankele 17
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
531.83
555.65
609.55
Perf
orm
ance (
cpb)
Figure 25: minalpherv11_ref (blue)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
0.93
1.26
1.46
1.68
1.992.24
2.55
3.51
4.04
4.76
7.69
9.02
Perf
orm
ance (
cpb)
Figure 26: morus640128v1_ref (orange), morus1280256v1_ref(light blue), morus1280128v1_avx2 (red), morus1280128v1_ref64(brown), morus1280256v1_ref64 (brown), morus1280128v1_ref(brown), morus1280256v1_avx2 (green dotted), morus1280256v1_sse2(blue),morus6400128v1_sse2 (purple),morus1280128v1_sse2 (yellow)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
2.23
2.87
3.744.28
5.56
6.777.96
9.8511.29
20.79
60.8472
Perf
orm
ance (
cpb)
Figure 27: norx0841_ref (blue), norx1641_ref (orange), norx3241_ref(black), norx3241_xmm (purple), norx3261_ref (green), norx3261_xmm(pink), norx6441_ref (magenta), norx6441_ymm (brown), norx6441_xmm(light green), norx6444_ref (yellow), norx6461_ref (purple), norx6461_xmm(cyan dotted), norx6461_ymm (light blue)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
1
1.22
1.772.09
7.85
28.4832.2936.15
40.445.8851.82
Perf
orm
ance (
cpb)
Figure 28: aes128ocbtaglen128v1_opt (black), aes128ocbtaglen128v1_ref(purple), aes128ocbtaglen64v1_ref (purple), aes128ocbtaglen96v1_ref(purple), aes192ocbtaglen128v1_opt (yellow), aes192ocbtaglen128v1_ref(blue), aes192ocbtaglen64v1_ref (blue), aes192ocbtaglen96v1_ref (blue),aes256ocbtaglen128v1_opt (red), aes256ocbtaglen128v1_ref (green),aes256ocbtaglen64v1_ref (green), aes256ocbtaglen96v1_ref (green)
128 256 384 512 640 768 896 1024 1152
Message length (bytes)
0.680.79
1
2.56
13.34
20.47
Perf
orm
ance (
cpb)
Figure 29: aes128otrpv3_ref (green), aes128otrsv3_ref (yellow),aes128otrpv3_nip7m1 (blue), aes128otrpv3_nip7m2 (magenta dotted),aes128otrpv3_nip8m1 (purple dotted), aes128otrpv3_nip8m2 (light green),aes128otrsv3_nip7m1 (light red dotted), aes128otrsv3_nip7m2 (magentadotted), aes128otrsv3_nip8m1 (dark blue), aes128otrsv3_nip8m2 (light blue)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
0.9111.05
3.39
17.46
25.77
Perf
orm
ance (
cpb)
Figure 30: aes128otrpv3_ref (green), aes256otrsv3_ref (yellow),aes256otrpv3_nip7m1 (blue), aes256otrpv3_nip7m2 (magenta dotted),aes256otrpv3_nip8m1 (purple dotted), aes256otrpv3_nip8m2 (light green),aes256otrsv3_nip7m1 (light red dotted), aes256otrsv3_nip7m2 (magentadotted), aes256otrsv3_nip8m1 (dark blue), aes256otrsv3_nip8m2 (light blue)
18 Software Benchmarking of the 2nd round CAESAR Candidates
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
4.254.976.12
9.46
329.12397.07510.58
1556.83
Perf
orm
ance (
cpb)
Figure 31: paeq128_aesni (pink), paeq128_ref (purple), paeq128t_aesni(black), paeq128t_ref (purple), paeq128tnm_aesni(cyan dotted),paeq128tnm_ref (light blue dotted), paeq160_aesni (magenta), paeq160_ref(blue), paeq64_aesni (brown), paeq64_ref (yellow), paeq80_aesni (lightgreen), paeq80_ref (green dotted)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
21.43
23
24.5125.63
31.88
34.65
48.91
59.91
Perf
orm
ance (
cpb)
Figure 32: omdsha256k128n96tau128v2_avx1 (red),omdsha256k128n96tau128v2_ref (green dotted), omd-sha256k128n96tau128v2_sse4 (yellow), omdsha256k128n96tau64v2_avx1(purple), omdsha256k128n96tau64v2_ref (green dotted), omd-sha256k128n96tau64v2_sse4 (red), omdsha256k128n96tau96v2_avx1(purple), omdsha256k128n96tau96v2_ref (blue dotted), omd-sha256k128n96tau96v2_sse4 (red)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
21.45
24.52
31.88
34.62
49.03
59.72
Perf
orm
ance (
cpb)
Figure 33: omdsha256k192n104tau128v2_avx1 (blue), omd-sha256k192n104tau128v2_ref (orange), omdsha256k192n104tau128v2_sse4(yellow dotted)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
21.43
23
24.5125.63
31.88
34.65
48.91
59.91
Perf
orm
ance (
cpb)
Figure 34: omdsha256k256n104tau160v2_avx1 (red dotted), omd-sha256k256n104tau160v2_ref (blue), omdsha256k256n104tau160v2_sse4(red dotted), omdsha256k256n248tau256v2_avx1 (yellow), omd-sha256k256n248tau256v2_ref (green), omdsha256k256n248tau256v2_sse4(purple)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
21.4522.12
24.5
31.85
34.58
48.98
59.68
Perf
orm
ance (
cpb)
Figure 35: omdsha512k128n128tau128v2_avx1 (light blue dotted), omd-sha512k128n128tau128v2_ref (red), omdsha512k128n128tau128v2_sse4(purple dotted), omdsha512k256n256tau256v2_avx1 (yellow), omd-sha512k256n256tau256v2_ref (green), omdsha512k256n256tau256v2_sse4(purple), omdsha512k512n256tau256v2_avx1 (yellow), omd-sha512k512n256tau256v2_ref (green), omdsha512k512n256tau256v2_sse4(light blue dotted)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
5.846.73
8.39.51
17.1219.89
29.3331.81
56.5365.5
72.47
128.19
Perf
orm
ance (
cpb)
Figure 36: pi16cipher096v2_ref (red), pi16cipher128v2_ref (light greendotted), pi32cipher128v2_ref (pink dotted) pi32cipher256v2_ref (lightgblue), pi64cipher128v2_ref (yellow), pi64cipher256v2_ref (cyan dot-ted), pi16cipher096v2_gotpv (black dotted), pi16cipher128v2_goptv(blue), pi32cipher128v2_goptv (green) pi32cipher256v2_goptv (magenta),pi64cipher128v2_goptv (purple), pi64cipher256v2_goptv (purple)
Ralph Ankele and Robin Ankele 19
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
1.42
2.01
2.63
3.19
8.4
20.26
29.9
46.77
Perf
orm
ance (
cpb)
Figure 37: poetv2aes128_ni (blue), poetv2aes128_ref (orange), poetv2aes4_ni(yellow) poetv2aes4_ref (purple)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
1306.19
2344.4
2549.63
4628.23
5228.8
Perf
orm
ance (
cpb)
Figure 38: primatesv1ape120_ref (blue), primatesv1ape80_ref (orange),primatesv1gibbon120_ref (yellow), primatesv1gibbon80_ref (purple), pri-matesv1hanuman120_ref (green), primatesv1hanuman80_ref (cyan dotted)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
8
9.61
11.91
38.59
43.6145.8
50.92
Perf
orm
ance (
cpb)
Figure 39: scream10v3_ref (blue), scream10v3_sse (orange), scream12v3_ref(yellow), scream12v3_sse (purple)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
35.87
40.8
76.69
85.06
94.25
Perf
orm
ance (
cpb)
Figure 40: shellaes128v2d4n64_ref (red), shellaes128v2d4n80_ref(blue), shellaes128v2d5n64_ref (green), shellaes128v2d5n80_ref (pur-ple), shellaes128v2d6n64_ref (yellow), shellaes128v2d6n80_ref (cyan),shellaes128v2d7n64_ref (light green), shellaes128v2d7n80_ref (purple),shellaes128v2d8n64_ref (orange), shellaes128v2d8n80_ref (black)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
2.52.82
5.27
23.8328.56
4980.83
Perf
orm
ance (
cpb)
Figure 41: present80n6t4silcv2_ref(green), aes128n12t8silcv2_aesni(blue), aes128n12t8silcv2_ref (red), aes128n8t8silcv2_aesni (yellow),aes128n8t8silcv2_ref (purple)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
22.14
31.25
65.771.43
428.68
716.9
Perf
orm
ance (
cpb)
Figure 42: stribob192r2_8bit (blue), stribob192r2_bitslice (orange), stri-bob192r2_ref (yellow), stribob192r2_smaller (purple), stribob192r2_ssse3(green)
20 Software Benchmarking of the 2nd round CAESAR Candidates
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
0.29
0.47
1.95
4.73
7.01
27.17
Perf
orm
ance (
cpb)
Figure 43: tiaoxinv2_nim (blue), tiaoxinv2_ref (orange)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
8.74
9.98
30.7133.03
52.66
Perf
orm
ance (
cpb)
Figure 44: trivia0v2_ref (blue), trivia128v2_ref (yellow), trivia128v2_sse4 (or-ange)
0 128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Message length (bytes)
1.51
2.75
11.22
978311969.25
Perf
orm
ance (
cpb)
Figure 45: aesgcm128_ref (orange), aesgcm128_openssl (blue), aes-gcm256_openssl (yellow)
Ralph Ankele and Robin Ankele 21
B 3D Plots for Increasing Message and Associated DataLengths
Message (bytes)Associated Data (bytes)
15
00
20
128 128
25
256 256
30
384384
Perf
orm
ance (
cycle
s/b
yte
)
35
512 512
40
640 640
45
768 768
50
8968961024 1024
11521152 128012801408 1408
153615361664 1664
179217921920 1920
2048 2048
cpb: 48.77
cpb: 23.51
cpb: 16.22
cpb: 13.35
cpb: 51.01
cpb: 30.88
cpb: 20.88
cpb: 15.82
cpb: 13.32
cpb: 25.73
cpb: 21.97
cpb: 18.28
cpb: 15.31
cpb: 13.29
cpb: 18.47cpb: 17.48
cpb: 16.16
cpb: 14.66
cpb: 13.24
cpb: 15.55cpb: 15.23
cpb: 14.75
cpb: 14.02
cpb: 13.15
Figure 46: acorn128v2_opt
Message (bytes)Associated Data (bytes)
00
20
128128
40
256256 384384
Perf
orm
ance (
cycle
s/b
yte
)
60
512 512640 640
80
768768
100
8968961024 1024
1152 11521280 1280
1408 140815361536 16641664 17921792 19201920 20482048
cpb: 95.01
cpb: 29.27
cpb: 3.92
cpb: 5.59
cpb: 101.9
cpb: 50.29
cpb: 19.7
cpb: 11
cpb: 2.94
cpb: 28.49cpb: 25.11
cpb: 16.34
cpb: 9.63
cpb: 5.06
cpb: 11.46cpb: 12.19
cpb: 8.29
cpb: 6.74
cpb: 2.57
cpb: 6.37cpb: 5.92cpb: 4.79
cpb: 4.12
cpb: 2.83
Figure 47: aeadaes256ocbtaglen128v1_opt
Message (bytes)Associated Data (bytes)
0.6
0 0
0.8
128128
1
256 256
1.2
384384
1.4
Perf
orm
ance (
cycle
s/b
yte
)
512 512
1.6
640640
1.8
768768
2
2.2
896 8961024 1024
115211521280 1280
1408 140815361536
1664 16641792 1792
192019202048 2048
cpb: 2.22
cpb: 1
cpb: 0.66
cpb: 0.52
cpb: 2.12
cpb: 1.27
cpb: 0.83
cpb: 0.61
cpb: 0.5
cpb: 0.92
cpb: 0.79
cpb: 0.65
cpb: 0.55
cpb: 0.49
cpb: 0.58cpb: 0.55
cpb: 0.52
cpb: 0.48
cpb: 0.45
cpb: 0.43cpb: 0.43cpb: 0.42
cpb: 0.42
cpb: 0.42
Figure 48: aegis128l_aesnic
Message (bytes)Associated Data (bytes)
5.5
0 0
6
128128
6.5
256 256
7
384 384
Perf
orm
ance (
cycle
s/b
yte
)
7.5
512 512640640
8
768768
8.5
896 8961024 1024
1152 115212801280 14081408 15361536
1664 166417921792 19201920
2048 2048
cpb: 8.83
cpb: 6.53
cpb: 5.77
cpb: 5.5
cpb: 7.89
cpb: 6.72
cpb: 5.99
cpb: 5.65
cpb: 5.44
cpb: 6.05cpb: 5.92
cpb: 5.7
cpb: 5.52
cpb: 5.41
cpb: 5.52cpb: 5.52cpb: 5.47
cpb: 5.41
cpb: 5.35
cpb: 5.31cpb: 5.32cpb: 5.33
cpb: 5.31
cpb: 5.3
Figure 49: aes128n8t8clocv2_aesni
Message (bytes)Associated Data (bytes)
5.5
00
6
128128
6.5
256 256
7
384 384
7.5
Perf
orm
ance (
cycle
s/b
yte
)
512 512
8
640 640
8.5
768 768
9
896896 10241024 115211521280 1280
140814081536 1536
166416641792 1792
1920 19202048 2048
cpb: 9.16
cpb: 6.56
cpb: 5.81
cpb: 5.53
cpb: 8.78
cpb: 7.15
cpb: 6.2
cpb: 5.73
cpb: 5.51
cpb: 6.34
cpb: 6.14
cpb: 5.84
cpb: 5.61
cpb: 5.47
cpb: 5.65cpb: 5.63
cpb: 5.56
cpb: 5.47
cpb: 5.4
cpb: 5.37cpb: 5.38cpb: 5.36
cpb: 5.34
cpb: 5.33
Figure 50: aes128n8t8silcv2_aesni
Message (bytes)Associated Data (bytes)
120
0 0
140
128128
160
256 256
180
384384
200
Perf
orm
ance (
cycle
s/b
yte
)
512512
220
640 640
240
768768
260
896896 102410241152 1152
128012801408 1408
15361536 166416641792 1792
1920 192020482048
cpb: 272.2
cpb: 234.1
cpb: 223.1
cpb: 218.7
cpb: 191.7cpb: 189.6
cpb: 202.2cpb: 208.6
cpb: 211.7
cpb: 135.6
cpb: 148.7
cpb: 170.8cpb: 188.4
cpb: 200.2
cpb: 119.2cpb: 127.7
cpb: 145.2
cpb: 165.2cpb: 183.2
cpb: 112.5cpb: 117.2cpb: 128.1
cpb: 144cpb: 162.8
Figure 51: aescopav2_ref
22 Software Benchmarking of the 2nd round CAESAR Candidates
Message (bytes)Associated Data (bytes)
11
00
11.5
128128
12
256256
12.5
384 384
13
Perf
orm
ance (
cycle
s/b
yte
)
512512
13.5
640640
14
768768
14.5
896 8961024 1024
1152 115212801280 14081408 15361536 16641664
1792 17921920 1920
20482048
cpb: 14.82
cpb: 11.88
cpb: 11.05
cpb: 10.71
cpb: 14.95
cpb: 12.73
cpb: 11.61
cpb: 11.06
cpb: 10.78
cpb: 12.01
cpb: 11.64
cpb: 11.27
cpb: 10.95
cpb: 10.74
cpb: 11.18cpb: 11.08
cpb: 10.94
cpb: 10.79
cpb: 10.65
cpb: 10.85cpb: 10.82
cpb: 10.77
cpb: 10.71
cpb: 10.63
Figure 52: aesjambuv2_aesni
Associated Data (bytes)Message (bytes)
1
00
1.5
128 128
2
256 256
2.5
384384
Perf
orm
ance (
cycle
s/b
yte
)
3
512512
3.5
640640
4
768 768
4.5
896896 10241024 11521152 128012801408 1408
1536 153616641664 17921792 19201920
2048 2048
cpb: 4.62
cpb: 2.38
cpb: 1.63
cpb: 1.33
cpb: 2.05
cpb: 2.58
cpb: 1.9
cpb: 1.49
cpb: 1.28
cpb: 0.96
cpb: 1.5
cpb: 1.41
cpb: 1.28
cpb: 1.18cpb: 0.66
cpb: 0.97cpb: 1.02
cpb: 1.04
cpb: 1.05 cpb: 0.54cpb: 0.7cpb: 0.76
cpb: 0.82
cpb: 0.89
Figure 53: aezv4_aesni
Message (bytes)Associated Data (bytes)
15
00
16
128128
17
256256
18
384 384
Perf
orm
ance (
cycle
s/b
yte
)
19
512512 640640
20
768 768
21
896 8961024 1024
1152 11521280 1280
140814081536 1536
1664 166417921792 19201920 20482048
cpb: 20.41
cpb: 16.54
cpb: 15.52
cpb: 15.1
cpb: 21.71
cpb: 18.2
cpb: 16.47
cpb: 15.61
cpb: 15.18
cpb: 16.75
cpb: 16.18
cpb: 15.72
cpb: 15.33
cpb: 15.1
cpb: 15.38cpb: 15.26
cpb: 15.17
cpb: 15.08
cpb: 14.9
cpb: 14.81cpb: 14.77
cpb: 14.78
cpb: 14.75
cpb: 14.75
Figure 54: ascon128av11_opt64
Message (bytes)Associated Data (bytes)
4
0 0
6
128128
8
256256
10
384384
Perf
orm
ance (
cycle
s/b
yte
)
12
512 512
14
640640
16
768768
18
8968961024 1024
115211521280 1280
140814081536 1536
1664 166417921792
1920 19202048 2048
cpb: 18.07
cpb: 7.41
cpb: 4.18
cpb: 2.81
cpb: 17.93
cpb: 10.3
cpb: 6.21
cpb: 4.03
cpb: 2.81
cpb: 7.32
cpb: 6.15
cpb: 4.81
cpb: 3.66
cpb: 2.78
cpb: 4.11cpb: 3.94
cpb: 3.55
cpb: 3.17
cpb: 2.65
cpb: 2.77cpb: 2.79cpb: 2.68
cpb: 2.59
cpb: 2.41
Figure 55: deoxysneq256128v13_aesni
Associated Data (bytes) Message (bytes)
40
0 0128128
50
256 256
60
384 384
Perf
orm
ance (
cycle
s/b
yte
)
512 512
70
640640768 768
80
8968961024 1024
11521152 12801280 14081408 153615361664 1664
1792 17921920 1920
20482048
cpb: 85.89
cpb: 66.21
cpb: 60.5
cpb: 58.25
cpb: 64.66
cpb: 60.42
cpb: 58.33
cpb: 57.31
cpb: 56.8cpb: 44.84
cpb: 47.64cpb: 50.61cpb: 52.85
cpb: 54.39
cpb: 39.1cpb: 41.3
cpb: 44.3cpb: 47.71
cpb: 50.86
cpb: 36.97cpb: 38.16cpb: 40.17cpb: 43.11
cpb: 46.61
Figure 56: elmd600v2_ref
Message (bytes)Associated Data (bytes)
5
0 0128 128
10
256 256
15
384 384
Perf
orm
ance (
cycle
s/b
yte
)
512 512
20
640 640
25
768768
30
8968961024 1024
1152 115212801280
1408 14081536 1536
16641664 17921792 192019202048 2048
cpb: 30.39
cpb: 16.81
cpb: 12.88
cpb: 11.25
cpb: 21.93
cpb: 14.92
cpb: 12.83
cpb: 11.32
cpb: 10.74
cpb: 7.13cpb: 7.81cpb: 8.93
cpb: 9.2
cpb: 9.61cpb: 3.53cpb: 4.16cpb: 5.48cpb: 6.79
cpb: 8.01
cpb: 1.97cpb: 2.49cpb: 3.4cpb: 4.58
cpb: 6.1
Figure 57: hs1sivlov1_ref
Ralph Ankele and Robin Ankele 23
Message (bytes)Associated Data (bytes)
00
30
128128
40
256 256
50
384 384
Perf
orm
ance (
cycle
s/b
yte
)
512 512
60
640640
70
768 768
80
896896 102410241152 1152
12801280 140814081536 1536
1664 16641792 1792
19201920 20482048
cpb: 82.96
cpb: 41.84
cpb: 30.05
cpb: 25.31
cpb: 79.13
cpb: 44.2
cpb: 32.77
cpb: 27.01
cpb: 24.08
cpb: 38.05
cpb: 30.85
cpb: 27.67
cpb: 25.11
cpb: 23.35
cpb: 26.22cpb: 24.11
cpb: 23.55
cpb: 22.87
cpb: 22.25
cpb: 21.5cpb: 20.74
cpb: 20.8
cpb: 20.87
cpb: 20.94
Figure 58: icepole128av2_ref
Message (bytes)Associated Data (bytes)
110
00
120
128128 256256
130
384384
140
Perf
orm
ance (
cycle
s/b
yte
)
512512
150
640640
160
768 768
170
896896 102410241152 1152
12801280 14081408 15361536 16641664 17921792 19201920 20482048
cpb: 171.8
cpb: 127.4
cpb: 114.7
cpb: 109.5
cpb: 165.3
cpb: 134.3
cpb: 119.7
cpb: 112.5
cpb: 108.8
cpb: 121.3
cpb: 116.8
cpb: 112.9
cpb: 109.8
cpb: 107.7
cpb: 108.8cpb: 108
cpb: 107.4
cpb: 106.9
cpb: 106.2
cpb: 103.7cpb: 103.6
cpb: 103.8cpb: 104
cpb: 104.4
Figure 59: ketjesrv1_reference
Associated Data (bytes) Message (bytes)
8
0 0128128
10
256 256
12
384 384
14
Perf
orm
ance (
cycle
s/b
yte
)
512512
16
640640
18
768768896 896
20
1024 10241152 1152
1280 128014081408 15361536 16641664 17921792
1920 192020482048
cpb: 20.01
cpb: 12.34
cpb: 8.91
cpb: 7.55
cpb: 19.41
cpb: 14.36
cpb: 11.72
cpb: 8.28
cpb: 7.33
cpb: 9.23cpb: 9.26
cpb: 9.2cpb: 8.26
cpb: 7.16
cpb: 7.4cpb: 7.64
cpb: 7.93cpb: 7.07
cpb: 6.96
cpb: 6.16cpb: 6.37
cpb: 6.67cpb: 6.7
cpb: 6.46
Figure 60: lakekeyakv2_generic64lc
Associated Data (bytes) Message (bytes)
1.5
0 0
2
128128
2.5
256256
3
384 384
Perf
orm
ance (
cycle
s/b
yte
)
512512
3.5
640 640
4
768768
4.5
896 896102410241152 1152
1280 12801408 1408
153615361664 1664
179217921920 1920
20482048
cpb: 4.75
cpb: 2.34
cpb: 1.66
cpb: 1.4
cpb: 4.75
cpb: 2.94
cpb: 2.04
cpb: 1.6
cpb: 1.38
cpb: 2.33
cpb: 2.02
cpb: 1.74
cpb: 1.5
cpb: 1.35
cpb: 1.66cpb: 1.58
cpb: 1.49cpb: 1.39
cpb: 1.31
cpb: 1.37
cpb: 1.36cpb: 1.34
cpb: 1.3
cpb: 1.27
Figure 61: morus640128v1_sse2
Message (bytes)Associated Data (bytes)
4
0 0
5
128128
6
256 256
7
384384
Perf
orm
ance (
cycle
s/b
yte
)
8
512 512
9
640640
10
768768
11
896896 102410241152 1152
12801280 14081408 153615361664 1664
1792 17921920 1920
2048 2048
cpb: 11.19
cpb: 5.7
cpb: 3.91
cpb: 3.31
cpb: 11.15
cpb: 7.78
cpb: 5.4
cpb: 3.98
cpb: 3.39
cpb: 5.77
cpb: 5.43
cpb: 4.62
cpb: 3.79
cpb: 3.35
cpb: 3.92cpb: 4.01
cpb: 3.8cpb: 3.45
cpb: 3.21
cpb: 3.35
cpb: 3.41cpb: 3.37
cpb: 3.22
cpb: 3.11
Figure 62: norx6441_xmm
Associated Data (bytes) Message (bytes)
00
20
128128
30
256256 384384
40
Perf
orm
ance (
cycle
s/b
yte
)
512512
50
640 640768768
60
8968961024 1024
1152 11521280 1280
140814081536 1536
166416641792 1792
1920 19202048 2048
cpb: 65.95
cpb: 33.08
cpb: 23.73
cpb: 20.02
cpb: 57.01
cpb: 32.82
cpb: 24.75
cpb: 20.74
cpb: 20.08
cpb: 24.59
cpb: 20.57
cpb: 19.3
cpb: 18.27
cpb: 17.59
cpb: 15.28cpb: 14.43
cpb: 14.9cpb: 15.41
cpb: 15.89cpb: 11.58cpb: 12.17
cpb: 11.97cpb: 12.84
cpb: 13.88
Figure 63: omdsha512k512n256tau256v2_avx1
24 Software Benchmarking of the 2nd round CAESAR Candidates
Message (bytes)Associated Data (bytes)
4
0 0
6
128 128256 256
8
384 384
Perf
orm
ance (
cycle
s/b
yte
)
10
512 512640640
12
768768896 89610241024
1152 11521280 1280
14081408 153615361664 1664
1792 179219201920
2048 2048
cpb: 13.92
cpb: 8.45
cpb: 7.05
cpb: 6.42
cpb: 8.54
cpb: 10.03
cpb: 7.87
cpb: 6.97
cpb: 6.4
cpb: 5.26
cpb: 6.83
cpb: 6.45
cpb: 6.31
cpb: 6.1 cpb: 4.09
cpb: 5.03
cpb: 5.19cpb: 5.42
cpb: 5.57
cpb: 3.73cpb: 4.23
cpb: 4.39cpb: 4.7
cpb: 5.01
Figure 64: paeq64_aesni
Associated Data (bytes) Message (bytes)
30
00128 128
40
256256384 384
50
Perf
orm
ance (
cycle
s/b
yte
)
512512 640640
60
768 768
70
8968961024 1024
115211521280 1280
140814081536 1536
1664 16641792 1792
19201920 20482048
cpb: 71.13
cpb: 39.16
cpb: 29.96
cpb: 26.27
cpb: 70.16
cpb: 46.72
cpb: 35.02
cpb: 29.02
cpb: 26.03
cpb: 38.18
cpb: 34.46
cpb: 30.75
cpb: 27.68
cpb: 25.61
cpb: 28.95
cpb: 28.24cpb: 27.27
cpb: 26.07
cpb: 24.98
cpb: 25.25cpb: 25.12
cpb: 24.94cpb: 24.6
cpb: 24.19
Figure 65: pi64cipher128v2_ref
Message (bytes)Associated Data (bytes)
15
20
0 0
25
128128
30
256 256
35
384384
40
Perf
orm
ance (
cycle
s/b
yte
)
512512
45
640 640
50
768 768
55
8968961024 1024
1152 115212801280 14081408 15361536
1664 16641792 1792
1920 192020482048
cpb: 57.01
cpb: 27.41
cpb: 18.96
cpb: 15.57
cpb: 59.99
cpb: 43.1
cpb: 27.84
cpb: 20.25
cpb: 16.41
cpb: 28.3
cpb: 27.82
cpb: 22.72
cpb: 18.69
cpb: 15.98
cpb: 19.34cpb: 20.19
cpb: 18.67
cpb: 16.94
cpb: 15.36
cpb: 15.72cpb: 16.36
cpb: 15.94cpb: 15.34
cpb: 14.61
Figure 66: scream10v3_sse
Message (bytes)Associated Data (bytes)
40
00
60
128 128256 256
80
384384
100
Perf
orm
ance (
cycle
s/b
yte
)
512512
120
640640
140
768768
160
8968961024 1024
1152 11521280 1280
1408 14081536 1536
1664 16641792 1792
1920 192020482048
cpb: 172.9
cpb: 108.2
cpb: 89.14
cpb: 81.37
cpb: 109.5
cpb: 91.72
cpb: 83.77
cpb: 79.16
cpb: 76.98 cpb: 45.96cpb: 53.34
cpb: 60.69
cpb: 66.42
cpb: 69.97
cpb: 28.31cpb: 34.02
cpb: 42.28
cpb: 51.6
cpb: 59.96
cpb: 20.96
cpb: 24.33
cpb: 30.05cpb: 38.15
cpb: 47.87
Figure 67: shellaes128v2d4n80_ref
Message (bytes)Associated Data (bytes)
0 0
30
128128 256256
35
384384
Perf
orm
ance (
cycle
s/b
yte
)
512 512640640
40
768 7688968961024 1024
1152 11521280 1280
1408 14081536 1536
16641664 179217921920 1920
20482048
cpb: 44.45
cpb: 31.88
cpb: 28.04
cpb: 26.66
cpb: 43.44
cpb: 31.91
cpb: 28.58
cpb: 27.07
cpb: 26.2
cpb: 30.97
cpb: 28.31
cpb: 27.38
cpb: 26.55
cpb: 26.07
cpb: 27.31
cpb: 26.51
cpb: 26.29cpb: 26.09
cpb: 25.86
cpb: 25.93
cpb: 25.61cpb: 25.58
cpb: 25.55
cpb: 25.51
Figure 68: stribob192r2_ssse3
Associated Data (bytes) Message (bytes)
0.5
0 0
1
128128
1.5
256256
2
384 384
Perf
orm
ance (
cycle
s/b
yte
)
512512
2.5
640 640
3
768768
3.5
896896 10241024 115211521280 1280
1408 140815361536 16641664 17921792 19201920
2048 2048
cpb: 3.67
cpb: 1.47
cpb: 0.84
cpb: 0.59
cpb: 3.45
cpb: 1.98
cpb: 1.21
cpb: 0.79
cpb: 0.58
cpb: 1.34
cpb: 1.12
cpb: 0.87
cpb: 0.67
cpb: 0.54
cpb: 0.74cpb: 0.71
cpb: 0.64cpb: 0.56
cpb: 0.49
cpb: 0.49
cpb: 0.49cpb: 0.48
cpb: 0.46
cpb: 0.44
Figure 69: tiaoxinv2_nim
Ralph Ankele and Robin Ankele 25
Associated Data (bytes) Message (bytes)
4500
50
128128
55
256256
60
384 384
Perf
orm
ance (
cycle
s/b
yte
)
512 512
65
640640
70
768768896 896
1024 102411521152
1280 128014081408
1536 15361664 1664
1792 179219201920 20482048
cpb: 74.49
cpb: 54.99
cpb: 49.41
cpb: 47.18
cpb: 72.48
cpb: 58.97
cpb: 52.08
cpb: 48.67
cpb: 46.93
cpb: 52.89
cpb: 51.02
cpb: 49.11
cpb: 47.55
cpb: 46.5
cpb: 47.28cpb: 47.07
cpb: 46.7cpb: 46.26
cpb: 45.87
cpb: 45.07
cpb: 45.07cpb: 45.1
cpb: 45.11
cpb: 45.08
Figure 70: trivia0v2_ref
Message (bytes)Associated Data (bytes)
00
25
128 128256256
30
384 384
Perf
orm
ance (
cycle
s/b
yte
)
512 512
35
640 640768768
40
896 8961024 1024
11521152 128012801408 1408
1536 15361664 1664
1792 17921920 1920
20482048
cpb: 40.2
cpb: 30.19
cpb: 27.24
cpb: 26.12
cpb: 38.35
cpb: 30.43
cpb: 27.78
cpb: 26.37
cpb: 25.74
cpb: 26.12
cpb: 25.04cpb: 25.06
cpb: 25.18
cpb: 25.12cpb: 22.56cpb: 22.6cpb: 22.98
cpb: 23.7
cpb: 24.19cpb: 21.23
cpb: 21.08cpb: 21.54cpb: 22.19
cpb: 22.99
Figure 71: aes128otrsv2_ref
Message (bytes)Associated Data (bytes)
4
0 0
5
128 128256 256
6
384384
7
Perf
orm
ance (
cycle
s/b
yte
)
512512
8
640 640
9
768 768
10
8968961024 1024
1152 11521280 1280
14081408 153615361664 1664
1792 17921920 1920
2048 2048
cpb: 10.2
cpb: 5.19
cpb: 3.92
cpb: 3.21
cpb: 10.6
cpb: 6.58
cpb: 4.63
cpb: 3.8
cpb: 3.19
cpb: 5.86
cpb: 5.06
cpb: 4.26
cpb: 3.71
cpb: 3.23
cpb: 4.18cpb: 3.97
cpb: 3.76
cpb: 3.46
cpb: 3.17
cpb: 3.41cpb: 3.35
cpb: 3.28
cpb: 3.23
cpb: 3.1
Figure 72: aes128poetv2aes4ls0lt0_ni
26 Software Benchmarking of the 2nd round CAESAR Candidates
C Bar Charts for Real-World Message Sizes for each CAE-SAR Candidate
1 16 557 1500 16000 10000000
50
100
150
200
250
300
350
400
Message Size
Pe
rfo
rma
nce
(cp
b)
391.85
116.13
11.65 9.05 7.67 7.52
Figure 73: acorn128v2_opt
1 16 557 1500 16000 10000000
20
40
60
80
100
120
140
160
180
200
Message Size
Pe
rfo
rma
nce
(cp
b)
191.06
50.89
2.51 1.26 0.6 0.53
Figure 74: aeadaes128ocbtaglen128v1_opt
1 16 557 1500 16000 10000000
5
10
15
20
25
30
35
40
45
Message Size
Pe
rfo
rma
nce
(cp
b)
44.82
12.74
0.7 0.4 0.22 0.21
Figure 75: aegis128l_aesnic
1 16 557 1500 16000 10000000
10
20
30
40
50
60
70
Message Size
Pe
rfo
rma
nce
(cp
b)
67.45
18.72
2.98 2.57 2.34 2.32
Figure 76: aes128n12t8clocv2_aesni
1 16 557 1500 16000 10000000
10
20
30
40
50
60
70
80
90
Message Size
Pe
rfo
rma
nce
(cp
b)
80.5
22.71
3.13 2.62 2.35 2.33
Figure 77: aes128n8t8silcv2_aesni
1 16 557 1500 16000 10000000
10
20
30
40
50
60
Message Size
Pe
rfo
rma
nce
(cp
b)
54.47
13.23
1.21 0.82 0.58 0.57
Figure 78: aes128otrpv3_ref
Ralph Ankele and Robin Ankele 27
1 16 557 1500 16000 10000000
20
40
60
80
100
120
140
Message Size
Pe
rfo
rma
nce
(cp
b)
130
34.77
2.48 1.6 1.12 1.07
Figure 79: poetv2aes4_ni
1 16 557 1500 16000 10000000
100
200
300
400
500
600
700
800
900
1000
Message Size
Pe
rfo
rma
nce
(cp
b)
954.52
275.55
110.15 105.68 103.1 102.95
Figure 80: aescopav2_ref
1 16 557 1500 16000 10000000
10
20
30
40
50
60
70
Message Size
Pe
rfo
rma
nce
(cp
b)
61.53
21.52
6.66 6.31 6.13 6.11
Figure 81: aesjambuv2_aesni
1 16 557 1500 16000 10000000
5
10
15
20
25
30
35
40
45
50
Message Size
Pe
rfo
rma
nce
(cp
b)
49.46
9.13
0.84 0.68 0.53 0.52
Figure 82: aezv4_aesni
1 16 557 1500 16000 10000000
10
20
30
40
50
60
70
80
Message Size
Pe
rfo
rma
nce
(cp
b)
71.4
25.03
6.63 6.25 6.06 6.04
Figure 83: ascon128av11_opt64
1 16 557 1500 16000 10000000
20
40
60
80
100
120
Message Size
Pe
rfo
rma
nce
(cp
b)
101.41
29.02
2.05 1.37 0.77 0.73
Figure 84: deoxysneq128128v13_aesni
1 16 557 1500 16000 10000000
100
200
300
400
500
600
700
800
Message Size
Pe
rfo
rma
nce
(cp
b)
743.39
256.27
63.97 59.19 57.05 56.8
Figure 85: elmd600v2_ref
1 16 557 1500 16000 10000000
50
100
150
200
250
300
Message Size
Pe
rfo
rma
nce
(cp
b)
256.33
73.5
8.99 7.11 6.53 6.71
Figure 86: hs1sivlov1_ref
28 Software Benchmarking of the 2nd round CAESAR Candidates
1 16 557 1500 16000 10000000
100
200
300
400
500
600
700
Message Size
Pe
rfo
rma
nce
(cp
b)
646.63
186.55
13.65 9.45 7.39 7.2
Figure 87: icepole128av2_ref
1 16 557 1500 16000 10000000
50
100
150
200
250
300
350
400
450
500
Message Size
Pe
rfo
rma
nce
(cp
b)
475.58
159.02
39.78 36.76 35.25 35.26
Figure 88: ketjesrv1_reference
1 16 557 1500 16000 10000000
50
100
150
200
250
300
350
400
450
500
Message Size
Pe
rfo
rma
nce
(cp
b)
474.14
135.44
7.08 4.2 2.6 2.55
Figure 89: seakeyakv2_SandyBridge
1 16 557 1500 16000 10000000
20
40
60
80
100
120
Message Size
Pe
rfo
rma
nce
(cp
b)
100.4
28.82
1.73 1.07 0.71 0.68
Figure 90: morus1280256v1_avx2
1 16 557 1500 16000 10000000
20
40
60
80
100
120
140
160
Message Size
Pe
rfo
rma
nce
(cp
b)
158.96
45.56
3.32 2.43 1.92 1.87
Figure 91: norx6441_ymm
1 16 557 1500 16000 10000000
200
400
600
800
1000
1200
Message Size
Pe
rfo
rma
nce
(cp
b)
1121.2
320.74
24.61 18.05 14.21 13.83
Figure 92: omdsha512k512n256tau256v2_avx1
1 16 557 1500 16000 10000000
50
100
150
Message Size
Pe
rfo
rma
nce
(cp
b)
144.74
41.51
5.62 4.56 3.98 3.93
Figure 93: paeq80_aesni
1 16 557 1500 16000 10000000
50
100
150
200
250
300
350
400
450
500
Message Size
Pe
rfo
rma
nce
(cp
b)
489.86
140.14
9.34 7.14 4.75 4.56
Figure 94: pi64cipher128v2_goptv
Ralph Ankele and Robin Ankele 29
1 16 557 1500 16000 10000000
100
200
300
400
500
600
700
800
Message Size
Pe
rfo
rma
nce
(cp
b)
712.97
204.21
14.39 9.26 7.41 7.21
Figure 95: scream10v3_sse
1 16 557 1500 16000 10000000
200
400
600
800
1000
1200
1400
Message Size
Pe
rfo
rma
nce
(cp
b)
1267.94
388.48
45.68 37.31 33.02 35.32
Figure 96: shellaes128v2d8n80_ref
1 16 557 1500 16000 10000000
50
100
150
200
250
300
350
400
450
Message Size
Pe
rfo
rma
nce
(cp
b)
440.1
127.18
25.19 22.47 21.31 21.22
Figure 97: stribob192r2_ssse3
1 16 557 1500 16000 10000000
10
20
30
40
50
60
Message Size
Pe
rfo
rma
nce
(cp
b)
50.06
13.93
0.71 0.38 0.2 0.19
Figure 98: tiaoxinv2_nim
1 16 557 1500 16000 10000000
50
100
150
200
250
300
Message Size
Pe
rfo
rma
nce
(cp
b)
257.01
79.65
10.61 9.01 8.06 7.97
Figure 99: trivia0v2_sse4