Recent Advances in Authenticated Encryption September 19-22, 2016, Indian Statistical Institute, Kolkata
Recent development in AES-GCM authenticated encryption
optimization and deployment, and its nonce misuse resistant version GCM-SIV
Shay Gueron
University of Haifa
University of Haifa, Israel
Intel Corporation Intel Corporation, Israel Development Center, Haifa, Israel
AES-GCM / AES-GCM-SIV
My learnings
AES-GCM / AES-GCM-SIV
Ciphertext Plaintext
Authentication tag
AES-GCM in a nutshell Efficient Authenticated Encryption is important
– e.g., in client-server communications (TLS)
By today - AES-GCM has already become the de-facto mode of operation for authenticated encryption
– Part of (current) TSL 1.2
– Planned in TLS 1.3 (as one of two AEAD options)
– Preferred server & client choice on the leading servers / browsers • (when the CPU is detected to have AES-NI & PCLMULQDQ instructions)
Advantages:
– Security proof
– Excellent performance on modern CPU’s with AES-NI & PCLMULQDQ
AES-GCM / AES-GCM-SIV
AES-GCM in a nutshell
Input:
– Key: K
– Nonce (IV) • assume 96 bits
– A: associated data (a1, a2, …, ar*)
– M: plaintext (m1, m2, …, ms*)
• s ≤ 232-2 • ar
* and ms* are not necessarily full 128-bit blocks
AES-GCM / AES-GCM-SIV
Output:
– Ciphertext: C (c1, c2, …, cs*)
• length(cs*) = length(ms
*)
– Authentication tag: TAG
AES-GCM in a nutshell Derive hash key: H = AESK (0128)
Setup initial counter: CTR = IV||031||1
Compute MASK = AESK (CTR)
For j = 1, 2, …,: – CTR = inc32 (CTR); – cj = AESK (CTR) ⊕ mj
– inc32 increments the 32-bit counter inside the 128-bit block
Set X1=a1, … Xr = (ar)’, Xr+1=c1, … Xr+s= (cs)’, Xr+s+1 = (bitlen(M) || bitlen(A)) – All Xj’s are 128-bit blocks (possible 0 padding for (ar)’, (cs)’) – n = r+s+1
GHASHH = X1 ● Hn ⊕ X2 ● Hn-1 ⊕… ⊕ Xn ● H – “●” = multiplication in GF (2128) [x] / P(x) – P(x) = x128 + x7 + x2 + x + 1 (but with reversed order of bits in bytes!)
TAG = GHASHH ⊕ MASK
C = (c1 , c2 , … cs*
) AES-GCM / AES-GCM-SIV
Authenticated Encryption alternatives PRE AES-NI / CLMUL (2009)
• RC4 + HMAC SHA-1 ~9.5 C/B
• AES + HMAC SHA-1 ~23 C/B
• AES-GCM ~22 C/B
• RC4 SHA1 and AES SHA1 dominated the TLS world
• The emerging AES-GCM had no performance advantage
• (lookup tables for GF(2128) multiplications)
• AES-GCM deployment was marginal until 2012+ (low adoption of TLS 1.2)
AES-GCM / AES-GCM-SIV
AES-GCM across Intel CPU generations (2016)
22
3.08 2.75
1.02 0.76 0.65
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
4.00
Pre AES-NI /PACLMULQDQ
Westmere(2010)
Sandy bridge(2012)
Haswell(2013)
Broadwell(2014)
Skylake (Sept.2015)
cycl
es
pe
r b
yte
AES-GCM performance
AES-GCM / AES-GCM-SIV
(2015) AES-GCM at the cost of CTR!
Westmere, Sandy bridge, Haswell, Broadwell, Skylake are Intel Architecture Codenames.
Codenames Haswell: 4th Generation Intel® Core Processor
Codenames Broadwell: 5th Generation Intel® Core Processor
Codenames Skylake: 6th Generation Intel® Core Processor
Comparison to other AES modes
C/B CTR XTS CBC dec CBC ENC (MB) AES-GCM
AES CBC – serial
Sandy bridge 0.76 1.21 0.8 0.9 2.75 5.05
Haswell 0.64 0.7 0.65 0.8 1.02 4.41
Broadwell 0.64 0.7 0.65 0.8 0.76 4.41
Skylake 0.63 0.63 0.62 0.64 0.65 2.65
AES-GCM / AES-GCM-SIV
Measured on 8KB buffers
How did AES-GCM become so fast? CPU instructions
– AES-NI for encryption [Gueron]
– PCLMULQDQ (64-bit polynomial multiplication) for AES-GCM [Gueron—Kounavis]
Algorithms and optimizations for CTR encryption & GHASH computations [Gueron], [Gueron-Krasnov]
Improved performance of AES-NI / PCLMULQDQ across CPU generations
– Shorter latency, better throughput
New optimizations
– Efficient reduction with (fast) PCLMULQDQ [Gueron]
All contributed to OpenSSL and NSS [Gueron-Krasnov]
AES-GCM / AES-GCM-SIV
Intel’s AES-NI / PCLMULQDQ
Intel introduced a new set of instructions (2010)
AES-NI:
– Facilitate high performance AES encryption and decryption
• PCLMULQDQ 64 x 64 128 (carry-less)
– Binary polynomial multiplication; speeds up computations in binary fields
Underlying idea for using in GHASH:
1. Compute 128 x 128 256 via carry-less multiplication (of 64-bit operands)
2. Reduction: 256 128 modulo x128 + x7 + x2 + x + 1 (done efficiently via software)
AES-GCM / AES-GCM-SIV
AES-NI: Throughput vs. Latency
AES-GCM / AES-GCM-SIV
AESENC data, key0
AESENC data, key1
AESENC data, key2
AESENC data0, key0
AESENC data1, key0
AESENC data2, key0
AESENC data3, key0
AESENC data4, key0
AESENC data5, key0
AESENC data6, key0
AESENC data7, key0
AESENC data0, key1
Parallelizable modes (CTR, CBC dec, XTS) can interleave multiple messages to gain full throughput with AES-NI
Carry-less 128 x 128 256 but note carelessly
(Gueron Kounavis, 2009) Multiply 128 x 128 → 256 𝐴1: 𝐴0 • 𝐵1: 𝐵0
Schoolbook (4 PCLMULQDQ invocations) 𝐴0•𝐵0 = 𝐶1: 𝐶0 , 𝐴1•𝐵1 = 𝐷1: 𝐷0 𝐴0•𝐵1 = 𝐸1: 𝐸0 , 𝐴1•𝐵0 = 𝐹1: 𝐹0
𝐴1: 𝐴0 • 𝐵1: 𝐵0 = 𝐷1: 𝐷0 ⊕𝐸1 ⊕𝐹1: 𝐶1 ⊕𝐸0 ⊕𝐹0: 𝐶0
Carry-less Karatsuba (3 PCLMULQDQ invocations) 𝐴1•𝐵1 = 𝐶1: 𝐶0 , 𝐴0•𝐵0 = 𝐷1: 𝐷0
𝐴1 ⊕𝐴0 • 𝐵1 ⊕𝐵0 = [𝐸0: 𝐸1]
𝐴1: 𝐴0 • 𝐵1: 𝐵0 = [𝐶1: 𝐶0 ⊕𝐶1 ⊕𝐷1 ⊕𝐸1: 𝐷1 ⊕𝐶0 ⊕𝐷0 ⊕𝐸0: 𝐷0]
AES-GCM / AES-GCM-SIV
This is fixed
So this is also fixed
A new interpretation to GHASH operations
• GHASH does not use GF(2128) computations in the standard way
• Inherent contradiction between the representation of the AES state as 16 bytes and the state as an element in GF(2128)
• In GHASH, the bits inside the 128-bit operands are reflected
• The GHASH A ● B operation over AES ciphertext blocks is
– T1 = reflect (A)
– T2 = reflect (B)
– T3 = T1 × T2 modulo x128 + x7 + x2 + x + 1 (a GF(2128) multiplication)
– Reflect (T3)
AES-GCM / AES-GCM-SIV
A new interpretation to GHASH operations The new interpretation of A ● B
A × B × x-127 mod x128 + x127+x126+x121 + 1
The polynomial is desrever
i.e., a weird Montgomery Multiplication in GF(2128) modulo the reversed poly
Better written as
A × B × x × x-128 mod x128 + x127+x126+x121 + 1
This operation can be computed efficiently!
AES-GCM / AES-GCM-SIV
This is fixed
Fast reduction modulo x128+x127+x126+x121+1 (Gueron 2012)
Algorithm: “Montgomery reduction”
Input 256-bit operand [X3:X2:X1:X0]
[A1:A0] = X0 • 0xc200000000000000
[B1:B0] = [X0⊕A1:X1⊕A0]
[C1:C0] = B0 • 0xc200000000000000
[D1:D0] = [B0⊕C1:B1⊕C0]
Output: [D1⊕X3:D0⊕X2]
AES-GCM / AES-GCM-SIV
; Input is in T1:T7
vmovdqa T3, [W]
vpclmulqdq T2, T3, T7, 0x01
vpshufd T4, T7, 78
vpxor T4, T4, T2
vpclmulqdq T2, T3, T4, 0x01
vpshufd T4, T4, 78
vpxor T4, T4, T2
vpxor T1, T4 ; result in T1
The cost: 2 x PCLMULQDQ + 3 x (shift + XOR) Ideal with fast PCLMULQDQ
Aggregated Reduction GHASHH = X1 ● Hn ⊕ X2 ● Hn-1 ⊕… ⊕ Xn ● H
• With Horner algorithm: 1 field multiplication per block form
Aggregation:
• Pre-compute k powers of H to evaluate the polynomial
• Defer the reduction on once every k polynomial (ring) multiplication
• Operate on x ● H
• Useful choices are k=8 or 6
AES-GCM / AES-GCM-SIV
Interleaving CTR and GHASH
There are two approaches to GCM
– AES-CTR function for encryption + another GHASH function to generate the MAC
– Achieves, at best, the performance of “CTR+GHASH”
– Interleave the calculation of CTR and GHASH in a single function
– Achieves a better performance
– If coded efficiently, can fill the execution pipe to the maximum
AES-GCM / AES-GCM-SIV
Situation today
AES-GCM is a big success
• Ubiquitous (including OpenSSL and NSS)
• Selected for TLS connection by practially all of the major servers
• Some examples: Google, AWS, Dropbox, Coudflare
• All browsers support AES-GCM, and will offer it at handshake if running on a CPU with AES-NI (all 64-bit CPU’s already have it)
• On the latest architecture (Skylake): AES-GCM is as fast as the CTR encryption
AES-GCM / AES-GCM-SIV
Familiarity breeds contempt?
AES-GCM / AES-GCM-SIV
GCM-SIV: Full Nonce Misuse-Resistant Authenticated
Encryption at Under One Cycle per Byte
Appeared at ACM CCS 2015
Shay Gueron University of Haifa
Intel Corp.
Yehuda Lindell Bar-Ilan University
AES-GCM in a nutshell (2) Derive hash key: H = AESK (0128)
Setup initial counter: CTR = IV||031||1
Compute MASK = AESK (CTR)
For j = 1, 2, …,: – CTR = inc32 (CTR); – cj = AESK (CTR) ⊕ mj
– inc32 increments the 32-bit counter inside the 128-bit block
Set X1=a1, … Xr = (ar)’, Xr+1=c1, … Xr+s= (cs)’, Xr+s+1 = (bitlen(M) || bitlen(A)) – All Xj’s are 128-bit blocks (possible 0 padding for (ar)’, (cs)’)
GHASHH = X1 ● Hn ⊕ X2 ● Hn-1 ⊕… ⊕ Xn ● H – n = r+s+1 – “●” = multiplication in GF (2128) [x] / P(x) – P(x) = x128 + x7 + x2 + x + 1 (with reversed order of bits within the bytes)
TAG = GHASHH ⊕ MASK
C = (c1 , c2 , … cs*
)
AES-GCM / AES-GCM-SIV
Repeating a nonce (with the same key)
has a disastrous effect on both privacy and integrity
Why Should an IV Repeat?
Randomness is much harder than it should be
– Intel has RDRAND and RDSEED on all new processors (from Ivy Bridge 2011)
Not used inside Linux /dev/random
AES-GCM / AES-GCM-SIV
Bad Randomness
In 2008, a bug in Debian Linux was found
– In 2006, code that was crucial for RNG reseeding was commented out
AES-GCM / AES-GCM-SIV
Bad Randomness
PlayStation 3
– In 2010, the ECDSA private key used by Sony to sign software for PlayStation 3 was recovered because Sony failed to generate a new random nonce for each signature
AES-GCM / AES-GCM-SIV
RSA Keys – Lenstra et al. 2012
Collected 6.4 million RSA keys from the web
– 71,052 occurred more than once • Different owners can decrypt each other’s traffic • Some of the moduli repeated thousands of times (no entropy)
– 12,934 had a common factor • Computed 𝐺𝐶𝐷(𝑁,𝑁’) where 𝑁 = 𝑝𝑞 and 𝑁’ = 𝑝’𝑞 • Factor both moduli
We use this for entropy estimation
AES-GCM / AES-GCM-SIV
Entropy Estimation via RSA Keys
The expected number of collisions in q samples from a domain of size N is 𝒒𝟐
𝑵 ≈ 𝒒𝟐
𝟐𝑵
We have 𝒒 = 𝟏𝟐, 𝟖𝟎𝟎, 𝟎𝟎𝟎 (number of primes is double)
We have number of collisions = 12,934
So, 𝟏𝟐,𝟖𝟎𝟎,𝟎𝟎𝟎𝟐
𝟐𝑵= 𝟏𝟐, 𝟗𝟑𝟒 giving 𝑵 ≈ 𝟐𝟑𝟐.𝟓𝟔
Conclusion: an “average” of 33 bits of entropy
AES-GCM / AES-GCM-SIV
And recently… • Nonce-Disrespecting Adversaries: Practical Forgery Attacks on GCM in TLS
• Böck, Zauner, Devlin, Somorovsky, Jovanovic • https://eprint.iacr.org/2016/475.pdf (2016)
AES-GCM / AES-GCM-SIV
Randomness can repeat and does repeat, What should we do?
Our goal: an Authenticated Encryption scheme that – Is nonce-misuse resistant (security)
– Enjoys the performance benefits of AES-GCM (performance)
– Uses only small changes over existing standard (easy deployment)
– Can re-use software (and hardware) components (efficiency)
AES-GCM / AES-GCM-SIV
Can we really have the cake and eat it?
YES!
Nonce Misuse Resistance [Rogaway-Shrimpton]
Denote nonce by N
Security property
– If N is same and message is same – the result is the same ciphertext • This is inherent
– Otherwise – full security (authenticated encryption): • Even if N is the same and the message is not • Even if N is different and the message the same
This cannot be achieved for online encryption
– If two long messages differ only in the last bit, when same N is used…
AES-GCM / AES-GCM-SIV
Abstract SIV Encryption [Rogaway-Shrimpton]
Input: message 𝑀 and nonce 𝑁
Step 1:
– Apply a PRF 𝐹 with key 𝐾1 to (𝑁,𝑀); denote result by 𝑇
Step 2:
– Encrypt 𝑀 with key 𝐾2 using nonce 𝑇; denote result by 𝐶
Output (𝑁, 𝐶, 𝑇)
Decryption: 𝑀 ← 𝐷𝑒𝑐𝐾2 𝐶 with nonce 𝑇; check 𝑇 = 𝐹𝐾1(𝑁,𝑀)
AES-GCM / AES-GCM-SIV
SIV Encryption Security
Encryption:
𝑇 = 𝐹𝐾1(𝑁,𝑀); 𝐶 ← 𝐸𝑛𝑐𝐾2 𝑀 with nonce 𝑇
Security
– If nonce 𝑁 is different, then by PRF the value 𝑇 is pseudorandom
– If nonce 𝑁 is the same but 𝑀 is different, then by PRF the value 𝑇 is pseudorandom
– The value 𝑇 also serves as a valid MAC and so have authenticated encryption
AES-GCM / AES-GCM-SIV
Efficient Instantiations
Option 1 – apply a PRF based on AES
– What PRFs do we have? CBC-MAC
– Very expensive
Option 2 – construct a more efficient PRF using simpler primitives
– Let 𝐻 be an 𝜖-XOR universal hash function ∀𝑥, 𝑦, 𝑧∶ Pr 𝐻𝐾1 𝑥 ⊕𝐻𝐾1 𝑦 = 𝑧 ≤ 𝜖 𝑛
Claim: 𝐹𝐾1,𝐾2 𝑁,𝑀 = 𝐹𝐾2 𝐻𝐾1 𝑀 ⊕𝑁 is a PRF
AES-GCM / AES-GCM-SIV
Universal-Hash Based PRF
The construction: 𝐹𝐾1,𝐾2 𝑁,𝑀 = 𝐹𝐾2 𝐻𝐾1 𝑀 ⊕𝑁
Proof idea:
– By the PRF property of 𝐹, can distinguish only if it queries 𝑁,𝑀 , 𝑁′,𝑀′ where 𝐻𝐾1 𝑀 ⊕𝑁 = 𝐻𝐾1 𝑀′ ⊕𝑁′
– Equivalently: if 𝐻𝐾1 𝑀 ⊕𝐻𝐾1 𝑀′ = 𝑁⊕𝑁′
– By the 𝜖-XOR property, this happens with probability only 𝜖 for each pair
– Therefore, secure PRF for negligible 𝜖
AES-GCM / AES-GCM-SIV
The GCM-SIV Instantiation
The GHASH function H in GCM is an 𝜖-XOR universal hash function (for negligible 𝜖) [McGrew-Viega] we use an improved contruction
The PRF used is AES (only need a single block)
Encryption is AES-CTR
Versions:
– Three different keys (for GHASH, PRF, CTR-ENC)
– Two keys: use same key for PRF and CTR-ENC
– One key: derive the two keys using AES itself
AES-GCM / AES-GCM-SIV
The GCM-SIV Instantiation
A very important property:
all the elements here are identical to the existing AES-GCM
– We only change the order of operations using the Synthetic IV paradigm
– MAC first, mix result with IV, then encrypt
Why is this important?
– Efficiency
– Deployment ease (use existing code bases)
AES-GCM / AES-GCM-SIV
GCM-SIV (context)
Input:
– 2 Keys: K, H
– Nonce (N) • assume 95 bits
– A: associated data (a1, a2, …, ar*)
– M: plaintext (m1, m2, …, ms*)
• s ≤ 232-1 ; ar* and ms
* are not necessarily full 128-bit blocks
AES-GCM / AES-GCM-SIV
The single key variant uses input key K0 to derive: H = AESK0 (0128), K = AESK0 (0
127 || 1)
Output:
– Ciphertext: C (c1, c2, …, cs*)
– Authentication tag: TAG
Definition:
– POLYVALH (X1 || X2||…|| Xn) = X1 ● Hn ⊕ X2 ● Hn-1 ⊕… ⊕ Xn ● H • “●” = multiplication in GF (2128) [x] / P(x); P(x) = x128 + x127 + x126 + x121 + 1 • Can be the same as GHASH (if bits are reversed) but does not have to
GCM-SIV (encryption) LENBLK = (bitlen(M) || bitlen(A))
Set X1=a1, … Xr = (ar)’, Xr+1=m1, … Xr+s= (ms)’, Xr+s+1 = LENBLK
– All Xj’s are 128-bit blocks (possible 0 padding for (ar)’, (cs)’)
– n=r+s+1
•T = POLYVALH (X1 || X2||…|| Xn)
•TAG = AESK (0||(T ⊕ N) [126:0])
•For i = 1, 2, … (i = < 232 -1 )
• CTRBLKi = 1||TAG[126:32]||i32 (i32 = i encoded as 32-bit string)
• ci = mi ⊕ AESK (CTRBLKi )
C = (c1 , c2 , … cs*
)
– If length(ms*) != 128 - chop lsbits of cs so that length(cs
*) = length(ms*)
Output: C, TAG
AES-GCM / AES-GCM-SIV
First: compute Hash and TAG over the plaintext Then: compute TAG from the hash and the nonce Then: use TAG as IV for the CTR encryption
Important notes
Separation via the 95-bit IV:
• TAG = AESK ( 0 || (T ⊕ N) [126:0] )
• CTRBLKi = 1 || TAG[126:32] || i32
Nonce misuse resistance achieved by
• T = POLYVALH (X1 || X2||…|| Xn) varies with the inputs
• TAG = AESK (0||(T ⊕ N) [126:0])
Inherent in SIV construction: Hash+Tag & Encryption are serialized
What optimizations are possible?
• (almost) Everything the AES-GCM does – we can do (better?)
AES-GCM / AES-GCM-SIV
Efficiency of GCM vs GCM-SIV
Encryption
– In GCM, CTR-ENC and GHASH are interleaved and run in parallel
– In GCM-SIV, GHASH must be finished before CTR-ENC can begin (cannot be done in parallel)
AES-GCM / AES-GCM-SIV
Efficiency of GCM vs GCM-SIV
Decryption:
– In GCM, once again CTR-DEC and GHASH interleaved
– In GCM-SIV, can also interleave (decryption cost “should be” the same as the original GCM)
AES-GCM / AES-GCM-SIV
The computational cost of GCM-SIV
Key Derivation + GHASH + Tag Generation + CTR’s Generation + CTR ENCRYPT
• Derivation (required only for 1 key variant): key expansion + encryption 2 blocks
• GHASH: GF (2128) multiplication per each 16-byte in M and A + one for LENBLOCK
• ceil ( (|M|+|A|) / 16 ) + 1 field multiplications
• Tag Generation: key expansion + encryption of one block
• CTR Generation: incrementing the counter blocks
• CTR ENCRYPTION: ceil ( |M| / 16 ) AES encryptions • (key is already expanded in step during Tag generation)
AES-GCM / AES-GCM-SIV
Different from that of AES-GCM, but has the same cost
Proven security statement
AES-GCM / AES-GCM-SIV
The security of GCM-SIV is equivalent to that of AES-GM (with 96-bit IV)
Note about POLYVAL vs. GHASH
Let Xi and H be 128 bit blocks,; M = message of n blocks (M = X1 || X2||…|| Xn)
In AES-GCM
– GHASH H (M) = X1 ● Hn ⊕ X2 ● Hn-1 ⊕… ⊕ Xn ● H
– “●” denotes multiplication in GF (2128) [x] / P(x) • P(x) = x128 + x7 + x2 + x + 1 (with reversed order of bits within the bytes)
In GCM-SIV
– No need to reverse the order of bits within the bytes
– “●”: A ● B = A × B × x-128 in GF (2128) [x] / Q(x) • Q(x) = x128 + x127 + x126 + x121 + 1 • (× is the field multiplication)
AES-GCM / AES-GCM-SIV
1.1
8
1.10
1.16 0.92
0.77
0.76
0.94
0.65
0.65
-
0.20
0.40
0.60
0.80
1.00
1.20
1.40
GCM-SIV encrypt(with init)
GCM-SIV decrypt(with init)
AES-GCM(without init)
Cyc
les
per
byt
e
Haswell
Broadwell
Skylake
GCM-SIV performance - highlights
AES-GCM / AES-GCM-SIV
GCM-SIV (2 keys) over an 8KB message
Potpourri
GCM-SIV (Our implementation) is faster than (OpenSSL’s best) AES-GCM for short messages, due to a new software optimization
AES-GCM / AES-GCM-SIV
Summary • Full nonce misuse-resistant authenticated encryption at an extremely low cost
• almost AES-GCM
• Full proof of security and full implementation • Easily deployable:
– Utilizes existing hardware – Utilize existing code and software (AES-GCM implementations)
• Detailed specifications, reference code and Open Source optimized code implementations coming soon • Submitting GCM-SIV to IEFT’s Crypto Forum Research Group (CFRG) as an RFC
• Unpatented • We hope to see it adopted
AES-GCM / AES-GCM-SIV
Enhanced AES-GCM-SIV (CFRG submission)
AES-GCM / AES-GCM-SIV
AES-GCM-SIV 128 flow (encryption) – Input:
• in_AAD, in_MSG • K, N
– Message / AAD padding: • AAD = Pad in_AAD to d blocks • MSG = pad in_MSG to n blocks (M1 || M2 || M3 … ||Mn) • Define LENBLK • Padded AAD/MSG = AAD||MSG||LENBLK (consists of d+n+1 blocks)
– Calculate: • Record_Hash_key = AESK (N) • Record_Enc_key = AESK (Record_Hash_key ) • T = POLYVALRecord_Hash_Key (AAD||MSG||LENBLK) • TAG = AESRecord_Enc_key (0||T[126:0]) • CTRBLKi = 1||TAG[126:32]||TAG[31:0] i (i is 32 bit long. i = 0,1 ... i< 232 -1 ) • CTi = AESRecord_Enc_key (CTRBLKi ) ⊕ Mi • Define CT = (CT1 , CT2 , … CTn ) • If length(in_MSG) != length(CT) - chop lsbits of CT so that
length(in_MSG) == length(CT)
– Output: CT = (CT1 , CT2 , … CTn ), TAG
AES-GCM-SIV CFRG Meeting 47 - addition modulo 232
AES-GCM-SIV 256 flow (encryption) – Input:
• In_AAD, in_MSG • K, H, N
– Derive (as described before): • AAD • MSG = M1 || M2 || M3 … ||Mn • LENBLK
– Calculate: • Record_Hash_key[127:0] = AESK (N) (AES= AES 256) • Record_Enc_key[255:128] = AESK (Record_Hash_key) (AES= AES 256) • Record_Enc_key [127:0] = AESK (Record_Enc_key[255:128]) (AES= AES 256) • T = POLYVALRecord_Hash_key (AAD||MSG||LENBLK) • TAG = AESRecord_Enc_key (0||T [126:0]) (AES= AES 256) • CTRBLKi = 1||TAG[126:32]||TAG[31:0] i (i is 32 bits long. i = 0,1 ... i< 232 -1 ) • CTi = AESRecord_Enc_key (CTRBLKi ) ⊕ Mi (AES= AES 256) • Define CT = (CT1 , CT2 , … CTn ) • If length(in_MSG) != length(CT) - chop lsbits of CT so that
length(in_MSG) == length(CT)
– Output: • CT = (CT1 , CT2 , … CTn ) • TAG
AES-GCM-SIV CFRG Meeting 48
- addition modulo 232
AES-GCM-SIV 128 flow (encryption)
AES-GCM-SIV CFRG Meeting 49
AAD MSG
LENBLK
Alen Input: Mlen N K
Padded_AAD Padded_MSG
T
Record_Enc_Key
AES
POLYVAL
AES
MSB Zeroed
AES
CTi TAG Output: AES = AES128 - addition modulo 232
CTRBLKi= 1||TAG[126:32]||TAG[31:0] i
Record_Hash_key
AES
AES-GCM-SIV 256 flow (encryption)
AES-GCM-SIV CFRG Meeting 50
AAD MSG
LENBLK
Alen Input: Mlen N K
Padded_AAD Padded_MSG
T
Record_ENC_KEY[255:128]
AES
POLYVAL
AES
MSB Zeroed
AES
CTi TAG Output:
AES
Record_ENC_KEY[127:0]
AES = AES256 - addition modulo 232
CTRBLKi= 1||TAG[126:32]||TAG[31:0] i
Record_Hash_Key
AES
AES-GCM-SIV 128 Performance (in C/B)
AES_GCM_SIV_Encryption (128 bit)
1KB 2KB 4KB 8KB 16KB
HSW 1.78 1.50 1.37 1.31 1.27
BDW 1.35 1.12 1.01 0.95 0.92
SKL 1.32 1.12 1.02 0.98 0.95
AES_GCM_SIV_Decryption (128 bit)
1KB 2KB 4KB 8KB 16KB
HSW 1.88 1.50 1.38 1.29 1.26
BDW 1.30 1.00 0.88 0.80 0.68
SKL 1.09 0.85 0.74 0.68 0.66
AES-GCM-SIV CFRG Meeting 51
GCM-SIV 256 Performance (in C/B)
AES_GCM_SIV_Encryption (256 bit)
1KB 2KB 4KB 8KB 16KB
HSW 1.90 1.89 1.70 1.61 1.56
BDW 1.83 1.48 1.31 1.23 1.19
SKL 1.75 1.46 1.32 1.25 1.22
AES_GCM_SIV_Decryption (256 bit)
1KB 2KB 4KB 8KB 16KB
HSW 2.22 1.77 1.70 1.61 1.56
BDW 1.72 1.32 1.31 1.23 1.19
SKL 1.36 1.10 0.32 1.25 1.22
AES-GCM-SIV CFRG Meeting 52
GCM-SIV Short Messages Performance[Cycles]
AES_GCM_SIV 128 bit (encryption)
AES_GCM_SIV 256 bit (encryption)
AES-GCM-SIV CFRG Meeting 53
Input Size 16B 32B 64B
HSW 514 569 658
BDW 476 515 573
SKL 342 356 422
Input Size 16B 32B 64B
HSW 310 348 483
BDW 287 306 419
SKL 213 243 354
References
• S. Gueron, Y. Lindell, GCM-SIV: Full Nonce Misuse-Resistant Authenticated Encryption at Under One Cycle per Byte, 22nd ACM Conference on Computer and Communications Security, 22nd ACM CCS: pages 109-119, 2015.
• AES-GCM-SIV CFRG Spec:
• S. Gueron, University of Haifa and Intel Corporation Intended, A. Langley, Y. Lindell Bar Ilan University (August 29, 2016)
• https://tools.ietf.org/html/draft-irtf-cfrg-gcmsiv-02
• Shay Gueron AEs-GCM-SIV github:
• https://github.com/Shay-Gueron/AES-GCM-SIV
54
Thank you.