fair%and%comprehensive%methodology%for ...kgaj/publications/conferences/gmu_ches_2010_slides.pdf35...

47
1 Fair and Comprehensive Methodology for Comparing Hardware Performance of Fourteen Round Two SHA3 Candidates using FPGAs Kris Gaj , Ekawat Homsirikamol, and Marcin Rogawski George Mason University U.S.A.

Upload: others

Post on 05-Jan-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

1  

Fair  and  Comprehensive  Methodology  for  Comparing  Hardware  Performance  of  Fourteen  Round  Two  SHA-­‐3  Candidates    

using  FPGAs  

Kris  Gaj,    Ekawat  Homsirikamol,  and  

Marcin  Rogawski  George  Mason  University  

U.S.A.  

Page 2: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

Co-­‐Authors  Ekawat Homsirikamol

a.k.a “Ice” Marcin Rogawski

Developed optimized VHDL implementations of 14 Round 2 SHA-3 candidates + SHA-2 in two variants each (256 & 512-bit output),

for some functions using several alternative architectures

Page 3: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

3

•  Motivation & Goals

•  Methodology

•  Results

•  Comparison with Other Groups

•  Future Work & Conclusions

Outline

Page 4: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

4  

Motivation &

Goals

Page 5: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

5

SHA-3 Contest - NIST Evaluation Criteria

Security  

SoIware  Efficiency    

Hardware  Efficiency    

Simplicity  

FPGAs  ASICs  

Flexibility   Licensing  

Page 6: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

6

Results of Security Evaluation So Far SHA-3 Zoo Page

Page 7: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

7

Lessons from the Past - AES Contest – 1997-2000

Speed in FPGAs Votes at the AES 3 conference

Round 2 of AES Contest, 2000

Page 8: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

8

•  Fair and comprehensive methodology for evaluation of hardware performance in FPGAs

•  High-speed fully autonomous implementations of all 14 SHA-3 candidates & SHA-2 256-bit variants

optimized for the maximum throughput to area ratio

•  Commonly acceptable recommendations to NIST

Our Goals

Page 9: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

9  

Methodology

Page 10: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

10

Comprehensive Evaluation

•  two major vendors: Altera and Xilinx (~90% of the market) •  multiple high-performance and low-cost families

Altera Xilinx

Technology Low-cost High- performance

Low-cost High- performance

90 nm Cyclone II Stratix II Spartan 3 Virtex 4

65 nm Cyclone III Stratix III Virtex 5

Page 11: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

11

•  Language: VHDL

•  Tools: FPGA vendor tools

•  Interface

•  Performance Metrics

•  Design Methodology

•  Benchmarking

Uniform Evaluation

Page 12: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

12

Why Interface Matters?

•  Pin limit

Total number of i/o ports ≤ Total number of an FPGA i/o pins

•  Support for the maximum throughput

Time to load the next message block ≤ Time to process previous block

Page 13: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

13

Interface: Two possible solutions

Length of the message communicated at the beginning

+ easy to implement passive source circuit

− area overhead for the counter of message bits

Dedicated end of message port

− more intelligent source circuit required

+ no need for internal message bit counter

msg_bitlen

zero_word

message end_of_msg SHA core

Page 14: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

14

SHA Core: Interface & Typical Configuration

•  SHA core is an active component; surrounding FIFOs are passive and widely available •  Input interface is separate from an output interface •  Processing a current block, reading the next block, and storing a result for the previous message can be all done in parallel

fifoin_empty  

fifoin_read  

idata  w   w  

odata  

fifoout_full  

fifoout_write  

fifoin_full  

fifoin_write  

fifoout_empty  

fifoout_read  

Input  FIFO  

SHA  core  

clk   rst  

ext_idata  

w  

ext_odata  din   dout  

src_ready  

src_read  

dst_ready  

dst_write  

din   dout  

full   empty  

write   read  

Output  FIFO  

din   dout  

full   empty  

write   read  

w  

clk   rst  

clk   rst   clk   rst  

clk   rst  

clk   rst  

Page 15: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

15

Primary Secondary

1. Throughput (single long message)

2. Area

3. Throughput / Area 3. Hash Time for Short Messages (up to 1000 bits)

Performance Metrics

Page 16: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

16

Performance Metrics - Area

We force these vectors to look as follows through the synthesis and implementation options:

0

0

0

0

Areaa

Page 17: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

17

Primary Optimization Target: Throughput to Area Ratio

Features: •  practical: good balance between speed and cost •  very reliable guide through the entire design process,

facilitating the choice of   high-level architecture   implementation of basic components   choice of tool options

•  leads to high-speed, close-to-maximum-throughput designs

Choice of Optimization Target

Page 18: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

18

Our Design Flow

Specification Interface

Datapath Block diagram

Controller ASM Chart

VHDL Code

Formulas for Throughput & Hash time

Max. Clock Freq. Resource Utilization

Throughput, Area, Throughput/Area, Hash Time for Short Messages

Controller Template

Library of Basic Components

Page 19: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

19

Basic Operations of 14 SHA-3 Candidates

19 NTT – Number Theoretic Transform, GF MUL – Galois Field multiplication,

MUL – integer multiplication, mADDn – multioperand addition with n operands

Page 20: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

ATHENa  –  Automated  Tool  for  Hardware  Evalua@oN  

20  

Benchmarking  open-­‐source  tool,  wriHen  in  Perl,  aimed  at  an    

 AUTOMATED  genera@on  of    OPTIMIZED  results  for    MULTIPLE  FPGA  plaTorms  

Under  development  at    George  Mason  University.        

http://cryptography.gmu.edu/athena

Page 21: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

21

•  batch mode of FPGA tools

•  ease of extraction and tabulation of results •  Excel, CSV (available), LaTeX (coming soon)

•  optimized choice of tool options

Generation of Results Facilitated by ATHENa

vs.

Page 22: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

22

Relative Improvement of Results from Using ATHENa Virtex 5, 256-bit Variants of Hash Functions

0

0.5

1

1.5

2

2.5

Area Thr Thr/Area

Ratios of results obtained using ATHENa suggested options vs. default options of FPGA tools

Page 23: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

23  

Results

Page 24: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

24

Throughput [Mbit/s] Virtex 5, 256-bit variants of algorithms

0

2000

4000

6000

8000

10000

12000

14000

16000

Page 25: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

25

Area [CLB slices] Virtex 5, 256-bit variants of algorithms

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

Page 26: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

26

Normalization & Compression of Results

•  Absolute  result  

                 e.g.,  throughput  in  Mbits/s,  area  in  CLB  slices  

•  Normalized  result  

•  Overall  normalized  result  

                     Geometric  mean  of  normalized  results  for  

                                         all  inves@gated  FPGA  families    

normalized _ result =result _ for_ SHA − 3_candidate

result _ for_ SHA − 2

Page 27: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

27

Normalized Throughput & Overall Normalized Throughput

Page 28: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

28

Overall Normalized Throughput: 256-bit variants of algorithms Normalized to SHA-256, Averaged over 7 FPGA families

0

1

2

3

4

5

6

7

8

Page 29: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

29

Overall Normalized Area: 256-bit variants of algorithms Normalized to SHA-256, Averaged over 7 FPGA families

0

5

10

15

20

25

30

Page 30: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

30

Overall Normalized Throughput/Area: 256-bit variants Normalized to SHA-256, Averaged over 7 FPGA families

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Page 31: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

31

Throughput vs. Area Normalized to Results for SHA-256 and Averaged over 7 FPGA Families – 256-bit variants

best

worst

Page 32: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

32

Execution Time for Short Messages up to 1000 bits Virtex 5, 256-bit variants of algorithms

Page 33: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

33  

Thr/Area Thr Area Short msg.

Summary for SHA-3-256

BLAKE BMW CubeHash ECHO Fugue Groestl Hamsi JH Keccak Luffa Shabal SHAvite-3 SIMD Skein

Page 34: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

34

•  Throughput/Area & Throughput most crucial for high-speed implementations

•  Area cannot be easily traded for Throughput

Best performers so far 1-2. Keccak & Luffa 3. Groestl

Worst performers so far: 14. SIMD – throughput to area ratio 13. ECHO – area

Summary of Results

Page 35: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

35

•  SHA-3 papers FPL 2010 paper •  Results for 512-bit variants ATHENa features ATHENa intro Case studies

•  Cryptology e-Print Archive - 2010/445 (100+ pages) •  Detailed hierarchical block diagrams •  Corresponding formulas for execution time and throughput

•  ATHENa web site •  Most recent results •  Comparisons with results from other groups •  Optimum options of tools

More About our Designs & Tools

Page 36: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

36  

Comparison with

Other Groups

Page 37: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

37

OTHER  GROUPS   GMU  

Area Thr Thr/Area Source Area Thr Thr/Area

BLAKE 1660 2676 1.61 Kobayashi et al. 1871 2854 1.53

CubeHash 590 2960 5.02 Kobayashi et al. 707 3445 4.87

ECHO 9333 14860 1.59 Lu et al. 5445 13875 2.55 Groestl 1722 10276 5.97 Gauvaram

et al. 1884 8677 4.61

Hamsi 718 1680 2.34 Kobayashi et al. 946 2646 2.80

Keccak 1412 6900 4.89 Bertoni et al. 1229 10807 8.79 Luffa 1048 6343 6.05 Kobayashi

et al. 1154 8008 6.94

Shabal 153 2051 13.41 Detrey et al. 1266 2624 2.07 Skein (estimated) 1632 3535 2.17 Tillich 1463 2812 1.92

Comparison with Best Results Reported by Other Groups Virtex 5, 256-bit variants of algorithms

Page 38: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

38

BEST REPORTED RESULTS

Area Thr Thr/Area Source

BLAKE 1660 2676 1.61 Kobayashi et al. BMW 4400 5577 1.27 GMU CubeHash 590 2960 5.02 Kobayashi et al. ECHO 5445 13875 2.55 GMU Fugue 956 3151 3.30 GMU Groestl 1722 10276 5.97 Gauvaram et al. Hamsi 946 2646 2.80 GMU JH 1108 3955 3.57 GMU Keccak 1229 10807 8.79 GMU Luffa 1154 8008 6.94 GMU Shabal 153 2051 13.41 Detrey et al. SHAvite-3 1130 2887 2.55 GMU SIMD 9288 2326 0.25 GMU Skein 1632 3535 2.17 Tillich et al.

Best Overall Reported Results as of Aug. 6, 2010 Virtex 5, 256-bit variants of algorithms

Page 39: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

39

Throughput vs. Area: Best reported results Virtex 5, 256-bit variants of algorithms

best

worst

Page 40: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

ATHENa Server

FPGA Synthesis and Implementation

Result Summary + Database Entries

2 3

HDL + scripts + configuration files

1

Database Entries

ATHENa scripts and configuration files8

Designer

4

HDL + FPGA Tools

User

Database query

Ranking of designs

5 6

InvitaRon  to  Use  ATHENa  http://cryptography.gmu.edu/athena

0 Interfaces

+ Testbenches 40  

Page 41: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

41

Future Work

Page 42: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

42

Analysis of Alternative Architectures - Unrolled

r times r/2 times

Page 43: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

43

Analysis of Alternative Architectures - Folded

r times 2⋅r times 2⋅r times

Page 44: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

44

Analysis of Alternative Architectures CubeHash, Groestl, Keccak, Luffa in Virtex 5

0

1

2

3

4

5

6

7

8

0 1 2 3 4 5 6 7

Nor

mal

ized

Thr

ough

put

Normalized Area

CubeHash

Groestl

Luffa

Keccak

x1 x2  x4

fv3  [2  

x1   x2

fv4  

fv2  

x1  

x1   x2

CubeHash

Luffa

Keccak

Groestl

Page 45: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

45  

Conclusions

Page 46: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

46

•  Fair and comprehensive methodology for evaluation of hardware performance in FPGAs developed and applied to the evaluation of 14 Round 2 SHA-3 candidates

•  Large differences among competing algorithms in terms of performance in FPGAs

•  Three front-runners: Keccak, Luffa, Groestl Two candidates trailing behind: SIMD, ECHO

Conclusions

Page 47: Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

Questions?

Thank you!

47

Questions?

CERG: http:/cryptography.gmu.edu

ATHENa: http:/cryptography.gmu.edu/athena