athena and deliverables
DESCRIPTION
ATHENa and DeliverablesTRANSCRIPT
![Page 1: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/1.jpg)
Comprehensive environment for benchmarking using FPGAs:
ATHENa - Automated Tool for Hardware EvaluatioN
1
![Page 2: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/2.jpg)
Modern Benchmarking: Natural Progression of Tools
2
Software ASICsFPGAs
eBACS
D. Bernstein,T. Lange
? ?
![Page 3: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/3.jpg)
ATHENa – Automated Tool for Hardware EvaluatioN
3
Set of scripts written in Perl aimed at an AUTOMATED generation of OPTIMIZED results for MULTIPLE hardware platforms
Currently under development at George Mason University.
Version 0.3.1
http://cryptography.gmu.edu/athena
![Page 4: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/4.jpg)
Why Athena?
4
"The Greek goddess Athena was frequently called upon to settle disputes between the gods or various mortals. Athena Goddess of Wisdom was known for her superb logic and intellect. Her decisions were usually well-considered, highly ethical, and seldom motivated by self-interest.”
from "Athena, Greek Goddess of Wisdom and Craftsmanship"
![Page 5: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/5.jpg)
Designers of ATHENa
Venkata“Vinny”MS CpEstudent
Ekawat“Ice”
MS CpEstudent
Marcin
PhD ECEstudent
Rajesh
PhD ECEstudent
Xin
PhD ECEstudent
MichalPhD exchangestudent from
Slovakia
![Page 6: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/6.jpg)
ATHENaServer
FPGA Synthesis and Implementation
Result Summary+ Database Entries
2 3
HDL + scripts + configuration files
1
Database Entries
Download scripts andconfiguration files8
Designer
4
HDL + FPGA Tools
User
Databasequery
Ranking of designs
5
6
Basic Dataflow of ATHENa
0
Interfaces+ Testbenches 6
![Page 7: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/7.jpg)
7
synthesizable source files
synthesizable source files
configuration files
configuration files testbenchtestbench
constraint files
constraint files
result summary (user-friendly)
result summary (user-friendly)
database entries
(machine- friendly)
database entries
(machine- friendly)
![Page 8: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/8.jpg)
synthesizable source files
synthesizable source files
configuration files
configuration files
result summary (user-friendly)
result summary (user-friendly)
![Page 9: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/9.jpg)
ATHENa Major Features (1)• synthesis, implementation, and timing analysis in the batch mode
• support for devices and tools of multiple FPGA vendors:
• generation of results for multiple families of FPGAs of a given vendor
• automated choice of a best-matching device within a given family
9
![Page 10: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/10.jpg)
ATHENa Major Features (2)• automated verification of the design through simulation in the batch
mode
• exhaustive search for optimum options of tools
• heuristic adaptive optimization strategies aimed at maximizing selected performance measures (e.g., speed, area, speed/area ratio, power, cost, etc.)
OR
10
![Page 11: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/11.jpg)
ATHENa Major Features (2)• automated verification of the design through simulation in the batch
mode
• exhaustive search for optimum options of tools
• heuristic adaptive optimization strategies aimed at maximizing selected performance measures (e.g., speed, area, speed/area ratio, power, cost, etc.)
OR
11
![Page 12: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/12.jpg)
12
Multi-Pass Place-and-Route AnalysisGMU SHA-512, Xilinx Virtex 5
100 runs for different placement starting points
The smaller the better
~ 20%
best worst
Minimum clock 12
![Page 13: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/13.jpg)
13
Dependence of Results on Requested Clock Frequency
![Page 14: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/14.jpg)
ATHENa Applications
• single_run: - one set of options
• placement_search - one set of options - multiple starting points for placement
• exhaustive_search - multiple sets of options - multiple starting points for placement - multiple requested clock frequencies
![Page 15: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/15.jpg)
SHA-1 Results
Throughput [Mbit/s]
Architectures
Virtex 5
Virtex 4
Spartan 3
15
![Page 16: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/16.jpg)
spartan3 virtex4 virtex5 cyclone2 cyclone3 stratix2 stratix3
0
200
400
600
800
1000
1200
1400
1600
1800
2000
sha1
sha256
sha512
FPGA family
Mb
/sATHENA Results for SHA-1, SHA-256 & SHA-512
16
![Page 17: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/17.jpg)
Ideas (1)
17
• Select several representative FPGA platforms with significantly different properties
e.g., different vendor – Xilinx vs. Altera
process - 90 nm vs. 65 nm
LUT size - 4-input vs. 6-input
optimization - low-cost vs. high-performance
• Use ATHENa to characterize all SHA-3 candidates and SHA-2 using these platforms in terms of the target performance metrics (e.g. throughput/area ratio)
![Page 18: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/18.jpg)
Ideas (2)
18
• Calculate ratio
SHA-3 candidate performance vs.
SHA-2 performance (for the same security level)
• Calculate geometrical average over multiple
platforms
![Page 19: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/19.jpg)
Technology Low-cost High-performance
120/150 nm Virtex 2, 2 Pro
90 nm Spartan 3 Virtex 4
65 nm Virtex 5
45 nm Spartan 6
40 nm Virtex 6
Xilinx FPGA Devices
![Page 20: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/20.jpg)
Xilinx FPGA Device Support by Tools
Version Low-cost High-performance
Xilinx ISE 10.1 All up to Virtex 5 All up to Virtex 5
Xilinx WebPACK 11.1 Smallest up to Virtex 5 Smallest up to Virtex 5
Xilinx WebPACK 11.3 Smallest up to Virtex 5Smallest Spartan 6,
Virtex 6
Smallest up to Virtex 5Smallest Spartan 6,
Virtex 6
![Page 21: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/21.jpg)
Altera FPGA Devices
Technology Low-cost Mid-range High-performance
130 nm Cyclone Stratix
90 nm Cyclone II Stratix II
65 nm Cyclone III Arria I Stratix III
40 nm Cyclone IV Arria II Stratix IV
![Page 22: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/22.jpg)
Altera FPGA Device Support by Tools
Version Low-cost Mid-range High-performance
Quartus 7.1 Cyclone IV none, Cyclone III all
Arria GX allArria II GX none
Stratix II smallest,Stratix III none
Quartus 8.1 Cyclone IV none, Cyclone III all
Arria GX allArria II GX none
Stratix I, II, IIIsmallest
Quartus 9.0 sp2, Sep. 09
Cyclone IV none, Cyclone III all
Arria GX allArria II GX none
Stratix I, II, IIIsmallest
Quartus 9.1Nov. 09
Cyclone IV smallest,
Cyclone III all
Arria GX allArria II GX smallest
Stratix I, II, IIIall
Stratix IV none
![Page 23: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/23.jpg)
FPGA and ASIC Performance Measures
23
![Page 24: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/24.jpg)
The common ground is vague
• Hardware Performance: cycles per block, cycles per byte, Latency (cycles), Latency (ns), Throughput for long messages, Throughput for short messages, Throughput at 100 KHz, Clock Frequency, Clock Period, Critical Path Delay, Modexp/s, PointMul/s
• Hardware Cost: Slices, Slices Occupied, LUTs, 4-input LUTs, 6-input LUTs, FFs, Gate Equivalent GE, Size on ASIC, DSP Blocks, BRAMS, Number of Cores, CLB, MUL, XOR, NOT, AND
• Hardware efficiency: Hardware performance/Hardware cost
24
![Page 25: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/25.jpg)
25
Our Favorite Hardware Performance Metrics:
Mbit/s for Throughput
ns for Latency
Allows for easy cross-comparison among implementationsin software (microprocessors), FPGAs (various vendors),
ASICs (various libraries)
![Page 26: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/26.jpg)
26
But how to define and measure throughput and latency for hash functions?
Time to hash N blocks of message = Htime(N, TCLK) =
Initialization Time(TCLK) + N * Block Processing Time(TCLK) + Finalization Time(TCLK)
Latency = Time to hash ONE block of message = Htime(1, TCLK) = = Initialization Time + Block Processing Time + Finalization Time
Throughput (for long messages) = Htime(N+1, TCLK) - Htime(N, TCLK)
Block size
= Block size
Block Processing Time (TCLK)
![Page 27: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/27.jpg)
But how to define and measure throughput and latency for hash functions?
Initialization Time(TCLK) = cyclesI T⋅ CLK
Block Processing Time(TCLK) = cyclesP T⋅ CLK
Finalization Time(TCLK) = cyclesF T⋅ CLK
Block size
from specification
from analysis of block diagram
and/or functional simulation
from place & route report
(or experiment)
27
![Page 28: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/28.jpg)
How to compare hardware speed vs. software speed?
EBASH reports (http://bench.cr.yp.to/results-hash.html)
In graphs
Time(n) = Time in clock cycles vs. message size in bytes for n-byte messages, with n=0,1, 2, 3, … 2048, 4096
In tables
Performance in cycles/byte for n=8, 64, 576, 1536, 4096, long msg
Time(4096) – Time(2048)
2048 Performance for long message =
28
![Page 29: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/29.jpg)
How to compare hardware speed vs. software speed?
Throughput [Gbit/s] = Performance for long message [cycles/byte]
8 bits/byte clock frequency [GHz] ⋅
29
![Page 30: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/30.jpg)
30
How to measure hardware cost in FPGAs?
1. Stand-alone cryptographic core on FPGA
2. Part of an FPGA System On-Chip
3. FPGA prototype of an ASIC implementation
Cost of a smallest FPGA that can fit the core. Unit: USD [FPGA vendors would need to publish MSRP (manufacturer’s suggested retail price) of their chips] – not very likelyor size of the chip in mm2 - easy to obtain
Vector: (CLB slices, BRAMs, MULs, DSP units) for Xilinx
(LEs, memory bits, PLLs, MULs, DSP units) for Altera
Force the implementation using only reconfigurable logic(no DSPs or multipliers, distributed memory vs. BRAM): Use CLB slices as a metric. [LEs for Altera]
![Page 31: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/31.jpg)
How to measure hardware cost in ASICs?
1. Stand-alone cryptographic core
2. Part of an ASIC System On-Chip
Cost = f(die area, pin count)
Tables/formulas available from semiconductor foundries
Cost ~ circuit area
Units: μm2
or GE (gate equivalent) = size of a NAND2 cell
31
![Page 32: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/32.jpg)
Deliverables (1)1. Detailed block diagram of the Datapath with names of all signals matching VHDL code [electronic version a bonus]
2. Interface with the division into the Datapath and the Controller [electronic version]
3. ASM charts of the Controller, and a block diagram of connections among FSMs (if more than one used)
[electronic version a bonus]
4. RTL VHDL code of the Datapath, the Controller, and the Top-Level Circuit
5. Updated timing and area analysis formulas for timing confirmed through simulation 32
![Page 33: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/33.jpg)
Deliverables (2)
6. Report on verification
− highest level entity verified for functional correctness• Functional simulation• Post-synthesis simulation• Timing simulation [bonus]
− verification of lower-level entities- Name of entity- Testbench used for verification- Result of verification, incorrect behavior, possible source of
error
33
![Page 34: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/34.jpg)
Deliverables (3)
7. Results of benchmarking using ATHENa
– Entire core or the highest level entity verified for correct functionality– Xilinx Spartan 3, Virtex 4, Virtex 5– Three methods of testing
• Single_run• Placement_search [cost table = 1, 11, 21]• Exhaustive_search
[cost_table = 31, 41, 51; speed or area; two sets of requested frequencies]– Results generated by ATHENa– Your own graphs and charts– Observations and conclusions
34
![Page 35: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/35.jpg)
Bonus Deliverables (4)
8. Pseudo-code [but not a C code]
9. Bugs and suspicious behavior of ATHENa
10. Additional results of benchmarking using ATHENa
– Altera Cyclone II, Stratix II, Cyclone III, Arria I, Stratix III– Three methods of testing
• Single_run• Placement_search [seed = 1, 1000, 2000]• Exhaustive_search
[seed = 3000, 4000, 5000; speed or area; two sets of requested frequencies]– Results generated by ATHENa– Your own graphs and charts– Observations and conclusions
35
![Page 36: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/36.jpg)
Bonus Deliverables (5)
11. Report from the meeting with students working on the same SHA core
– Summary of major differences– Advantages and disadvantages of your design
12. Bugs found in the – Padding script– Testbench– Class examples– Slides– Documentation– SHA-3 Packages– Etc.
36
![Page 37: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/37.jpg)
Bonus Deliverables (6)
13. Extending the design to cover all hash function variants
– Hash value sizes: 512 [highest priority], 384, 224– Other variant/parameter support specific to a given hash function– Support through generics or constants
14. Padding in hardware
Assuming that message size before padding is already a multiple of the– word size– byte size– a single bit
37
![Page 38: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/38.jpg)
38
14 local students
(with 3 former BSCpE graduates)
14 internationalstudents
4 GWU PhD candidates
Composition of Students
![Page 39: ATHENa and Deliverables](https://reader031.vdocuments.us/reader031/viewer/2022020101/545fe159b1af9ffa588b4f68/html5/thumbnails/39.jpg)
After Grading
1. Summary of results published on the course web page
2. Selected students invited to develop articles/reports to be posted on the - ATHENa web page - SHA-3 Zoo Web Page 3. Unification, generalization and optimization of codes by Ice, myself, and other students
4. Presentation to NIST, conference submissions, presentation at the Second SHA-3 Conference in Santa Barbara in August 2010.
39