design of a power aware encryption accelerator a …
TRANSCRIPT
DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR
A Project
Presented to the faculty of the Department of Electrical and Electronic Engineering
California State University, Sacramento
Submitted in partial satisfaction of
the requirements for the degree of
MASTER OF SCIENCE
in
Electrical and Electronic Engineering
by
Muhammad Haider Pervaiz
SPRING
2016
ii
DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR
A Project
by
Muhammad Haider Pervaiz
Approved by:
__________________________________, Committee Chair
Dr. Behnam Arad
__________________________________, Second Reader
Dr. Ted Krovetz
__________________________
Date
iii
Student: Muhammad Haider Pervaiz
I certify that this student has met the requirements for format contained in the University format
manual, and that this project is suitable for shelving in the Library and credit is to be awarded for
the project.
__________________________, Graduate Coordinator ___________________ Dr. Preetham Kumar Date
Department of Electrical and Electronic Engineering
iv
Abstract
of
DESIGN OF A POWER AWARE ENCRYPTION ACCELERATOR
by
Muhammad Haider Pervaiz
Today we live in an age where data security in digital communication has become an
important requirement. The need for privacy and protection of data has made major companies take
appropriate actions like recent addition of end-to-end encryption by WhatsApp for its over billion
users. There are both software and hardware approaches to encrypt messages with former being
more flexible but less efficient than latter. Moreover, hardware solutions are most advisable for
portable devices [1]. In this project, a hardware accelerator for AEGIS128L encryption algorithm
is presented for mobile devices. The accelerator was designed considering power efficiency as one
of the primary goals, since mobile devices operate on battery supply. Different power saving
techniques like parallel design, clock gating, power gating and multi-threshold voltage cells were
used to achieve this goal. Other important factors considered were speed and area.
The project encompassed power aware hardware implementation of AEGIS128L, modeling
it in SystemVerilog hardware description language (HDL), verifying it using a layered test bench,
synthesizing it using 90nm cell library and finally performing power estimation. In the power
estimation, we used the gate-level netlist generated during synthesis and the switching activity of
the netlist during simulation to get an accurate estimation of power usage. Synopsys Electronic
v
Design Automation (EDA) tools like VCS simulator, Design Compiler synthesis, and Power
Compiler tools were used in this work.
Power consumption of the proposed design improved considerably throughout the project
phases. The proposed design required 7.6% less power compared to a non-power aware design in
the normal operating mode. The power saving during the sleep mode was 68%. The design supports
data rate of 1.6 Gigabytes per second.
_______________________, Committee Chair
Dr. Behnam Arad
_______________________
Date
vi
ACKNOWLEDGEMENTS
I begin with thanking Almighty Allah who gave me the strength, courage, ability and
excellent support in the form of family, teachers, friends and colleagues that helped me succeed.
I am thankful to my parents for their endless support and encouragement throughout. I
would like to thank my brother and sister for sharing their thoughts whenever I needed them. I also
would like to thank my uncle for guiding me about the university procedures.
I am grateful to Prof. Arad for giving me an opportunity to work with him and for being an
excellent teacher and mentor. I thank Prof. Krovetz for accepting to be my second reader and giving
valued input throughout the project. I also thank Prof. Kumar and Prof. Faroughi for guiding me.
I would like to thank my colleagues Tasnif, Hardik, Rashid, Tejas, Dhaval and Zeeshan in
helping me understand the power aware design flows. I would also like to thank my friends who
accompanied me in my leisure time and helped me in relaxing. Special thanks to Muzaffar for being
a good friend and support.
Lastly, I would like to thank CSU, Sacramento, Electrical and Electronic Engineering
Department and Synopsys for providing me the facilities and required tools to complete this work.
vii
TABLE OF CONTENTS
Page
Acknowledgements ......................................................................................................................... vi
List of Tables .................................................................................................................................. xi
List of Figures ................................................................................................................................ xii
Chapter
1. INTRODUCTION ....................................................................................................................... 1
1.1 Overview ............................................................................................................................... 1
1.2 Power Aware Design Flow for AEGIS128L ......................................................................... 4
1.3 Report Structure ..................................................................................................................... 6
2. THE AEGIS128L ENCRYPTION ALGORITHM ..................................................................... 9
2.1 AEGIS Family ....................................................................................................................... 9
2.2 AES128 VS AEGIS128L .................................................................................................... 10
2.3 AEGIS128L State Update Function .................................................................................... 11
2.3.1 AES Round Function ................................................................................................... 11
2.4 AEGIS128L Stages.............................................................................................................. 13
2.4.1 Initialization Stage........................................................................................................ 13
2.4.2 Processing the Authenticated Data ............................................................................... 14
2.4.3 Encryption Stage .......................................................................................................... 15
2.4.4 Finalization Stage ......................................................................................................... 15
2.5 AEGIS128L Usage Recommendations ............................................................................... 16
2.6 Decryption .......................................................................................................................... 16
3. ARCHITECTURAL DESIGN OF AEGIS128L .......................................................... 17
viii
3.1 Introduction to Hardware Design of AEGIS128L ............................................................... 17
3.2 High Level Design of AEGIS128L ..................................................................................... 19
3.2.1 Block Level Design ...................................................................................................... 19
3.2.2 Encryption Cycle .......................................................................................................... 20
3.3 Parallel Design Over Pipeline .............................................................................................. 20
3.4 Power Domains ................................................................................................................... 22
3.5 Clock Gating Map ............................................................................................................... 23
3.6 SOC Level Power Saving Awareness ................................................................................. 24
3.6.1 Power Gating at the SOC Level ................................................................................... 25
3.6.2 Clock Frequency Division at the SOC Level ............................................................... 25
4. MODELING OF AEGIS128L IN SYSTEMVERILOG............................................................ 26
4.1 SystemVerilog Features Used ............................................................................................. 26
4.1.1 Package ........................................................................................................................ 26
4.1.2 Interface ........................................................................................................................ 26
4.1.3 User Defined Types ...................................................................................................... 26
4.1.4 Functions ...................................................................................................................... 27
4.1.5 Enhanced Blocks .......................................................................................................... 27
4.2 ENCRYPTION MODULE .................................................................................................. 27
4.2.1 Overview ...................................................................................................................... 27
4.2.2 Initialization Sub-Module ............................................................................................ 29
4.2.3 Controller Sub-Module ................................................................................................ 30
4.2.4 Datapath Sub-Module .................................................................................................. 31
4.3 FIFO RAM Module ............................................................................................................. 34
ix
4.4 FIFO Controller Module ...................................................................................................... 34
5. AEGIS128L VERIFICATION .................................................................................................. 37
5.1 Overview ............................................................................................................................. 37
5.2 AEGIS128L Verification Framework ................................................................................. 37
5.2.1 Connection with DUT .................................................................................................. 38
5.2.2 Inter-process communication ....................................................................................... 39
5.2.3 Program block .............................................................................................................. 39
5.2.4 Validation ..................................................................................................................... 40
5.2.5 Coverage ...................................................................................................................... 40
5.3 Layered Testbench ............................................................................................................... 42
5.3.1 Class Transactor ........................................................................................................... 42
5.3.2 Class Generator ............................................................................................................ 42
5.3.3 Class Agent .................................................................................................................. 43
5.3.4 Class Driver .................................................................................................................. 43
5.3.5 Class Scoreboard .......................................................................................................... 43
5.3.6 Class Monitor ............................................................................................................... 43
5.3.7 Class Checker ............................................................................................................... 44
5.4 Regular Test Bench for Saif ................................................................................................ 44
5.4.1 Scenarios ...................................................................................................................... 44
5.4.2 SAIF File Generation ................................................................................................... 44
5.4.3 Gate Level Simulation .................................................................................................. 46
5.5 Multi-Voltage Aware Simulation ........................................................................................ 49
6. AEGIS128L SYNTHESIS ......................................................................................................... 50
x
6.1 Synthesis Script ................................................................................................................... 51
6.2 UPF Script ........................................................................................................................... 51
6.3 Checks ................................................................................................................................. 52
6.4 Clock Rate ........................................................................................................................... 52
7. POWER ESTIMATION AND ANALYSIS .............................................................................. 53
7.1 Power Estimation ................................................................................................................. 53
7.1.1 Power Types ................................................................................................................. 53
7.1.2 Calculating Power ........................................................................................................ 54
7.1.3 Report_power ............................................................................................................... 55
7.1.4 Manual Setting ............................................................................................................. 56
7.1.5 Using SAIF ................................................................................................................... 56
7.2 Power Analysis .................................................................................................................... 57
8. CONCLUSION .......................................................................................................................... 60
Appendix A. AEGIS128L HARDWARE MODEL SOURCE FILES .......................................... 62
Appendix B. AEGIS128L VERIFICATION ENVIRONMENT ................................................... 85
Appendix C. SYNTHESIS, UPF AND PERL SCRIPTS ............................................................ 101
Appendix D. POWER AND SIMULATION RESULTS ............................................................ 105
References .................................................................................................................................... 111
xi
LIST OF TABLES
Tables Page
1. Basic Specifications of AEGIS128L ............................................................................. 9
2. Comparison of AEGIS family’s algorithms , based on [4] .......................................... 10
3. IO description .............................................................................................................. 18
4. Clock cycle behavior ................................................................................................... 20
5. Total cycles one message takes for encryption ............................................................ 28
6. Encryption system states .............................................................................................. 31
7. Functional Coverage Commands, based on [10] ......................................................... 41
8. Modes of operation ...................................................................................................... 45
9. SAIF generation commands [17] ................................................................................. 46
10. Removing timing information from gate-level cell [16] .............................................. 48
11. Power Improvements ................................................................................................... 58
xii
LIST OF FIGURES
Figures Page
1. Power aware design flow for AEGIS128L, based on [4] ................................................... 4
2. AEGIS128L state update function, based on [4] .............................................................. 11
3. AES round function .......................................................................................................... 12
4. AEGIS128L stages ........................................................................................................... 13
5. AEGIS128L Initialization stage summary [4] .................................................................. 14
6. Black box of AEGIS128L ................................................................................................. 17
7. Block level diagram .......................................................................................................... 19
8. Pipelined design limitations, no buffers shown ................................................................ 21
9. Parallel architecture, buffers not shown ............................................................................ 22
10. AEGIS128L Power Domains Summarized by Power Compiler ...................................... 23
11. Clock gated domains ......................................................................................................... 24
12. Initialization block ............................................................................................................ 30
13. Block diagram of datapath module ................................................................................... 32
14. Block diagram of state update function ............................................................................ 33
15. FIFO controller ................................................................................................................. 35
16. Mealy Finite State Machine for FIFO controller .............................................................. 36
17. AEGIS128L verification framework, based on [13] ......................................................... 38
18. Connection of test-bench with DUT, based on [13] ......................................................... 39
19. Functional Coverage results for AEGIS128L ................................................................... 42
20. VCS NLP Flow [15] ......................................................................................................... 49
21. Synthesis Process, based on [18] ...................................................................................... 50
xiii
22. SAIF file snippet ............................................................................................................... 57
1
CHAPTER 1: INTRODUCTION
1.1 Overview:
50 years of Moore’s Law has helped semiconductor industry grow at a phenomenal rate.
In 1971 Intel’s first processor (Intel 4004) had a transistor count of 2300. Today, its latest processor
Xeon Broadwell-E5 has 7.2 Billion transistors [2]. On one hand, such a huge transistor count
provides high computational performance and offers huge storage capabilities but on the other
hand, it presents challenges like high power consumption and data security. With Moore’s Law
driven devices changing our daily lifestyle with breakthroughs in modern cities, transportation,
healthcare, education [3] and more, it becomes extremely important to protect the user data from
falling into the hands of phishing attackers, hackers and other un-trusted parties. In addition, it is
worth noting that since 2013, more Tablets have been sold than Laptops and well over 4.7 Billion
mobile phone users exist today worldwide [5]. Clearly, with the increase of mobile network
devices, to provide power efficient solutions for security has become equally important.
One of the most effective ways to protect the data is by encryption. Encryption can be
implemented in software or hardware. The hardware implementation is faster and more secure
compared to the software implementation. It is also a more suitable option for portable devices [1]
and thus used in this project. Message protection typically requires protection of confidentiality
and authenticity. Both of these requirements can be treated either separately or in an integrated
fashion by an encryption algorithm. The advantage of latter is that it is more efficient as it saves
computation cost [4] and becomes an obvious choice for an encryption algorithm to be used in the
mobile devices. Furthermore, there are at least three ways in which integrated encryption algorithm
2
can be developed; the first one is by using block cipher in special mode, the second is by using
stream cipher with key stream divided into two parts (one for encryption and other for
authentication), and the third is by designing a dedicated authenticated encryption algorithm [4].
In this project, the third type of integrated algorithm named AEGIS128L was used because of the
following reasons:-
1. It is very fast and can support modern day 4G network speed demands. It is 8 times faster
than AES in CBC mode [4].
2. It is suitable for parallel hardware implementation, as parallel AES round functions are
used at each step [4] and so can be operated on low frequency to save power.
3. It has less computation cost and so is relatively better option than other algorithms for
power aware hardware design.
4. Encryption and decryption share same algorithm [4] and so same hardware as well.
5. Authentication is achieved almost for free [4].
6. It does not need to encrypt packet header and so is suitable for network communication
[4].
7. It is a symmetric system and uses 128-bit key, which offers high security.
8. It is relatively new algorithm.
9. It provides robust security if following three conditions are met [4]:-
a. Nonce is not re-used
b. 128 bit authentication tag is used
c. Forgery attack is not successful by repeating the attack.
Today network communication is based on either Internet Protocol Version 4 (IPv4) or
3
Version 6 (IPv6). A packet in IPv6 has 20 bytes header size and can have up to 65536 bytes in
payload. The implementation of AEGIS128L presented in this project was modeled for mobile
network devices and supports the IPv6’s packet size. More details of the algorithm are covered in
the next chapter.
As described above that in the hardware design for the portable devices, the emphasis on
low power design is inevitable. Thus in this project at every step decisions were made by keeping
power consumption as the most important factor. From selection of the algorithm to architectural
design to HDL modeling and synthesis, all the power saving techniques in Front End stage of the
Application Specific Integrated Circuits (ASIC) design flow were observed. These power saving
methods considered in this project are discussed in detail in [6] and are enlisted below:-
1. Selection of an algorithm that requires lesser number of transitions.
2. Architectural design with parallelism
3. In design modeling:-
a. Controlled counters
b. Gray coded state machines
c. Resource sharing
d. Avoiding glitches
4. Clock gating
5. Frequency division
6. Multi Voltage (considered but not used)
7. High threshold library cells
8. Power gating
4
1.2 Power Aware Design Flow for Aegis128L:
Figure 1 shows an overview of the power aware design flow followed in this project. It
was inspired from [4]. Problem statement describes the high-level goal of the project and
Figure 1 – Power aware design flow for AEGIS128L, based on [4]
define the specifications to be met. The objective of this project was to design an accelerator for an
encryption algorithm for the mobile devices. Since the mobile devices are portable, the natural
requirement of power efficiency became a primary specification. Similarly, speed and area factors
5
were also critical criteria for portable network devices. Furthermore, the encryption itself had to be
robust. We selected an algorithm to fulfill these requirements.
Next step was to design a high-level hardware model for the selected algorithm such that
power consumption can be as low as possible. RTL modeling was the next step in which the design
was coded in System Verilog to realize the devised architecture for the encryption algorithm in
hardware. RTL modeling style was kept power aware. Once the power aware RTL source code was
ready, the clock gating enable signals were added in the RTL code. The synthesis tool added the
clocking gating cells for the clock gating enable signals in the RTL. This type of clock gating is
called fine grain clock gating.
In the synthesis phase, the power aware technology cell library was provided to the
synthesis tool. The library supported clock gating and multi-threshold voltage level (Vt) cells. For
this project, multi Vt cell library was provided to the synthesis tool and it chose High Vt cells
according to the timing constraints of different paths. High Vt cells provide low leakage but are
slower and so were placed in the non-critical timing paths automatically.
The synthesis tool was also provided with the Unified Power Format (UPF) file. The UPF
file used in this project meets the IEEE 1801-2009 standard specifications. The UPF file describes
the power intent of the design to the synthesis tool. The power gated netlist helps in turning off
power on certain portions of the design, which are in sleep mode. This saved both leakage and
dynamic power spent on idle domains.
6
Once we had the gate-level netlist, it was important to verify that everything synthesized
as intended. For this purpose, either formal verification tool can be run or gate-level simulation can
be verified. We opted for the latter as it also helped in generating accurate switching activity files
that were used to estimate power.
1.3 Report Structure:
Following lays out the structuring of this report:-
Chapter 1: This chapter introduces the project, describes its importance and goals. It talks
about the rational of the selection of the encryption algorithm and about the power aware design
methodology.
Chapter 2: This chapter summarizes the AEGIS128L encryption algorithm. It starts with
describing the rational of choosing AES128L over its earlier versions AEGIS256 and AEGIS128.
Then it gives the basic introduction of the AEGIS128L algorithm and moves on to explain how it
works.
Chapter 3: This chapter describes the architectural design of AEGIS128L. It explains why
parallel architecture is preferred over pipelined architecture. It gives a high-level view of the design
of AEGIS128L. All modules, their interconnections and controls for clock and power gating are
also explained.
7
Chapter 4: This chapter provides details of the hardware modeling of the algorithm using
SystemVerilog HDL. It gives details of each module used and discusses how power aware design
is incorporated in the RTL code.
Chapter 5: This chapter contains information about the verification done using program
based layered test bench and SystemVerilog Object Oriented Programming concepts. Functional
coverage was used to get a sense of how much design was exercised. Apart from the functional
verification, this chapter also explains the regular test benches used on synthesized gate level netlist
to generate switching activity SAIF files for different scenarios to get accurate Front End power
estimation discussed later in Chapter 7.
Chapter 6: This chapter explains the synthesis script and the UPF script used in the power
aware synthesis process. Synopsys Design Compiler coupled with Power Compiler was the tool
used for the synthesis process. 90nm technology library’s regular, clock gating and multi-threshold
voltage cells were used for the synthesis. The power intent was formulated according to IEEE 1801-
2009 Unified Power Format standard.
Chapter 7: This chapter discusses the use of SAIF files generated in Chapter 5. It talks
about the Front End power estimation methods and presents the power results analysis. Step by
step improvement in power consumption is shown for different modes of operations like sleep,
normal and overdrive mode.
Chapter 8: Includes the conclusion describing the summary of the work, results obtained
and future work possibilities.
8
Appendices: Appendix A contains the AEGIS128L model source files. Appendix B
contains the AEGIS128L verification files. Appendix C contains the synthesis, UPF and Perl scripts
used in the project. Appendix D contains the power estimation reports along with other simulation
result.
9
CHAPTER 2: AEGIS128L ENCRYPTION ALGORITHM
In this chapter, AEGIS128L encryption algorithm is described. AEGIS128L is a symmetric
cipher which is designed by Hongjun Wu1 and Bart Preneel2. It is a dedicated authenticated
encryption algorithm, which means that the encryption of the message and production of the
authentication tag are integrated in the same algorithm. This feature of AEGIS128L makes it stand
out as an obvious choice for fast speed hardware implementation as it requires less computation.
Table 1 shows some basic specifications of the algorithm.
Table 1 – Basic Specifications of AEGIS128L
Key Size 128 bits
Message Block Size 256 bits
Nonce Size 128 bits
Authentication Tag Size 128 bits
2.1 AEGIS Family:
AEGIS family has three version of the algorithm, AEGIS128L, AEGIS128 and AEGIS256.
From the hardware design perspective, area, power, speed and security are four main criteria that
were analyzed and lead to the selection of AEGIS128L. A brief summary of the analysis is shown
in Table 2.
---------------------------- 1 Division of Mathematical Sciences, Nanyang Technological University, 50 Nanyang Ave, Singapore
639798 Email: [email protected] 2 Dept. Elektrotechniek-ESAT/COSIC, KU Leuven and iMinds, Ghent
Email: [email protected]
10
Table 2 – Comparison of AEGIS family’s algorithms, based on [4]
Criteria 128 128L 256
AREA Least High Middle
POWER Medium Best Worst
SPEED In middle Fastest Slowest
SECURE Good
Key size: 128 bits
Good
Key size: 128 bits
Most but more than
required.
Key size: 256 bits
Leakage power is higher in AEGIS128L due to higher gate count but dynamic power will
be lower because AEGIS128L can do same job in same amount of time like other two versions but
for a lower clock frequency. Moreover, leakage power can be controlled by using power gating
when system is in sleep mode. So overall, AEGIS128L offers better power efficiency.
2.2 AES128 VS AEGIS128L:
AEGIS128L uses AES round function block as a primitive. The main difference between
AES128 and AEGIS128L encryption algorithms is that AEGIS128L provides good security in
fewer rounds per message block because of its overall mathematical scheme. AES128 encryption
algorithm needs ten rounds while AEGIS encryption algorithm uses only one per message block
(this is excluding the initialization and post encryption rounds per message) [4]. With all the stages
considered, it takes four rounds. Therefore, AEGIS is much faster than AES for the same clock rate
and thus consumes less power.
On the latest Intel Haswell microprocessors, the speed of AEGIS-128L is more than twice
that of AES-GCM [4]. Other features of AEGIS128L were already discussed in Chapter 1.
11
2.3 AEGIS128L State Update Function:
AEGIS128L uses a 128-byte state. State is used to store the randomized data that is
produced by the algorithm. The state update function updates the 128-byte state Si with two 16-
byte message blocks ma and mb to produce Si+1 [4]. It consists of eight AES round function blocks
(not the last round). Figure 2 demonstrates how Si+1 (next state) is generated from Si (present state).
In the figure, ‘R’ represents the AES round function and ‘w’ is the temporary 16 byte word.
Figure 2 – AEGIS128L state update function, based on [4]
2.3.1 AES Round Function:
AES round function is a primitive that is used by different encryption algorithms. It is based
on substitution-permutation network and was designed by Joan Daemen and Vincent Rijmen [7].
Since it is commonly available, we have leveraged its System Verilog code from [11]. Figure 3
shows how the AES round function for the two 128-bits inputs operates.
12
Figure 3 – AES round function
13
The AES round function for 128-bits consists of four steps. SubBytes is a non-linear
substitution step where each byte is replaced with another according to a lookup table. ShiftRows
is a transposition step where the last three rows of the state are shifted cyclically a certain number
of steps. MixColumns is a mixing operation, which operates on the columns of the state, combining
the four bytes in each column. AddRoundKey does the bitwise XOR operation of the result of the
above three steps with the other key input into the round function. This summary is taken from [7].
2.4 AEGIS128L Stages:
AEGIS128L has four stages, which are depicted in Figure 4. Now let us understand how
each of the state is functioning.
Figure 4 – AEGIS128L stages
2.4.1 Initialization Stage:
In this stage for a new message, key and nonce are loaded into the state and cipher runs for
10 steps using key and nonce as input messages. Figure 5 which is based on [4], shows summary
of this stage.
14
Figure 5 – AEGIS128L Initialization stage summary [4]
The abbreviations used in Figure 5 are described below:-
K128: 128 bit Key
IV128: 128 bit Nonce
Const0: First 16 bytes of constant
Const1: Last 16 bytes of constant
Constant: It is a 32-byte constant which is represented in hexadecimal as follows:-
Constant =
000101020305080d1522375990e979_62db3d18556dc22ff12011314273b528dd
Operation: Operation used in Figure 4 is bitwise XOR.
2.4.2 Processing the Authenticated Data:
Once initialization is done, the associated data updates the state. Following equation is used
‘x’ number of times where ‘x is the length of the associated data in bits divided by 256 [4].
Si+1 = Stateupdate128L (Si, Associated_datai, Associated_datai+1) [4]
15
It is important to note here that associated data is not encrypted but instead is used to update
the state. Associated data can be send without undergoing encryption.
2.4.3 Encryption Stage:
Once initialization for a message is completed, cipher text is generated per one-step for
each block of that message. Message block is mixed with the state using the state update function
as follows:-
Ci = Pi Si,1 Si,6 (Si,2 & Si,3) [4]
Ci+1 = Pi+1 Si,2 Si,5 (Si,6 & Si,7) [4]
Si+1 = Stateupdate (Si,Pi,Pi+1) [4]
where Pi and Pi+1 are 128-bit message blocks. Ci and Ci+1 are respective ciphered blocks.
2.4.4 Finalization Stage:
In the finalization stage, seven steps are used to generate the authentication tag. Following
equations from [4] summarize this stage.
tmp = Si (64-bit assoc. data length concatenated with 64-bit message length)
For i = 0 to 6, states are updated as following:-
Si+1 = Stateupdate128L(Si,tmp,tmp)
Once the seven steps are completed, authentication tag is generated using following equation:-
Tag = 6i=0 S6,i (bitwise xor operation of the eight 128 bits of the state)
16
2.5 AEGIS128L Usage Recommendations:
For robust security, AEGIS128L must be used such that the Nonce is not re-used.
Moreover, 128-bit authentication tag should be used. It is also worth mentioning here that
AEGIS28L can support up to 264 message blocks. However, since IPv6 requirement is only 216
octets of payload, the implementation in this project only supports up to 216 bytes of data payload
per message. For the fast data rate, it is recommended that the System on Chip (SOC) should only
send 64KB size messages. More detail on this is covered in Chapter 4.
2.6 Decryption:
In this project, we did not perform the decryption process for AEGIS128L. In order to
perform the decryption of AEGIS128L, exact values of the key size, the nonce size and the tag size
should be known [4]. The process of decryption is similar to the encryption. Firstly, the
initialization and associated data stages should be applied. Then state-update function should be
used to generate the plaintext from the cypher-text. Finally, the finalization stage should be applied.
Therefore, the same hardware that encrypts the plaintext can be used for the decrypting the cypher-
text.
17
CHAPTER 3: ARCHITECTURAL DESIGN OF AEGIS128L
In this chapter, a hardware accelerator for AEGIS128L encryption algorithm is proposed.
The accelerator is designed to support the speed and the power efficiency required by the mobile
network devices today.
3.1 Introduction to Hardware Design of AEGIS128L:
The black box diagram of AEGIS128L is shown in Figure 6. The description of each port
can be found in Table 3. Two 128-bit input ports are re-used for different data at different stages of
the encryption process cycle. This re-use saves more than 255 extra pins without effecting system’s
performance. Input pin ‘Start’ begins the process of encryption and can only be asserted when
system’s output pin ‘ready’ is asserted. More detail on this is provided in later sections.
Figure 6 – Black box of AEGIS128L
18
Table 3 – IO description
Port
Name
Input
or
Output
Description
start Input This signal starts the encryption process. It should only become
active high when ready output is active high. It stays ON until all
of the message blocks are received and then it turns off.
FINon Input This signal indicates that the finalization stage should start now.
It indicates that the message has ended now.
inbus128_1 Input
128-bit
bus
This 128-bit input bus is used to input KEY in the first two cycles
after the start. Then it carries associated data for two cycles.
After that it has message block and in the finalization stage, it
has zero value.
inbus128_2 Input
128-bit
bus
This 128-bit input bus is used to input NONCE in the first two
cycles after the start. Then it carries associated data for two
cycles. After that, it has message block and in the finalization
stage, it has zero value.
Rst Input This is asynchronous reset input of the system.
Clk Input This is the clock input of the system. Note that the outer network
layer of the SOC drives this clk input and so depending upon
regular or overdrive mode drives this clock, different clock rate
clocks can be used.
Full Output This output tells that the FIFO is full and cannot store anymore.
ready Output Ready tells the outer system about whether system can take
another message to encrypt or not.
tagout Output This output tells that the authentication tag is present in the data
output port.
done Output This indicates that the encryption of message is completed.
dataout128_1 Output
128-bit
bus
This output bus gives the encrypted data and the authentication
tag.
dataout128_2 Output
128 bit
bus
This output bus gives the encrypted data.
19
3.2 High Level Design of AEGIS128L:
AEGIS128L hardware design consists of three main blocks. The FIFO, the FIFO controller
and the encryption block. FIFO is the abbreviation of first in first out. Encryption block is further
divided into three sub-blocks, which are discussed in the next chapter.
3.2.1 Block Level Design:
In the AEGIS128L hardware design, the FIFO controller manages the flow of the data into
the system. It generates the control signals for both the FIFO_RAM and the Encryption block.
Some of the clock and power gating control signals are also generated from the FIFO controller.
The detail of the high-level architecture can be found in the next section. Figure 7 shows the block
level diagram for AEGIS128L.
Figure 7 – Block level diagram
20
3.2.2 Encryption Cycle:
Now let us understand how the encryption cycle for this implementation works. For that
purpose, we are using a clock cycle table that shows the behavior of the design by capturing the
highlights of the design for each clock cycle. For the message size of 256 to 524288 bits, the clock
cycle behavior of the design can be found in Table 4.
Table 4 – Clock cycle behavior
Cycle Input Output Behavior of the design
0 Start=0 Ready=1 System is ready for encryption of new message
1-2 Start=1 System starts, Key and Nonce enter the system
3-4 Associated data enter the system
5-x Message blocks enter the system, ‘x’ depends on
message size
x+1 FINon=1 Message ended, begin finalization step
x+2 Start=0
FINon=0
Cycle ends
x+3 to
till
ready=1
Next message cannot arrive until ready becomes
one.
Ready=1 Next message can come now, process repeats
3.3 Parallel Design over Pipeline:
AEGIS128L algorithm is designed in such a way that it supports parallel data processing
more than structural or pipelined data processing. AEGIS128L uses eight parallel AES round
21
function blocks to update the state that makes pipelining very costly. To understand this, Figure 8
shows the insight of the matter. Note that the buffers for the pipeline are not shown in the diagram.
As you can see in the figure that when three pipeline stages are added, it is incapable of
increasing the throughput because a non-linear pipeline is formed. A non-linear pipeline has a
feedback path and in this case, new output can only arrive every third clock cycle. This is because
present output is required to generate the next output. Hence, this pipeline design will only increase
the clock rate but not the execution time or the throughput. We can use Carry-look ahead adder like
concept here to speed up the process but that will take lot of logic. For a mobile device especially,
the real estate is of big concern.
A more efficient solution is a parallel architecture, as it can improve the execution time
without the need of increasing clock rate. Thus, this kind of architecture should be more power
efficient, and able to meet speed requirement, and use moderate real estate. Figure 9 shows the
parallel architecture concept.
Figure 8 – Pipelined design limitations, no buffers shown
22
Figure 9 – Parallel architecture, buffers not shown
3.4 Power Domains:
Power gating is to turn off the power supply to the un-used portions of the system, reducing
leakage and dynamic power. Power gating brings down the power dissipation number largely.
CMOS switches are used to turn-off the power when power gating enable signal arrives. This
project uses three power domains; always on, turning on and off the initialization block, and turning
on and off the whole encryption block. It is important to note that the whole design can be powered
off by the System on Chip (SOC) but that is not the scope of this project. Figure 10 shows the
power domains for the AEGIS128L. PTOP is a wrapper and is added to take care of the glue logic
23
in the form of buffers that can be added in the backend stage of the ASIC design process. Always-
on domain (PAon) covers the instances z1 and z2 (FIFO and FIFO controller). Power domain for
turning on or off the whole encryption block is labeled POF1 while POF2 is the label of power
domain that turns on or off the initialization block. Initialization block is used only in the first clock
cycle of every message encryption cycle.
Figure 10 – AEGIS128L Power Domains Summarized by Power Compiler
3.5 Clock Gating Map:
Clock gating is one of the most effective ways to control the dynamic power wastage. It
uses a control signal to gate the clock to the registers when no new data has to be captured. This
control signal has to be coded in RTL and later in the synthesis script. Clock gating option should
be enabled for the synthesis tool.
In this implementation, two blocks from Figure 7 are loaded with the registers; hence, we
enabled clock gating for them to save dynamic power. The amount of power saving is significant
24
and discussed in Chapter 7. In Figure 11, we have highlighted the two clock gated blocks and there
clock gating enable signal.
Figure 11 – Clock gated domains
3.6 SOC Level Power Saving Awareness:
In this project, an encryption block was designed for the SOC used in the mobile devices.
Although SOC level discussion is outside the scope of this project, it should be noted that the SOC
must be able to save power on this encryption block by following two methods.
25
3.6.1 Power Gating at the SOC Level:
The SOC must be able to power gate the whole of encryption block when it knows that the
encryption is not needed for a relatively longer period. This helps in saving power by going into
the deep sleep mode.
3.6.2 Clock Frequency Division at the SOC Level:
Another way that SOC can save power on encryption block is by using clock frequency
division. In such an approach, the SOC can run the encryption block on a slower frequency when
the speed can be tolerated. The SOC can choose to run encryption on maximum possible speed
whenever required.
26
CHAPTER 4: MODELING OF AEGIS128L IN SYSTEMVERILOG
4.1 SystemVerilog Features Used:
SystemVerilog is a combined hardware description language and hardware verification
language based on extensions to Verilog [8]. The enhanced features in the language help making
RTL modeling of the design very systematic and efficient. The features that were used in the design
phase of this project are briefly described below.
4.1.1 Package:
Packages provide means to have common code shared by different modules in one place
[9]. We have used it to share user defined data types and synthesizable functions. We used
conditional compilation to include the import of package first time only [10].
4.1.2 Interface:
Interface is used to connect the top RTL module with the layered test bench. This gives
us the ability to modify any IO in one place instead of going into two different files, which is
error prone.
4.1.3 User defined types:
User defined data types help in creating meaningful data types that suit the context of the
signal. In this project, we have used user-defined types to define 128-bit and 1024-bit buses in a
27
way that we can access each byte separately as well. Multi-dimensional arrays were also utilized
in the RTL.
4.1.4 Functions:
SystemVerilog enhances Verilog function quite a lot. These enhancements can be found in
[10]. We have used functions inside package to call it from the RTL, where needed. It is important
to mention here that functions should be defined as automatic for synthesizable RTL.
4.1.5 Enhanced blocks:
Verilog had only ‘always’ block which was used for combinatorial and sequential code.
The differentiation between combinatorial, latch-based and flip flop based design was made by
how we use the always block. This obviously was error prone. System Verilog introduced advanced
blocks like ‘always_comb’ for combinatorial, ‘always_ff’ for flip flop based and ‘always_latch’
for latch-based designs. The RTL code in this project utilizes this feature.
4.2 Encryption Module:
4.2.1 Overview:
The encryption module consists of three sub-modules shown in Figure 7. This module takes
256-bits data block (via two 128-bits input ports) in each cycle and depending upon the control
signals, performs initialization for the new message, takes in associate data to further update the
internal states, encryption of the data and production of the authentication tag. The total number of
28
clock cycles that encryption module takes to complete the encryption of the input packet (header
and payload) is calculated below.
Table 5 – Total cycles one message takes for encryption
Total cycles Initialization takes 10
Total cycles Header/associated data processing
takes
2
Total cycles Message encryption takes (size of payload in bits, y)/256 =
x
Total cycles Finalization process takes 7
Delay before new message 1
Total number of cycles for y bit long message 20+x
As the length of the message increases, the overall data rate increases. AEGIS128L is
designed in a way that it supports message lengths from 256 bits to 524288 bits. This factor limits
the data rate. To explain it let us assume that AEGIS128L can run on 20ns clock time and then find
the slowest and fastest data rate possible.
For the worst case, 256 bits will take 21 cycles to be encrypted, which gives us 76MB/sec
of data rate. Calculations are shown below.
Total cycles = 21, Time Period = 20ns, Execution time = Total cycles x Time Period = 420
ns
29
Total bits = 256, Total bytes = 256/8 = 32
Data rate = Total bytes of data/Total execution time = 32/(420 x 10-9 ) = 76 MB/s
For the best case, 524288 bits will take 2068 cycles to be encrypted, which gives us 1.58
GB/s of data rate. Calculations are shown below.
Total cycles = 2068, Time Period = 20ns, Execution time = Total cycles x Time Period
41360 ns
Total bits = 524288, Total bytes = 524288/8 = 65536
Data rate = Total bytes of data/Total execution time = 65536/(41360 x 10-9 ) = 1.58 GB/s
Therefore, the most efficient way that the SOC can utilize the encryption block is by
sending 64KB (65536 bytes) size messages only. This way encryption block will perform quite
fast.
4.2.2 Initialization sub-module:
This module represents a combinatorial logic that is only used in the first cycle of each
message encryption. This means that for the remaining of up to 2067 cycles, power is dissipated in
the form of leakage power and dynamic power. Dynamic power dissipates because at the input
ports, the signals still go through transitions. This is why we identified this logic under power gating
domain.
This block takes the Key and the Nonce to activate the internal 1024-bits state using a
constant described in Chapter 2. This step is very important and is one of the key ones for making
30
AEGIS128L unbreakable. The Nonce can be public but should always be a new value; network
layer (that drives encryption block and is under SOC control) should not repeat Nonce as that could
potentially weaken the security of the system. Figure 12 shows the block diagram of this module.
Figure 12 – Initialization block
From power-aware modeling perspective, this module did not offer much. Initially, a
portion of this module was designed as a sequential logic to save dynamic power by clock gating.
When it was identified as a power-gated domain, it was converted to all combinatorial logic to
reduce the area. Source code for initialization module can be found in Appendix A.
4.2.3 Controller sub-module:
This module generates two multiplexer control signals for the datapath module, power gate
enable signal for the initialization block, counts the length of the message, generates a factor that
is used as a input by datapath block for calculating authentication tag and also generates two bit
system’s state outputs. The ‘Tagout’ output is used to indicate that authentication tag can now be
captured and output ‘done’ indicates that the system is done performing the encryption. This block
31
uses four control input, namely start, ADon, MSGon and FINon. These control signals identify
which of the four states system is in, Table 6 summarizes these four states.
Table 6 – Encryption system states
start ADon MSGon FINon SYSTEM STATE
0 0 0 0 System is not used. Long wait can lead to power
gate mode.
1 0 0 0 System starts, initialization mode
1 1 0 0 System moves to associate data processing mode
1 1 1 0 System is now encrypting message
1 1 1 1 System is now generating authentication tag
In the RTL code, at least two power aware coding features were used in this module. The
first one was the controlled counters instead of the free running counters. The controlled counters
help in reducing switching power and so minimizes the dynamic power. The second feature used
was the reduction of the extra flip-flops. As discussed earlier that AEGIS128L has support of 264
bits messages but we are limiting it to 219 because of practical usage purpose. Therefore, we did
not use the extra flip-flops and instead used a logic zero for them. Moreover, clock gating can be
applied to the control signals to save power but we have avoided it to escape timing issues as the
control signals are enabling multiplexers and power domains.
4.2.4 Datapath sub-module:
This module is the heart of the system. It has the datapath that includes the stateupdate
function. Moreover, it also has multiplexers that make sure that the right inputs and the right outputs
are used. The state-update block is followed by a 1024-bits register bank, which is clock gated and
saves lot of dynamic power. Figure 13 shows the block diagram of this module.
32
Figure 13 – Block diagram of datapath module
Note that the buffer at the output of the state update block is clock gated. The state update
function was discussed in Chapter 2; here we present its block diagram that shows its hardware
implementation. It uses eight AES round function blocks. It introduces randomness in the messages
and just needs one cycle per message block for encryption. This feature makes AEGIS128L
standout from other encryption algorithms.
33
Figure 14 – Block diagram of state update function
34
4.3 FIFO RAM Module:
This module is to model a simple 16 by 257 bit RAM. The replacement policy on the RAM
is FIFO. As mentioned earlier the encryption module needs Key and Nonce inputs for the first ten
cycles, but the system input only delivers it for the first two cycles. Therefore, we store both of
them in the RAM to meet encryption module’s requirements. In addition, when encryption module
is busy with initialization stage, in those ten cycles, system continues to store the associated data
and the plaintext message in the RAM.
The rationale behind the size of the RAM is that for any length of the message, 16 should
be enough. This is because the incoming data can start replacing the previous FIFO entries on every
17th cycle without causing any data conflicts. For any new incoming message, the existing
message’s encryption has to be completed first. Therefore, the flow of the data is as following:
i. First intake of the data block, encryption starts in parallel.
ii. Data is outputted in parallel as well.
iii. New data can only be taken in when the last data has finished processing and has
generated authentication tag.
It should also be noted that when there is no write in process, the FIFO RAM is clock gated
saving lots of dynamic power. We can still read the RAM when it is in clock gate mode.
4.4 FIFO Controller Module:
This module is the main controller unit for the project. Conceptually, it has four main parts.
These parts are power-gating control unit, clock-gating control unit, FIFO control unit and
35
encryption control unit. The power-gating control unit is responsible for turning on or off the power
supply of the encryption block. The clock-gating control unit generates two clock-gating enable
signals which are used by FIFO and encryption blocks. The FIFO control unit controls the flow of
the FIFO RAM. It generates signals like write enable, read enable, read pointer, write pointer, full
and empty which help in the FIFO RAM operation. Encryption control unit is responsible for
managing the four stages of the encryption block. It provides four control signals to the encryption
block that identify what mode the encryption block should be running in. Figure 15 shows the block
level diagram for this module.
There is a finite state machine (FSM) in this module, which was coded in the gray code
style purposely. This style helps in saving dynamic power by offering lesser transitions for state
change. Figure 16 shows the FSM and its related code.
Figure 15 - FIFO controller
36
Figure 16 - Mealy Finite State Machine for FIFO controller
37
CHAPTER 5: AEGIS128L VERIFICATION
5.1 Overview:
SystemVerilog has many features that make verification phase of the project more efficient.
These features have been utilized by methodologies such as Verification Methodology Manual
(VMM), Open Verification Methodology (OVM) and Universal Verification Methodology
(UVM). These methodologies can handle the task of verification for huge designs very efficiently
and offer reusability scope.
For this project, we have used a layered test bench for the functional verification. VMM
follows the layered test bench architecture to take full advantage of the automation [12]. The
constructs used for the project include classes, functional coverage, thread and inter-process
communication.
We also use regular test benches to generate different scenarios for generating Switching
Activity Interchange Format (SAIF) file. This file was used to generate the accurate dynamic power
estimates. The process of generating SAIF file is discussed in detail in the Section 5.4.2
5.2 AEGIS128L Verification Framework:
AEGIS128L verification framework is based on a layered test bench model. This style of
modeling was derived from [13]. Figure 17 summarizes this framework. The thick arrows are used
as the mailbox representation in the figure.
38
Figure 17 – AEGIS128L verification framework, based on [13]
5.2.1 Connection with DUT:
The Layered test bench was connected with the DUT by using a top-level module. This
top-level module used SystemVerilog’s interface concept for connections. The clocking block was
also used to avoid timing issues by providing synchronization between DUT and test bench [13].
The code for interface can be found in Appendix A.
39
Figure 18 – Connection of test-bench with DUT, based on [13]
5.2.2 Inter-process Communication:
SystemVerilog has new constructs like fork_join_any and fork_join_none that can trigger
a parallel process. This feature was used in the environment class of the layered test bench to run
methods from different classes in parallel. We utilized the ‘Mailbox’ feature of SystemVerilog as
well. Mailbox is like a FIFO, which can source and sink data and helps in passing information
between two threads [13]. We also used the dynamic arrays for the communication between the
agent and the scoreboard. Dynamic arrays size is variable and thus it use gave us flexibility to
transfer data of varying length easily.
5.2.3 Program Block:
SystemVerilog introduces the program block to hold the test-bench and to reduce the race
conditions between the design under test (DUT) and the test-bench. This project used a program
block, which included all the classes, functional coverage and helped in providing race condition
free model for testing. Details of each class used in the program block are included in the Section
5.3.
40
5.2.4 Validation:
For the validation, two main tasks were completed. The first one was to generate the
expected results and the second one was to match the DUT outputs with the expected results at the
right timing. Generating expected results can either be accomplished by reading an external file
using the system tasks or by producing them in the scoreboard class. Former is done by using any
programming language like C++ or PERL and dumping the expected results in a file. Advantage
in that is that we will have a pure software design to match with the hardware design. On the other
side, extra steps of exporting the test vectors to some external file where C++/PERL can read them
and reading the expected results back in have to take place. This of course looks inefficient.
For this project, we generated the expected results in the scoreboard class and passed them
to the checker class. The checker class compared the actual results with the expected results and
flagged possible failures in the design.
5.2.5 Coverage:
Coverage is the extent to which something is tested. In the context of ASIC verification,
coverage can be of different types like code and functional. In this project, we use functional
coverage by using SystemVerilog constructs. Table 7 shows the commands used with their
description.
One aspect of the coverage can be to gauge how well the design is tested. Another aspect
is for power measurement. DUT should be exercised for enough time with different scenarios so
41
that we can capture an overall accurate switching activity. This helps in getting an accurate Front-
end power estimation.
Table 7 – Functional Coverage Commands, based on [16]
Commands Description
covergroup It is user defined construct that holds cover points
coverpoint It is use to represent a variable from the DUT
bins It associates a variable name and a count with a set of values or a
sequence of value transitions.
options Built-in feature that helps in defining the weightage of
covergroup
sample() It is a built-in method that helps in calculating coverage on the
fly
$get_coverage This built-in method calculates total coverage percentage
achieved
Figure 19 shows the results of the functional coverage. By running twenty messages of
65536 bytes each, we managed to have a functional coverage number of 100%. To get this result,
we ran urg –dir simv.vdb and firefox urgReport/grp0.html after the compile and simv commands.
42
Figure 19 – Functional Coverage results for AEGIS128L
5.3 Layered Testbench:
Figure 17 shows how the verification infrastructure is placed. It uses many ‘classes’ for
different sections of the framework. Let us explore these classes to understand their functionality.
5.3.1 Class Transactor:
This class is not shown in Figure 17 but was used for the transaction of data from one class
to another using mailbox. This class holds the random and regular variables that can be manipulated
to hold the test vectors.
5.3.2 Class Generator:
This class as the name suggests generated the test vectors using the transactor class. It
defined certain constraints and each variable from the transactor object was assigned appropriate
constraints. This way we achieved high coverage in short amount of time by modeling our
constraints smartly.
43
5.3.3 Class Agent:
The main goal of this class was to get data from the generator object and push it in the
dynamic array so that the scoreboard can read from it. It also passed it down to the driver class,
using mailbox. We used this dynamic array to measure the functional coverage as it held all the
input test vectors.
5.3.4 Class Driver:
Driver class extracted the data from the agent-to-driver mailbox and drived the DUT input
ports. If the driver drives the synchronous signal at the active edge of the clock, the value propagates
immediately to the design [13]. If the test-bench drives the output just after the active edge, the
value is not seen in the design until the next active edge of the clock [13].
5.3.5 Class Scoreboard:
In this class, expected results were formulated by using the test vectors delivered by the
dynamic array. The scoreboard can also input the expected results from an external file.
5.3.6 Class Monitor:
To sample the outputs of the DUT and transfer it to checker class via mailbox, monitor
class was used.
44
5.3.7 Class Checker:
Checker got the expected results from the scoreboard and actual results from the DUT. It
then did comparison between the two and flagged the errors.
5.4 Regular Test Bench for Saif:
To generate SAIF files, we used regular test-benches as well. It was because we had to
run these simulations on gate level netlist and netlist loses some of the hierarchies because of the
boundary optimization during synthesis.
Different scenarios like sleep, normal and overdrive were modeled. This was done so that
the power compiler can generate accurate power report by using the generated SAIF files.
5.4.1 Scenarios:
Table 8 shows the scenarios used and their description. Note that from the SOC level, the
whole encryption block can be powered off saving lot of leakage power but that is outside of the
scope of the project.
5.4.2 SAIF File Generation:
There are two methods to generate a SAIF file. One is to convert vcd file to saif file by
using vcd2saif command. To generate vcd file, we need to place system task $vcdpluson in an
initial block and while compiling, use “-PP” field. VCS generates vpd format file, which we can
use as input to generate saif file as follows.
45
vcd2saif –input vcdpluson.vpd –o power.saif
Table 8 – Modes of operation
Mode Description
Sleep No message is being encrypted but SOC has kept ON the power to the
system.
Normal Message is getting encrypt at a scaled down frequency.
Overdrive Message is getting encrypt at maximum possible frequency.
The second method is by generating the SAIF file directly. In this case we don’t need to
generate vcd file and so we can comment out the $vcdpluson command. Table 9 shows the
commands used in the test-bench to generate the SAIF file. Please note that these commands should
be used in order.
It is important to note here that the time unit should be same in test-bench and SAIF file.
Since we used 10-9 as time-unit in the SAIF file generation command shown in Table 9, our test-
bench should also match this time unit. Otherwise, power estimation will not be accurate for the
dynamic power. This was achieved by using following command.
`timescale 1ns/1ns
46
Table 9 – SAIF generation commands [17]
Commands Description
$set_gate_level_monitoring("ON");
This method/command turns ON the
registering of all internal nets for simulation.
$set_toggle_region("test.top_rtl");
It specifies the toggle region. We have used the
instantiation of DUT in test-bench.
$toggle_start;
This command instructs simulator to start
capturing toggle activity.
$toggle_stop;
This command instructs simulator to stop
capturing toggle activity.
$toggle_report("power.saif",
1.0e-9, "test.dut");
This command dumps the switching activity of
nets and ports into a file with user given name.
Specifying the timescale/time-unit is important
and should match with the test-bench.
The third field is just describing the hierarchy
for switching activity. SAIF file should be
annotated with netlist on same hierarchy for
power estimation step.
5.4.3 Gate Level Simulation:
Gate-level simulation is needed when the RTL and the overall intent of the design is to be
matched. The gate-level netlist that is obtained through synthesis process is simulated and the
results are compared with the RTL simulation. Gate-level simulation can also be performed for
getting accurate switching activity to be used in power estimation. We ran gate-level simulations
for mainly the second reason.
47
The Synthesis tool optimizes the design and applies boundary optimization, which causes
netlist to lose its RTL hierarchies some times. In this case, if we apply the RTL simulation generated
SAIF file to the netlist, many signals will not be able to annotate and thus power estimation will
not be accurate. Instead, if we run gate-level simulation to generate SAIF file, annotation problem
is solved.
For RTL simulation, we usually use following command. However, this command is not
sufficient for the gate-level simulation. This is because netlist contains many gates that are not
present in the RTL. Instead, they are taken from the library file.
vcs –sverilog “testbench.sv”
Fortunately, VCS provides a switch with vcs command that can load the library file in .v
format. This makes the gate-level simulation possible and the command becomes as follows:-
vcs –sverilog “testbench.sv” –v lib.v
However, there is one more challenge, in addition to the functional information; gates in
library file have lot of information about timing. We can either model test-bench to accommodate
for these delays or remove timing specification from the library file. We opted the second method
by using PERL script to clear all the timing information. The PERL script is shared in Appendix
C. The result of the script is shown in Table 10 for one of the gates. The code shown in the table
was taken from the Synopsys 90nm digital standard cell library.
48
Table 10 – Removing timing information from gate-level cell, code taken from [14]
Pre script
`celldefine
`suppress_faults
`enable_portfaults
`ifdef functional
`timescale 1ns / 1ns
`delay_mode_distributed
`delay_mode_unit
`else
`timescale 1ps / 1ps
`delay_mode_path
`endif
module AND2X1_HVT (IN1,IN2,Q);
output Q;
input IN1,IN2;
and #1 (Q,IN2,IN1);
`ifdef functional
`else
specify
specparam
in1_lh_q_lh=52,in1_hl_q_hl=50,in2_lh_q_lh=59,in2_hl_q_hl=56;
( IN1 +=> Q) = (in1_lh_q_lh,in1_hl_q_hl);
( IN2 +=> Q) = (in2_lh_q_lh,in2_hl_q_hl);
endspecify
`endif
endmodule
`endcelldefine
`disable_portfaults
Post script
module AND2X1_HVT (IN1,IN2,Q);
output Q;
input IN1,IN2;
and (Q,IN2,IN1);
endmodule
49
5.5 Multi-Voltage Aware Simulation:
VCS runs simulations of RTL or netlist with an assumption that voltage is always on. In
order to get power gating effect on the SAIF file, we need to run simulations on the VCS with MV
Sim version of VCS tool. This tool version has the awareness of multi voltage and can turn off or
on certain power-domains on the direction of the power gating enable signals and thus helps in
providing accurate switching activity detail. The flow for VCS with MV Sim is called VCS Native
Low Power Flow (VCS NLP). Figure 20 shows the flow diagram. Unfortunately, we did not had
access to this version of the VCS and so were unable to test the power intent implementation.
Similarly, for the power gating we have used manual hacks to get the power consumption estimates,
as the SAIF file did not include power-gating effect.
Figure 20 – VCS NLP Flow [15]
50
CHAPTER 6: AEGIS128L SYNTHESIS
In the synthesis process, the RTL code is translated and optimized into a gate level netlist
using technology library cells. The gate-level output is called netlist. The optimized netlist is the
product that the Front-end delivers to the Back-end in the ASIC design flow. Synopsys Design
Compiler is the synthesis tool used for this project. Figure 21 shows what does it needs to generate
the gate-level netlist for the power aware design.
Figure 21 – Synthesis Process, based on [18]
51
6.1 Synthesis Script:
The synthesis script included the commands to be executed in sequential fashion. In the
beginning, we read the RTL files, then defined the constraints like input and output delays. Then
we provided the synthesis tool with the libraries to be used. For clock gating, clock gating aware
libraries were used and similarly for multi-threshold voltage cells. For clock-gating, we used the
command set_clock_gating_style. This command helps in defining the type of clock gating to be
used, minimum and maximum fan-out etc. To be power efficient, minimum number of flip-flops
in register was chosen to be three. Fanout of each clock gating enable signal can be set to infinity
in Front-end to get lower power and area. We chose fan-out of 64 to get realistic power estimation.
In addition, we used latch-based style for the clock gating as it helped in reducing glitches. Glitch
free design helps in saving dynamic power [6]. Then we loaded UPF script and started the compile
process.
“Compile” command is an older command with respect to the “compile_ultra” command.
We used the latter one. This command has boundary optimization ON by default, unlike “compile”
[18]. In addition, it has a switch “-gate_clock” to implement clock gating. If UPF script is read, it
is automatically incorporated in the design.
6.2 UPF Script:
In this script, we defined the power intent of the system. Firstly, total domains were created
and supply ports were connected via nets. Then we created power switches with power-gate enable
signals. Isolation and retention cells population command usually comes next but we did not use
them. Retention cells were not required because we did not need to retain any state when we
52
powered off the system. As far as isolation cells were concerned, we did not use extra logic of
isolation cells to save area and power. It is save to do so because the system was designed so that
there were multiplexers to follow the power gated domain and timing wise it was always made sure
that when power domain is turned off, the mux selected other input and not the powered off logic’s
output. Both synthesis and UPF scripts can be found in Appendix C.
6.3 Checks:
We can use two commands to check the design. One is the check_design command and
other is the check_mv_design command. Check_design checks the design itself and reports any
potential issues. Check_mv_design checks the multi-voltage power intent and reports any issue in
it.
6.4 Clock Rate:
Our design is capable of running on 20 ns clock time period. SOC can drive the encryption
block with slower frequency to save power or in overdrive mode with maximum frequency for
performance. Ideally, this design can deliver up to 1.6GB/s.
Maximum clock cycles per byte (cpb) = 1 / (data rate)(time period) = 1/32 = 0.03125 cpb
Clock cycle per byte (cpb) = 0.036 cpb for 4096-byte message
From 4096-byte data message’s cpb comparison, we conclude that our implementation is
more than 15 times faster than the AEGIS128L on Intel Sandy Bridge Core i5 processor [4].
53
CHAPTER 7: POWER ESTIMATION AND ANALYSIS
In this chapter, we will cover two main topics. One is the power measurement method used
for the project and other is the analysis of power results. Incremental improvement by applying
each power-saving method is also the part of this chapter.
7.1 Power Estimation:
In this section, we discuss how Synopsys Power Compiler measures power. In addition,
SAIF file structuring is discussed.
7.1.1 Power Types:
Power consumption in digital circuits mainly fall under two categories, Static and Dynamic
power. Dynamic power is further divided into Internal (short circuit) and Switching Power. Let us
understand what does each one mean.
Static Power is the power dissipated by the gates when they are not switching i.e. they are
inactive. It is mainly due to source-to-drain subthreshold leakage, which is caused by reduced
threshold voltages that prevent the gate from completely turning off [17].
Internal Power is any power dissipated within the boundary of a cell [17]. It is the power
dissipated during switching because of the charging and discharging of internal capacitances of the
cell. It also includes the power dissipated during momentary short-circuit between P and N
transistor [17].
54
Switching power is due to the charging and discharging of the load capacitances of the cell
[17]. Therefore, system should be modeled in a way that minimizes the transitions from zero to one
and vice versa, to save power.
7.1.2 Calculating Power:
In this section, we briefly explain how Power Compiler measures the power for the circuit.
Power compiler uses equations for each type of power and gives an estimate. Synopsys uses Non-
Linear Delay Model (NLDM) based model. Let us go through the equations for each power type.
For the leakage power, power compiler analysis calculates total leakage power of the
design by summing the leakage power of each library cell used in the system [17]. This is
summarized in the equation below:
Pleakage = ∑ 𝑷𝑐𝑒𝑙𝑙_𝑙𝑒𝑎𝑘𝑎𝑔𝑒_𝑘𝑐𝑒𝑙𝑙𝑠(𝑘)
For Internal power, the short circuit time, voltage used by the cell, current used by the cell
and the frequency of transitions are required. Power compiler calculates the internal power using
following equation:
Pinternal = E_{output pin} x PathWeight x Toggle_rate(transitions per second) [17]
55
‘E’ represents the internal energy for the output pin of cell as a function of input transitions, output
load and voltage.
“PathWeight” for input pins depends on input toggle rate, transition times and functionality of cell.
Toggle rate of output and input pins is also required [17]. For switching power, following equation
is used:
Psw = Vdd2 ∑i (Cload_i x TRi) [17]
where TRi represents toggle rate of net I, transitions per second. In addition, Vdd represents
supply voltage. Cload is the total capacitive load of net i [17].
Therefore, for dynamic power, we add the internal and the switching power. One thing to
note is that the clock rate, toggle rate and probability of the logic one are the main factors that are
under our control and are big contributors to dynamic power.
7.1.3 Report_power:
Report_power is the command that is used to calculate power on the current design by the
design compiler. If no switching activity is annotated, the Power Compiler uses following defaults
for the primary inputs [17]:
Probability = 0.1 (10% chance of signal being in one state)
Toggle_rate = 0.1 * fclk (signal switches once every 10th clock cycle)
56
It is obvious that the defaults that Power Compiler uses cannot represent the true dynamic
power. This is why, annotating the actual switching activity is very important. SAIF file gives the
actual toggle rate and the probability of one state of the signals.
Report_power_calculation command can be used by specifying a “net” on which power
calculation method is applied. This command generates a report showing how the power is
calculated for that net. Following equation gets the power calculation report for the port ‘clk’.
report_power_calculation clk > report_power_calculation
7.1.4 Manual Setting:
We can manually set the switching activity of the nets by using set_switching_activity
command as well. This of course can become a hectic, verbose and error prone calculation method.
We can also set a global new toggle rate and static probability by using the following two
commands:
Power_default_static_probablity
Power_default_toggle_rate
7.1.5 Using SAIF:
Figure 22 shows a snippet of the SAIF file. It shows a port signal and its statistics after
gate-level simulation. Using this file, we can calculate the true probability and toggle rate as
57
follows:
Probability of logic one = total time for one / (total time for one + total time for zero +
total time for x)
Toggle rate of signal = total toggle count / total simulation time
Figure 22 – SAIF file snippet
7.2 Power Analysis:
For power analysis, three scenarios were considered, namely normal, sleep and overdrive.
Normal scenario covers the continuous encryption of the messages, one after the other. Sleep
scenario covers the case where there is no message to be encrypted, but the system is ready.
Overdrive is same as normal but uses the maximum clock rate. The reason we do not have a mix
scenario is the unavailability of VCS+MV-Sim tool as explained in the Section 5.5.
58
For each step of the power improvement, we calculated power for the three scenarios
mentioned above. There were a total of four power improvement steps based on power aware RTL
code, the addition of clock gating, the addition of multi-threshold Vt cells, and the addition of
power gating. Table 11 shows the summary of power improvements. Detailed reports can be found
in Appendix D.
Table 11 – Power Improvements
Scenarios Power Aware
RTL
Clock
Gating
Multi
threshold Vt
cells
Power
gating
Normal
(clock T =
32ns)
12.98 mW 12.64 mW 12.18 mW 12.16 mW
Sleep 3.39 mW 3.07 mW 2.80 mW 1.1 mW
Overdrive
(clock T =
20ns)
18.72 mW 18.34 mW 17.78 mW 17.75 mW
If the clock rate is slower, the dynamic power decreases. Therefore, SOC has to carefully
designed in how it uses frequency scaling for the encryption block. In the Normal and Overdrive
modes, dynamic power combines with static power to give the total power consumption rate. For
the Sleep mode, only leakage power is accounted.
In the sleep mode, 68% of power is saved. In regular mode up to 6-7% power is saved. For
example in the normal mode, we save 820uW. In the over-drive mode, we save up to 970uW.
AEGIS128L was designed in a way that it has low execution time, completes the encryption
59
relatively quickly, and enters the sleep mode. In the sleep mode, SOC can also turn off the whole
power supply. Either way, power saving is significant.
Mobile devices operate on battery. Battery life can be summarized by following equation.
Battery life = battery capacity / power consumption
From this, we can calculate AEGIS128L impact on mobile phone battery by following
simple analysis. Let us assume a regular battery with capacity of 1500mAh and 3.7 voltage rating.
Following equation will now give the impact of AEGIS128L on battery capacity:
Amps-Hours = (Watts/Volts) x Hours
For 20 hours of battery life, we get the power consumption as follows:
Power (Watts) = (Amps-Hour x Voltage) / Time(hrs)
Power(Watts) = (1500mAh x 3.7 v)/20 hrs
Power = 277.5 mW
AEGIS128L implementation used 12mW in the active mode; therefore, it is using 4.3%
of battery in the active mode. In the sleep mode, it uses 0.4%. [19] shows how the smartphone
battery is usually used. It depends a lot, on what application is being run. In addition, units other
than SOC consume smartphone battery as well.
60
CHAPTER 8: CONCLUSION
In this project, a power-efficient model of AEGIS128L encryption algorithm was
developed in SystemVerilog HDL. Validation was completed in SystemVerilog usin VCS
simulator whereas Synthesis was done on Synopsys Design Compiler tool using a 90nm technology
library. For power estimations, Power Compiler tool was used. Synopsys Inc. provided all the
EDA tool used in this work.
The proposed low power encryption solution was accomplished by using a parallel
architecture and an algorithm that supports it. The designed encryption accelerator can be used
along with the SOC to speed up the encryption process and consume low power.
The model was thoroughly validated in SystemVerilog. A program based layered test
bench was used for this purpose. Functional coverage was used to access the validation. Test
vectors were generated to check the design thoroughly and expected results were created by the
Scoreboard section of the verification.
The power aware gate level netlist can run at 50 MHz frequency. Since 256-bits message
block can be processed in one clock cycle, this gives us maximum of 1.6GB/s data rate. We expect
the design to run faster using latest technology libraries.
Cycles per byte (cpb) is a clock rate independent performance metric. From 4096-byte data
message’s cpb comparison, we conclude that our implementation is more than 15 times faster than
the AEGIS128L on Intel Sandy Bridge Core i5 processor [4].
61
Power aware implementation helped in saving approximately 7% of the dynamic power
and 68% in idle mode. The power aware methods did increase the area by 2% but it reduced the
dynamic power amount by 7%. This saving was achieved when the device was in full
operation. A more significant power saving was obtained (37.5%) when the device ran 50% in the
normal mode. These methods included the power aware RTL coding, clock gating, power gating,
use of multi-threshold Vt cells and frequency scaling.
For future, multi VDD power domains can be created which will result in saving dynamic
power as well. A better and faster technology library can be used to see the effect of it on power
and speed. Clock rate has the scope for improvement. By adding one or two pipeline stages in the
timing critical datapath, we can speed up the encryption process. In addition, protection from side
band attacks can also be incorporated in the hardware implementation.
62
APPENDIX A: AEGIS128L HARDWARE MODEL SOURCE FILES
pckg.sv
`ifndef DEFS_DONE
`define DEFS_DONE
package pckg;
typedef struct packed { logic [127:0] bus0;
logic [127:0] bus1;
logic [127:0] bus2;
logic [127:0] bus3;
logic [127:0] bus4;
logic [127:0] bus5;
logic [127:0] bus6;
logic [127:0] bus7;
}bus1024;
typedef struct packed { logic [127:0] MS128; //most significant 128 bits
logic [127:0] LS128; //least significant 128 bits
}store256;
typedef struct packed { logic [15:0] word0;
logic [15:0] word1;
logic [15:0] word2;
logic [15:0] word3;
logic [15:0] word4;
logic [15:0] word5;
logic [15:0] word6;
logic [15:0] word7;
}bus128;
//following functions taken from [11]
function automatic [7:0] xtime (input [7:0] b);
return {b[6:0],1'b0}^(8'h1b&{8{b[7]}});
endfunction
function automatic [31:0] mix_col (input [7:0] s0,s1,s2,s3);
mix_col={xtime(s0)^xtime(s1)^s1^s2^s3,s0^xtime(s1)^xtime(s2)^s2^s3,
s0^s1^xtime(s2)^xtime(s3)^s3,xtime(s0)^s0^s1^s2^xtime(s3)};
endfunction
function automatic [7:0] sbox(input [7:0] a);
case (a)
8'h00: return 8'h63;
63
8'h01: return 8'h7c;
8'h02: return 8'h77;
8'h03: return 8'h7b;
8'h04: return 8'hf2;
8'h05: return 8'h6b;
8'h06: return 8'h6f;
8'h07: return 8'hc5;
8'h08: return 8'h30;
8'h09: return 8'h01;
8'h0a: return 8'h67;
8'h0b: return 8'h2b;
8'h0c: return 8'hfe;
8'h0d: return 8'hd7;
8'h0e: return 8'hab;
8'h0f: return 8'h76;
8'h10: return 8'hca;
8'h11: return 8'h82;
8'h12: return 8'hc9;
8'h13: return 8'h7d;
8'h14: return 8'hfa;
8'h15: return 8'h59;
8'h16: return 8'h47;
8'h17: return 8'hf0;
8'h18: return 8'had;
8'h19: return 8'hd4;
8'h1a: return 8'ha2;
8'h1b: return 8'haf;
8'h1c: return 8'h9c;
8'h1d: return 8'ha4;
8'h1e: return 8'h72;
8'h1f: return 8'hc0;
8'h20: return 8'hb7;
8'h21: return 8'hfd;
8'h22: return 8'h93;
8'h23: return 8'h26;
8'h24: return 8'h36;
8'h25: return 8'h3f;
8'h26: return 8'hf7;
8'h27: return 8'hcc;
8'h28: return 8'h34;
8'h29: return 8'ha5;
8'h2a: return 8'he5;
8'h2b: return 8'hf1;
8'h2c: return 8'h71;
8'h2d: return 8'hd8;
64
8'h2e: return 8'h31;
8'h2f: return 8'h15;
8'h30: return 8'h04;
8'h31: return 8'hc7;
8'h32: return 8'h23;
8'h33: return 8'hc3;
8'h34: return 8'h18;
8'h35: return 8'h96;
8'h36: return 8'h05;
8'h37: return 8'h9a;
8'h38: return 8'h07;
8'h39: return 8'h12;
8'h3a: return 8'h80;
8'h3b: return 8'he2;
8'h3c: return 8'heb;
8'h3d: return 8'h27;
8'h3e: return 8'hb2;
8'h3f: return 8'h75;
8'h40: return 8'h09;
8'h41: return 8'h83;
8'h42: return 8'h2c;
8'h43: return 8'h1a;
8'h44: return 8'h1b;
8'h45: return 8'h6e;
8'h46: return 8'h5a;
8'h47: return 8'ha0;
8'h48: return 8'h52;
8'h49: return 8'h3b;
8'h4a: return 8'hd6;
8'h4b: return 8'hb3;
8'h4c: return 8'h29;
8'h4d: return 8'he3;
8'h4e: return 8'h2f;
8'h4f: return 8'h84;
8'h50: return 8'h53;
8'h51: return 8'hd1;
8'h52: return 8'h00;
8'h53: return 8'hed;
8'h54: return 8'h20;
8'h55: return 8'hfc;
8'h56: return 8'hb1;
8'h57: return 8'h5b;
8'h58: return 8'h6a;
8'h59: return 8'hcb;
8'h5a: return 8'hbe;
65
8'h5b: return 8'h39;
8'h5c: return 8'h4a;
8'h5d: return 8'h4c;
8'h5e: return 8'h58;
8'h5f: return 8'hcf;
8'h60: return 8'hd0;
8'h61: return 8'hef;
8'h62: return 8'haa;
8'h63: return 8'hfb;
8'h64: return 8'h43;
8'h65: return 8'h4d;
8'h66: return 8'h33;
8'h67: return 8'h85;
8'h68: return 8'h45;
8'h69: return 8'hf9;
8'h6a: return 8'h02;
8'h6b: return 8'h7f;
8'h6c: return 8'h50;
8'h6d: return 8'h3c;
8'h6e: return 8'h9f;
8'h6f: return 8'ha8;
8'h70: return 8'h51;
8'h71: return 8'ha3;
8'h72: return 8'h40;
8'h73: return 8'h8f;
8'h74: return 8'h92;
8'h75: return 8'h9d;
8'h76: return 8'h38;
8'h77: return 8'hf5;
8'h78: return 8'hbc;
8'h79: return 8'hb6;
8'h7a: return 8'hda;
8'h7b: return 8'h21;
8'h7c: return 8'h10;
8'h7d: return 8'hff;
8'h7e: return 8'hf3;
8'h7f: return 8'hd2;
8'h80: return 8'hcd;
8'h81: return 8'h0c;
8'h82: return 8'h13;
8'h83: return 8'hec;
8'h84: return 8'h5f;
8'h85: return 8'h97;
8'h86: return 8'h44;
8'h87: return 8'h17;
66
8'h88: return 8'hc4;
8'h89: return 8'ha7;
8'h8a: return 8'h7e;
8'h8b: return 8'h3d;
8'h8c: return 8'h64;
8'h8d: return 8'h5d;
8'h8e: return 8'h19;
8'h8f: return 8'h73;
8'h90: return 8'h60;
8'h91: return 8'h81;
8'h92: return 8'h4f;
8'h93: return 8'hdc;
8'h94: return 8'h22;
8'h95: return 8'h2a;
8'h96: return 8'h90;
8'h97: return 8'h88;
8'h98: return 8'h46;
8'h99: return 8'hee;
8'h9a: return 8'hb8;
8'h9b: return 8'h14;
8'h9c: return 8'hde;
8'h9d: return 8'h5e;
8'h9e: return 8'h0b;
8'h9f: return 8'hdb;
8'ha0: return 8'he0;
8'ha1: return 8'h32;
8'ha2: return 8'h3a;
8'ha3: return 8'h0a;
8'ha4: return 8'h49;
8'ha5: return 8'h06;
8'ha6: return 8'h24;
8'ha7: return 8'h5c;
8'ha8: return 8'hc2;
8'ha9: return 8'hd3;
8'haa: return 8'hac;
8'hab: return 8'h62;
8'hac: return 8'h91;
8'had: return 8'h95;
8'hae: return 8'he4;
8'haf: return 8'h79;
8'hb0: return 8'he7;
8'hb1: return 8'hc8;
8'hb2: return 8'h37;
8'hb3: return 8'h6d;
8'hb4: return 8'h8d;
67
8'hb5: return 8'hd5;
8'hb6: return 8'h4e;
8'hb7: return 8'ha9;
8'hb8: return 8'h6c;
8'hb9: return 8'h56;
8'hba: return 8'hf4;
8'hbb: return 8'hea;
8'hbc: return 8'h65;
8'hbd: return 8'h7a;
8'hbe: return 8'hae;
8'hbf: return 8'h08;
8'hc0: return 8'hba;
8'hc1: return 8'h78;
8'hc2: return 8'h25;
8'hc3: return 8'h2e;
8'hc4: return 8'h1c;
8'hc5: return 8'ha6;
8'hc6: return 8'hb4;
8'hc7: return 8'hc6;
8'hc8: return 8'he8;
8'hc9: return 8'hdd;
8'hca: return 8'h74;
8'hcb: return 8'h1f;
8'hcc: return 8'h4b;
8'hcd: return 8'hbd;
8'hce: return 8'h8b;
8'hcf: return 8'h8a;
8'hd0: return 8'h70;
8'hd1: return 8'h3e;
8'hd2: return 8'hb5;
8'hd3: return 8'h66;
8'hd4: return 8'h48;
8'hd5: return 8'h03;
8'hd6: return 8'hf6;
8'hd7: return 8'h0e;
8'hd8: return 8'h61;
8'hd9: return 8'h35;
8'hda: return 8'h57;
8'hdb: return 8'hb9;
8'hdc: return 8'h86;
8'hdd: return 8'hc1;
8'hde: return 8'h1d;
8'hdf: return 8'h9e;
8'he0: return 8'he1;
8'he1: return 8'hf8;
8'he2: return 8'h98;
8'he3: return 8'h11;
68
8'he4: return 8'h69;
8'he5: return 8'hd9;
8'he6: return 8'h8e;
8'he7: return 8'h94;
8'he8: return 8'h9b;
8'he9: return 8'h1e;
8'hea: return 8'h87;
8'heb: return 8'he9;
8'hec: return 8'hce;
8'hed: return 8'h55;
8'hee: return 8'h28;
8'hef: return 8'hdf;
8'hf0: return 8'h8c;
8'hf1: return 8'ha1;
8'hf2: return 8'h89;
8'hf3: return 8'h0d;
8'hf4: return 8'hbf;
8'hf5: return 8'he6;
8'hf6: return 8'h42;
8'hf7: return 8'h68;
8'hf8: return 8'h41;
8'hf9: return 8'h99;
8'hfa: return 8'h2d;
8'hfb: return 8'h0f;
8'hfc: return 8'hb0;
8'hfd: return 8'h54;
8'hfe: return 8'hbb;
8'hff: return 8'h16;
endcase
endfunction
endpackage
import pckg::*;
`endif
Port.sv
`include "pckg.sv"
//top_rtl(rst,clk,start,data_in,c1,c2,tagout,done)
interface port(input bit clk,rst);
bus128 c2,c1;
bus256 data_in;
69
logic start,tagout,done; //ready
clocking ck@(posedge clk);
input c2,c1,tagout,done;
output data_in,start;
endclocking
modport top_rtl(input data_in,rst,start,clk,output c2,c1,tagout,done);
modport layrd_test(clocking ck);
endinterface
Top_rtl.sv
`include "fifo.sv"
`include "fifo_cntlr.sv"
`include "encryption.sv"
module top_rtl(rst,clk,start,data_in,c1,c2,tagout,done,ready);
input rst,clk;
input start;
input [256:0] data_in;
wire [256:0] data_out;
wire EMPTY,FULL;
wire [3:0] rd_ptr,wr_ptr;
output [127:0] c1,c2;
output tagout,done,ready;
wire FINon;
wire poff_rtl,poff_init;
wire start_in,ADon,MSGon,FINon_in,rd_en,wr_en,done_clk_gate;
//assign start = data_in[257] ;
assign FINon=data_out[256];
fifo z1(rst,clk,FINon_in,wr_ptr,rd_ptr,wr_en,rd_en,data_in,data_out,done_clk_gate);
fifo_cntlr
z2(rst,clk,start,FINon,EMPTY,FULL,rd_en,wr_en,wr_ptr,rd_ptr,done_clk_gate,start_in,ADon,M
SGon,FINon_in,ready,poff_rtl);
encryption
z3(data_out[255:128],data_out[127:0],rst,clk,start_in,ADon,MSGon,FINon_in,poff_rtl,c2,c1,tago
ut,done,poff_init);
endmodule
70
fifo.sv
module fifo(rst,clk,FINon_in,wr_ptr,rd_ptr,wr_en,rd_en,data_in,data_out,done_clk_gate);
input rst,clk,wr_en,rd_en,FINon_in;
input [3:0] wr_ptr,rd_ptr;
input [256:0] data_in; //define as structure
output [256:0] data_out; //define as structure
input done_clk_gate;
logic [256:0] data_out;
logic [256:0] ram[0:15];
always_ff@(posedge clk or negedge rst)
begin
if(!rst)
begin
ram <= '{default:257'd0};
end
else if(done_clk_gate)
begin
if(wr_en)
begin
ram[wr_ptr] <= data_in;
end
end
end
always_comb
begin
if(rd_en) //when empty,rd_en=0
begin
data_out = ram[rd_ptr]; //might add isolation cells if power gated
end
else
begin
data_out = 257'd0; //might add isolation cells if power gated
end
end
endmodule
71
fifo_cntlr.sv
//`include "fifo.sv"
module
fifo_cntlr(rst,clk,start,FINon,EMPTY,FULL,rd_en,wr_en,wr_ptr,rd_ptr,done_clk_gate,start_in,A
Don,MSGon,FINon_in,ready,poff_rtl);
input rst,clk,start,FINon;
output EMPTY,FULL,rd_en,wr_en,done_clk_gate;
output logic start_in,ADon,MSGon,FINon_in,ready,poff_rtl;
output [3:0] wr_ptr,rd_ptr;
logic [3:0] wr_ptr,rd_ptr;
logic signed [4:0]diff;
logic clear;
//roll_over define
logic roll_over,n_roll_over;
assign diff = wr_ptr - rd_ptr ;
assign done_clk_gate = start;
assign poff_rtl = ready & (!start);
always_ff@(posedge clk or negedge rst)
begin
if(!rst)
roll_over <= 0;
else
roll_over <= n_roll_over;
end
always_comb
begin
//if((wr_ptr - rd_ptr < 0) && (roll_over == 0))
if((diff < 0) && (roll_over == 0))
begin
n_roll_over = 1;
end
//else if((wr_ptr - rd_ptr > 0) && (roll_over == 1))
else if((diff >= 0) && (roll_over == 1))
begin
n_roll_over = 0;
end
else
begin
n_roll_over = roll_over;
end
end
72
//EMPTY and FULL asynchr defines
logic EMPTY,FULL;
always_comb
begin
if(roll_over == 0)
begin
if(wr_ptr == rd_ptr)
EMPTY = 1;
else
EMPTY = 0;
if((wr_ptr - rd_ptr) == 15)
FULL = 1;
else
FULL = 0;
end
//roll_over==1
else
begin
EMPTY = 0;
if(wr_ptr == rd_ptr)
FULL = 1;
else
FULL = 0;
end
end
//write and read ennable defines
logic wr_en,n_wr_en,rd_en,n_rd_en;
always_ff@(posedge clk or negedge rst)
begin
if(!rst)
begin
wr_en <= 0;
rd_en <= 0;
end
else
begin
wr_en <= n_wr_en;
rd_en <= n_rd_en;
end
end
always_comb
if(start == 0)
begin
n_wr_en = 0;
73
if(FULL == 1)
n_rd_en = 1;
else if(EMPTY == 1)
n_rd_en = 0;
else
n_rd_en = 1;
end
else
begin
if(FULL == 1)
begin
n_rd_en = 1;
n_wr_en = 0;
end
else if (EMPTY == 1)
begin
n_rd_en = 0;
n_wr_en = 1;
end
else
begin
n_rd_en = 1;
n_wr_en = 1;
end
end
//write pointer
always_ff@(posedge clk or negedge rst)
begin
if(!rst)
begin
wr_ptr <= 0;
end
else if(wr_en)
begin
wr_ptr <= wr_ptr + 1;
end
else if(clear == 1)
wr_ptr <= 0;
else
wr_ptr <= wr_ptr;
end
//read pointer
logic [1:0] state,n_state;
74
logic [3:0] cnt,n_cnt;
logic [3:0] n_rd_ptr;
logic flag_st,n_flag_st;
always_ff@(posedge clk,negedge rst)
begin
if(!rst)
begin
cnt <= 0;
state <= 0;
rd_ptr <= 0;
flag_st <= 0;
end
else
begin
cnt <= n_cnt;
state <= n_state;
rd_ptr <= n_rd_ptr;
flag_st <= n_flag_st;
end
end
always_comb
begin
n_flag_st = 0;
clear = 0;
ready = 0;
case(state)
2'b00:
begin
if(flag_st == 1)
begin
{start_in,ADon,MSGon,FINon_in} = 4'b0000;
n_cnt = cnt;
n_state = state;
n_rd_ptr = rd_ptr;
//done_clk_gate = 0;
end
else if(cnt <= 8 && rd_en == 1)
begin
{start_in,ADon,MSGon,FINon_in} = 4'b1000;
n_cnt = cnt + 1;
n_state = state;
n_rd_ptr = rd_ptr;
//if(cnt <= 3)
// done_clk_gate = 0;
//else
75
// done_clk_gate = 1;
end
else if(rd_en == 1)
begin
{start_in,ADon,MSGon,FINon_in} = 4'b1000;
n_cnt = 0;
n_state = state + 1;
n_rd_ptr = rd_ptr + 1;
//done_clk_gate = 1;
end
else
begin
{start_in,ADon,MSGon,FINon_in} = 4'b0000;
clear = 1;
n_cnt = 0;
n_state = state;
//n_rd_ptr = rd_ptr;
n_rd_ptr = 0;
//done_clk_gate = 0;
ready = 1;
end
end
2'b01:
begin
{start_in,ADon,MSGon,FINon_in} = 4'b1100;
//done_clk_gate = 1;
if(cnt <= 0 && rd_en == 1)
begin
n_cnt = cnt + 1;
n_state = state;
n_rd_ptr = rd_ptr + 1;
end
else if(rd_en == 1)
begin
n_cnt = 0;
n_state = state + 2;
n_rd_ptr = rd_ptr + 1;
end
else
begin
n_cnt = cnt;
n_state = state;
n_rd_ptr = rd_ptr;
end
end
76
2'b11:
begin
//done_clk_gate = 1;
n_cnt = 0;
if((FINon==0) && (rd_en == 1))
begin
{start_in,ADon,MSGon,FINon_in} = 4'b1110;
n_state = state;
n_rd_ptr = rd_ptr + 1;
end
else if(rd_en == 1)
begin
{start_in,ADon,MSGon,FINon_in} = 4'b1111;
n_state = state - 1;
n_rd_ptr = rd_ptr + 1;
end
else
begin
{start_in,ADon,MSGon,FINon_in} = 4'b1110;
n_state = state;
n_rd_ptr = rd_ptr;
end
end
2'b10:
begin
//done_clk_gate = 1;
{start_in,ADon,MSGon,FINon_in} = 4'b1111;
if(cnt <= 5 && rd_en == 1)
begin
n_cnt = cnt + 1;
n_state = state;
n_rd_ptr = rd_ptr;
end
else if(rd_en == 1)
begin
n_cnt = 0;
n_state = state + 2;
// if(EMPTY == 1)
n_rd_ptr = rd_ptr + 1;
// else
// n_rd_ptr = rd_ptr;
end
else
begin
n_cnt = cnt;
n_state = state;
77
n_rd_ptr = rd_ptr;
end
if(cnt)
n_flag_st = 1;
else
n_flag_st = 0;
end
endcase
end
endmodule
encryption.sv
`include "pckg.sv"
//`include "port.sv"
//`include "top_rtl.sv"
//`include "layrd_test.sv"
`include "controller.sv"
`include "initialization.sv"
`include "aes128.sv"
`include "stateupdate.sv"
`include "datapath.sv"
module encryption(
input [127:0] inbus1, inbus2,
input rst,clk,start,ADon,MSGon,FINon,poff_rtl,
output [127:0] c2,c1 , //c2 for crypted/0 and c1 for crypted/T
output tagout,done,poff_init
);
//module top_rtl(port prt);
wire [1023:0] state_in;
wire FINon_tmp;
wire [127:0] ADMSGlen_tmp;
wire start_tmp,start_tmp2;
controller y0( .rst(rst), .clk(clk), .start(start), .ADon(ADon), .MSGon(MSGon), .FINon(FINon),
.start2(start_tmp), .start3(start_tmp2),.tagout(tagout), .done(done), .FINon2(FINon_tmp),
.ADMSG_len(ADMSGlen_tmp) );
initialization y1(.rst(rst), .clk(clk), .Key(inbus1), .Nonce(inbus2), .state_init(state_in) );
78
datapath y2( .rst(rst), .clk(clk), .FINon(FINon), .FINon2(FINon_tmp), .start2(start_tmp),
.tagout(tagout), .Msg1(inbus1), .Msg2(inbus2), .ADMSG_len(ADMSGlen_tmp),
.state_init(state_in), .cg_start(start), .enc_msg1(c1), .enc_msg2(c2) );
assign poff_init = poff_rtl | start_tmp2 ;
//assign FINon2 = FINon_tmp;
//assign prt.tagout = tagout_tmp;
//assign prt.done = done_tmp;
Endmodule
initialization.sv
module initialization(
input rst,clk,
input [127:0] Key, Nonce,//bus128_in1 for top module
output bus1024 state_init
);
//store256 cnst;
parameter cnst_MS128 =
128'h00_01_01_02_03_05_08_0d_15_22_37_59_90_e9_79_62;
parameter cnst_LS128 = 128'hdb_3d_18_55_6d_c2_2f_f1_20_11_31_42_73_b5_28_dd;
always_comb
begin
state_init.bus0 = Key^Nonce;
state_init.bus1 = cnst_LS128;
state_init.bus2 = cnst_MS128;
state_init.bus3 = cnst_LS128;
state_init.bus4 = Key^Nonce;
state_init.bus5 = Key^cnst_MS128;
state_init.bus6 = Key^cnst_LS128;
state_init.bus7 = Key^cnst_MS128;
end
endmodule
controller.sv
module controller(
input rst,clk,start,ADon,MSGon,FINon,
output logic start2,start3,tagout,done,FINon2,
output logic [127:0] ADMSG_len
);
79
logic n_start2,n_start3,n_FINon2;
logic [63:0] adlen,n_adlen,msglen,n_msglen;
//reusing unused bits of adlen to avoid extra flops, msglen is also 64 bits
logic [2:0] counter,n_counter; // to count 7 cycles to set 'done' on receiving FINon on signal
assign ADMSG_len = {adlen,msglen}; //concatenating adlen and msglen to give 128 bits
assign tagout = (counter == 5 | counter == 6 ); //for output mux select
assign done = (counter == 6); //for completion signal
assign adlen[63:2] = 0;
assign adlen[0] = 0;
assign msglen[63:11] = 0;
//sequential block
always_ff@(posedge clk,negedge rst)
begin
if(!rst)
begin
start2 <= 0;
start3 <= 0;
FINon2 <= 0;
adlen[1] <= 0;
msglen[10:0] <= 0;
counter <= 0;
end
else
begin
start2 <= n_start2;
start3 <= n_start3;
//start2 <= start;
//start3 <= start2;
//FINon2 <= FINon;
FINon2 <= n_FINon2;
adlen[1] <= n_adlen[1];
msglen[10:0] <= n_msglen[10:0];
counter <= n_counter;
end
end
//combinatorial block
always_comb
begin
if(!start)
begin
80
n_start2 = 0;
n_start3 = 0;
n_FINon2 = 0;
n_adlen[1] = 0;
n_msglen[10:0] = 0;
n_counter = 0;
end
else
begin
n_start2 = start;
n_start3 = start2;
n_FINon2 = FINon;
n_adlen[1] = (adlen[1] | ADon) && start;
if(MSGon & !FINon)
n_msglen[10:0] = msglen[10:0] + 1;
else
n_msglen[10:0] = msglen[10:0];
if(FINon)
n_counter = counter+1;
else
n_counter = counter;
end
end
endmodule
datapath.sv
module datapath(
input rst,clk,FINon,FINon2,start2,tagout,
input [127:0] Msg1,Msg2,ADMSG_len,
input bus1024 state_init,
input cg_start,
output logic [127:0] enc_msg1,enc_msg2
);
logic [127:0] tmp,n_tmp,A,B;
bus1024 state,n_state,state_tmp;
//assign tmp = state.bus2 ^ ADMSG_len;
//comb logic/xor
stateupdate ins(state_tmp,A,B,n_state);
81
always_ff@(posedge clk, negedge rst)
begin
if(!rst)
state <= 0;
else //if(cg_start)
state <= n_state;
end
always_ff@(posedge clk, negedge rst)
begin
if(!rst)
tmp <= 0;
else
tmp <= n_tmp;
end
always_comb
begin
if(!FINon2)
n_tmp = state.bus2.ADMSG_len;
else
n_tmp = tmp;
//mux1
if(FINon2)
begin
A = tmp;
B = tmp;
end
else
begin
A = Msg1;
B = Msg2;
end
//mux2
if(start2 & cg_start)
begin
state_tmp = state;
end
else
begin
state_tmp = state_init;
end
//output muxes
if(tagout)
begin
enc_msg2 = 0;
82
enc_msg1 = state.bus0 ^ state.bus1 ^ state.bus2 ^ state.bus3 ^ state.bus4 ^
state.bus5 ^ state.bus6;
end
else
begin
enc_msg2 = Msg2 ^ state.bus1 ^ state.bus6 ^ (state.bus2 & state.bus3);
enc_msg1 = Msg1 ^ state.bus2 ^ state.bus5 ^ (state.bus6 & state.bus7);
end
end
endmodule
stateupdate.sv
module stateupdate
(
input bus1024 S,
input [127:0] A,B,
output bus1024 S_updted
);
logic [127:0] temp1,temp2;
assign temp1 = S.bus0 ^ A;
assign temp2 = S.bus4 ^ B;
aes128 x0(S.bus7,temp1,S_updted.bus0);
aes128 x1(S.bus0,S.bus1,S_updted.bus1);
aes128 x2(S.bus1,S.bus2,S_updted.bus2);
aes128 x3(S.bus2,S.bus3,S_updted.bus3);
aes128 x4(S.bus3,temp2,S_updted.bus4);
aes128 x5(S.bus4,S.bus5,S_updted.bus5);
aes128 x6(S.bus5,S.bus6,S_updted.bus6);
aes128 x7(S.bus6,S.bus7,S_updted.bus7);
endmodule
aes128.sv //[11]
module aes128(
input [127:0] A,B,
output [127:0] C
);
wire [7:0] A_matrix[0:3][0:3];
wire [7:0] A_sub[0:3][0:3];
wire [7:0] A_srow[0:3][0:3];
wire [7:0] A_mcol[0:3][0:3];
83
wire [127:0] A_out;
assign A_matrix[0][0] = A[127:120];
assign A_matrix[1][0] = A[119:112];
assign A_matrix[2][0] = A[111:104];
assign A_matrix[3][0] = A[103:96];
assign A_matrix[0][1] = A[95:88];
assign A_matrix[1][1] = A[87:80];
assign A_matrix[2][1] = A[79:72];
assign A_matrix[3][1] = A[71:64];
assign A_matrix[0][2] = A[63:56];
assign A_matrix[1][2] = A[55:48];
assign A_matrix[2][2] = A[47:40];
assign A_matrix[3][2] = A[39:32];
assign A_matrix[0][3] = A[31:24];
assign A_matrix[1][3] = A[23:16];
assign A_matrix[2][3] = A[15:8];
assign A_matrix[3][3] = A[7:0];
assign A_sub[0][0] = sbox(A_matrix[0][0]);
assign A_sub[0][1] = sbox(A_matrix[0][1]);
assign A_sub[0][2] = sbox(A_matrix[0][2]);
assign A_sub[0][3] = sbox(A_matrix[0][3]);
assign A_sub[1][0] = sbox(A_matrix[1][0]);
assign A_sub[1][1] = sbox(A_matrix[1][1]);
assign A_sub[1][2] = sbox(A_matrix[1][2]);
assign A_sub[1][3] = sbox(A_matrix[1][3]);
assign A_sub[2][0] = sbox(A_matrix[2][0]);
assign A_sub[2][1] = sbox(A_matrix[2][1]);
assign A_sub[2][2] = sbox(A_matrix[2][2]);
assign A_sub[2][3] = sbox(A_matrix[2][3]);
assign A_sub[3][0] = sbox(A_matrix[3][0]);
assign A_sub[3][1] = sbox(A_matrix[3][1]);
assign A_sub[3][2] = sbox(A_matrix[3][2]);
assign A_sub[3][3] = sbox(A_matrix[3][3]);
assign A_srow[0][0] = A_sub[0][0];
assign A_srow[0][1] = A_sub[0][1];
assign A_srow[0][2] = A_sub[0][2];
assign A_srow[0][3] = A_sub[0][3];
assign A_srow[1][0] = A_sub[1][1];
assign A_srow[1][1] = A_sub[1][2];
assign A_srow[1][2] = A_sub[1][3];
assign A_srow[1][3] = A_sub[1][0];
assign A_srow[2][0] = A_sub[2][2];
84
assign A_srow[2][1] = A_sub[2][3];
assign A_srow[2][2] = A_sub[2][0];
assign A_srow[2][3] = A_sub[2][1];
assign A_srow[3][0] = A_sub[3][3];
assign A_srow[3][1] = A_sub[3][0];
assign A_srow[3][2] = A_sub[3][1];
assign A_srow[3][3] = A_sub[3][2];
assign {A_mcol[0][0],A_mcol[1][0],A_mcol[2][0],A_mcol[3][0]} =
mix_col(A_srow[0][0],A_srow[1][0],A_srow[2][0],A_srow[3][0]);
assign {A_mcol[0][1],A_mcol[1][1],A_mcol[2][1],A_mcol[3][1]} =
mix_col(A_srow[0][1],A_srow[1][1],A_srow[2][1],A_srow[3][1]);
assign {A_mcol[0][2],A_mcol[1][2],A_mcol[2][2],A_mcol[3][2]} =
mix_col(A_srow[0][2],A_srow[1][2],A_srow[2][2],A_srow[3][2]);
assign {A_mcol[0][3],A_mcol[1][3],A_mcol[2][3],A_mcol[3][3]} =
mix_col(A_srow[0][3],A_srow[1][3],A_srow[2][3],A_srow[3][3]);
assign A_out = {A_mcol[0][0],A_mcol[1][0],A_mcol[2][0],A_mcol[3][0],
A_mcol[0][1],A_mcol[1][1],A_mcol[2][1],A_mcol[3][1],
A_mcol[0][2],A_mcol[1][2],A_mcol[2][2],A_mcol[3][2],
A_mcol[0][3],A_mcol[1][3],A_mcol[2][3],A_mcol[3][3]} ;
assign C = A_out ^ B;
endmodule
85
APPENDIX B: AEGIS128L VERIFICATION ENVIRONMENT
Layered Testbench:
program automatic layrd_test(port prt);
//initial $monitor("state=%h\n",rtl_inst.y2.state);
class environment;
class transaction;
logic tagout,done,start;
bus128 c1,c2;
rand logic[255:0] gen_data_in;
logic FINon;
constraint rndm
{
gen_data_in[255:128] > 0;
gen_data_in[127:0] > 0;
}
constraint assoc_data
{
gen_data_in[255:128] > 0;
gen_data_in[127:0] inside {0};
}
constraint ending
{
gen_data_in[255:128] inside {0};
gen_data_in[127:0] inside {0};
}
endclass:transaction
class generator;
transaction t;
mailbox #(transaction) gen2agt;
//static bus128 nonce=0;
bus128 nonce=1;
function new(input mailbox #(transaction) gen2agt);
this.gen2agt = gen2agt;
//nonce = nonce+1;
endfunction:new
task run(input int count,msgs);
//static bus128 nonce;
//nonce=0;
86
bus128 x,y;
repeat(msgs)
begin
t=new();
t.constraint_mode(0);
t.rndm.constraint_mode(1);
assert(t.randomize());
t.start = 1;
t.FINon=0;
t.gen_data_in[127:0]=nonce;
nonce=nonce+1;
repeat(2)
//repeat(8)
begin
gen2agt.put(t);
end
t=new();
t.constraint_mode(0);
t.rndm.constraint_mode(1);
assert(t.randomize());
t.start = 1;
t.FINon=0;
gen2agt.put(t);
t=new();
t.constraint_mode(0);
t.assoc_data.constraint_mode(1);
assert(t.randomize());
t.start = 1;
t.FINon=0;
gen2agt.put(t);
repeat(count)
begin
t=new();
t.constraint_mode(0);
t.rndm.constraint_mode(1);
assert(t.randomize());
87
t.start = 1;
t.FINon=0;
gen2agt.put(t);
end
t=new();
t.constraint_mode(0);
t.ending.constraint_mode(1);
assert(t.randomize());
t.start = 1;
t.FINon=1;
gen2agt.put(t);
repeat(21)
begin
t=new();
t.constraint_mode(0);
t.rndm.constraint_mode(1);
assert(t.randomize());
t.start = 0;
t.FINon=0;
t.gen_data_in=0;
gen2agt.put(t);
end
end
endtask:run
endclass:generator
class agent;
bus128 inbus1_scr[*],inbus2_scr[*];
logic start_scr[*],FINon_scr[*];
mailbox #(transaction) gen2agt,agt2drv;
transaction t;
function new(input mailbox # (transaction) gen2agt,agt2drv);
// function new(input mailbox # (transaction) gen2agt);
this.gen2agt = gen2agt;
this.agt2drv = agt2drv;
endfunction:new
88
task run(input int count,msgs,output bus128 inbus1_scr[*],inbus2_scr[*],output
logic start_scr[*],FINon_scr[*]);
int i,fg=0;
i = (count + 26)*msgs;
for (int k=0; k < i; k=k+1)
begin
gen2agt.get(t);
//$display("inbus1=%d,,inbus2=%d,,start=%d,,FINon=%d",t.gen_data_in[255:128],t.gen_data_in
[127:0],t.start,t.FINon);
{FINon_scr[k],inbus1_scr[k],inbus2_scr[k]} =
{t.FINon,t.gen_data_in};
start_scr[k] = t.start;
//FINon[k] = t.FINon;
agt2drv.put(t);
end
endtask:run
endclass:agent
class driver;
mailbox #(transaction) agt2drv;
transaction t;
function new(input mailbox # (transaction) agt2drv);
this.agt2drv = agt2drv;
endfunction:new
task run (input int count,msgs);
int i;
i = (count + 26)*msgs;
//agt2drv.get(t);
repeat(i)
begin
agt2drv.get(t);
//$display("inbus1=%d,,inbus2=%d\n",t.gen_inbus1,t.gen_inbus2);
prt.ck.data_in <= {t.FINon,t.gen_data_in};
prt.ck.start <= t.start;
@prt.ck;
//$display($time,
"state_init=%h\nstate=%h\nin1=%d\nin2=%d",rtl_inst.z3.y2.state_init,rtl_inst.z3.y2.n_state,rtl_in
st.z3.y2.Msg1,rtl_inst.z3.y2.Msg2);
end
endtask:run
89
endclass:driver
class monitor;
transaction t;
mailbox #(transaction) mon2chk;
function new(ref mailbox #(transaction) mon2chk);
this.mon2chk = mon2chk;
endfunction:new
task run (input int count,msgs);
int i;
i = (count + 26)*msgs; //can be 20 or 21
t=new();
//$display("state=%h\nin1=%d\nin2=%d",rtl_inst.y2.state,rtl_inst.y2.Msg1,rtl_inst.y2.Ms
g2);
//@prt.ck i removed
repeat(i)
begin
//
$display("state=%h\nin1=%d\nin2=%d",rtl_inst.y2.state,rtl_inst.y2.Msg1,rtl_inst.y2.Msg
2);
@prt.ck
t.c2 <= prt.ck.c2;
t.c1 <= prt.ck.c1;
t.tagout <= prt.ck.tagout;
t.done <= prt.ck.done;
mon2chk.put(t);
//$display($time," start2=%d
Msg1=%h\nMsg2=%h\nst_n=%h\nst_init=%h\ncg_start=%d\nstate=%h\nstate_tmp=%h\nenc_m
1=%h\nenc_m2=%h\n\n", rtl_inst.z3.y2.start2,rtl_inst.z3.y2.Msg1,rtl_inst.z3.y2.Msg2,
rtl_inst.z3.y2.n_state,rtl_inst.z3.y2.state_init,
rtl_inst.z3.y2.cg_start,rtl_inst.z3.y2.state,rtl_inst.z3.y2.state_tmp,rtl_inst.z3.y2.enc_msg1,
rtl_inst.z3.y2.enc_msg2);
//$display($time,"
start=%d,ADon=%d,MSGon=%d,FINon=%d,tagout=%d,done=%d\ninbus1=%h\ninbus2=%h\nc
1=%h\nc2=%h\n\n",rtl_inst.z3.start,rtl_inst.z3.ADon,rtl_inst.z3.MSGon,rtl_inst.z3.FINon,rtl_inst
.z3.tagout,rtl_inst.z3.done,rtl_inst.z3.inbus1,rtl_inst.z3.inbus2,rtl_inst.z3.c1,rtl_inst.z3.c2);
//$display($time,"
start=%d,FINon=%d,tagout=%d,done=%d\ninbus1=%h\ninbus2=%h\nc1=%h\nc2=%h\n\nn_stat
90
e=%h\n\n",rtl_inst.z3.start,rtl_inst.z3.FINon,rtl_inst.z3.tagout,rtl_inst.z3.done,rtl_inst.z3.inbus1,rt
l_inst.z3.inbus2,rtl_inst.z3.c1,rtl_inst.z3.c2,rtl_inst.z3.y2.n_state);
//$display($time,"
start=%d,FINon=%d,tagout=%d,done=%d\nn_state=%h\n\nstate=%h\ninbus1=%h\ninbus2=%h\
nc1=%h\nc2=%h\n\n",rtl_inst.z2.start,rtl_inst.z1.infin,rtl_inst.z3.tagout,rtl_inst.z3.done,rtl_inst.z
3.y2.n_state,rtl_inst.z3.y2.state,rtl_inst.z3.inbus1,rtl_inst.z3.inbus2,rtl_inst.z3.c1,rtl_inst.z3.c2);
end
endtask:run
endclass:monitor
class scoreboard;
transaction t;
task run(input int count,msgs,input bus128 inbus1_scr[*],inbus2_scr[*],input
logic start[*],FINon_scr[*],output bus128 c1_scr[*],c2_scr[*],output logic
tagout_scr[*],done_scr[*],output bus1024 state1[*]);
int i,k,m=0,n,p,l,o,q,t,u,num,FINon2;
logic Fin2=0;
for(int d=0;d<=msgs-1;d++)
begin
// p=i/msgs;
for(k=1; k<=count+25; k++)
begin
if(k==1) num = (msgs*(count+26)) +26+1;
l = (k==2);
m = ((k>=3) && (k<=11));
n = ((k>=12) && (k<=13));
o = ((k>=14) && (k<14+count));
q = ((k>=14+count) && (k<14+count+7));
FINon2 = ((k>14+count) && (k<14+count+7));
t = ((k>=14+count+5) && (k<=14+count+6));
u = ((k == 14+count+6));
if(m) num=1+(count+26)*d;
else if(n|o) num=(k+(count+26)*d-10);
else if(q) num=4+count+(count+26)*d;
if(l) num=1+(count+26)*d;
i = k + (count + 26)*d;
// $display("i=%d count=%d msgs=%d k=%d m=%d
n=%d",i,count,msgs,k,m,n);
expected(FINon2,l,p,d,k,m,n,o,q,t,u,inbus1_scr[num],inbus2_scr[num],start_scr[i],FINon
_scr[i],Fin2,state1[i],c1_scr[i],c2_scr[i],tagout_scr[i],done_scr[i]);
Fin2 = FINon_scr[k];
91
//$display("c1=%h\nc2=%h\nin1=%h\nin2=%h\nstart=%d done=%d d=%d k=%d\nstate=%h\n---
--\n",c1_scr[i],c2_scr[i],inbus1_scr[num],inbus2_scr[num],start_scr[i],done_scr[i],d,k,state1[i]);
end
end
endtask
task static expected(input int FINon2,l,p,d,k,m,n,o,q,t,u,input bus128
inbus1,inbus2,input logic start,FINon,Fin2,output bus1024 state, output bus128 c1,c2,output logic
tagout,done);
bus1024 state1;
logic [10:0] msglenn = 0;
bus128 tmp;
static logic cnt;
logic FINon2;
store256
cnst=256'h00_01_01_02_03_05_08_0d_15_22_37_59_90_e9_79_62_db_3d_18_55_6d_c2_2f_f
1_20_11_31_42_73_b5_28_dd;
//$display("e_state=%h",state);
if(l==1)
begin
done=0;
state.bus0 = inbus1^inbus2;
state.bus1 = cnst.LS128;
state.bus2 = cnst.MS128;
state.bus3 = cnst.LS128;
state.bus4 = inbus1^inbus2;
state.bus5 = inbus1^cnst.MS128;
state.bus6 = inbus1^cnst.LS128;
state.bus7 = inbus1^cnst.MS128;
c2 = inbus2 ^ state.bus1 ^ state.bus6 ^ (state.bus2 & state.bus3);
c1 = inbus1 ^ state.bus2 ^ state.bus5 ^ (state.bus6 & state.bus7);
stateupdate(state,inbus1,inbus2,state1);
state = state1;
end
else if(m==1 | n==1)
begin
c2 = inbus2 ^ state.bus1 ^ state.bus6 ^ (state.bus2 & state.bus3);
92
c1 = inbus1 ^ state.bus2 ^ state.bus5 ^ (state.bus6 & state.bus7);
stateupdate(state,inbus1,inbus2,state1);
state = state1;
done=0;
end
else if(o==1)
begin
c2 = inbus2 ^ state.bus1 ^ state.bus6 ^ (state.bus2 & state.bus3);
c1 = inbus1 ^ state.bus2 ^ state.bus5 ^ (state.bus6 & state.bus7);
stateupdate(state,inbus1,inbus2,state1);
state = state1;
tagout = 1'b0;
done = 1'b0;
end
else if(q==1)
begin
msglenn = count;
tmp = state.bus2 ^ {62'd0,1'b1,1'b0,53'd0,msglenn};
c2 = inbus2 ^ state.bus1 ^ state.bus6 ^ (state.bus2 & state.bus3);
c1 = inbus1 ^ state.bus2 ^ state.bus5 ^ (state.bus6 & state.bus7);
if(u==1)
begin
done = 1'b1;
c2 = 0;
c1 = state.bus0 ^ state.bus1 ^ state.bus2 ^ state.bus3 ^ state.bus4 ^ state.bus5 ^ state.bus6;
end
else
done = 1'b0;
if(FINon2)
begin
stateupdate(state,tmp,tmp,state1);
state = state1;
end
else
begin
stateupdate(state,inbus1,inbus2,state1);
state = state1;
end
93
if(t==1) begin
tagout = 1'b1;
end
end
else
begin
c2 = inbus2 ^ state.bus1 ^ state.bus6 ^ (state.bus2 & state.bus3);
c1 = inbus1 ^ state.bus2 ^ state.bus5 ^ (state.bus6 & state.bus7);
end
//$display("l=%d m=%d n=%d o=%d q=%d t=%d u=%d done=%d\n\n",l,m,n,o,q,t,u,done);
//$display("c1=%h\nc2=%h\nin1=%h\nin2=%h\nstart=%d done=%d d=%d k=%d\nstate=%h\n---
--\n",c1,c2,inbus1,inbus2,start,done,d,k,state);
//$display("e_state=%h",state);
endtask:expected
endclass:scoreboard
class checker;
int counter,error;
transaction t;
mailbox # (transaction) mon2chk;
function new (input mailbox #(transaction) mon2chk);
this.mon2chk = mon2chk;
endfunction
task run(input int count,msgs,input bus128 inbus1_scr[*],inbus2_scr[*],input
logic start_scr[*],FINon_scr[*],input bus128 c1_scr[*],c2_scr[*],input logic
tagout_scr[*],done_scr[*],input bus1024 state1[*]);
int i,k,p;
i = (count + 26)*msgs;
p=i/msgs;
for(k=0;k<i;k++)
begin
@prt.ck; //???ok
mon2chk.get(t);
if(k%p==2)
$display("key=%h nonce=%h\n",rtl_inst.z3.inbus1,rtl_inst.z3.inbus2);
94
if(k%p == (14+count+6))
begin
if(rtl_inst.z3.c1 === c1_scr[k] && rtl_inst.z3.c2 === c2_scr[k])
begin
counter = counter+1;
$display("authentication_tag=%h\ne_authentication_tag=%h\n exp_done=%d done=%d
\n",rtl_inst.z3.c1,c1_scr[k],done_scr[k],rtl_inst.z3.done);
end
else
begin
counter = counter+1;
error = error+1;
$display("authentication_tag=%h\ne_authentication_tag=%h\n exp_done=%d done=%d
error \n",rtl_inst.z3.c1,c1_scr[k],done_scr[k],rtl_inst.z3.done);
end
end
if((k%p > 13 && k%p < 14+count)) // | k%p == 22)
begin
if(rtl_inst.z3.c1 === c1_scr[k] && rtl_inst.z3.c2 === c2_scr[k])
begin
counter = counter+1;
$display("in2=%h in1=%h\nc2=%h c1=%h\ne_c2=%h
e_c1=%h\n exp_done=%d done=%d
\n",rtl_inst.z3.inbus2,rtl_inst.z3.inbus1,rtl_inst.z3.c2,rtl_inst.z3.c1,c2_scr[k],c1_scr[k],done_scr[k
],rtl_inst.z3.done);
end
else
begin
counter = counter+1;
error = error+1;
$display("in2=%h in1=%h\nc2=%h c1=%h\ne_c2=%h
e_c1=%h\n exp_done=%d done=%d error
\n",rtl_inst.z3.inbus2,rtl_inst.z3.inbus1,rtl_inst.z3.c2,rtl_inst.z3.c1,c2_scr[k],c1_scr[k],done_scr[k
],rtl_inst.z3.done);
end
end
if(k%(count+26) == 0) $display("----------\nStarting next msg to
encrypt\n");
end
$display("Total Errors = %5d \n",error);
95
//@prt.ck; ///?????
endtask:run
endclass:checker
generator gen;
agent agt;
driver drv;
monitor mon;
checker chk;
scoreboard scb;
mailbox #(transaction) gen2agt,agt2drv,mon2chk;
// mailbox #(transaction) gen2agt,agt2drv;
function void build();
gen2agt = new;
agt2drv = new;
mon2chk = new;
gen = new(gen2agt);
agt = new(gen2agt,agt2drv);
//agt = new(gen2agt);
drv = new(agt2drv);
//mon = new(); //mhp
mon = new(mon2chk);
chk = new(mon2chk);
scb = new();
endfunction:build
task run();
fork
gen.run(count,msgs);
agt.run(count,msgs,inbus1_scr,inbus2_scr,start_scr,FINon_scr);
drv.run(count,msgs);
mon.run(count,msgs);
scb.run(count,msgs,inbus1_scr,inbus2_scr,start_scr,FINon_scr,c1_scr,c2_scr,tagout_scr,d
one_scr,state1);
chk.run(count,msgs,inbus1_scr,inbus2_scr,start_scr,FINon_scr,c1_scr,c2_scr,tagout_scr,
done_scr,state1);
join
endtask:run
endclass:environment
96
//coverage
covergroup Cov;
coverpoint prt.data_in[255:128] {
bins lowest = {[0:2**32]};
bins medlow = {[2**32:2**64]};
bins medhigh = {[2**64:2**96]};
bins highest = {[2**96:2**128]};
option.at_least = 2000; }
coverpoint prt.data_in[127:0] {
bins lowest = {[0:2**32]};
bins medlow = {[2**32:2**64]};
bins medhigh = {[2**64:2**96]};
bins highest = {[2**96:2**128]};
option.at_least = 2000; }
endgroup
//
environment env;
real cov_num;
Cov cv;
int count;
int msgs;
bus128 inbus1_scr[*],inbus2_scr[*];
bus128 inbus1_temp,inbus2_temp;
bus128 c1_scr[*],c2_scr[*];
bus1024 state1[*];
logic start_scr[*],FINon_scr[*],tagout_scr[*],done_scr[*];
initial
begin
env = new();
//count=1;
//count= 1 + $unsigned($random)%15;
count= 2048;
//msgs = $unsigned($random)%55;
msgs = 20;
$display("each message has total =%d 256 bits , total msgs = %d",count,msgs);
fork
begin
env.build();
env.run();
end
join_any
97
cv=new();
for(int iter=0;iter<(msgs*(count+26)); iter++)
begin
inbus1_temp = inbus1_scr[iter];
inbus2_temp = inbus2_scr[iter];
cv.sample();
end
repeat(msgs*(count+26)+12) @prt.ck;
cov_num = $get_coverage;
$display("\nfunctional coverage = %f\n",cov_num);
end
endprogram
//endmodule
Top.sv
`include "pckg.sv"
`include "port.sv"
`include "top_rtl.sv"
`include "layrd_test.sv"
`include "fifo.sv"
`include "fifo_cntlr.sv"
`include "controller.sv"
`include "initialization.sv"
`include "aes128.sv"
`include "stateupdate.sv"
`include "datapath.sv"
`include "encryption.sv"
//`include "controller.sv"
//`include "initialization.sv"
//`include "aes128.sv"
//`include "stateupdate.sv"
//`include "datapath.sv"
module top;
bit clk=0,rst=1;
initial
begin
98
#1 rst = 0;
#1 rst = 1;
end
always
#5 clk = ~clk;
port prt(clk,rst);
top_rtl rtl_inst(prt.top_rtl1);
layrd_test tb_inst(prt.layrd_test1);
endmodule
test.sv //regular testbench used to generate the SAIF files
//`include "top_rtl.sv"
`include "top_rtl_synthesized.v"
//`include "saed90final.v"
module test;
logic clk,rst,start;
logic [256:0] data_in;
wire [127:0] c1,c2;
wire tagout,done;
logic [127:0] inb1,inb2;
top_rtl w1(rst,clk,start,data_in,c1,c2,tagout,done);
//initial
//$vcdpluson;
always
#16 clk = ~clk;
initial
begin
//$set_gate_level_monitoring("ON");
$set_toggle_region(test.w1);
$toggle_start();
clk=0;
//$monitor($time,"st=%d AD=%d MSG=%d FI=%d roll=%d start=%d data_out=%d EMPTY=%d
FULL=%d wr_ptr=%d rd_ptr=%d rd_en=%d wr_en=%d cnt=%d state=%d fin=%d
done_clk_gate=%d
99
rdy=%d",start_in,ADon,MSGon,FINon_in,x2.roll_over,start,data_out[9:0],EMPTY,FULL,x2.wr
_ptr,x2.rd_ptr,x2.rd_en,x2.wr_en,x2.cnt,x2.state,x2.FINon,x2.done_clk_gate,ready);
//$monitor($time,"st=%d AD=%d MSG=%d FI=%d start=%d data_out=%d EMPTY=%d
FULL=%d state=%d cnt=%d done_clk_gate=%d
rdy=%d",start_in,ADon,MSGon,FINon_in,start,data_out[9:0],EMPTY,FULL,x2.state,x2.cnt,x2.d
one_clk_gate,ready);
$monitor($time,"in1=%d in2=%d\nst=%d c1=%d c2=%d\n tg=%d done=%din_in1=%d
in_in2=%d\nin_st=%d in_ad=%d in_msg=%d
in_fin=%d\n",data_in[255:128],data_in[127:0],start,c1,c2,tagout,done,w1.z1.data_out[255:128],
w1.z1.data_out[127:0],w1.z2.start_in,w1.z2.ADon,w1.z2.MSGon,w1.z2.FINon_in);
rst=0;
#1 rst=0;
#1 rst=1;// data_in={256'd3333,1'b1,1'b0,1'b0,1'b0};
#1 {start,data_in}={1'b1,1'b0,128'd12630567003293860855109576201165734308,128'd2};
#32 {start,data_in}={1'b1,1'b0,128'd12630567003293860855109576201165734308,128'd2};
#32
{start,data_in}={1'b1,1'b0,128'd130482792299249650780958843169121197881,128'd15119963
0763829394216724218405483397577};
#32 {start,data_in}={1'b1,1'b0,128'd172715115316271502967306500513779031842,128'd0};
#32
{start,data_in}={1'b1,1'b0,128'd23305380004405776739468075089964426475,128'd826743473
97392953226878434787120253260};
#32 {start,data_in}={1'b1,1'b1,128'd0,128'd0};
#32 {start,data_in}={1'b0,1'b0,128'd0,128'd0};
#640
for (int i=1; i<256; i++)
begin
#32 {start,data_in}={1'b1,1'b0,128'd12630567003293860855109576201165734308,128'd2};
#32 {start,data_in}={1'b1,1'b0,128'd12630567003293860855109576201165734308,128'd2};
#32
{start,data_in}={1'b1,1'b0,128'd130482792299249650780958843169121197881,128'd15119963
0763829394216724218405483397577};
#32 {start,data_in}={1'b1,1'b0,128'd172715115316271502967306500513779031842,128'd0};
for(int j=1;j<=i;j++)
begin
inb1 = 1*j;
inb2 = 2*j;
#32 {start,data_in}={1'b1,1'b0,inb1,inb2};
end
100
#32 {start,data_in}={1'b1,1'b1,128'd0,128'd0};
#32 {start,data_in}={1'b0,1'b0,128'd0,128'd0};
#640;
end
for (int i=1; i<525; i++)
begin
#32 {start,data_in}={1'b1,1'b0,inb1,{i,i,i,i}};
#32 {start,data_in}={1'b1,1'b0,inb1,{i,i,i,i}};
#32
{start,data_in}={1'b1,1'b0,128'd130482792299249650780958843169121197881,128'd15119963
0763829394216724218405483397577};
#32 {start,data_in}={1'b1,1'b0,128'd172715115316271502967306500513779031842,128'd0};
for(int j=1;j<=i;j++)
begin
inb1 = 3*j;
inb2 = 4*j;
#32 {start,data_in}={1'b1,1'b0,inb1,inb2};
end
#32 {start,data_in}={1'b1,1'b1,128'd0,128'd0};
#32 {start,data_in}={1'b0,1'b0,128'd0,128'd0};
repeat(i)
#64;
end
$toggle_stop();
$toggle_report("p7.saif",1.0e-9,"test.w1");
#1760; //$vcdplusoff;
#1$finish;
end
endmodule
101
APPENDIX C: SYNTHESIS, UPF AND PERL SCRIPTS
Synthesis script:
set hdlin_auto_save_templates "true"
set hdlin_sv_packages "enable"
set hdlin_infer_function_local_latches "true"
#Read the design in
read_file -format sverilog {"top_rtl.sv"}
#set the current design
set current_design top_rtl
#Link the design
link
#create clockand constrain the design
create_clock "clk" -period 20 -name "clk"
set_input_delay 0.5 -clock clk [all_input]
set_output_delay 0.2 -clock clk [all_output]
set_dont_touch_network "clk"
set_max_area 0
#Set operating conditions
set_operating_conditions -library "saed90nm_typ" "TYPICAL";
set_operating_conditions -library "saed90nm_typ_hvt" "TYPICAL";
set_operating_conditions -library "saed90nm_typ_cg_hvt" "TYPICAL";
set_operating_conditions -library "saed90nm_typ" "TYPICAL"
set_operating_conditions -library "saed90nm_typ_cg" "TYPICAL"
uniquify
set_clock_gating_style -sequential_cell latch \
-positive_edge_logic integrated:CGLPPRX2 \
-negative_edge_logic integrated:CGLNPRX2 \
-control_point before \
-minimum_bitwidth 3 \
-max_fanout 64 \
#insert_clock_gating
#powergating setup
set upf_create_implicit_supply_sets false
102
load_upf /gaia/class/student/pervaizm/Project/4_rtl_clk_hvt_pg/top.upf
set_voltage 1.20 -obj {VDD_POF1_VIRTUAL VDD_POF2_VIRTUAL VDD}
set_voltage 0.00 -obj {VSS}
#Synthesize and generate report
#compile_ultra
#compile_ultra -gate_clock -no_boundary_optimization
compile_ultra -gate_clock
report_clock_gating > report0
report_attribute > report1
report_area > report2
report_constraints -all_violators > report3
report_timing -path full -delay max -max_paths 1 -nworst 1 > report4
report_power > report5
report_power -hier > report6
write_file -format verilog -hierarchy -output top_rtl_synthesized.v
write_file -format ddc -hierarchy -output top_rtl_synthesized.ddc
Perl Script-1:
#!/usr/bin/perl
$path1 = "/gaia/class/student/pervaizm/design1_final/untitledfolder/saed90nm_hvt.v";
open(READ,$path1);
@lines = <READ>;
open(WRITE,">saed90nm_hvt_clean.v");
foreach my $lne(@lines)
{
if($lne =~ /^\s*`/)
{
}
else
{
print WRITE $lne;
}
}
close(READ);
close(WRITE);
103
Perl Script-2:
#!/usr/bin/perl
$path1 = " saed90nm_hvt_clean.v";
open(READ,$path1);
@lines = <READ>;
open(WRITE,">saed90final.v");
$k=0;
foreach my $lne(@lines)
{print "k=$k\n\n";
#if($lne =~ /^\s*`/)
if($lne =~ /^\s*specify/)
{
$k=1; print"1\n";
}
elsif($lne =~ /^\s*endspecify/)
{
$k=0; print"2\n";
}
elsif($k==1)
{
print"3\n";
}
elsif($k==0)
{
print WRITE $lne; print"4\n";
}
}
close(READ);
close(WRITE);
UPF Script:
create_power_domain PTOP
create_power_domain PAon -elements {z1 z2}
create_power_domain POF1 -elements {z3/y0 z3/y2}
create_power_domain POF2 -elements {z3/y1}
create_supply_port VDD
create_supply_net VDD -domain PTOP
create_supply_net VDD -domain PAon -reuse
create_supply_net VDD -domain POF1 -reuse
104
create_supply_net VDD -domain POF2 -reuse
connect_supply_net VDD -ports VDD
create_supply_port VSS
create_supply_net VSS -domain PTOP
create_supply_net VSS -domain PAon -reuse
create_supply_net VSS -domain POF1 -reuse
create_supply_net VSS -domain POF2 -reuse
connect_supply_net VSS -ports VSS
create_supply_net VDD_POF1_VIRTUAL -domain POF1
create_supply_net VDD_POF2_VIRTUAL -domain POF2
set_domain_supply_net PTOP -primary_power_net VDD -primary_ground_net VSS
set_domain_supply_net PAon -primary_power_net VDD -primary_ground_net VSS
set_domain_supply_net POF1 -primary_power_net VDD_POF1_VIRTUAL -primary_ground_net
VSS
set_domain_supply_net POF2 -primary_power_net VDD_POF2_VIRTUAL -primary_ground_net
VSS
create_power_switch POF1_sw -domain POF1 -input_supply_port {in VDD} -output_supply_port
{out1 VDD_POF1_VIRTUAL} -control_port {POF1_sd z2/poff_rtl} -on_state {stateon1 in
{!POF1_sd}}
create_power_switch POF2_sw -domain POF2 -input_supply_port {in VDD} -output_supply_port
{out2 VDD_POF2_VIRTUAL} -control_port {POF2_sd z3/poff_init} -on_state {stateon2 in
{!POF2_sd}}
#set_isolation POF_iso_out -domain POF -isolation_power_net VDD -isolation_ground_net VSS
-clamp_value 0 -applies_to outputs //no need as output already muxed
#set_isolation_control POF_iso_out -domain POF -isolation_signal y0/start3 -isolation_sense low
-location parent
add_port_state VDD -state {voltage 1.20 }
add_port_state POF1_sw/out1 -state {voltage 1.20} -state {POF_OFF1 off}
add_port_state POF2_sw/out2 -state {voltage 1.20} -state {POF_OFF2 off}
create_pst top_pst -supplies {VDD VDD_POF1_VIRTUAL VDD_POF2_VIRTUAL}
#create_pst top_pst1 -supplies {VDD VDD_POF1_VIRTUAL}
#create_pst top_pst2 -supplies {VDD VDD_POF2_VIRTUAL}
add_pst_state ALL_ON -pst top_pst -state {voltage voltage voltage}
add_pst_state RTL_OFF -pst top_pst -state {voltage POF_OFF1 POF_OFF2}
add_pst_state INIT_OFF -pst top_pst -state {voltage voltage POF_OFF2}
105
APPENDIX D: POWER AND SIMULATION RESULTS
Report power for normal mode of third stepping:
Information: Updating design information... (UID-85)
Warning: Design 'top_rtl' contains 1 high-fanout nets. A fanout number of 1000 will be used for
delay calculations involving these nets. (TIM-134)
Information: Propagating switching activity (low effort zero delay simulation). (PWR-6)
Warning: The derived toggle rate value (0.100000) for the clock net 'clk' conflicts with the
annotated value (0.062500). Using the annotated value. (PWR-12)
****************************************
Report : power
-analysis_effort low
Design : top_rtl
Version: I-2013.12-SP5-4
Date : Sat Apr 16 13:23:57 2016
****************************************
Library(s) Used:
saed90nm_typ (File:
/netdisk/tmp/saed/SAED90_EDK/SAED_EDK90nm/Digital_Standard_cell_Library/synopsys/m
odels/saed90nm_typ.db)
saed90nm_typ_hvt (File:
/netdisk/tmp/saed/SAED90_EDK/SAED_EDK90nm/Digital_Standard_cell_Library/synopsys/m
odels/saed90nm_typ_hvt.db)
saed90nm_typ_cg (File:
/netdisk/tmp/saed/SAED90_EDK/SAED_EDK90nm/Digital_Standard_cell_Library/synopsys/m
odels/clock_gating/saed90nm_typ_cg.db)
Operating Conditions: TYPICAL Library: saed90nm_typ_cg
Wire Load Model Mode: enclosed
Global Operating Voltage = 1.2
Power-specific unit information :
Voltage Units = 1V
Capacitance Units = 1.000000ff
Time Units = 1ns
Dynamic Power Units = 1uW (derived from V,C,T units)
Leakage Power Units = 1pW
106
Cell Internal Power = 6.0516 mW (65%)
Net Switching Power = 3.2597 mW (35%)
---------
Total Dynamic Power = 9.3113 mW (100%)
Cell Leakage Power = 2.8695 mW
Internal Switching Leakage Total
Power Group Power Power Power Power ( % ) Attrs
--------------------------------------------------------------------------------------------------
io_pad 0.0000 0.0000 0.0000 0.0000 ( 0.00%)
memory 0.0000 0.0000 0.0000 0.0000 ( 0.00%)
black_box 0.0000 0.0000 0.0000 0.0000 ( 0.00%)
clock_network 72.5890 74.9767 1.4733e+07 162.2987 ( 1.33%)
register 54.9246 87.1252 7.1198e+08 854.0305 ( 7.01%)
sequential 0.0000 0.0000 0.0000 0.0000 ( 0.00%)
combinational 5.9241e+03 3.0976e+03 2.1428e+09 1.1165e+04 ( 91.66%)
--------------------------------------------------------------------------------------------------
Total 6.0516e+03 uW 3.2597e+03 uW 2.8695e+09 pW 1.2181e+04 uW
1
Report power for sleep mode of third stepping:
Information: Propagating switching activity (low effort zero delay simulation). (PWR-6)
Warning: The derived toggle rate value (0.100000) for the clock net 'clk' conflicts with the
annotated value (0.062500). Using the annotated value. (PWR-12)
****************************************
Report : power
-analysis_effort low
Design : top_rtl
Version: I-2013.12-SP5-4
Date : Sat Apr 16 13:25:06 2016
****************************************
Library(s) Used:
saed90nm_typ (File:
/netdisk/tmp/saed/SAED90_EDK/SAED_EDK90nm/Digital_Standard_cell_Library/synopsys/m
odels/saed90nm_typ.db)
saed90nm_typ_hvt (File:
/netdisk/tmp/saed/SAED90_EDK/SAED_EDK90nm/Digital_Standard_cell_Library/synopsys/m
odels/saed90nm_typ_hvt.db)
107
saed90nm_typ_cg (File:
/netdisk/tmp/saed/SAED90_EDK/SAED_EDK90nm/Digital_Standard_cell_Library/synopsys/m
odels/clock_gating/saed90nm_typ_cg.db)
Operating Conditions: TYPICAL Library: saed90nm_typ_cg
Wire Load Model Mode: enclosed
Global Operating Voltage = 1.2
Power-specific unit information :
Voltage Units = 1V
Capacitance Units = 1.000000ff
Time Units = 1ns
Dynamic Power Units = 1uW (derived from V,C,T units)
Leakage Power Units = 1pW
Cell Internal Power = 52.8639 uW (97%)
Net Switching Power = 1.6058 uW (3%)
---------
Total Dynamic Power = 54.4697 uW (100%)
Cell Leakage Power = 2.7468 mW
Internal Switching Leakage Total
Power Group Power Power Power Power ( % ) Attrs
--------------------------------------------------------------------------------------------------
io_pad 0.0000 0.0000 0.0000 0.0000 ( 0.00%)
memory 0.0000 0.0000 0.0000 0.0000 ( 0.00%)
black_box 0.0000 0.0000 0.0000 0.0000 ( 0.00%)
clock_network 52.8635 1.6017 1.4788e+07 69.2534 ( 2.47%)
register 0.0000 0.0000 6.1376e+08 613.7578 ( 21.91%)
sequential 0.0000 0.0000 0.0000 0.0000 ( 0.00%)
combinational 3.9718e-04 4.1204e-03 2.1182e+09 2.1182e+03 ( 75.62%)
--------------------------------------------------------------------------------------------------
Total 52.8639 uW 1.6058 uW 2.7468e+09 pW 2.8012e+03 uW
1
108
Report power for overdrive mode of third stepping:
Information: Propagating switching activity (low effort zero delay simulation). (PWR-6)
****************************************
Report : power
-analysis_effort low
Design : top_rtl
Version: I-2013.12-SP5-4
Date : Sat Apr 16 13:25:47 2016
****************************************
Library(s) Used:
saed90nm_typ (File:
/netdisk/tmp/saed/SAED90_EDK/SAED_EDK90nm/Digital_Standard_cell_Library/synopsys/m
odels/saed90nm_typ.db)
saed90nm_typ_hvt (File:
/netdisk/tmp/saed/SAED90_EDK/SAED_EDK90nm/Digital_Standard_cell_Library/synopsys/m
odels/saed90nm_typ_hvt.db)
saed90nm_typ_cg (File:
/netdisk/tmp/saed/SAED90_EDK/SAED_EDK90nm/Digital_Standard_cell_Library/synopsys/m
odels/clock_gating/saed90nm_typ_cg.db)
Operating Conditions: TYPICAL Library: saed90nm_typ_cg
Wire Load Model Mode: enclosed
Global Operating Voltage = 1.2
Power-specific unit information :
Voltage Units = 1V
Capacitance Units = 1.000000ff
Time Units = 1ns
Dynamic Power Units = 1uW (derived from V,C,T units)
Leakage Power Units = 1pW
Cell Internal Power = 9.6767 mW (65%)
Net Switching Power = 5.2122 mW (35%)
---------
Total Dynamic Power = 14.8889 mW (100%)
Cell Leakage Power = 2.8694 mW
109
Internal Switching Leakage Total
Power Group Power Power Power Power ( % ) Attrs
--------------------------------------------------------------------------------------------------
io_pad 0.0000 0.0000 0.0000 0.0000 ( 0.00%)
memory 0.0000 0.0000 0.0000 0.0000 ( 0.00%)
black_box 0.0000 0.0000 0.0000 0.0000 ( 0.00%)
clock_network 116.1220 119.8884 1.4733e+07 250.7435 ( 1.41%)
register 87.8240 139.3119 7.1192e+08 939.0542 ( 5.29%)
sequential 0.0000 0.0000 0.0000 0.0000 ( 0.00%)
combinational 9.4728e+03 4.9530e+03 2.1428e+09 1.6568e+04 ( 93.30%)
--------------------------------------------------------------------------------------------------
Total 9.6767e+03 uW 5.2122e+03 uW 2.8694e+09 pW 1.7758e+04 uW
1
Simulation Results:
Following is the simulation result of one message of 256 bytes.
Inputs: in1 and in2, each 128-bits plain-text
Outputs: c1 and c2, each 128-bits cipher-text
Expected Outputs: e_c1,e_c2, expected results for the cipher-text.
Starting next msg to encrypt
key=fedebc47eba8c34934f63213f55f665e nonce=00000000000000000000000000000001
in2=dccf5c98c942bc144ab23017e8223c14 in1=194cd297a7cb8eb11824da31d694436c
c2=e5fa5915753ce84b8274e274fb72e459 c1=d2cf53634c36956ae7409ca1acda2a7f
e_c2=e5fa5915753ce84b8274e274fb72e459 e_c1=d2cf53634c36956ae7409ca1acda2a7f
exp_done=0 done=0
in2=96c54d1ee424ff03da07cea515a30005 in1=8f31edb7fbc2dee5d1176ba288f87128
c2=49067378e52e15d1beb0b02d291f74c1 c1=e935f450d747f52883aab5533e369186
e_c2=49067378e52e15d1beb0b02d291f74c1 e_c1=e935f450d747f52883aab5533e369186
exp_done=0 done=0
in2=1fcd2a32bc0d2a571563864345c4a549 in1=acc73be8b800b15c86e9a5bfa1fa273a
c2=a779f396c180a4a58ae874a949fa9656 c1=a56bbad78815caae46a4185c79fa0edb
e_c2=a779f396c180a4a58ae874a949fa9656 e_c1=a56bbad78815caae46a4185c79fa0edb
exp_done=0 done=0
110
in2=15a1082de9e381630c7ca19d1a100b49 in1=642d3fd91131930f7b2a0a6ec2b59a4b
c2=3f2b9dcfde02d3489ad6b514539cc48f c1=4fd0d6627816acaaebdcc865b784de25
e_c2=3f2b9dcfde02d3489ad6b514539cc48f e_c1=4fd0d6627816acaaebdcc865b784de25
exp_done=0 done=0
in2=9023c810a7b2856575554dee1a6d25a1 in1=3e685ea4dda10ccba117810c488f50cc
c2=b10f23cc47ffd6e027c0eb606b489d3e c1=d7324924ae86a6bba89b95282fa70838
e_c2=b10f23cc47ffd6e027c0eb606b489d3e e_c1=d7324924ae86a6bba89b95282fa70838
exp_done=0 done=0
in2=98f6c785a02b9a3b7bbfd154e919fd05 in1=9836b6684ff79b7971c49fe375a8b4b5
c2=2435f6739e5389f5f19d56dd34f7ee0d c1=34cfec3d7b0ff90e3fbd9b6ff3ed968e
e_c2=2435f6739e5389f5f19d56dd34f7ee0d e_c1=34cfec3d7b0ff90e3fbd9b6ff3ed968e
exp_done=0 done=0
in2=ca9693db09bb7dcfc4ac7ddeda8431b1 in1=ab28acb8bd9231e806d78702c845d0e6
c2=67a3cea9eead1e5ffe3dea82da47cc83 c1=3c94d3e6ea375a5782d51a705216434d
e_c2=67a3cea9eead1e5ffe3dea82da47cc83 e_c1=3c94d3e6ea375a5782d51a705216434d
exp_done=0 done=0
in2=8817c0f98865c268ae563d56a5f3cd20 in1=13959954d4e074fd7b169f99484d2c8d
c2=ff6c16bd08405993963eb2a9f83cdebb c1=b781a9b7ba042f9c51d4964521d0a36c
e_c2=ff6c16bd08405993963eb2a9f83cdebb e_c1=b781a9b7ba042f9c51d4964521d0a36c
exp_done=0 done=0
authentication_tag=d478a12b39e91135cca71a6cd830a8a8
e_authentication_tag=d478a12b39e91135cca71a6cd830a8a8
exp_done=1 done=1
----------
111
REFERENCES
[1] “Hardware and Software Encryption”, [Online]. Available:
http://www.infosecurity-magazine.com/magazine-features/tales-crypt-hardware-software.
[Accessed 1 April 2016].
[2] “Transistor Count”, [Online]. Available:
https://en.wikipedia.org/wiki/Transistor_count [Accessed 10 April 2016].
[3] “Moore’s Law Technology”, [Online]. Available:
http://www.intel.com/content/www/us/en/silicon-innovations/moores-law-technology.html.
[Accessed 4 April 2016].
[4] Hongjun Wu and Bart Preneel, “AEGIS: A Fast Authenticated Encryption Algorithm
(v1)," in CAESAR, 2014.
[5] “Mobile devices sold by year”, [Online]. Available:
http://www.statista.com/statistics. [Accessed 4 April 2016].
[6] Tejas Hadke and Behnam Arad, “Low-Power Chip Design Technique”, in CATA, 2015.
[7] “AES Encryption”, [Online]. Available:
https://en.wikipedia.org/wiki/Advanced_Encryption_Standard. [Accessed 6 April 2016].
[8] “SystemVerilog”, [Online]. Available:
https://en.wikipedia.org/wiki/SystemVerilog. [Accessed 6 April 2016].
[9] “Hierarchy in SystemVerilog”, [Online]. Available:
http://www.asic-world.com/systemverilog/hierarchy1.html. [Accessed 4 April 2016].
[10] S. Sutherland, SystenVerilog for Design, Springer, 2006.
[11] Bahram Hakhamaneshi and Behram Arad, “A Hardware implementation of the advance
encryption standard (AES) algorithm using SystemVerilog”, ISCA, 2010.
[12] “VMM Introduction”, [Online]. Available:
http://www.testbench.in/VM_01_INTRODUCTION.html. [Accessed 28 March 2016]
[13] C. Spear, SystemVerilog for Verification, Springer, 2008.
[14] Synopsys 90 nm technology library, 10 February 2014,
http://www.synopsys.com/Community/UniversityProgram/Pages/Library.aspx
[15] “VMM Introduction”, [Online]. Available:
http://www.synopsys.com/Tools/Verification/LowPowerVerification/Pages/MVSIM.aspx.
[Accessed 28 March 2016]
112
[16] “Functional Coverage”, [Online]. Available:
http://www.testbench.in. [Accessed 30 March 2016]
[17] Synopsys Power Compiler User Guide - Version E-2010.12-SP2, March 2011
[18] Synopsys Design Compiler User Guide – Version G-2012.06-SP3, October 2012
[19] Aaron Carroll and Gernot Heiser, “An Anlaysis of Power Consumption in a Smartphone”,
USENIXATC, 2010.