smart card power analysis: from theory to practice · 1.4 document structure this document...
TRANSCRIPT
Smart Card Power Analysis: From Theory To Practice
Joao Fernando Coelho Lopes
Thesis to obtain the Master of Science Degree in
Information Systems and Computer Engineering
Supervisor: Prof. Ricardo Jorge Fernandes Chaves
Examinatiom Committee
Chairperson: Prof. Luıs Manuel Antunes VeigaSupervisor: Prof. Ricardo Jorge Fernandes Chaves
Member of the Committe: Prof. Renato Jorge Caleira Nune
May 2017
Acknowledgments
First of all, I would like to thank my coordinator, Professor Ricardo Chaves, since its advice helped
me keep the work in the right direction and overcome some of the difficulties encountered.
I would like to thank my colleague Ricardo Macas, for reviewing and providing some feedback
about my thesis, and Jaganath Mohanty for having introduced me and explained the setup compo-
nents.
I would like to acknowledge WB electronics for providing the source code example for the smart
card.
I would like to thank my family, especially my parents, for the constant support. All of this, from
start to end, would have not been possible without them.
Last but not least, I would like to thank my girlfriend Joana Neno for its support and patience
specially on the tough days.
iii
Abstract
Smart cards are ubiquitous devices used in many critical areas. They offer some mechanisms
against unauthorized access, that protects the secret data in it. They can also offer cryptographic
operations such as data protection and authentication. However, power analysis offers non-intrusive
techniques to extract sensitive information. The most common attacks used is the Differential Power
Analysis (DPA), an attack class most appropriate for symmetric ciphering like AES. Signal-to-noise
ratio was used as complementary analysis to assess the noise on the recorded power traces respec-
tively. Then this work presents the fundamentals of CPA, the state of the art, and other statistical
analysis algorithms. Using this, an experimental setup is purposed to perform this type of analysis of
smart cards and FPGA. Finally, an experimental evaluation is performed to assess if the setup can
improve with the use of external amplification and different power supplies. The results shown that
this type of setup can benefit from external amplification, but did not benefit as much when different
power supplies were used. Also, the secret key of an unprotected smart card, using the best setup
configuration, was fully recovered using 25 power traces.
Keywords
Side-channel, Power analysis, Smart cards
v
Contents
1 Introduction xiii
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
1.2 Thesis Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
1.3 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
1.4 Document Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
2 Background xvii
2.1 Smart Cards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii
2.1.1 Physical Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii
2.1.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
2.1.3 Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
2.1.4 Smart card physical interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx
2.1.5 Smart communication Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx
2.1.6 Smart card data transmission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi
2.1.7 Security Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi
2.1.8 Operating system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii
2.2 Cryptographic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii
2.2.1 AES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii
2.2.2 RSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiv
2.3 Side-Channel Analysis Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvi
2.3.1 Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii
2.3.2 Simple Power Analysis - Visual Inspection . . . . . . . . . . . . . . . . . . . . . . xxvii
2.3.3 Differential Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxviii
2.4 Signal Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxx
2.4.1 Analog to Digital . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxx
2.4.2 Power traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxi
2.5 Welch’s T-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxii
3 State of Art xxxv
3.1 Side-channel Analysis Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxvi
3.1.1 SAKURA-G/W . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxvi
3.1.2 ChipWhisperer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxvii
vii
3.2 Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxix
3.2.1 Template Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxix
3.2.2 Collisions Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xli
3.3 Defence Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlii
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlii
4 Proposed Solution and Implementation xliii
4.1 Processing Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlv
4.1.1 SAKURA-G/W . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlv
4.1.2 Smart Card . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlvi
4.2 Trace Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlix
4.2.1 PicoScope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlix
4.2.2 Programming the traces collecting program . . . . . . . . . . . . . . . . . . . . . l
4.2.3 PC to SAKURA-G communicator programming . . . . . . . . . . . . . . . . . . . li
4.2.4 PC to SAKURA-W communicator programming . . . . . . . . . . . . . . . . . . . li
4.3 Signal Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . liii
4.3.1 Signal-to-noise ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . liii
4.3.2 T-test Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . liv
4.3.3 Intermediary value generator for AES . . . . . . . . . . . . . . . . . . . . . . . . liv
4.4 The overall setup usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . liv
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lvi
5 Experimental Results and Evaluations lvii
5.1 Different Setup Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lviii
5.1.1 Signal-to-noise Ratio Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . lviii
5.1.2 CPA Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lx
5.1.3 Power Supply Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lxiii
5.2 T-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lxv
5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lxvii
6 Conclusions and Future Work lxix
6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lxxi
Bibliography lxxiii
viii
List of Figures
2.3 Smart card contact pads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx
2.4 Smart card Application Protocol Data Units (APDU) format. . . . . . . . . . . . . . . . . xxi
2.5 Smart card communication protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii
2.6 AES AddRoundKey. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv
2.7 AES SubBytes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv
2.8 AES ShiftRows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv
2.9 AES MixColumns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv
2.10 Differential Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxviii
2.15 Sampling and quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxi
3.1 SAKURA-G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxvi
3.2 SAKURA-W . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxvii
3.3 Chipwisperer Starter Kit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxviii
3.4 Chipwisperer Software Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxviii
4.1 Power analysis setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xliv
4.2 PicoScope 6000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . l
4.3 Implemented power analysis setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lv
5.5 Partial CPA from setup 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lxii
5.6 Partial CPA from with setup 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lxii
5.7 Partial Correlation Power Analysis (CPA) from with setup 3. . . . . . . . . . . . . . . . . lxii
5.9 Power trace of the laboratory power supply in void. . . . . . . . . . . . . . . . . . . . . . lxiv
5.10 Power trace of the wall charger in void. . . . . . . . . . . . . . . . . . . . . . . . . . . . . lxiv
5.11 Power trace of the USB power supply in void. . . . . . . . . . . . . . . . . . . . . . . . . lxiv
5.12 Power supply incremental CPA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lxv
ix
Abbreviations
AC Alternating Current
AES Advanced Encryption Standard
APDU Application Protocol Data Units
API Application programming interface
ASIC Application Specific Integrated Circuits
ATR Answer To Reset
BPS Bits Per Second
CD Carrier Detect
CMOS Metal-OxideSsemi Conductor
COS Card Operating System
CPA Correlation Power Analysis
CPU Control Process Unit
CRC Cyclical Redundancy Check
CTS Clear to Send
DCE Data Communication Equipment
DC Direct Current
DDR Data Direction Register
DES Data Encryption Standard
DLL dynamic link library
DPA Differential Power Analysis
DSR Data Set Ready
DTE Data Terminal Equipment
xi
EEPROM Electrically-Erasable Programmable Read-Only Memory
FIA Fault Injection Attacks
FPGA Field-Programmable Gate Array
HAL Hardware Acceleration Engine
HD Hamming-Distance
HW Hamming-Weight
I/O Input/Output
ISO/IEC International Organization for Standardization and International Electrotechnical Commis-
sion
ISO International Organization for Standardization
MIPS Millions of Instructions Per Second
MMU Memory Management Unit
OS Operating System
PAA Power analysis attacks
POI Points-Of-Interest
RAM Random-access memory
RI Ring Indicator
ROM Read Only Memory
RSA Rivest Shamir Adleman
SCA Side-Channel Attacks
SIM Subscriber Identity Module
SNR signal-to-noise ratio
SPA Simple Power Analysis
SRAM Static Random Access Memory)
XOR Exclusive-or
xii
1Introduction
Contents1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv1.2 Thesis Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv1.3 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv1.4 Document Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
xiii
The smart card has become very popular over the years, being used in many security systems.
They can safely store sensitive information, like secret keys, thanks to built in security mechanisms
making them tamper-resistant. Smart cards can also perform cryptographic operations using the
secret key they hold, meaning the secret key is never exposed.
Side-channel attacks [1], have proven to be a very effective mean to attack cryptographic algo-
rithms. Side-channel exploits sensitive information leaked by a device during operation, ultimately
compromising the secret information of the cryptographic system.
Paul Kocher has proven in his pioneering work [2] that a smart card can be compromised easily, if
adequate protection mechanisms against power analysis attacks are not deployed. Power analysis is
a type of side-channel attack, where the instantaneous power consumption of a device can be used
to compromise the secret it holds. This is possible since the consumption of a device depends on the
data and operations that are being processed.
The two main power analysis attacks used are Simple Power Analysis (SPA) and Differential Power
Analysis (DPA). SPA relies on the direct interpretation of a power trace, used mainly to retrieve infor-
mation about the operations being performed, but can also be used to retrieve sensitive information in
particular cases. DPA is a more elaborate attacker, relying on the statistical analysis of multiple power
traces. In general, the attacker needs little to no implementation details about the device under attack
[3].
1.1 Motivation
Smart cards are distributed worldwide being used in many of today’s industries. Because they are
so widely spread and carry sensitive information they are a very desirable target for attackers. Smart
cards have multiple mechanisms that detect, when the card is working under abnormal conditions, or
when attempts to probe or tamper the components are made [4]. Side-channel attacks can retrieve
information from smart cards by measuring power consumption, electromagnetic fields, timing or even
sound [5]. These unintended leaks of sensitive information that might look harmless, but ultimately
can compromise a device’s secret information.
Power analysis attacks (PAA) are of interest because they do not need to tamper with the normal
functionality of the device. They measure the power consumption during the device’s operation and
perform statistical analysis over the collected data, defeating smart card security mechanisms that did
not contemplate this type of attack being more focused, for example, on protecting the device against
physical access. Power Analysis Attacks have been proven over the years to be very effective on
cryptographic devices, according to the existing state of the art research [1, 6–9].
The need to understand and measure the effectiveness of these attacks on devices that hold
sensitive information as smart cards is huge. The improvement of these attacks and, at the same
time, the implementation of countermeasures is the way to keep one step ahead of attackers.
xiv
1.2 Thesis Goals
In order to know how cryptographic devices can be exploited with the use of power analysis,
it is required to find how this type of attacks works to understand why and what devices might be
vulnerable.
The goal with this work is to build a setup that allows one to perform this type of analysis. The
analysis should allow the recovery of the secret key from an unprotected smart card implementation
and access the quality of the power traces gathered.
This setup will focus mainly in supporting smart cards, but can also be extended to other devices
as Field-Programmable Gate Array (FPGA)s. Then, different setup configurations will be tested in
order to find out how they affect the power trace collection and analysis.
Finally, this document is intended to serve as a stepping-stone for those who wish to understand
the concepts of power analysis and need to perform their own experiments and evaluate their own
systems.
1.3 Requirements
Based on these goals, the requirements are:
• The user, from this document, must be able to understand how power analysis and Correlation
Power Analysis (CPA) attacks work.
• The user must be able to understand how the setup works and how to use it.
• The user must be able to collect traces from smart cards using a power analysis platform.
• The user must be able to perform correlation analyses and recover the secret key from the
unprotected smart card provided.
• The user must be able to use the setup by only changing the configuration parameters and
selecting the type of analysis to be performed.
1.4 Document Structure
This document describes how power analysis is able to retrieve a secret key from a cryptographic
device and how it can be done in practice.
Chapter 2 provides the base information to understand why power analysis is possible and what
type of statistical analysis can be performed, in order to retrieve information from the power recorded
power consumptions.
Chapter 3 presents the state of the art on power analysis platforms, attacks that improve key
recovery base attack and some protection mechanisms that can increase the difficulty of recovering
the key.
xv
Chapter 4 describes the proposed setup and discusses its implementation. Here the trace col-
lecting setup components are presented as well all the software developed to support the overall
setup.
Chapter 5 presents the evaluation of the proposed solution. The several setup configurations
are evaluated by comparing the required traces. Several evaluation methods and related statistical
algorithms are also compared to see their effectiveness.
Chapter 6 concludes the dissertation by summarizing the work developed and presents possible
work directions
xvi
2Background
Contents2.1 Smart Cards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii2.2 Cryptographic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii2.3 Side-Channel Analysis Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvi2.4 Signal Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxx2.5 Welch’s T-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxii
xvii
This chapter presents the background, for a better understanding of later chapters. Section 2.1
presents an introduction of the smart card characteristics and components. Section 2.2 presents 2
types of ciphering algorithms: The Advanced Encryption Standard (AES) and Rivest Shamir Adleman
(RSA). Section 2.3 explains how power analysis work, followed by two techniques to perform such
analysis. Section 2.4 presents the characteristics to take into account when gathering signals and
explains the components of power traces. Finally, section 2.5 presents a complementary analysis
technique named t-test.
2.1 Smart Cards
Plastic cards have been in use since the 50’s. The first security mechanism relied on visual fea-
tures of the card, such as security printing and signature panel. The successor was the magnetic
stripe card that allowed the storage of digital data, enabling the card to be read by machines. The
problem with this technology is the possibility to read, delete and re-write the data store in the mag-
netic strip with the right equipment.
With the creation the integrated circuit and subsequently inclusion on a plastic card, the term smart
card was born. A smart card is a card shaped device with an integrated circuit that provides a way
to store data securely. There are two main smart card types: The memory and microcontroller smart
cards [10].
Memory cards include dedicated logic for security, providing access control to data, for example,
a writing/erase protection mechanism. This type of cards is designed for one specific purpose which
restricts their flexibility. However, this makes them inexpensive to manufacture. They are used in
applications that need storage and minimal data protection, being used in health insurance cards.
Microcontroller cards can be seen as a miniaturized computers with a Operating System (OS), a
storage, memory and an Input/Output (I/O) port. They also have the ability to create, delete, manipu-
late files and process data. This gives them the ability to execute applications and perform function-
alities dynamically, offering at the same time secure application transactions and data protection. A
great security advantage they have is the ability to perform cryptographic operations inside the card,
meaning the secure data is never exposed.
Smart cards communicate with the external world using physical contacts or electromagnetic
fields. Hybrid smart cards also exist, meaning that they possess the two means of communication
[10].
2.1.1 Physical Characteristics
ISO/IEC 7816-1 specifies the physical characteristics of a smart card [10]. The most common
format is the ID-1 card, with dimensions of 85.60 X 54.00 X 0.76 mm. These are the usual credit
card shaped cards used in such industries as financial, healthcare and others. Figure 2.1 shows this
common format.
Another common format is the ID-000 card, known for the Subscriber Identity Module (SIM) cards.
xviii
Figure 2.1: All three smart card formats. [11]
These cards have dimensions of 3 X 15 X 0.76 mm, used mainly on cell phones to identify and
authenticate subscribers towards the cell phone operator. There is also a smaller version, the ID-000
card named mini-UICC, the smallest card format produced. There also is a third format, named ID-00
or ’mini-card’, that has a size between the ID-1 and ID-000. This format has not yet been established
internationally.
2.1.2 Architecture
A microcontroller smart card is composed by the following internal components [10]: The Central
Processing Unit (CPU), responsible for executing the device instructions; The Random-access mem-
ory (RAM), the working memory of the Control Process Unit (CPU); The Read Only Memory (ROM),
which contains the Card Operating System (COS); The Electrically-Erasable Programmable Read-
Only Memory (EEPROM), the non-volatile memory, used to store data, programs and OS routines;
The I/O port, used to communicate with the external world; and finally the sensors, that monitor if the
card is running on specified parameters. Some smart cards also have a co-processor used in heavy
numeric calculations.
Figure 2.2 illustrates the components of a smart card microprocessor laid out in a simplified way.
Figure 2.2: Microprocessor card components.
2.1.3 Standards
Several different worldwide companies produce the smart card and the infrastructure to commu-
nicate with them. To guarantee the compatibility and interoperability between different manufactur-
xix
ers, the International Organization for Standardization and International Electrotechnical Commis-
sion (ISO/IEC) 7816 was created to specify smart card standards.
Contact cards that are International Organization for Standardization (ISO) compliant must follow
part one, two and three of the ISO 7816 [12], part one describes the physical characteristics, part two
the location and dimensions of the contacts and part three describes the transmission protocols and
electronic signals.
Contactless cards must comply with the additional ISO/IEC 14443, divided in four parts where part
one describes the physical characteristics, part two the radio frequency power and signal interface,
part three the initialization and anti-collision mechanism and part four the transmission protocol.
2.1.4 Smart card physical interface
ISO/IEC 7816-3 specifies the physical characteristics, electronic signals and transmission proto-
cols of a contact smart card [12]. The contact pad is shown in figure 2.3 and each pin function is
described has the following:
The VCC is used to supply the smart card with power and voltages can range from 5V, 3V or
1.8V; The RST is used to reset the smart card microcontroller, to which the card responds with an
Answer-To-Reset (ATR); The CLK is used to supply the microcontroller with a clock signal, usually at
3.5MHz; The SPU means standard or proprietary use (the former programming voltage Vpp) and is
optional; The I/O is used to transfer data between the smart card and a terminal. The RFU stands for
”reserved for future use” and is usually denoted as the AUX pin used, for example, as an operation
trigger or as a second I/O communication channel.
Figure 2.3: The smart card contact pad.
2.1.5 Smart communication Protocol
Smart cards use what is called Application Protocol Data Units (APDU) to exchange data between
a terminal. The communication is specified by ISO/IEC 7816-4 [12].
The two main components of a APDU are the header and body. The header has a fixed size and is
always present, while the body can vary in size and is not required for some instructions. The header
can be divided into four elements, the class byte (CLA), the instruction byte (INS) and two parameter
bytes (P1, P2). As for the body, it has three components, the data length command Lc, the data field
and the response length expected Le. The left part of Figure 2.4(a) shows a visual representation of
the APDU structure.
xx
(a) Figure A (b) Figure B
Figure 2.4: Smart card APDU format.
The response APDU is composed of a response body and trailer (SW1, SW2), with the body being
optional and the trailer being of fixed size. The body contains the data produced by the previous
command, which can be none if the previous operation did not return anything. The SW1 and SW2
contain the response code indicating if the processing status was successful or not. The right part of
Figure 2.4(b) shows a visual representation of the returned APDU format.
For example, a verify command, to check if the user’s inserted pin (0000) matches the smart
card internal one, is represented as the APDU: 0x94 0x20 0x80 0x00 0x04 0x30 0x30 0x30 0x30,
where 0x94 identifies the instruction class, 0x20 the instruction, 0x80 and 0x00 are parameter 1 and
2 respectively, 0x04 is the data length the data and the four 0x30 are the PIN digits in ASCII value. If
the pin was correct, the returned APDU would be 0x90 0x00, where the 0x90 and 0x00 are the trailer
SW1 and SW2 respectively.
2.1.6 Smart card data transmission
Smart card data communication is performed asynchronously using a half-duplex channel. This
means the smart card reader and the smart card share the same communication line, the data pin
seen in a previous section, so only one device can communicate at a time. This communication can
use two protocols named T=0 and T1, were T=0 defines the data transmission of one byte at time
whereas T=1 defines a data block transmission.
When transmitting one byte, there are a couple of extra bits that ensure the sender and receiver
are synchronized. At the start, the line drops to from a high state to a low state informing the receiver
that there is one byte being transferred. Then the data bits are transferred followed by 2 more bit
fields. Those bits are the data parity bit and the guard time, where the parity bit is added to make
the number of data bits either even or odd, depending on the configuration, and the guard time
bits ensures the receiver sees the highest state before a new byte transfer is performed. The most
common configuration for data transmission is 9600 bits/second, 8 data bits, even parity and 2 stop
bits. Figure 2.5 illustrates this transmission structure.
2.1.7 Security Mechanisms
One of the main features that make smart cards so desirable, is their ability to protect the data in it.
To achieve that, tamper mechanisms are implemented to stand against physical and logical attacks.
Some of the countermeasures against physical attacks [13] are: The programmable active shield, that
is a protective shield layer that covers the smart card microcontroller, preventing the chip components
xxi
Figure 2.5: Smart card communication protocol structure.
from being analysed/probed; the Memory Management Unit (MMU) that acts like a firewall, preventing
smart card application from accessing privileged resources that should only be access by the OS;
The data bus encryption, that cipher data being passed over the bus, preventing an attacker from
knowing what values are being transmitted; The sensors that are integrated with the microcontroller to
prevent abnormal operation of the smart card, by monitoring things like the internal and external clock
frequency, voltage, temperature and others components; The Cyclical Redundancy Check (CRC) that
checks for data errors during transmission, reading or writing; Finally, the current masking device is
a security mechanism against power analysis that operates by performing random dummy accesses
operations in memory, changing the power consumption of a device during operation.
2.1.8 Operating system
Operations on smart cards are controlled and monitored by the COS which is a small O.S, specific
for each type of card [10]. They are divided in two groups: The general purpose COS, that has a
generic commands that work on many applications, like the Java card; The dedicated COS, that has
instructions for some specific applications and can also contain the application within itself. COS
can also be classified as an open or a proprietary platform. The term open platform means third
parties can load programs into the smart card, while proprietary platform means the opposite, only
the producer of the COS can install programs to the card.
Java Card is one of the most used COS worldwide. It is an open source, multi-application oper-
ating system based on Java, intended to abstract the hardware from the programming language, i.e.
programmers do not to worry about the hardware specifications when developing a program. This is
accomplished by translating the programs to bytecode, with is then interpreted by a virtual machine.
The virtual machine is responsible to run the program while handling the hardware specifications and
resources. Java card allows a smart card to have multiple Java Card applications (applets) inside,
while granting isolation from each other through the use of firewalls. This has the advantage of al-
lowing different vendors to have their applets on the same smart card, independently of the level of
security and testing each applet has.
MULTOS is another popular smart card operating system known to be used when there are needs
for high security and performance. Applications are typically written in C language, which are then
compiled into the MULTOS Executable Language (MEL). Like the Java Card, it also allows multiple
xxii
applications on a single smart card, while providing isolation from each other. The main difference
comes in at production, were the MULTOS manufactures must comply with a licence that obligates
them to perform rigorous security and interoperability tests on these smart cards.
2.2 Cryptographic Algorithms
Most of the today’s systems security relies on cryptographic operations to provide desirable prop-
erties such as authentication, integrity and confidentiality. The main cryptographic operations are of
two types, asymmetric and symmetric encryption.
In symmetric ciphering, there is one key to cipher and decipher data. Some examples of this
type of algorithms are the AES, the Data Encryption Standard (DES), 3DES, Blowfish and Twofish.
AES will be described next, since it is widely used and secure to the point of being used by the U.S
government to protect classified data.
Asymmetric ciphers use two keys, named private and public keys, to exchange messages se-
curely. The sender uses the public key, known to everyone, to encrypt the message content, granting
confidentiality. The public key can also be used to verify the authenticity of a message. The receiver
uses the private key, known only to him, to decrypt the message content. The private key can also
be used to cipher data to grant message authenticity. Some examples of asymmetric cryptographic
algorithms are the RSA, Diffie-Helman and ECC! (ECC!). RSA will be described next, since is the
most commonly used and is still considered, despite the fact it was first published in 1977.
Between asymmetric and symmetric cipher there are some factors to take into account. Asym-
metric algorithms rely on computationally expensive math operations, making them slower compared
to symmetric operations. On the other hand, AES is less easily scalable since it needs a new secret
key for each new pair of entities that want to communicate.
There is also a class of algorithms named hash functions, that produce distinct fixed sized outputs
dependent of the input that is given. Hash functions transform an arbitrary size of data into a fixed
size value, usually smaller than the original data. They are used to validate data integrity, meaning it
is possible to detect changes in the original data by comparing the outputs of the data received and
the expected one. Most popular hash functions are the SHA-1 [14] and MD5 [15]. MD5 has been
deemed insecure due to several mathematical vulnerabilities.
2.2.1 AES
The AES [16] is an symmetric cryptographic algorithm used to protect electronic data. Ciphering is
performed in blocks of 128 bits using keys that range from 128, 196 and 256 bits, being the algorithm
slightly different depending on the key size. The algorithm was adopted by the U.S government to
protect secret data.
AES performs 10, 12 or 14 transformation rounds, depending if the used key is 128, 192 or 256
bit, over a 4x4 byte data matrix called the state. At the beginning, the state has the original data block
to be ciphered then, 4 operations are performed in each round. The operations are addRoundKey,
xxiii
subBytes, shiftRows and mixColumns, where each one either replaces the values of the state with
new ones or changes their position. The result, after all the operations, is the ciphered data. As for
the deciphering operation, is the same process with the same number of rounds and operations but
in reverse order. Next, the previous 4 mentioned operations are described in more detail:
AddRoundKey operation performs an Exclusive-or (XOR) between the state and a round key. A
round key is generated from the secret key, via a key scheduler, and can be represented also in a 4x4
matrix. Figure 2.6 illustrates the addRoundKey operation.
SubBytes operation performs a non-linear substitution on each byte, meaning, every position
of the state is replaced by another value. The substitution function is named S-box and works by
replacing the state bytes either, by using a formula, or by using a pre-computed table with all the
possible values. Figure 2.7 illustrates the subBytes operation.
ShiftRows operation performs a number of row rotations, that depend on the row position. By
shifting n − 1 times, where n is the row position. This means that the first row stays the same, the
second is shifted one position, the third two positions and the fourth is shifted three positions. Figure
2.8 illustrates the shiftRows operation.
MixColumns operation performs a column data mix independent of the other columns. To achieve
that, each state column is multiplied with a fixed polynomial. Figure 2.9 illustrates the mixColumns
operation.
The only difference to this sequence is on the tenth round, where the mixColumns is not performed
and a extra addRoundKey is computed. The pseudocode of this algorithm is shown bellow:
aes c ipher ing ( byte p l a i n t e x t , word round key ){byte s ta te ;s t a t e = p l a i n t e x t ;AddRoundKey ( s ta te , round key ) ;for ( i = 1 ; i < num rounds − 1; i ++){
SubBytes ( s ta t e ) ;ShiftRows ( s ta te ) ;MixColumns ( s ta te ) ;AddRoundKey ( s ta te , round key [ i ] ) ;
}SubBytes ( s ta t e ) ;ShiftRows ( s ta te ) ;AddRoundKey ( s ta te , round key [ num rounds ] ) ;return s ta te ;
}
2.2.2 RSA
RSA [17] is a asymmetric cryptographic algorithm based on the modular arithmetic over large
positive numbers. The security of the algorithm relies on the difficulty in factoring large numbers into
its base primes, since no algorithm capable of doing that efficiently was found.
The basic idea of this algorithm is to raise the message value to the public key when encrypting
or private key when decrypting, ending with a modular operation that uses a known value. This
xxiv
Figure 2.6: AES AddRoundKey.
Figure 2.7: AES SubBytes.
Figure 2.8: AES ShiftRows. Figure 2.9: AES MixColumns.
mathematical operation is called modular exponentiation.
The public key is the pair Ppu = e, n were e is the value that is going to be used to raise the
message value M and n is the value that will be used on the modular operation.
The cipher operation is presented as the following:
Me (modn) = C(ciphertext) (2.1)
The private key is a pair Ppu = d, n were n is the value that is going to be used to raise the
message value M and n is the value that will be used on the modular operation. The decipher
operation is presented as the following:
Cd (modn) =M(Message) (2.2)
RSA and other asymmetric cryptographic algorithms are based on the modular exponentiation.
The most used technique to perform a modular exponentiation is known as the square-and-multiply
algorithm. The algorithm works by checking if the last bit of the exponent number is one or zero. If it
is zero, the algorithm performs a square operation with the current result and the exponent is shifted
one position to the right. If the last bit is one, the previous operations are applied plus a multiplication
between the current result and the base. The following presents a possible implementation of the
algorithm:
xxv
squareAndMul t ip ly ( x , n ){i f ( n < 0)
return squareAndMul t ip ly ( 1 / x , −n ) ;else i f ( n == 0)
return 1;else i f ( n == 1)
return x ;else i f ( isEven ( n ) )
return squareAndMul t ip ly ( x ∗ x , n / 2 ) ;else / / i s Odd
return x ∗ ( squareAndMul t ip ly ( x ∗ x , ( n − 1 ) / 2 ) ) .}
This implementation is vulnerable to SPA since the execution of the multiplying operation is de-
pendent of the secret key bits [3]. This issue will be addressed in detail on section 2.3.2.
2.3 Side-Channel Analysis Attacks
Side-Channel Attacks (SCA) takes advantage of the information leaked by the physical implemen-
tation of a cryptographic device. They can be divided in two major groups [18], passive/active and
invasive/non-invasive attacks, having the following characteristics:
Passive and Active attacks: Passive attacks rely on the observation of the device’s behaviour
under normal specifications, meaning no attempts to tamper with it are performed. In contrast,
active attacks try to tamper with the device functionality, so it operates under abnormal specifi-
cations.
Invasive and non-invasive: Invasive attacks start by depackaging the chip to get access to
the components, so they can be directly probed. In contrast, non-invasive attacks try to gather
information only using the external components.
This type of attacks are not exclusive from each other, meaning it is possible to have a passive
attack using invasive or non-invasive techniques and vice-versa.
The attacks can be grouped in classes, depending on what physical characteristic they exploit
like, timing, power consumption, fault induction/analysis, temperature, electromagnetic emanations,
acoustic emissions and others. Their main goal is the same: retrieve information that can be used
to help revealing the device’s secret keys, but mostly differ in time, cost, expertise and equipment
needed to perform them.
Power analysis is typically a type of non-invasive attack [3] that can be performed with inexpensive
equipment when compared to other active attacks. They have been proven to be effective in recov-
ering secrets from the cryptographic device while maintaining the process fairly simple: to record the
power consumption of a device, while performing cryptographic operations, and in the end perform
statistical analysis over recorded power consumptions. This concept was the main reason this attack
was chosen, it is effective and fairly simple.
xxvi
2.3.1 Power Analysis
Most common modern digital circuits are build using complementary Metal-OxideSsemi Conductor
(CMOS) cells. This technology has the particularity of having significant power consumptions when
the internal logic cell values change. This type of power consumption is dynamic and occurs when
logic cells change their internal values from 0 → 1 or 1 → 0. Logic cells that maintain their internal
state have a static consumption, leading to very little power consumptions. This happens because
when there is a state transition it is usually required to charge or discharge a condenser, used to
maintain the logical state.
The state of logic cells depends on the input and operations a device is performing. An attacker can
acquire knowledge about operations and data being processed at a given moment. This information
leak can ultimately lead to the discovery of sensitive data, such as secret keys on a cryptographic
device. This type of analysis is called PAA where a correlation between the power consumption and
the key dependent operations performed on the device is evaluated.
An attacker has, in most cases, limited to no knowledge about a device implementation. So to
simulate the power consumption of a device, they make use of simple power models. The most
commonly used are the Hamming-Distance (HD) and Hamming-Weight (HW) [5].
The Hamming-Weight power model is a simple model that considers the number of bits set to one,
at a given moment, to describe the power consumption of a device. This model requires almost no
knowledge about the device structure, and only needs to know the value being processed at a given
time.
The Hamming-Distance counts how many bits have changed from 1→ 0 and 0→ 1 over a transi-
tion. This model requires the attacker to know the predecessor and successor values at a given time.
It also requires some knowledge about the device implementation which in most cases is not known.
These power models are then used on PAAs. PAAs are divided into three types: SPA, DPA and
High-Order DPA. Later sections are going to be addressed the first two attacks.
The following sections will refer only to the software implemented algorithms on a 8-bit smart card,
since they are simpler to understand. For a power analysis attack on an AES Application Specific
Integrated Circuits (ASIC) implementation refer to [19].
2.3.2 Simple Power Analysis - Visual Inspection
Simple power analysis can be described as “a technique that involves directly interpreting power
consumption measurements collected during cryptographic operations” [5] to gather sensitive infor-
mation that can lead to the key discovery. This technique uses from one or few power traces of the
device under attack. This usually leads to the use of complex statistical analysis in order to retrieve
useful information from the power trace. Also, detailed knowledge about the algorithm implementation
running on the device is almost always necessary in order to retrieve useful information [5].
Visual inspection is one technique that can be used to retrieve information about the operations a
device is performing. This is possible because devices execute the algorithm instructions in sequence,
xxvii
i.e. the algorithm implementation is translated into a set of instructions that run in sequence on the
device.
The instruction set of a device can be divided into four subsets: the arithmetic, logical, data trans-
fer and branching sets. These sets work with different components as the arithmetic-logic unit, the
co-processor, the RAM, the ROM and the peripheral components. Components have their unique im-
plementation and purpose, making them have different power consumptions patterns when running.
This means the different components can be observed in the power traces allowing to identify what
instructions are running at a given time.
Instructions have their own power characteristic, leading to potential severe risk if their execution,
depends on the secret key. For a device running an algorithm with such characteristics, the key can be
compromised without much effort. The best example can be seen in some public-key cryptosystems
implementations using the square-and-multiply modular exponentiation [3]. The squaring operation
is performed for every bit of the secret key while the multiply operation is performed only for bits of
the key equal to one. Since the square and multiplication can be distinguished in the power trace, re-
trieving the secret key is as simple as checking when only a squaring or the squaring and multiply are
performed. Figure 2.10 depicts an example of a power trace during a square and multiply operation.
Even if the sequence of operations does not depend on the secret key, visual inspection can still
be useful in cases where the algorithm running on a device is not known. It can also give some hints
about how an algorithm is implemented.
Figure 2.10: An illustrative example of how Simple Power Analysis can be used to identify the bits being pro-cessed by visually inspecting the instantaneous power consumption.
2.3.3 Differential Power Analysis
In power analysis attacks, DPA are more effective than SPA in most cases. One of the reasons
being, they do not require detailed knowledge about the attacked device, being sufficient in most
cases to know the algorithm running on the device [5].
DPA requires many traces in order to be successful, but it’s effectiveness recovering the secret
key is superior to SPA, being able to recover keys even if power traces are noisy. The attack looks for
data dependencies at specific points of the power trace, instead of patterns in a complete trace.
This type of analysis is performed in five steps [5] detailed follow:
Step 1: Choosing an Intermediate Result of the Executed Algorithm. In the first step, an
intermediate stage of the algorithm running on the attacked device is chosen. This stage must
xxviii
be a function that depends on a known value, the algorithm’s input or output data, and a portion
of the secret key.
Step 2: Measuring the Power Consumption. In the second step, the power trace of the
device is obtained while performing several cryptographic operations with known data as input.
The data used in the operations is stored in a vector d and the produced power trace is stored
in a vector t. Finally, a matrix T is built using all the recorded t vectors, where every position of
vector d produces one line on the matrix T. The first position of vector d produces the power
trace on the first line of matrix T and the same is done for the rest of the elements. Note that the
intermediary operation input must be known to the attacker. Figure 2.11 illustrates the process.
Step 3: Calculating Hypothetical Intermediate Values. In the third step, all possible keys
used in the intermediary operation are generated and stored in a vector k. The elements of
vector k are typically called key hypotheses. Then we build a matrix V containing all the possible
intermediary values using vector k and the data vector d. Each line of the matrix will have all
the possible intermediary values for one input data.
Since every column value depends on one hypothetical key and the input data, one of columns
will have the same values as the attacked device. Figure 2.12 illustrates the process.
Step 4: Mapping Intermediate Values to Power Consumption Values. In the fourth step,
each element of matrix V is mapped to a hypothetical power consumption. The values calcu-
lated are then stored in a matrix H. The function used to perform the mapping between the
intermediary values and the hypothetical power consumption, is usually one of the power mod-
els referred before, namely Hamming-weight or Hamming-distance. Figure 2.13 illustrates the
process.
Step 5: Comparing the Hypothetical Power Consumption Values with Power Traces. In
the final step, each column of hypothetical power consumption matrix H is compared with all
the columns of matrix T. These operation compares all hypothetical power consumption of all
key hypothesis with all the position of the recorded traces. All results are then stored in matrix
R. These comparisons between matrices are performed using statistical analysis, such as the
correlation coefficient or distance of means methods [3]. Figure 2.14 illustrates the process.
In the end, if the power model and the number of used traces used are adequate, the key is found
by searching the row, on the resulting matrix R, that as the highest value.
It is worth mentioning that when in step 4, the correlation coefficient is used, this attack is named
Correlation Power Analysis.
xxix
Figure 2.11: Illustration of the power consumption measurement (step 2).
Figure 2.12: Illustration of the hypothetical intermediate values matrix construction (step 3).
Figure 2.13: Illustration of mapping intermediate values to power consumption values (step 4).
Figure 2.14: Illustration of the comparison between the hypothetical power consumption values with the powertraces (step 5).
2.4 Signal Characteristics
This section will talk about the conversion between an analog signal to digital, explaining how
the conversion is done and how to minimize the loss of information during the process. Then the
components that constitute each sample of a power trace, are presented.
2.4.1 Analog to Digital
The power consumption of a device can be seen has as an analog signal, i.e. continuous on
the time domain. To be processed and analysed by a digital device, like a computer, the analog
signal must be converted from continuous to discrete. This conversion is done using sampling and
quantization. Sampling converts the time axis to a finite number of sampled values. Quantization
xxx
is performed on the y axis, where the amplitude of a signal is mapped to a finite set of values, as
illustrated in Figure 2.15.
Figure 2.15: Sampling and quantization of a continuous signal.
When performing analog to digital conversions, there is a risk of losing information on the process.
This can happen if the sampling frequency is low or the converter used has inadequate quantization
resolution, where sampling frequency is the number of points gathered per second on the time axis;
and signal resolution, usually presented in bits, affects the number of values that can be mapped from
the analog to the digital y axis.
When sampling, one has to be careful with the frequency chosen because a low frequency, de-
pending on the signal, might not record all the signal information and lead to what is called, aliasing.
One naive solution could be to always use the maximum sampling frequency, but this has the dis-
advantage of requiring more space and increase the cost of processing the signal. A better way is
to find the minimal sampling rate, in order to reconstruct an analog signal, and then adjust the sam-
pling frequency from there. One theorem that is used to discover this minimal sampling rate is using
the Nyquist-Shannon sampling theorem, which states that in order to capture a signal without losing
information, the sample rate has to be twice the maximum frequency of the signal being sampled.
Another thing to take into account is that low bit resolution leads to a bigger quantization error.
Quantization error is essentially how different is the signal in terms of amplitude from the recorded
value and the original signal. To minimize the quantization error, the signal must occupy the entire
range of the acquisition device. This what the full quantization range is used to map the signal,
maximizing its resolution.
2.4.2 Power traces
The total power consumption in each point is the sum of four components. The operation-dependent
component is represented by Pop. Data-dependent component is represented by Pdata. The other
two, electronic noise and constant power consumption, are represented by Pel.noise and Pconst re-
spectively.
Electronic noise happens when a undesired signal or signals exist and are recorded. This is some-
thing that is present in every measurement in practice. The way to characterize it, is by performing the
same operations with constant input several times and average the result. This way, since the noise
is something more or less random, it will be partially averaged out.
xxxi
Constant power consumption is usually ignored, since it does not carry any useful information
from it. This is because the presence of Pconst in measurement is normally caused by current leaks.
Internal value changes on transistors that are independent of the data processed and operations
being performed are called switching noise. The total power on each sample of a recorded power
trace, can be reformulated by:
Ptotal = Pop + Pdata + Pel.noise + Pconst (2.3)
If an attacker wants to retrieve useful information from the power traces, it can use different types
of power analysis on Pop, Pdata. This type of analysis targets different properties of those components,
meaning that, he can target a complete or a small part of a component. The part used in the attack is
called exploitable component, denoted by Pexp. The part not used is called switching noise, denoted
by Psw.noise. The relation between these components can be defined as:
Pop + Pdata = Pexp + Psw.noise (2.4)
Considering this new relation, the total power composition can be formulated as:
Ptotal = Pexp + Psw.noise + Pel.noise + Pconst (2.5)
The analysis of the signal containing useful information Pexp, becomes more difficult to perform
the higher the value of Psw.noise and Pel.noise. One way to quantify the leakage of information, from
one point, is to use a signal-to-noise ratio (SNR) metric. In the context of power analysis, SNR can
be written as follows:
SNR =V ar(Pexp)
V ar(Psw.noise + Pel.noise)(2.6)
The number of samples required to extract useful information from a point is inversely proportional
to the SNR value. So if traces have more noise, more traces are needed to retrieve useful information
from them.
2.5 Welch’s T-Test
Properly acquired power traces are essentially a signal that need to be processed, in order to
retrieve useful information and ultimately discover the secret key of an electronic device. Until now,
the CPA analysis was used to retrieve this secret key, but this method as one disadvantage: it is
computationally expensive to correlate thousands of large traces, when in the end usually only a
portion of the trace is relevant to the recovery of secret keys. This section presents a statistical
tool named t-test, to try to find this relevant point in a power trace, without heavy computation or,
depending on the attack, thousands of traces.
T-test is a statistical hypothesis test used to distinguish if two sets of data are significantly different
from one another. This test follows a Student’s t-distribution, used when the data being analysed are
normally distributed, the number of samples is small and the standard deviation is unknown. Welch’s
t-test is an adaptation of the regular t-test for sets that have unequal variances and sample sizes.
xxxii
This method by itself does not reveal the secret key of a device, but can show potential Points-
Of-Interest (POI) by distinguishing trace samples if they have different power consumptions, revealing
possible differences in the signal, thus revealing possible leakages. This is of special importance
when dealing with a large number of traces with lots of samples, since it would reduce the correlation
to those presenting potential interest. The t-test formula is defined as the following:
t =X1 −X2√
s21N1
+s22N2
(2.7)
where X1, X2 are the sampled mean from dataset 1 and 2 respectively, s21, s22 are the sampled
variance and N1, N2 are the number of samples.
The degrees of freedom v associated with the variance can be calculated using the following
formula:
v ≈(s21N1
+s22N2
)2
(s21N1
)2
N1−1 +(
s22N2
)2
N2−1
(2.8)
This method works in the power analysis context, because different input data cause different in-
termediary values. These intermediate values cause a distinct power consumption that is noticeable,
whereas instructions that are independent of the input data leakages have the same power consump-
tion. The points, where the consumption is different, are potential places that can have sensitive data
like the XOR operation between secret key and plain text. But for this to work, two distinct power trace
groups need to be created by using two pre-selected plaint text groups. The categories are presented
below:
• Fixed vs. Fixed - The two sets of input data have only one data value different from each other.
This method has the advantage of being completely independent of the algorithm being used,
but usually at the cost of increasing the false negatives.
• Fixed vs. Random - One set has one fixed value and the other has random values. This one
is very simple and has the advantage of being completely independent of the algorithm being
analysed.
• Semi-fixed vs. Random - One set has data that produce some fixed intermediary values and the
other set only has random data. Taking the AES as an example, the semi-fixed data set can be
one that sets half of the bytes, after the first round, to zero and the other half to random values.
This method required some knowledge of the algorithm being analyzed.
The advantages of using this method, when comparing with the correlation attack, is that it requires
much less traces to give meaningful results. The main disadvantage is the loss of intuition on where
the main leakages points are located. This happens because t-test tends to give a range of points
around the point of interest while CPA is more precise. There are also possible false negatives, where
t-test reports does not report POIs in places that have sensitive information being handled.
Some techniques can be used to improve the results like the ”Paired t-test” [20] where the values
from the two sets are given alternated to the crypto device, so its internal state changes between
ciphering operations.
xxxiii
xxxiv
3State of Art
Contents3.1 Side-channel Analysis Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxvi3.2 Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxix3.3 Defence Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlii3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlii
xxxv
This chapter introduces some of the signal processing methods and procedures that can be used
to obtain better traces and discusses some platforms currently being used by the industry to assess
the security of an electronic devices. This work will cover the smart card Section 3.1 presents some of
the side-channel analysis platforms. Section 3.2 presents 2 state of the art attacks that can increase
the CPA attack effectiveness. Section 3.3 gives an overview of 2 defence mechanisms that allows to
reduce possible signal leakage. Finally section 3.4 will give a conclusion of this chapter.
3.1 Side-channel Analysis Platforms
SCA platforms, in the context of this work, refers to software and hardware used to assess the
security of cryptographic devices against power analysis attacks. There are two platform categories:
• Ad-hoc/home-made that can consist of an oscilloscope to measure the power consumption, a
resistor in series with the device power supply and some scripts to analyse the obtained data.
• Commercial tools and software developed by companies or the academic community, with the
purpose of testing the security of an electronic devices.
The following presents some of the most used platforms, both in software and hardware, with
a brief introduction describing their functionality as well some of its advantages and disadvantages.
For hardware, the SAKURA-G/W and ChipWhisperer for in software the ChipWhisperer’s software
module.
3.1.1 SAKURA-G/W
Figure 3.1: Top view of the SAKURA-G. [21]
SAKURA-G, (Figure 3.1) and SAKURA-W (Figure 3.2) are both testing platforms to evaluate
the leakage and related security of the implemented cryptographic modules. The are designed for
xxxvi
Figure 3.2: Top view of the SAKURA-W.
research and evaluate the side-channel leakage, such as SCA and Fault Injection Attacks (FIA).
SAKURA-W is designed to serve as an adapter, that sits on top of the SAKURA-G to enable smart
card security tests. SAKURA stands for Side-channel Attack User Reference Architecture.
This platform is well known in the SCA community, being the one chosen for the DPA contest [22],
a website that hosts a competition where people submit their power analysis algorithms, and the best
one is chosen.
SAKURA-G is a 140mm x 120mm board, composed of two programmable FPGAs: A Xilinx
Spartan-6 (C6SLX75) for the cryptographic circuit (the main FPGA) and a Spartan-6 (XC6SLX9) for
the control circuit (controller FPGA). The main clock oscillator for these FPGAs is clocked at 48Mhz,
but this value can be scaled up or down when programming the FPGAs.
The board is designed to be low-noise, in terms of power analysis, and comes with an on-board
amplifier to facilitate power analysis. The amplifier has a 360MHz bandwidth with a +20dB gain. It
has power sources, one via USB and the other from an external power supply. Also, it comes with two
sets of 40 user I/O pins, where one set is controlled by the main FPGA and the other by the controller
FPGA.
SAKURA-W is an expansion board; it uses the SAKURA-G’s FPGA controller to deliver the com-
mand on the smart card. It has a smart card reader, no amplifier and only has a set of 40 I/O pins.
The main disadvantage of this platform is its high cost, since it can be a barrier for those wanting
to start study the field of power analysis.
3.1.2 ChipWhisperer
ChipWhisperer is an open-source toolchain that provides both hardware and software for power
analysis and glitch attacks. The project started has a crowd funding project on Kickstarter belonging
to NewAE Technology Inc [23].
xxxvii
Figure 3.3: A ChipWhisperer starter kit that comes with a multi-target board, an oscilloscope, differential probeand a low noise amplifier.
Figure 3.4: ChipWhisperer software interface.
The project provides a board, to allow a user to configure it with the desired algorithm, plug it
into on the computer and start experimenting on the field of side-channel attacks using the provided
software (depicted Figure 3.4 ), while maintaining a low cost when compared to other commercial
products.
One of the first products, still in production is the ChipWhisperer-lite (depicted Figure 3.3) that is
formed by the measuring board and a target board. Both boards come connected with each other,
needing to be break a way from each other and the connectors needed to be soldered. There is also
a more expensive version that comes with the components already soldered.
As for the board components, the measuring board has a 10-bit Analog-to-digital converter, one
Atmel SAM3U High-Speed USB, Xilinks S6LX9 FPGA, a +55 low noise amplifier, one MOSFET for
glitch generation and the board is powered by a micro-USB, which also serves as the communica-
tor. The target board comes with an 8/16-bit XMEGA128 microcontroller that guarantees real-time
performance, i.e. that the system responds within a specific time constrains, and is of low power
consumption. At the time, the price of the ChipWhisperer-Lite was $250 USD and other options, like
the board already with the connectors or a more robust version is also available to buy on the website
xxxviii
[24].
The product is designed to be low cost, while performing at the same level of other more expensive
products, by providing a synchronous capture. The concept of synchronous capture is to sync the
trace collection with the target device clock and multiply or divide the base frequency. This helps to
maintain the traces aligned, while allowing the device to have cheaper parts when compared to high
end oscilloscopes.
The software is written in python 2.7, designed to be cross-platform, coming in two modules: Cap-
ture module, used to configure the oscilloscope, communicate with the device and gather the power
traces; and the analyser module, used to visualize and process the captured data. This software is
also compatible with regular oscilloscopes as PicoScope .
3.2 Attacks
The following presents 2 attacks that can be used to improve the CPA analysis. First the template
attacks are going to be presented, where the device consumption is characterized to reduce the
traces required to discover the correct key. Then collision attacks are presented to reduce the number
of keys guesses by finding equal intermediary values in different plain texts.
3.2.1 Template Attacks
Template attacks [5, 25] are based on the characterization of power traces using multivariate
normal distributions. This technique depends on the data being processed and can be divided in two
parts: the template building and the template matching.
The main idea of this attack is to record the power trace of a set of sequenced instructions, on
a device equal to the one under attack [5], using different data and secret keys. Then the power
consumption trace is recorded and the multivariate normal distribution is computed. The result of this
will be a template for each pair of data and the key used on the profiling device. The final step is the
recording of the power trace on the device under attack and compare it with the template built. The
result of this operation is a probability, measuring how good a given template suites the power trace.
Template Building Phase
In the template building phase, a single instruction or a set of sequenced instructions are charac-
terized. The targeted instructions must manipulate the secret value while the power consumption is
recorded. For the instruction characterization to work, different pairs of data and keys (di, kj) must be
used. Then, from the traces gathered, a multivariate normal distribution composed by a mean vector
and a covariance matrix is computed. There are three typically strategies used to build templates: the
usage of pairs of data and keys, where multiple data and keys are used to build a template. the usage
of intermediary values, where a function that uses (di,dk) is characterized for all its possible values;
the usage of power models like the Hamming-weight or Hamming-distance.
xxxix
Template Matching Phase
In the template matching phase, the probability density function and the power trace are evaluated
using the Gaussian/Normal distribution seen in the following formula:
p(t; (m,C)di,kj) =exp(− 1
2 · (t−m)′ ·C−1 · (t−m))√(2 · π)T · det(C)
(3.1)
were m is the mean vector, C is the covariance matrix, t is the power trace and (di, kj) are the data and
key used to build template. From the formula, a probability is obtained indicating how well the power
trace suits the template, outputting the highest value for the best match. During the computation,
numerical problems can arise while computing the covariance matrix inversion and exponentiation.
Performing the exponentiation on small numbers can lead to numerical problems. One solution is
to perform the logarithm to the equation. This has the consequence that now the smallest absolute
value of the logarithm indicates the correct key and not the highest one. To avoid the inversion of
matrix C, it is set to the identity matrix. Doing that, discards the covariance between points considering
only the mean vector. Finally, the resulting equation still has one exponentiation and to remove it,
the logarithm is again applied, avoiding possible numerical problems when using small exponential
values. The final expression is as following:
ln p(t; (m,C)di,kj) = −1
2(ln(2 · π)NIP + (t−m)′ · (t−m)) (3.2)
With these additional operations, the template that produces the smallest absolute value is the one
that indicates the correct key.
Example using the Hamming-weight
An example of use of Hamming-Weight can be given for the MOV instruction, that moves part of
the secret key to a register. An attacker can characterize the Hamming-weight of a device under his
control, by checking the power consumption for each value (from 0 to 8 on an 8-bit microcontroller).
If each of the different weights has distinct power consumptions, a template can be built for each
Hamming-weight value. Then he gathers the power trace on the device under attack, at the approxi-
mate moment were the MOV operation is being performed and compare it with the expected values.
This will give the Hamming-weight of the key portion being processed by the MOV instruction. If this
is repeated for every key portion, it reduces the number of guesses an attacker as to make to retrieve
the secret key.
Template-Based Attack for DPA
Template-based attacks improve regular DPA attacks by reducing the probability of wrong key
guesses. They can be seen as an extension of the template attacks for SPA where the device con-
sumption is characterized.
The basis of this attack starts by, given a trace ti what is the probability of finding key kj , written
as p(kj |ti). Consider ti as an element of power trace vector t and kj as an element from the possible
xl
key guesses vector k. Using the Bayes’ theorem [3] the following formula is deducted:
p(kj |ti) =p(ti|kj) · p(kj)∑n
l=1(p(ti|kl) · p(kl))(3.3)
Bayes’ theorem can be seen has a update function, that receives as an input the key probabilities
p(kl) that do not consider ti, but its output value consider it. The input key probabilities p(kj) are
known as prior probabilities and the output p(kj |ti) is known as posterior probabilities.
The previous formula works for one trace, but in practice, multiple traces are used to gather more
information about the secret key. Now using matrix T, were each line can be seen as a power trace
vector, the mathematical condition is written as p(kj |T ). Since every trace is statiscally independent,
applying the Bayes’ theorem leads to the following formula:
p(kj |ti) =(∏m
i=1 p(ti|kj)) · p(kj)∑nl=1((
∏mi=1 p(ti|kl) · p(kl))
(3.4)
Finally, the probabilities of p(kj) and p(ti|kj) need to be determine in order to calculate p(kj |ti).
The value of p(kj), since all key values are equally likely, is 1/K, were K is the number of possible
keys. As for the value of p(ti|kj), they are calculated after phase 3 of the DPA attack, were the
probabilities are based on the power trace matrix M and the all possible intermediaries values matrix
V.
3.2.2 Collisions Attacks
During encryptions using different plaint texts and an unknown key, it is possible that intermediary
functions produce the same values. When this event occurs, it is said that a collision happened. The
importance of a collision is that, when it can only occur for a certain subset of the all possible key
values. For example: considering an intermediary function f(di, kj) where di, kj are the input data
and the unknown key respectively, if f(d1, k1) = f(d2, k1) and d1 6= d2, k1 can only assume a reduced
number of values so the two functions have the same value. This reduces the number of key guesses
to find the correct key. This type of attacks uses side channel analysis to detect the internal collisions,
in this case using power analysis.
On AES, a collision attack can be performed on the Mix Column Transformation, for example, the
mix columns operation can be presented has a matrix multiplication, like the following:x0x1x2x3
=
02 03 01 0101 02 03 0101 01 02 0303 01 01 02
×a0a1a2a3
(3.5)
where an is the result of sbox(dn ⊕ kn) and b0 is given by
b0 = 02× sbox(d0 ⊕ k0) + 03× sbox(d1 ⊕ k1) + 01× sbox(d2 ⊕ k2) + 01× sbox(d3 ⊕ k3) (3.6)
For the attack to work, one must consider only plain text values that would make booth d0 = d1 = 0
and d2 = d3. When two plaint texts producing d2 6= d′2 and d3 6= d′3 produce the same output,
information can be deducted from k2 and k3.
xli
3.3 Defence Mechanism
Until now, attacks against cryptographic devices were covered, where the power consumption of
a device is correlated with the values it processes. To reduce this correlation between these values
and the power consumption there are two main methods used, namely hiding and masking.
Hiding tries to make the power consumption independent from the operation being performed and
the intermediary values being processed, either in software or hardware. On the software level, to
make the power consumption appear randomly, instruction delays, dummy operations, and instruction
shuffling, inserted in the program. These operations are controlled by random values that generates
values that are then used to decide how long the delays are, how many dummy operations are per-
formed and where the instructions are going to be shuffled. This method has the disadvantage of
increasing the power consumption and the processing time.
On the hardware level, the device can be built to consume an equal amount of power for every
operation and data processed. One way to do this is by using dual rail logic, logic cells receive and
output a value and their complement. This with a precharged logic, which puts the output of a logic
gate on a specific value of either 1 or 0, which always produces the same sequence of bit transitions.
This method has the disadvantage of, besides increasing the manufacturing cost, it is not possible to
make a device’s power consumption 100% independent of each operation and data processed.
Masking has the same objective as hiding, making the device consumption independent of its
intermediary values, but uses random values to mask the intermediary values, instead of trying to
change the device’s power consumption. This requires only changes on the algorithm so it applies
and takes the random mask from its intermediary values. As an example, consider v to be the in-
termediary value and m a random value generated internally by the algorithm. The masked value is
the result of XORiing the two values vm = v ×m. This type of defence mechanism works because
the consumption of a device depends on the values being processed. If something random is added
before the intermediary is processed, the consumption of the masked value will appear to be random,
making the consumption independent from the processed value.
3.4 Conclusion
This chapter presents some of the state of art SCA platforms, attack methods and defence mech-
anisms. This chapter starts by presenting 2 types of platforms for side-channel analysis, were one is
an ad-hoc/home-made platform, whereas the other is a CPA commercial product. Next, some more
advance attack methods are presented, starting with template attacks, that rely on characterising
the device’s power consumption, and collision attacks that try to identify when collision happens to
reduce the number of key guesses. Finally, hiding and masking defence mechanism are presented
to increase the CPA resistance, where hiding tries to directly change the power consumption of the
device, and masking tries to conceal the intermediary value before processing then. For more infor-
mation on this topic refer to [26].
xlii
4Proposed Solution and
Implementation
Contents4.1 Processing Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlv4.2 Trace Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlix4.3 Signal Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . liii4.4 The overall setup usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . liv4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lvi
xliii
This chapter presents both the proposed setup, describing the components and the analysis
scripts developed.
This work intends to provide a setup that allows the assessment of an electronic device security,
using power analysis. The setup components can be divided in 3 main parts: the device under test,
the trace acquisition and the trace analysis. The device under test is the equipment from where the
power traces are retrieved while performing cryptographic operations. In this work the devices used
were the smart card and FPGA, both configured with an AES algorithm.
The component that gathers traces is the oscilloscope, measuring the power consumption of the
device being tested, and a python program running on a PC that configures and coordinates the trace
collection both on the oscilloscope and on the device under tests. The component that analyses the
traces is composed of the analysis scripts that are execute on a PC, they perform various statistical
analyses on the power traces to not only retrieve the secret key, but also assess the signal quality or
find potential points that can be exploited. Figure 4.1 illustrates the components and their relationship.
Figure 4.1: A illustration of a power analysis setup.
It is important to understand how the components work together, so the user can make the most
of the setup. The following sections provide a deeper explanation of the setup. This chapter presents
some of the challenges encountered while configuring and programming the setup and the decisions
made to overcome them. If a user wishes to use/understand the setup, make improvement to it or is
developing its own setup, the following sections can help save time since many of the discovering and
problem solving was done while developing this one.
This chapter is divided into 4 sections. Section 4.1 presents the work developed in regard to the
target devices, from their characteristics to the software developed. Section 4.2 presents the work
developed on the trace collection device, it presents the used oscilloscope characteristics and the
development of the trace collection python program and communication interface with the targeted
devices. Section 4.3 presents the scripts developed for trace analysis, explaining their purpose and
structure. Section 4.4 presents an overview on how the setup can be assembled and configured both
xliv
for SAKURA-G and SAKURA-W. Section 4.5 presents some concluding remarks of this chapter.
4.1 Processing Units
This section presents and explains all the work related to the two target devices used, the FPGA
and smart card. Although both were physical devices, this section is divided between software and
hardware, because some of the nature work was more software programming and the other more
hardware setup testing, in the case of the smart card and the FPGA respectively.
It is worth mentioning that the smart card programming has a larger section when compared with
SAKURA-G, since the SAKURA-G already provided the source code and did not need any changes
in its behaviour. On the other hand, the provided smart card software, required licensing. Meaning
that the source code was not publicly available, which prevented the software behaviour from being
changed. To solve this restriction, the smart card software was developed from scratch. Next the
work developed and the decisions made on both devices will be presented.
4.1.1 SAKURA-G/W
As mentioned in the previous chapter, the SAKURA platform is designed to allow research and
development on SCA and FIA. This platform has been the one chosen since, besides being the
most used by the side-channel community to perform power analysis attacks, it was the only one that
provided a smart card reader, which it was the main targeted device of this work.
The two main components of this board are the controller, a Xilinx Spartan-6 XC6SLX9, and the
main security circuit, a Spartan-6 XC6SLX75. The controller instructs the main security circuit. The
main circuit contains an implementation of the desired algorithm to be tested.
To program the SAKURA-G, two software are used, one for the controller FPGA and another
for the main FPGA. For the SAKURA-W only the controller FPGA is programmed, redirecting the
received commands from the main FPGA to the SAKURA-W (that sits on top of the SAKURA-G). The
programming of the FPGAs is done using Xilinx’s Platform Cable USB II, providing a user-friendly
configuration of Xilinx FPGAs and programming of Xilinx programmable read-only memory (PROM).
The software used to load the files and to communicate with the Platform Cable USB II is the iMPACT
tool from Xilinx.
There are two types of programming, one that is non-persistent, meaning if the power is turned
off, the FPGA configuration is lost, and another that is persistent. The non-persistent programming
should be preferable on the cases the FPGA is going to be changing a constantly, since the persistent
programming is stored on a flash that can only be written a limited amount of times.
The device configuration, provided by the platform’s website for the SAKURA-G/W, was the Verilog-
HDL code and another program to check if the boards were working properly. A program to be written
on the smart card, compatible with the SAKURA-W software, was also provided but was not published
with the source code. The SAKURA-G main FPGA is set to work at 24MHz while the SAKURA-W is
clocked at 3.571Mhz. If the user desired to change these values, it is possible to change by editing
xlv
the Verilog-HDL.
In terms of triggering, in the SAKURA-G is done by sending the signal through one of the top
external pins. The first 4 pins are set at the start of the key scheduling, first AES round, last round
and for all rounds. On the SAKURA-W, the 8th pin of the second row of the bottom pins, is mapped
to smart card’s pin AUX1. This pin signals the trigger and the configuration of when it goes ON is
defines is controlled by the smart card software.
4.1.2 Smart Card
A smart card can be seen as a miniaturized computer, having a CPU, RAM, ROM and a storage
memory. The smart cards used in this project were acquired from a company named wb electronics
[27], the same entity that also supplied the smart cards that came with the SAKURA board. This was
important, in order to ensure compatibility between the smart cards that came with the boards and
the ones acquired.
These smart cards use an 8-bit atmega8515 microcontroller and have 512 bytes of Static Random
Access Memory) (SRAM) and EEPROM, 64K bytes external memory. They are able to perform up to
16 Millions of Instructions Per Second (MIPS) at 16MHz, operating at 4.5V - 5.5V and have a 10,000
write/erase cycles. To understand how the software components and the challenges encountered
while developing the setup, the next section details the encountered the issues.
Source code development
The smart cards that came with the SAKURA board only provided the compiled code and required
a licence for the source code. This prevented modifications to its behaviour and implementation,
like trigger positioning/duration, changes to the AES algorithm, introduction of delays, modifying the
communication protocol and other changes. Because of this, the decision was to implement the smart
card software from scratch, to have more control and flexibility of what could be done with it.
Before starting to implement, the wb electronics website named infinity USB, was checked for
software examples of the smart card. On the website, source code examples for the PC side were
available, but for the smart card only the compiled code was provided. We reached out the company
and asked if it could provide the source code of the smart card, since was for academic use and after
some time they agreed and sent the source code. The source code was of great help since it provided
the program structure and communication functions to transfer data from/to the smart card.
While waiting for the answer, the smart card’s atmega8515 microcontroller datasheet [28] was
studied to understand how the microcontroller ports were organized, labelled and what ports could
be mapped to the smart card pins. On the software part, the 8515 I/O Application programming
interface (API) was studied to understand what software ports were available and how they could be
used. This later helped to better understand the source code and its functionality.
After having the source code example, the next step was to program the new smart card software,
using the functions to transfer and receive bytes of data from the provided source code. In the end,
the goal was to have smart cards that were identical to the smart cards provided with SAKURA smart
xlvi
cards, in terms of functionality, to ensure compatibility with the SAKURA provided software, but able
of being changed.
Programming
The source of the smart card program needed to be tested to check if it was working properly,
since sometimes the code provided is not the final one and has bugs. The source code was compiled
using atmel studio 7.0 and the program was written on the smart card using the smart card writer
named infinityUSB. After that, the smart card was tested using the PC side software provided on the
infinityUSB website and after confirming everything worked, the smart card programming started.
The program was adapted to have the same smart card command structure as the SAKURA smart
cards (the APDUs), to ensure compatibility with the software provided by the SAKURA’s website. After
this, a C implementation of the AES algorithm named tiny-AES128-C [29] was used because of its
small size and simplicity, since the smart cards have limited resources. After having the program
finished, the code did not fit on the internal memory of the smart card. The problem was that, on
microcontrollers, every variable and vector is stored on RAM, and 512 bytes was not enough to store
everything. The solution was to use the pragma PROGMEM, to store the constant variables and
constant vector on the flash memory. The only difference was that now to access the values stored in
these variables and vectors the function pgm read byte near and pgm read byte need to be called.
After loading the compiled program into the smart card, the software was checked to assess if
everything was working properly. To do this, the smart card writer was set to reader mode and the
smart card was tested. After cheking the developed smart card software was working with a modified
program from the infinityUSB website, the smart card was tested on the SAKURA-W.
On SAKURA-W a modified program, provided by the SAKURA’s website, was used. This func-
tionality of this program was to check if the SAKURA-W smart cards was working properly, and the
adaptation was done to strip the program of its interface and leave only the communication part.
However, when tested with the modified smart card it did not answer to the commands that were sent.
The problem was found to be the frequency used by the smart card reader, which was lower than the
SAKURA-W’s reader.
Frequency Mismatch
After a careful inspection on the data transmission’s functions and values being used, it was con-
cluded that the smart card software was configured to work at 6MHz. SAKURA-W’s smart card reader
was clocked at 3.571MHz, but at first this should not make any difference since the microcontroller
could work at lower frequency as well. The problem was on the receiving and sending data functions,
that were dependent on the CPU frequency. To explain the issue a brief explanation of the functions
will be given below:
Data on the smart card is transmitted in serial mode i.e. one bit at the time. The smart card
reads/writes from/to the data pin, where the values can be either low or high (0 or 1) and there is
a time window for both reading and writing each state. In order for the smart card to know when
xlvii
to read, a delay function is used, where it receives a value, decrements it until reaching zero and
leaves the function. The idea is, the function will spend a certain amount of time decrementing
the value, and this duration depends on that value passed to this function and the CPU frequency.
Higher CPU frequency’s mean faster instructions processing and smaller delay times using the same
function value. The program came with a predefined value for the 6MHz CPU and for the bit rate of
9600 Bits Per Second (BPS). Note that the bit rate must also be considered, because the higher the
bits transmitted per second, the smaller the bit intervals will be. To calculate the new value for the
SAKURA-W frequency, the following formula was inferred from the values already calculated in the
code:
delay value =cpu frequency
3× bps(4.1)
The CPU frequency divided by three times the bps, gives the number of decrement loops the function
should perform. The multiplication by 3 is because the function delay spends 3 CPU instructions or
clock cycles to perform the value decrement.
After the new delay values, have been calculated, the smart card was tested again on the SAKURA-
W and this time it responded to the commands but it did not deliver the correct result, having extra
value in between the smart card answer. To try and solve the problem, the communication was in-
spected at the bit level using the oscilloscope, as described next.
Communication Debugging
To confirm if the new delay values were compatible with SAKURA-W frequency’s, and to check if
the extra value of the result could be observed on the communication bits, the communication of the
SAKURA smart cards and the programmed one was compared. In order to do that, the developed
smart card software was programmed to have the same command structure and received/sent the
same data as the card that came with the SAKURA board. This comparison was done at the bit level,
comparing the time between each bit transmission, using the PicoScope and its capturing software.
The transmission protocol is constituted by one delay bit that start at a low value, eight bits of data,
one parity bit and two stop bits that stay at a high value. After comparing the two, it was observed
that they had a small difference, but not enough to cause problems in the transmission, except in one
case. When a call to the function receive and then sent a byte, the protocol was not waiting the time
specified by the protocol on the stop bits.
After fixing the problem and checking the extra value was not present on the low-level communi-
cation, the smart card was tested again to confirm the extra value was still present on the result. The
conclusion was, the error did not come from data transmission, i.e. the problem was not on the smart
card, and the extra value was being inserted by the PC side software. To confirm this hypothesis,
the PC side code was written from scratch using also python to check if the problem persisted. After
having the python program finished, the extra value was gone and the only thing left to do was the
trigger. In a later section, the development of python program will be described in more detail.
xlviii
Trigger Implementation
With the smart card programmed and communicating with the main python program, the only
thing left to do was to programming a trigger signalling to identify when the AES operation started.
This ensures that the acquired traces are aligned, resulting in a more effective CPA analysis. Also,
the trigger would allow to measure how long an AES round takes since it is difficult to identify the
beginning and ending of each round from the power traces. This measured time can then be used to
configure the PicoScope capturing period.
The trigger was implemented by setting one of the auxiliary pins of the card to serve as a trigger.
First the desired port was identified with the help of the atmega8515 datasheet, then the pin was set
to output by changing the 5th register from Data Direction Register (DDR) B to 1. Set the pin to high
or low, was done by setting the 5th bit of the register of PORTB to 1 or 0 respectively.
Having the trigger implemented and set to go ON during the first AES round, a successful corre-
lation analysis was possible in order to confirm it was working properly.
4.2 Trace Acquisition
This section presents and explains the components used for the trace acquisition and the chal-
lenges encountered during its development. The components that are going to be addressed are the
PicoScope, the oscilloscope used to measure the power consumption, and the main collecting soft-
ware that is used, not only to configure and gather the power traces, but also to send and receive data
to the device being measured. It is worth mentioning that the main gathering program is essentially
the same for the smart card and on the FPGA, with the two differences: i) the parameters for trace
acquisition, so it adjusts to the signal being measured; ii) the interface used to communicate with the
target device. Next, a brief description of the PicoScope is presented.
4.2.1 PicoScope
The PicoScope series 6000 [30] (figure 4.2), in particular the 6404D, is a high-performance USB
oscilloscope that, together with its software, turns a computer into an oscilloscope and spectrum
analyser. This type of oscilloscope offers portability, performance, flexibility and is programmable. It
is configured and accessed via PC using the software that comes along with the device, or by making
use of the PicoScope device driver.
The oscilloscope comes with 4 input channels, 8-bit signal resolution, 500 MHz bandwidth and up
to 5 gigasamples per second (GS/s) in real-time, shared among the 4 channels. It comes with what is
called a deep memory that can store acquires up to 2 gigasample. To handle that amount of memory
and at the same time display the traces without compromising performance, PicoScope comes with
a Hardware Acceleration Engine (HAL) 4 that guarantees both trace gathering and trace visualization
without slowing down. The allowed voltage range scales are: ±50 mV, ±100 mV, ±200 mV, ±500 mV,
±1 V, ±2 V, ±5 V, ±10 V, ±20 V. It has an integrated wave generator output, capable of generating
xlix
Figure 4.2: PicoScope 6000 series.
repeated waveforms, whose characteristics can be set by the user. It has an auxiliary trigger that can
also serve has a reference clock signal.
The software that comes along with the oscilloscope, allows the user to observe and measure
the signal(s) being captured. Some of the options provided by the software allow the adjustment
of the vertical and horizontal scale, where the vertical is the voltage range and the horizontal is the
time-based measured in units of time. It is also possible to adjust the sampling frequency, config-
ure triggers, perform zoom in/out on sections of the signal and configure the capturing channels to
Alternating Current (AC)/Direct Current (DC)/DC50(ohms).
Before starting to capture traces, the desired signal was observed on the PicoScope software
and then adjusted vertically and horizontally to be close to the axis limits. Then, after having the
optimal parameters, they are passed to the trace gathering program. To communicate with the driver,
the program uses a 32-bit Windows dynamic link library (DLL) denoted as ps6000.dll, that provided
access to the PicoScope functions.
4.2.2 Programming the traces collecting program
The main trace gathering program was based on an open source program made by Colin O’Flynn,
denoted as pico-python [31]. This software works as a wrapper for the PicoScope ’s API, providing
more improved functions to access PicoScope’s functionalities. The program is divided in 3 parts:
PicoScope configuration, target device instrumentation and storage.
On the PicoScope configuration, first the communication driver is located and loaded. Then the
parameters of the PicoScope , such as the trace length, signal amplitude and sampling frequency are
l
sent to the PicoScope . Also, the input channels that are not going to be used are turned off (by omis-
sion PicoScope activates all channels) and the trigger is configured. On the device instrumentation,
the plain texts can be loaded from a file or be generated randomly by the program. Then the trigger
is armed and the program sends the plain text to the device, using an interface that is different for
SAKURA-G and SAKURA-W. After the trace is gathered, the program receives the cipher text.
On the trace storage, the gathered trace is converted from a raw type to a 32-float type, and then
is stored in a Matlab file format.
It is worth mentioning that because python 32-bit is used, there is a limitation on the amount of
memory that can be allocated, meaning the program is not capable of supporting large traces in
memory. To overcome this problem, the program gathers a specified amount of traces, stores them
on a file, clears the memory and continue to gather new traces.
4.2.3 PC to SAKURA-G communicator programming
The interface to communicate with the SAKURA-G, used by the trace gathering program, was
based on the program source code from the SAKURA website [32]. This program, named SAKURA-
G checker, communicates with the board to verify if the board was working properly by sending plain
texts and checking their output.
After inspecting the SAKURA checker, it was noticed that the code used a DLL written in C# to
communicate with the SAKURA-G and it had a considerable amount of code to perform the sending
and receiving operations. The use of a C# DLL in is relevant because the gathering script was written
in python and the implementation being used, the standard python, did not support the load C# DLLs
or programs. Also, even if a C/C++ programmed DLL was found, the time that would be spent rewriting
and then debugging it, at the time, was not worth it when a working example was available and just
needed some adaptations.
Taking these two points into account, the decision was to strip the code from its graphical interface
and adapt the code to receive instructions via console, leaving the communication with the SAKURA
board as it was. The result was a console program that can receive 3 types of commands: A command
to change the secret key and the other two to cipher data with the difference one would return the
result and the other would not.
After having the communicator program working, the final step was to integrate the program with
the main gathering script. This was achieved by calling the executable with some parameters such as
the command, the communication channel number SAKURA was and the either the key or plain text.
The result of this calling would be the cipher text.
4.2.4 PC to SAKURA-W communicator programming
The interface to communicate with the SAKURA-W, used by the trace gathering program, was also
based on a program from the SAKURA website, named SAKURA-W checker [33]. At the beginning
a similar strategy for the SAKURA-G program was adopted, i.e. transform the existing code into an
executable that receives commands. When the program was tested, it did not work and, with the help
li
of an oscilloscope, the executable was found to make the smart card reset, after sending a plain text
to the smart card. This behaviour caused the trigger to be set to ON at startup, causing the gathering
program to record the smart card initialization instead of the AES rounds. The solution was to code
all the handling and send in to the smart card APDUs in python.
In contrast with the SAKURA-G checker, that used a C# DLL to communicate with the board,
SAKURA-W did not require any DLL since the communication was performed via virtual COM port.
Also, the code used to communicate was simpler when compared with the SAKURA-G checker. After
finishing the python implementation of the SAKURA-W communicator, the first tests were performed
but the smart card did not seem to answer back. After confirming the commands were being delivered
correctly with the use of the oscilloscope, by observing the bits transmission to the smart card, one
of the smart card that came with the board was tested instead and the result was the same. This
behaviour of maintaining the trigger pin at high was observed at the smart card initialization stage, so
this indicated that maybe the smart card was stuck at the initialization stage.
Further analysis of the code revealed that in the configuration code, the one that configured the
communication parameters with the virtual COM port, the smart card received a RST (Request to
send) signal for 200ms, then waited for 500ms before requesting Answer To Reset (ATR). This served
as hint that maybe the python module used to communicate with the COM port, named pyserial, was
somehow stuck sending the RST signal even after it was explicitly coded that it should not. After trying
all the commands that could possibly turn off, the signal did not change the smart card state, it came
the idea of inspecting the COM lines to assess if indeed the problem was the reset state. To do that
the virtual COM lines were inspected to check if the RST was still active when it should not.
To analyse the virtual COM port, it is important to understand the communication protocol of the
port, in this case the RS-232. The RS-232 is used to transfer digital data between a Data Terminal
Equipment (DTE) and Data Communication Equipment (DCE) a one bit at the time i.e. it performs
a serial data transmission. The next step was to create two virtual COM ports, one connected to
the communicator program and the other port connected to a terminal. The communicator program
would be a modified SAKURA-W checker and the python program used on the smart card’s trace
gathering program. The idea was to compare the data and the state flags being passed during the
communication on the terminal.
Comparing the two programs revealed that the data being passed was the same, but the state of
the flags was different. The flags displayed, that represented the status of the pins, were the Carrier
Detect (CD), Clear to Send (CTS), Data Set Ready (DSR) and Ring Indicator (RI). In the case of
the SAKURA-W, written in C#, the flags after the initialization state only the CTS was ON. With the
smart card communicator, written in python, the CTS, CD, DSR were ON. This clearly indicated that
something was not right, since both flags should be the same.
The possible fix was to try the latest version of the pyserial the 3.2.1 instead of the 2.7, but that
was only available for python 3 and the one being used was python 2. Fortunately, only small incom-
patibilities with the new version were raised, but were easily corrected, and the flag state problem was
solved and the program started to communicate normally with the smart card.
lii
4.3 Signal Analysis
After the traces were collected and stored they need to be processed so they can provide mean-
ingful information. The scripts developed during this work were: the correlation power analysis, to
extract the secret key from the traces; t-test analysis, to find points of interest with a low number of
traces; signal-to-noise ratio, to assess and compare the setups that provide a better signal quality.
The software used to develop these scripts was Matlab, since it offered programming flexibility and
provided many of the functions needed by the scripts, as well as tools to present the data such as in
graphics. The following presents and explains in more detail what the scripts do and how they were
developed.
The correlation script is used to correlate all power consumption of possible key’s with the mea-
sured power traces using two power models, the Hamming-Height and Hamming-Distance. In terms
of attackable AES rounds, the script can attack the first and last rounds, which are the most common
targets, but can easily be extended to attack other rounds. This script was based on exercises by
rozvoj website [34].
In terms of the script structure, it can be divided in four parts: The first is the data input, where the
power traces are loaded into memory and the script is adjusted for the trace characteristics such as
the number of samples, the number of traces, the sample averaging and others. The second part is
the construction of the hypothetical power consumption, where a matrix is built considering the pair
text/cipher text, generates all hypothetical values for one byte of the key, using the Hamming-Weight or
Hamming-Distance. Next is the correlation attack, where the traces collected are compared with the
hypotheses matrix producing the correlation matrix and finally, the analysis of the correlation matrix
and the output of the data, such as the key, the correlation coefficient of the top 5 keys.
This script was later extended to also perform incremental CPA, where the number of the traces
being correlated, generating a picture that shows the evolution of the correlation with the number of
traces. The other extension was the partial correlation, where only a certain number of traces were
correlated at the time, allowing to check if there was some variance among the traces collected.
4.3.1 Signal-to-noise ratio
The SNR script was developed to efficiently measure a signal quality and output a value that can
use to compare, for example, different trace gathering setups. The script was based on the one used
in the book ”Power Analysis Attacks: Revealing the Secrets of Smart Cards” [5] where they explained
how one can calculate the signal-to-noise ratio of one sample point in a trace.
SNR =signal
noiseSNRdb = 10× log10(SNR) (4.2)
For the script to work, the user needs to know which values are being processed at a given time,
this case. To do that, some pre-generated plain texts must be used to produce the desired Hamming-
Weights at a determined AES round operation, in this case the S-box output. Then, this operation
needs to be found on the trace or during the device operation, with the use of a trigger for example.
liii
After having the plain texts and the operations located, the traces are grouped into nine sets,
according to their Hamming-Weight values, from 0 to 8, and each group processed individually. The
operations performed on each trace group is the average, to retrieve the signal, and then the standard
deviation to calculate the electronic noise. Finally, the signal is divided by the noise retrieving the SNR.
If the user desires to have the value in decibels (dB), it must apply the base 10 logarithm and multiply
by 10.
4.3.2 T-test Analysis
The t-test script was implemented to give the intuition of points-of-interest, starting with a low
trace count number and allowing to maintain, depending on the method used, an abstraction of the
algorithm being used. For this test, there are two sets: one set can have a single value, or a group
of values that will produce a specific intermediate value, while the other set is random. Depending on
the type of the set used the t-test is called fixed vs. random, when using a fixed data set or a semi-
fixed vs. random t-test produces specific intermediary values. These two sets will then be compared
using the Welcher’s t-test, was explained in the previous chapter. It essentially makes a hypothesis
test assessing if the two sets have an equal mean value or not. After having the t-test results, a graph
is generated showing the point of interest.
4.3.3 Intermediary value generator for AES
Both the SNR script and the semi-fixed vs. random t-test use a set of plain texts that produce a
specified intermediary value, in this case a Hamming-Weight, at a specific round operation. A script
to generate these values was built, based on a Matlab implementation of AES from [35], to produce
these plain texts.
There were two possible ways to implement this: the first was to generate random inputs and
perform an AES ciphering and check the value that was being generated at the specified location; The
second approach was to define the intermediary value and then perform the deciphering operation
from the targeted AES round, i.e. performing only the number of deciphering rounds equal to the
targeted round. At the end, the second method was chosen because it would give more control over
the intermediary values being generated on a specific round.
The intermediary values generated are after a S-box, and can be generated for any AES round.
When the plain text is generated, it is stored in one of the 9 files, each one representing the Hamming-
Weights from 0 to 8.
4.4 The overall setup usage
A user wanting to use this setup has to do a couple of steps before starting collecting traces. First,
depending if the SAKURA-G or SAKURA-W is used, the programming of the FPGAs is different. For
the SAKURA-G, the two FPGAs need to be programmed using the Xilinx platform cable USB, for
liv
the SAKURA-W only the controller is required. Note that the main circuit FPGA only needs to be
programmed once, even when changing between SAKURA-G and SAKURA-W.
After that, one SMA cable needs to be connected on to board and PicoScope, to measure the
power consumption, and a probing cable needs to go for the trigger pin. On the SAKURA-G there
are two options for power measurement: If the user wants to use the integrated amplifier, it plugs the
cable on SMA J3, but if the user wants to use no amplifier or its own amplifier SMA J2 is used. For
the SAKURA-W the measuring point is the SMA J2. For the trigger on SAKURA-G, a probe cable is
connected to pin 3 to gather all AES rounds and for SAKURA-W, the 8th pin counting from left to right
on the second row of the 40 pins is the one used, because it connects to smart card pad the AUX 1.
Next the user configures the main gathering script with parameters such the number of traces, the
signal amplitude, recording duration and sampling frequency. One way to obtain these parameters is
to use the software that comes with the PicoScope , that enables one to see in real-time the signal
being gathered and can adjust the parameters for optimal results. Then, after the main program is
configured, the trace gathering starts and when it finishes the user chooses one of the Matlab scripts
to perform signal analysis or the attack. Figure 4.3 illustrates the setup using the SAKURA-W.
Figure 4.3: The complete setup using the laboratory power supply, SAKURA-W on top of the SAKURA-G andPicoScope .
lv
4.5 Conclusion
This chapter illustrates the how the overall trace gathering setup is organized. First, an overview
of the 3 main components the setup is presented, as well as their functionality. Then each of those
components was described in more detail, explaining some of the development challenges and deci-
sions. The first component one was the processing unit, here it was explained the two types of the
devices used, the SAKURA-G for the FPGA and SAKURA-W for the smart card, and the work devel-
oped on them. The second component presented was the trace acquisition, were the PicoScope was
introduced as well the trace gathering program. The third component was the signal analysis, where
3 scripts were developed, explaining their functionality and structure.
lvi
5Experimental Results and
Evaluations
Contents5.1 Different Setup Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lviii5.2 T-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lxv5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lxvii
lvii
Following this, a CPA attack is performed, increasing the number of traces progressively, and
evacuating how the results evolve with the increase of traces.
This section evaluates the proposed setup, described in the previous section. This evaluation
will show how different setup configurations affect the gathered smart card power traces. It first
analyses the traces SNR and the respective Hamming-Weight leakage. Following this, a CPA attack
is performed, first increasing the number of traces progressively and evaluating how the results evolve
with the increase of traces, and then measuring the results using groups of 50 traces.
5.1 Different Setup Configurations
The setup has two main fixed physical components used to gather traces, the target device and
the oscilloscope, that is in this case were the SAKURA-W (smart card) and the PicoScope . Other
components that can be used were also considered, to improve the overall signal quality, namely an
external amplifier (Minicircuits ZFL-1000LN+) and a DC blocker (API 8037), a device used to filter
out the DC. To assess the impact on the signal quality, three setups were considered using these
components:
• Setup 1 - Amplifier, DC blocker and oscilloscope in AC mode.
• Setup 2 - DC blocker and oscilloscope in AC mode.
• Setup 3 - Only using oscilloscope in AC mode.
The oscilloscope also had a DC mode, but this one was not considered since the DC component was
being filtered out, by the DC blocker, and using the DC mode or the AC mode was equivalent.
For the SNR measurement, the oscilloscope was set to take samples at a rate of 1.25 GS/s (Giga-
samples per second) for the duration of 5×10−6, which lead to 6250 samples and each power trace
occupied 25 KB on file. For the other measurements, the oscilloscope was set to take samples at a
rate of 1.25 GS/s for the duration of 3.38×10−4, which lead to 422500 samples and each power trace
occupied 1.6 MB on file.
5.1.1 Signal-to-noise Ratio Comparison
To compare the mentioned setups, the SNR is used as a quantitative measure to assess the
signal quality and noise level of each setup. The proposed way to measure the SNR, as mentioned in
previous chapters, consists of measuring the power consumption of 9 different Hamming-Weight at a
given moment, from 0 to 8 since one byte is can have values from 0 to 255. As a simplification, and
for better result accuracy, only one value per Hamming-Weight is considered. However, one thing to
keep in mind is that each bit position might consume different amounts of power, i.e. the value 0x01
may leak differently than 0x10.
The byte chosen for this measurement, was the first S-box output byte on the first AES round. This
was found to have a considerable amount of leakage, mainly because the S-box value is fetched from
lviii
the EEPROM. This power consumption contributes for noticeable distinction between each hamming-
weigh, improving the SNR values, since the algorithm measures how distinguishable different signals
from the noise. Each 9 Hamming-Weights were measured 200 times, to ensure the noise could be
removed by averaging and the noise characterized with the standard deviation.
The evaluation metric used to evaluate is the SNR average of 2 clock tick, where the Hamming-
Weights have more stable values, and the signal distinction by measuring the distance between
highest and minimum hamming weights (8 and 0 respectively). This distance will be measured in
quantization values, since the captured signals have different amplitudes in terms of voltage, so an
equal comparison can be made in terms of signal distance. The average of the highest and lowest
Hamming-Weights (0x00 vs 0xFF) will be calculated and the difference used to measure the distance
between the two signals.
The results obtained for the 3 setups depicted in Figures 5.1, 5.2 and 5.3 suggest that: setup 1
had a SNR average of 29.41 dB and a distance of 2.64, (Hamming-Weight 0 and 8), on clock 1 (from
sample 20 to 180) and a SNR average of 25.41 dB and a distance of 2.43 on clock 2 (from sample
200 to 360); setup 2 had a SNR average of 16.99 dB and a distance of 1.54 on clock 1 and a SNR
average of 13.55 dB and a distance of 0.07 on clock 2; setup 3 had a SNR average of 18.95 dB and
a distance of 1.21 on clock 1 and a SNR average of 13.97 dB and a distance of 0.06 on clock 2.
Figure 5.1: Setup 1: Left image shows the SNR of the power traces shown on the right.
From these results, it can be concluded that: Setup 1 provided both the highest SNR and distance
between the Hamming-Weight 0 and 8; Setup 2 and 3 had similar results, whereas setup 2 had the
lowest SNR but better distance when compared with setup 3. Considering the metrics proposed, the
overall setup can benefit from an external amplifier with a DC blocker, when capturing smart card
traces. It is worth mentioning that, setup 1 and 2 might be further improved if another DC blocker is
used. The reason lies on the minimum pass frequency of 10MHz of the DC blocker, possibly causing
some information loss, since the smart card is working at 3.571Mhz. Also, when the SNR ratio was
performed on a larger trace portion it was noticed that some places, besides the ones analysed before,
showed presence spikes in SNR. These spikes might have useful information, but at the moment it is
not known what information is.
lix
Figure 5.2: Setup 2: Left image shows the SNR of the power traces shown on the right.
Figure 5.3: Setup 3: Left image shows the SNR of the power traces shown on the right.
5.1.2 CPA Attack
This section evaluates the same 3 setup configurations, but now using CPA, to assess if the results
obtained corroborate the analysis of the previous section.
To test the setups, 2 tests will be performed. First, an incremental CPA is performed, where the
number of traces are incremented 10 samples at a time until 500, and the distance between the
correct key average of other key guesses is measured. Secondly, as a complementary test, the trace
quality of the traces gathered is measured correlating 50 traces at a time and checking how much
they vary from start to end. The traces used for this test to cover the first AES round and use the
same smart card tested on the previous section.
The results of the incremental CPA test, assessing the correlation value with 50, 100, 200 and 500
traces, show that: Setup 1 had an average correlation distance of 0.52; Setup 2 had a correlation
distance of 0.40; Setup 3 had a correlation distance of 0.14, as depicted in Figure 5.4.
These results suggest that the best proposed setup is indeed the one that presented the best
distinction between the correct key byte and the other guesses, namely setup 1. A more interesting
lx
Figure 5.4: Difference between the correct key and the average of other keys hypothesis.
result was between setup 2 and 3 because, from the previous section, setup 3 had better SNR how-
ever setup 2 greater distance, but the incremental CPA was much better on setup 2. One possible
explanation for these results is that, the distance between Hamming-Weights are more relevant than
the SNR on CPA attacks. Another explanation is that the other spikes seen in SNR, might be leaking
information.
One relevant observation that can be noticed on Figure5.4 in that there were a decrease in value
of the correlation values at the start of setup 2 and 3. This may mean that the setups had a period of
adaptation, where the first traces had worse quality than the following ones. This leads to the second
test, with CPA, performed over groups of 50 traces at a time to assess if the trace quality changes
with more time spend capturing them. The results obtained are depicted in Figures 5.5, 5.5 and 5.5.
As it can be observed, the assumption that the first traces have a worse quality than the following
ones can be observed on both setup 2 and 3. One reason for this to happen can be due an adaptation
a period from the oscilloscope, since with setup 2 the oscilloscope has to use its voltage range ±50
mV which might bring more noise that is filtered out. On setup 3, it might have the same problem
as setup 2, plus the filtering of the DC component mode on the oscilloscope. By inspecting the first
traces, gathered suing setup 3, one can observe a period where the DC is taken out progressively.
Figure 5.8 shows the first 10 traces gathered from setup 3.
lxi
Figure 5.5: Partial CPA from setup 1. Figure 5.6: Partial CPA from with setup 2.
Figure 5.7: Partial CPA from with setup 3.
As it can be observed the first traces measured by the oscilloscope sill have some of the DC
component, that is then gradually removed until the trace stabilize closer to the 0 mV.
After analysing and selecting the best setup configuration, a full CPA analysis was performed and
all key bytes were found correctly with 25 traces and a correlation distance, between the correct key
bytes and average of the other key guesses, of 0.15. The conclusion from this section is that the
setup can benefit from an external amplifier with a DC blocker, leading to a faster correlation distance
from the correct key and the other keys guesses.
lxii
Figure 5.8: First traces 10 traces gathered from setup 3, showing the progressive adaptation of the oscilloscopeAC mode.
5.1.3 Power Supply Comparison
The SAKURA-(G/W) platform can be powered using the communication USB or an external power
supply. Depending on the power source used, the acquired power traces can contain more or less
noise. An interesting test is to use different power sources and see the impact they have on the trace
quality. Intuitively, a noisy power supply should impact negatively the power traces.
For this test, 3 power sources used, in particular an adjustable laboratory power supply; a wall
charger; and the USB cable. Given the results obtained on the previous sections, setup 1 is used
to gather the traces. The first test, consists in measuring the distance between the maximum and
minimum peaks of voltage with the power supplies in void, using an oscilloscope, then each power
supplies will be compared on an incremental CPA using 500 traces, to assess the real impact they
have on the CPA.
In the first test, the minimum and maximum distance between voltage peaks, the laboratory supply
showed a distance of 0.11 V, the wall charger had 0.063 V and the USB power supply had 0.551 V,
as depicted in Figures 5.9, 5.10 and 5.11.
This shows that the USB power supply has the highest noise from the 3 power sources. It is thus
expected to perform worse than the other two. Surprisingly, the wall charger had the lowest ripple
noise, even when compared to a more expensive adjustable laboratory power supply. One thing to
keep in mind is that, the results obtained are just to have an idea of how much fluctuation all three
power supplies have, since they were measured in void. A more accurate test would be to measure
the amount of noise the power supplies induce, when current is drawn from them. The second test
consists in the incremental CPA evaluation, as depicted in Figure5.12.
lxiii
Figure 5.9: Power trace of the laboratory power supply in void. Figure 5.10: Power trace of the wall charger in void.
Figure 5.11: Power trace of the USB power supply in void.
The obtained results suggest that the best power supply was the adjustable laboratory power
supply, having about 0.05 volts of distance from the other 2 power supplies starting at trace 200 and
beyond.
Despite the fact the wall charger had the lowest ripple noise from the other setups, it performed
identically to the USB power supply. As mentioned before, the power supplies were tested in void, so
it could be the case that, when the device starts drawing power, the noise increases. Also, it can be
noticed that the power supplies do not significantly affect the performed analysis.
lxiv
Figure 5.12: Difference between the correct key and the average of other keys.
5.2 T-Test
In the previous section, a CPA was performed showing that was possible to recover the secret
keys with a correlation distance of about 0.15 from the correct key byte guesses and the average
other key guesses, using only 25 traces to guess the secret is low, however the smart card used
has no protection mechanisms, and has its S-box written on the EEPROM, further increasing the
Hamming-Weight leakage.
If the user was dealing with a more protected smart card, thousands of traces could be neces-
sary to compromise the smart card. Since performing the CPA computation on a large trace can be
computationally demanding, a better approach would be to discover portions of the trace that could
potentially leak information, using a statistic tool that did not require many traces neither processing
time, reducing the number of samples that needed to be correlated. Since it has been proven pre-
viously that setup 1 had the best results on gathered traces, it was the one chosen to perform this
test.
This section looks into the Welch’s t-test, a statistical tool that can be used to find these points of
interest. This test was performed by creating two data sets, one that produced the Hamming-Weight
value of 0 on the first S-box on the first round, and another set that produced random values. Then 20
traces of the first AES round were gathered for each data set and the t-test was performed. To assess
the t-test accuracy, a CPA attack was also performed using the random data set gathered. This allows
to infer if the t-test distinguishes the point-of-interest faster than the CPA and if they match.
For this measurement, the oscilloscope was set to take samples at a rate of 1.25 GS/s for the
duration of 3.38×10−4, which lead to 422500 samples and each power trace occupied 1.6 MB on file.
lxv
The results using 10, 15, 20 and 25 traces are shown in Figures 5.13, 5.14, 5.15 and 5.16.
Figure 5.13: Left image shows the correlation result and right image the t-test using 10 traces.
Figure 5.14: Left image shows the correlation result and right image the t-test using 15 traces.
Figure 5.15: Left image shows the correlation result and right image the t-test using 20 traces.
This allows to infer if the t-test distinguishes the point-of-interest faster than the CPA and if the
lxvi
Figure 5.16: Left image shows the correlation result and right image the t-test using 25 traces.
points match. The results using 10, 15, 20 and 25 traces are shown in Figures 5.13, 5.14, 5.15 and
5.16.
From the results obtained, t-test seems to a useful tool to access points-of-interest much faster
than CPA, but it is worth mentioning that further analysis need to be done using a more protected
smart card.
5.3 Conclusion
This section presented 3 analysis setup configurations, in order to assess the one that could pro-
vide the best results on the CPA. First the SNR and the distance between the maximum and minimum
Hamming-Weights, on a trace portion that had a great and noticeable leakage, was proposed as the
first metric to choose the best setup. Then to confirm this metric, the CPA was performed, using
the same setup, and the results of the best setup were confirmed. Finally, to reduce the number of
samples processed by a CPA attack, t-test can be used as a faster way to provide points of interest,
since the POI showed up much faster and matched the one found on the CPA attack.
lxvii
lxviii
6Conclusions and Future Work
Contents6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lxxi
lxix
Smart cards are a common asset used in our daily lives. Their applicability gives from transporta-
tion, health, payments, telecommunications, identification, among other areas. These smart cards
provide several tamper proof and unauthorized access protection mechanisms, making then appro-
priate to store sensitive information, such as, secret and private keys.
Power analysis is an effective way to retrieve a sensitive information from a smart card. SPA
attacks are based on the visual inspection of one or a few power traces. They can provide information
about what operations were performed by the smart card, and even in some particular cases expose
the secret key. In most cases, the attacker needs to know some smart card implementation details to
be successful. DPA, depending on the trace quality and protection mechanisms the smart card under
attack has, might require a considerable amount of traces. This type of attack has the advantage
of being easy to automate, since the attack is based on finding correlations over specific points of a
power traces. This has also the advantage of not requiring a detailed knowledge of the smart card’s
implementation to be successful.
The proposed and implemented solution was to configure a setup that allows the user to per-
form side-channel analysis on both smart cards and FPGAs. The main hardware used was the
SAKURA-G/W, and a PC oscilloscope branded picoscope. To have the setup operating, four software
components were also developed:
• The software used to program the smart card, containing the AES algorithm, the communication
and customizable trigger.
• The trace gathering program responsible for configuring and setting the picoscope, communi-
cate with the target device and gather/store the traces.
• The SAKURA-G and SAKURA-W communication program used by the trace gathering program
to deliver and received data from the devices.
• The scripts used for the trace analysis (CPA, SNRP, T-test), which were responsible to analyse
the traces and present the data in a way understandable to the user, such as in graphics and
data comparison.
Then the implemented solution was analysed to assess which configuration gives the best CPA
results, since the quality of the traces influences the attack success. The experimental results suggest
that the setup with the external amplification and using an adjustable laboratory power supply was the
one that got the best results. This conclusion was deduced from the metrics defined in the evaluation,
where the SNR and correlation value from CPA were used. Using the best setup configuration, it was
possible to recover the secret key from an unprotected smart card by performing 25 power traces.
Finally, a t-test analysis was performed and compared with a CPA analysis, and as expected, the
t-test revealed leaks of information faster than the CPA, but also seem to provide some false positive
points-of-interest.
lxx
6.1 Future Work
The work developed in this thesis used as its core the CPA algorithm on unprotected cryptographic
devices, in this case, an smart card. If cryptographic devices that come with protection mechanism
are used, these methods might not be enough to compromise the device.
As for future work, it would be interesting to see how the CPA attack would perform against the
defence mechanism that were mentioned on Chapter 3. Also, the use of more advance techniques,
such as template and collision attack, could be tested to assess their effectiveness against protected
and unprotected devices.
The methodology used to select the best setup was done using the smart card, another interesting
test would be to do the same for the FPGA and see in the results match. Also, other types of setups
can be tested to see if the overall CPA improved. For example, by changing the DC blocker to one
with a lower frequency range, the results might be improved.
Finally, it would be interesting to use smart cards that are available on the market and assess their
security against this type of attacks. To do this, the communication protocol used with that card would
have to be discovered in order for the trace collecting program communicate with it.
lxxi
lxxii
Bibliography
[1] P. C. Kocher, “Timing attacks on implementations of diffie-hellman, rsa, dss, and other systems,”
in Annual International Cryptology Conference. Springer, 1996, pp. 104–113.
[2] P. Kocher, J. Jaffe, and B. Jun, “Differential power analysis,” in Annual International Cryptology
Conference. Springer, 1999, pp. 388–397.
[3] P. Kocher, J. Jaffe, B. Jun, and P. Rohatgi, “Introduction to differential power analysis,” Journal of
Cryptographic Engineering, vol. 1, no. 1, pp. 5–27, 2011.
[4] S. C. Alliance, “What makes a smart card secure?” A Smart Card Alliance Contactless and
Mobile Payments Council White Paper (Oct. 2008), 2008.
[5] S. Mangard, E. Oswald, and T. Popp, Power analysis attacks: Revealing the secrets of smart
cards. Springer Science & Business Media, 2008, vol. 31.
[6] S. Chari, C. S. Jutla, J. R. Rao, and P. Rohatgi, “Towards sound approaches to counteract power-
analysis attacks,” in Annual International Cryptology Conference. Springer, 1999, pp. 398–412.
[7] S. Chari, C. Jutla, J. R. Rao, and P. Rohatgi, “A cautionary note regarding evaluation of aes
candidates on smart-cards,” in Second Advanced Encryption Standard Candidate Conference.
Citeseer, 1999, pp. 133–147.
[8] C. Clavier, J.-S. Coron, and N. Dabbous, “Differential power analysis in the presence of hard-
ware countermeasures,” in International Workshop on Cryptographic Hardware and Embedded
Systems. Springer, 2000, pp. 252–263.
[9] J.-S. Coron, “Resistance against differential power analysis for elliptic curve cryptosystems,” in
International Workshop on Cryptographic Hardware and Embedded Systems. Springer, 1999,
pp. 292–302.
[10] W. Rankl and W. Effing, Smart card handbook. John Wiley & Sons, 2004.
[11] gorferay.com, “Smart card formats,” http://www.gorferay.com/pic/AT88SC153 Card SLE5542
Card SLE5528 Cards ISO Contact Cards 5.jpg, last accessed 10 May 2017.
[12] J. F. R. Mackinnon and S. P. L. Yatawara, “Smart cards: A case study,” Internation al Technical
Support Organization IBM, 1998.
lxxiii
[13] S. C. Alliance, “What makes a smart card secure?” A Smart Card Alliance Contactless and
Mobile Payments Council White Paper (Oct. 2008), 2008.
[14] D. Eastlake 3rd and P. Jones, “Us secure hash algorithm 1 (sha1),” Tech. Rep., 2001.
[15] R. Rivest, “The md5 message-digest algorithm,” 1992.
[16] N. F. Pub, “197: Advanced encryption standard (aes),” Federal Information Processing Standards
Publication, vol. 197, no. 441, p. 0311, 2001.
[17] E. Milanov, “The rsa algorithm,” RSA Laboratories, 2009.
[18] F. Koeune and F.-X. Standaert, “A tutorial on physical security and side-channel attacks,” in
Foundations of security analysis and design III. Springer, 2005, pp. 78–108.
[19] S. B. Ors, F. Gurkaynak, E. Oswald, and B. Preneel, “Power-analysis attack on an asic aes
implementation,” in Information Technology: Coding and Computing, 2004. Proceedings. ITCC
2004. International Conference on, vol. 2. IEEE, 2004, pp. 546–552.
[20] A. A. Ding, C. Chen, and T. Eisenbarth, “Simpler, faster, and more robust t-test based leakage
detection,” in International Workshop on Constructive Side-Channel Analysis and Secure Design.
Springer, 2016, pp. 163–183.
[21] S. Lab., “Sakura-g quick start guide,” http://satoh.cs.uec.ac.jp/SAKURA/doc/SAKURA-G Quik
Start Guide Ver1.0 English.pdf, last accessed 10 May 2017.
[22] T. P. S. research group et al., “Dpa contest,” http://www.atmel.com/images/doc2512.pdf, last ac-
cessed 10 May 2017.
[23] Newae, “Chipwhisperer,” https://newae.com, last accessed 10 May 2017.
[24] ——, “Chipwhisperer products,” https://newae.com/tools/chipwhisperer, last accessed 10 May
2017.
[25] S. Chari, J. R. Rao, and P. Rohatgi, “Template attacks,” in International Workshop on Crypto-
graphic Hardware and Embedded Systems. Springer, 2002, pp. 13–28.
[26] K. Schramm, G. Leander, P. Felke, and C. Paar, “A collision-attack on aes,” in International
Workshop on Cryptographic Hardware and Embedded Systems. Springer, 2004, pp. 163–175.
[27] W. Electronics, “Wb electronics website,” http://www.infinityusb.com, last accessed 10 May 2017.
[28] Atmel, “atmega8515 datasheet,” http://www.atmel.com/images/doc2512.pdf, last accessed 10
May 2017.
[29] G. user: Kokke, “Tiny aes128,” https://github.com/kokke/tiny-AES128-C, last accessed 10 May
2017.
lxxiv
[30] P. technology, “Picoscope 6000,” https://www.picotech.com/oscilloscope/picoscope-6000-series,
last accessed 10 May 2017.
[31] C. O’Flynn, “Pico-python,” https://github.com/colinoflynn/pico-python, last accessed 10 May
2017.
[32] S. Lab., “Sakura-g checker software,” http://satoh.cs.uec.ac.jp/SAKURA/hardware/SAKURA
Checker release 20130902 3.zip.
[33] ——, “Sakura-w aes checker software,” http://satoh.cs.uec.ac.jp/SAKURA/resource/SAKURA
W VCP AES Checker 140917.zip.
[34] C. T. University, “Differential power analysis exercises,” https://rozvoj.fit.cvut.cz/Lisbon/Analysis,
last accessed 10 May 2017.
[35] P. D.-I. J. J. Buchholz, “Matlab implementation of aes,” http://buchholz.hs-bremen.de/aes/aes.
htm, last accessed 10 May 2017.
lxxv
lxxvi