smart card power analysis: from theory to practice · 1.4 document structure this document...

Smart Card Power Analysis: From Theory To Practice

Joao Fernando Coelho Lopes

Thesis to obtain the Master of Science Degree in

Information Systems and Computer Engineering

Supervisor: Prof. Ricardo Jorge Fernandes Chaves

Examinatiom Committee

Chairperson: Prof. Luıs Manuel Antunes VeigaSupervisor: Prof. Ricardo Jorge Fernandes Chaves

Member of the Committe: Prof. Renato Jorge Caleira Nune

May 2017

Acknowledgments

First of all, I would like to thank my coordinator, Professor Ricardo Chaves, since its advice helped

me keep the work in the right direction and overcome some of the difficulties encountered.

I would like to thank my colleague Ricardo Macas, for reviewing and providing some feedback

about my thesis, and Jaganath Mohanty for having introduced me and explained the setup compo-

nents.

I would like to acknowledge WB electronics for providing the source code example for the smart

card.

I would like to thank my family, especially my parents, for the constant support. All of this, from

start to end, would have not been possible without them.

Last but not least, I would like to thank my girlfriend Joana Neno for its support and patience

specially on the tough days.

iii

Abstract

Smart cards are ubiquitous devices used in many critical areas. They offer some mechanisms

against unauthorized access, that protects the secret data in it. They can also offer cryptographic

operations such as data protection and authentication. However, power analysis offers non-intrusive

techniques to extract sensitive information. The most common attacks used is the Differential Power

Analysis (DPA), an attack class most appropriate for symmetric ciphering like AES. Signal-to-noise

ratio was used as complementary analysis to assess the noise on the recorded power traces respec-

tively. Then this work presents the fundamentals of CPA, the state of the art, and other statistical

analysis algorithms. Using this, an experimental setup is purposed to perform this type of analysis of

smart cards and FPGA. Finally, an experimental evaluation is performed to assess if the setup can

improve with the use of external amplification and different power supplies. The results shown that

this type of setup can benefit from external amplification, but did not benefit as much when different

power supplies were used. Also, the secret key of an unprotected smart card, using the best setup

configuration, was fully recovered using 25 power traces.

Keywords

Side-channel, Power analysis, Smart cards

v

Contents

1 Introduction xiii

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv

1.2 Thesis Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv

1.3 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv

1.4 Document Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv

2 Background xvii

2.1 Smart Cards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii

2.1.1 Physical Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii

2.1.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix

2.1.3 Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix

2.1.4 Smart card physical interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx

2.1.5 Smart communication Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx

2.1.6 Smart card data transmission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi

2.1.7 Security Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi

2.1.8 Operating system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii

2.2 Cryptographic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii

2.2.1 AES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii

2.2.2 RSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiv

2.3 Side-Channel Analysis Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvi

2.3.1 Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii

2.3.2 Simple Power Analysis - Visual Inspection . . . . . . . . . . . . . . . . . . . . . . xxvii

2.3.3 Differential Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxviii

2.4 Signal Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxx

2.4.1 Analog to Digital . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxx

2.4.2 Power traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxi

2.5 Welch’s T-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxii

3 State of Art xxxv

3.1 Side-channel Analysis Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxvi

3.1.1 SAKURA-G/W . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxvi

3.1.2 ChipWhisperer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxvii

vii

3.2 Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxix

3.2.1 Template Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxix

3.2.2 Collisions Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xli

3.3 Defence Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlii

3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlii

4 Proposed Solution and Implementation xliii

4.1 Processing Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlv

4.1.1 SAKURA-G/W . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlv

4.1.2 Smart Card . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlvi

4.2 Trace Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlix

4.2.1 PicoScope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlix

4.2.2 Programming the traces collecting program . . . . . . . . . . . . . . . . . . . . . l

4.2.3 PC to SAKURA-G communicator programming . . . . . . . . . . . . . . . . . . . li

4.2.4 PC to SAKURA-W communicator programming . . . . . . . . . . . . . . . . . . . li

4.3 Signal Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . liii

4.3.1 Signal-to-noise ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . liii

4.3.2 T-test Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . liv

4.3.3 Intermediary value generator for AES . . . . . . . . . . . . . . . . . . . . . . . . liv

4.4 The overall setup usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . liv

4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lvi

5 Experimental Results and Evaluations lvii

5.1 Different Setup Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lviii

5.1.1 Signal-to-noise Ratio Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . lviii

5.1.2 CPA Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lx

5.1.3 Power Supply Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lxiii

5.2 T-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lxv

5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lxvii

6 Conclusions and Future Work lxix

6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lxxi

Bibliography lxxiii

viii

List of Figures

2.3 Smart card contact pads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx

2.4 Smart card Application Protocol Data Units (APDU) format. . . . . . . . . . . . . . . . . xxi

2.5 Smart card communication protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii

2.6 AES AddRoundKey. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv

2.7 AES SubBytes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv

2.8 AES ShiftRows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv

2.9 AES MixColumns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv

2.10 Differential Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxviii

2.15 Sampling and quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxi

3.1 SAKURA-G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxvi

3.2 SAKURA-W . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxvii

3.3 Chipwisperer Starter Kit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxviii

3.4 Chipwisperer Software Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxviii

4.1 Power analysis setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xliv

4.2 PicoScope 6000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . l

4.3 Implemented power analysis setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lv

5.5 Partial CPA from setup 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lxii

5.6 Partial CPA from with setup 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lxii

5.7 Partial Correlation Power Analysis (CPA) from with setup 3. . . . . . . . . . . . . . . . . lxii

5.9 Power trace of the laboratory power supply in void. . . . . . . . . . . . . . . . . . . . . . lxiv

5.10 Power trace of the wall charger in void. . . . . . . . . . . . . . . . . . . . . . . . . . . . . lxiv

5.11 Power trace of the USB power supply in void. . . . . . . . . . . . . . . . . . . . . . . . . lxiv

5.12 Power supply incremental CPA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lxv

ix

Abbreviations

AC Alternating Current

AES Advanced Encryption Standard

APDU Application Protocol Data Units

API Application programming interface

ASIC Application Specific Integrated Circuits

ATR Answer To Reset

BPS Bits Per Second

CD Carrier Detect

CMOS Metal-OxideSsemi Conductor

COS Card Operating System

CPA Correlation Power Analysis

CPU Control Process Unit

CRC Cyclical Redundancy Check

CTS Clear to Send

DCE Data Communication Equipment

DC Direct Current

DDR Data Direction Register

DES Data Encryption Standard

DLL dynamic link library

DPA Differential Power Analysis

DSR Data Set Ready

DTE Data Terminal Equipment

xi

EEPROM Electrically-Erasable Programmable Read-Only Memory

FIA Fault Injection Attacks

FPGA Field-Programmable Gate Array

HAL Hardware Acceleration Engine

HD Hamming-Distance

HW Hamming-Weight

I/O Input/Output

ISO/IEC International Organization for Standardization and International Electrotechnical Commis-

sion

ISO International Organization for Standardization

MIPS Millions of Instructions Per Second

MMU Memory Management Unit

OS Operating System

PAA Power analysis attacks

POI Points-Of-Interest

RAM Random-access memory

RI Ring Indicator

ROM Read Only Memory

RSA Rivest Shamir Adleman

SCA Side-Channel Attacks

SIM Subscriber Identity Module

SNR signal-to-noise ratio

SPA Simple Power Analysis

SRAM Static Random Access Memory)

XOR Exclusive-or

xii

1Introduction

Contents1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv1.2 Thesis Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv1.3 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv1.4 Document Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv

xiii

The smart card has become very popular over the years, being used in many security systems.

They can safely store sensitive information, like secret keys, thanks to built in security mechanisms

making them tamper-resistant. Smart cards can also perform cryptographic operations using the

secret key they hold, meaning the secret key is never exposed.

Side-channel attacks [1], have proven to be a very effective mean to attack cryptographic algo-

rithms. Side-channel exploits sensitive information leaked by a device during operation, ultimately

compromising the secret information of the cryptographic system.

Paul Kocher has proven in his pioneering work [2] that a smart card can be compromised easily, if

adequate protection mechanisms against power analysis attacks are not deployed. Power analysis is

a type of side-channel attack, where the instantaneous power consumption of a device can be used

to compromise the secret it holds. This is possible since the consumption of a device depends on the

data and operations that are being processed.

The two main power analysis attacks used are Simple Power Analysis (SPA) and Differential Power

Analysis (DPA). SPA relies on the direct interpretation of a power trace, used mainly to retrieve infor-

mation about the operations being performed, but can also be used to retrieve sensitive information in

particular cases. DPA is a more elaborate attacker, relying on the statistical analysis of multiple power

traces. In general, the attacker needs little to no implementation details about the device under attack

[3].

1.1 Motivation

Smart cards are distributed worldwide being used in many of today’s industries. Because they are

so widely spread and carry sensitive information they are a very desirable target for attackers. Smart

cards have multiple mechanisms that detect, when the card is working under abnormal conditions, or

when attempts to probe or tamper the components are made [4]. Side-channel attacks can retrieve

information from smart cards by measuring power consumption, electromagnetic fields, timing or even

sound [5]. These unintended leaks of sensitive information that might look harmless, but ultimately

can compromise a device’s secret information.

Power analysis attacks (PAA) are of interest because they do not need to tamper with the normal

functionality of the device. They measure the power consumption during the device’s operation and

perform statistical analysis over the collected data, defeating smart card security mechanisms that did

not contemplate this type of attack being more focused, for example, on protecting the device against

physical access. Power Analysis Attacks have been proven over the years to be very effective on

cryptographic devices, according to the existing state of the art research [1, 6–9].

The need to understand and measure the effectiveness of these attacks on devices that hold

sensitive information as smart cards is huge. The improvement of these attacks and, at the same

time, the implementation of countermeasures is the way to keep one step ahead of attackers.

xiv

1.2 Thesis Goals

In order to know how cryptographic devices can be exploited with the use of power analysis,

it is required to find how this type of attacks works to understand why and what devices might be

vulnerable.

The goal with this work is to build a setup that allows one to perform this type of analysis. The

analysis should allow the recovery of the secret key from an unprotected smart card implementation

and access the quality of the power traces gathered.

This setup will focus mainly in supporting smart cards, but can also be extended to other devices

as Field-Programmable Gate Array (FPGA)s. Then, different setup configurations will be tested in

order to find out how they affect the power trace collection and analysis.

Finally, this document is intended to serve as a stepping-stone for those who wish to understand

the concepts of power analysis and need to perform their own experiments and evaluate their own

systems.

1.3 Requirements

Based on these goals, the requirements are:

• The user, from this document, must be able to understand how power analysis and Correlation

Power Analysis (CPA) attacks work.

• The user must be able to understand how the setup works and how to use it.

• The user must be able to collect traces from smart cards using a power analysis platform.

• The user must be able to perform correlation analyses and recover the secret key from the

unprotected smart card provided.

• The user must be able to use the setup by only changing the configuration parameters and

selecting the type of analysis to be performed.

1.4 Document Structure

This document describes how power analysis is able to retrieve a secret key from a cryptographic

device and how it can be done in practice.

Chapter 2 provides the base information to understand why power analysis is possible and what

type of statistical analysis can be performed, in order to retrieve information from the power recorded

power consumptions.

Chapter 3 presents the state of the art on power analysis platforms, attacks that improve key

recovery base attack and some protection mechanisms that can increase the difficulty of recovering

the key.

xv

Chapter 4 describes the proposed setup and discusses its implementation. Here the trace col-

lecting setup components are presented as well all the software developed to support the overall

setup.

Chapter 5 presents the evaluation of the proposed solution. The several setup configurations

are evaluated by comparing the required traces. Several evaluation methods and related statistical

algorithms are also compared to see their effectiveness.

Chapter 6 concludes the dissertation by summarizing the work developed and presents possible

work directions

xvi

2Background

Contents2.1 Smart Cards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii2.2 Cryptographic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii2.3 Side-Channel Analysis Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvi2.4 Signal Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxx2.5 Welch’s T-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxii

xvii

This chapter presents the background, for a better understanding of later chapters. Section 2.1

presents an introduction of the smart card characteristics and components. Section 2.2 presents 2

types of ciphering algorithms: The Advanced Encryption Standard (AES) and Rivest Shamir Adleman

(RSA). Section 2.3 explains how power analysis work, followed by two techniques to perform such

analysis. Section 2.4 presents the characteristics to take into account when gathering signals and

explains the components of power traces. Finally, section 2.5 presents a complementary analysis

technique named t-test.

2.1 Smart Cards

Plastic cards have been in use since the 50’s. The first security mechanism relied on visual fea-

tures of the card, such as security printing and signature panel. The successor was the magnetic

stripe card that allowed the storage of digital data, enabling the card to be read by machines. The

problem with this technology is the possibility to read, delete and re-write the data store in the mag-

netic strip with the right equipment.

With the creation the integrated circuit and subsequently inclusion on a plastic card, the term smart

card was born. A smart card is a card shaped device with an integrated circuit that provides a way

to store data securely. There are two main smart card types: The memory and microcontroller smart

cards [10].

Memory cards include dedicated logic for security, providing access control to data, for example,

a writing/erase protection mechanism. This type of cards is designed for one specific purpose which

restricts their flexibility. However, this makes them inexpensive to manufacture. They are used in

applications that need storage and minimal data protection, being used in health insurance cards.

Microcontroller cards can be seen as a miniaturized computers with a Operating System (OS), a

storage, memory and an Input/Output (I/O) port. They also have the ability to create, delete, manipu-

late files and process data. This gives them the ability to execute applications and perform function-

alities dynamically, offering at the same time secure application transactions and data protection. A

great security advantage they have is the ability to perform cryptographic operations inside the card,

meaning the secure data is never exposed.

Smart cards communicate with the external world using physical contacts or electromagnetic

fields. Hybrid smart cards also exist, meaning that they possess the two means of communication

[10].

2.1.1 Physical Characteristics

ISO/IEC 7816-1 specifies the physical characteristics of a smart card [10]. The most common

format is the ID-1 card, with dimensions of 85.60 X 54.00 X 0.76 mm. These are the usual credit

card shaped cards used in such industries as financial, healthcare and others. Figure 2.1 shows this

common format.

Another common format is the ID-000 card, known for the Subscriber Identity Module (SIM) cards.

xviii

Figure 2.1: All three smart card formats. [11]

These cards have dimensions of 3 X 15 X 0.76 mm, used mainly on cell phones to identify and

authenticate subscribers towards the cell phone operator. There is also a smaller version, the ID-000

card named mini-UICC, the smallest card format produced. There also is a third format, named ID-00

or ’mini-card’, that has a size between the ID-1 and ID-000. This format has not yet been established

internationally.

2.1.2 Architecture

A microcontroller smart card is composed by the following internal components [10]: The Central

Processing Unit (CPU), responsible for executing the device instructions; The Random-access mem-

ory (RAM), the working memory of the Control Process Unit (CPU); The Read Only Memory (ROM),

which contains the Card Operating System (COS); The Electrically-Erasable Programmable Read-

Only Memory (EEPROM), the non-volatile memory, used to store data, programs and OS routines;

The I/O port, used to communicate with the external world; and finally the sensors, that monitor if the

card is running on specified parameters. Some smart cards also have a co-processor used in heavy

numeric calculations.

Figure 2.2 illustrates the components of a smart card microprocessor laid out in a simplified way.

Figure 2.2: Microprocessor card components.

2.1.3 Standards

Several different worldwide companies produce the smart card and the infrastructure to commu-

nicate with them. To guarantee the compatibility and interoperability between different manufactur-

xix

ers, the International Organization for Standardization and International Electrotechnical Commis-

sion (ISO/IEC) 7816 was created to specify smart card standards.

Contact cards that are International Organization for Standardization (ISO) compliant must follow

part one, two and three of the ISO 7816 [12], part one describes the physical characteristics, part two

the location and dimensions of the contacts and part three describes the transmission protocols and

electronic signals.

Contactless cards must comply with the additional ISO/IEC 14443, divided in four parts where part

one describes the physical characteristics, part two the radio frequency power and signal interface,

part three the initialization and anti-collision mechanism and part four the transmission protocol.

2.1.4 Smart card physical interface

ISO/IEC 7816-3 specifies the physical characteristics, electronic signals and transmission proto-

cols of a contact smart card [12]. The contact pad is shown in figure 2.3 and each pin function is

described has the following:

The VCC is used to supply the smart card with power and voltages can range from 5V, 3V or

1.8V; The RST is used to reset the smart card microcontroller, to which the card responds with an

Answer-To-Reset (ATR); The CLK is used to supply the microcontroller with a clock signal, usually at

3.5MHz; The SPU means standard or proprietary use (the former programming voltage Vpp) and is

optional; The I/O is used to transfer data between the smart card and a terminal. The RFU stands for

”reserved for future use” and is usually denoted as the AUX pin used, for example, as an operation

trigger or as a second I/O communication channel.

Figure 2.3: The smart card contact pad.

2.1.5 Smart communication Protocol

Smart cards use what is called Application Protocol Data Units (APDU) to exchange data between

a terminal. The communication is specified by ISO/IEC 7816-4 [12].

The two main components of a APDU are the header and body. The header has a fixed size and is

always present, while the body can vary in size and is not required for some instructions. The header

can be divided into four elements, the class byte (CLA), the instruction byte (INS) and two parameter

bytes (P1, P2). As for the body, it has three components, the data length command Lc, the data field

and the response length expected Le. The left part of Figure 2.4(a) shows a visual representation of

the APDU structure.

xx

(a) Figure A (b) Figure B

Figure 2.4: Smart card APDU format.

The response APDU is composed of a response body and trailer (SW1, SW2), with the body being

optional and the trailer being of fixed size. The body contains the data produced by the previous

command, which can be none if the previous operation did not return anything. The SW1 and SW2

contain the response code indicating if the processing status was successful or not. The right part of

Figure 2.4(b) shows a visual representation of the returned APDU format.

For example, a verify command, to check if the user’s inserted pin (0000) matches the smart

card internal one, is represented as the APDU: 0x94 0x20 0x80 0x00 0x04 0x30 0x30 0x30 0x30,

where 0x94 identifies the instruction class, 0x20 the instruction, 0x80 and 0x00 are parameter 1 and

2 respectively, 0x04 is the data length the data and the four 0x30 are the PIN digits in ASCII value. If

the pin was correct, the returned APDU would be 0x90 0x00, where the 0x90 and 0x00 are the trailer

SW1 and SW2 respectively.

2.1.6 Smart card data transmission

Smart card data communication is performed asynchronously using a half-duplex channel. This

means the smart card reader and the smart card share the same communication line, the data pin

seen in a previous section, so only one device can communicate at a time. This communication can

use two protocols named T=0 and T1, were T=0 defines the data transmission of one byte at time

whereas T=1 defines a data block transmission.

When transmitting one byte, there are a couple of extra bits that ensure the sender and receiver

are synchronized. At the start, the line drops to from a high state to a low state informing the receiver

that there is one byte being transferred. Then the data bits are transferred followed by 2 more bit

fields. Those bits are the data parity bit and the guard time, where the parity bit is added to make

the number of data bits either even or odd, depending on the configuration, and the guard time

bits ensures the receiver sees the highest state before a new byte transfer is performed. The most

common configuration for data transmission is 9600 bits/second, 8 data bits, even parity and 2 stop

bits. Figure 2.5 illustrates this transmission structure.

2.1.7 Security Mechanisms

One of the main features that make smart cards so desirable, is their ability to protect the data in it.

To achieve that, tamper mechanisms are implemented to stand against physical and logical attacks.

Some of the countermeasures against physical attacks [13] are: The programmable active shield, that

is a protective shield layer that covers the smart card microcontroller, preventing the chip components

xxi

Figure 2.5: Smart card communication protocol structure.

from being analysed/probed; the Memory Management Unit (MMU) that acts like a firewall, preventing

smart card application from accessing privileged resources that should only be access by the OS;

The data bus encryption, that cipher data being passed over the bus, preventing an attacker from

knowing what values are being transmitted; The sensors that are integrated with the microcontroller to

prevent abnormal operation of the smart card, by monitoring things like the internal and external clock

frequency, voltage, temperature and others components; The Cyclical Redundancy Check (CRC) that

checks for data errors during transmission, reading or writing; Finally, the current masking device is

a security mechanism against power analysis that operates by performing random dummy accesses

operations in memory, changing the power consumption of a device during operation.

2.1.8 Operating system

Operations on smart cards are controlled and monitored by the COS which is a small O.S, specific

for each type of card [10]. They are divided in two groups: The general purpose COS, that has a

generic commands that work on many applications, like the Java card; The dedicated COS, that has

instructions for some specific applications and can also contain the application within itself. COS

can also be classified as an open or a proprietary platform. The term open platform means third

parties can load programs into the smart card, while proprietary platform means the opposite, only

the producer of the COS can install programs to the card.

Java Card is one of the most used COS worldwide. It is an open source, multi-application oper-

ating system based on Java, intended to abstract the hardware from the programming language, i.e.

programmers do not to worry about the hardware specifications when developing a program. This is

accomplished by translating the programs to bytecode, with is then interpreted by a virtual machine.

The virtual machine is responsible to run the program while handling the hardware specifications and

resources. Java card allows a smart card to have multiple Java Card applications (applets) inside,

while granting isolation from each other through the use of firewalls. This has the advantage of al-

lowing different vendors to have their applets on the same smart card, independently of the level of

security and testing each applet has.

MULTOS is another popular smart card operating system known to be used when there are needs

for high security and performance. Applications are typically written in C language, which are then

compiled into the MULTOS Executable Language (MEL). Like the Java Card, it also allows multiple

xxii

applications on a single smart card, while providing isolation from each other. The main difference

comes in at production, were the MULTOS manufactures must comply with a licence that obligates

them to perform rigorous security and interoperability tests on these smart cards.

2.2 Cryptographic Algorithms

Most of the today’s systems security relies on cryptographic operations to provide desirable prop-

erties such as authentication, integrity and confidentiality. The main cryptographic operations are of

two types, asymmetric and symmetric encryption.

In symmetric ciphering, there is one key to cipher and decipher data. Some examples of this

type of algorithms are the AES, the Data Encryption Standard (DES), 3DES, Blowfish and Twofish.

AES will be described next, since it is widely used and secure to the point of being used by the U.S

government to protect classified data.

Asymmetric ciphers use two keys, named private and public keys, to exchange messages se-

curely. The sender uses the public key, known to everyone, to encrypt the message content, granting

confidentiality. The public key can also be used to verify the authenticity of a message. The receiver

uses the private key, known only to him, to decrypt the message content. The private key can also

be used to cipher data to grant message authenticity. Some examples of asymmetric cryptographic

algorithms are the RSA, Diffie-Helman and ECC! (ECC!). RSA will be described next, since is the

most commonly used and is still considered, despite the fact it was first published in 1977.

Between asymmetric and symmetric cipher there are some factors to take into account. Asym-

metric algorithms rely on computationally expensive math operations, making them slower compared

to symmetric operations. On the other hand, AES is less easily scalable since it needs a new secret

key for each new pair of entities that want to communicate.

There is also a class of algorithms named hash functions, that produce distinct fixed sized outputs

dependent of the input that is given. Hash functions transform an arbitrary size of data into a fixed

size value, usually smaller than the original data. They are used to validate data integrity, meaning it

is possible to detect changes in the original data by comparing the outputs of the data received and

the expected one. Most popular hash functions are the SHA-1 [14] and MD5 [15]. MD5 has been

deemed insecure due to several mathematical vulnerabilities.

2.2.1 AES

The AES [16] is an symmetric cryptographic algorithm used to protect electronic data. Ciphering is

performed in blocks of 128 bits using keys that range from 128, 196 and 256 bits, being the algorithm

slightly different depending on the key size. The algorithm was adopted by the U.S government to

protect secret data.

AES performs 10, 12 or 14 transformation rounds, depending if the used key is 128, 192 or 256

bit, over a 4x4 byte data matrix called the state. At the beginning, the state has the original data block

to be ciphered then, 4 operations are performed in each round. The operations are addRoundKey,

xxiii

subBytes, shiftRows and mixColumns, where each one either replaces the values of the state with

new ones or changes their position. The result, after all the operations, is the ciphered data. As for

the deciphering operation, is the same process with the same number of rounds and operations but

in reverse order. Next, the previous 4 mentioned operations are described in more detail:

AddRoundKey operation performs an Exclusive-or (XOR) between the state and a round key. A

round key is generated from the secret key, via a key scheduler, and can be represented also in a 4x4

matrix. Figure 2.6 illustrates the addRoundKey operation.

SubBytes operation performs a non-linear substitution on each byte, meaning, every position

of the state is replaced by another value. The substitution function is named S-box and works by

replacing the state bytes either, by using a formula, or by using a pre-computed table with all the

possible values. Figure 2.7 illustrates the subBytes operation.

ShiftRows operation performs a number of row rotations, that depend on the row position. By

shifting n − 1 times, where n is the row position. This means that the first row stays the same, the

second is shifted one position, the third two positions and the fourth is shifted three positions. Figure

2.8 illustrates the shiftRows operation.

MixColumns operation performs a column data mix independent of the other columns. To achieve

that, each state column is multiplied with a fixed polynomial. Figure 2.9 illustrates the mixColumns

operation.

The only difference to this sequence is on the tenth round, where the mixColumns is not performed

and a extra addRoundKey is computed. The pseudocode of this algorithm is shown bellow:

aes c ipher ing ( byte p l a i n t e x t , word round key ){byte s ta te ;s t a t e = p l a i n t e x t ;AddRoundKey ( s ta te , round key ) ;for ( i = 1 ; i < num rounds − 1; i ++){

SubBytes ( s ta t e ) ;ShiftRows ( s ta te ) ;MixColumns ( s ta te ) ;AddRoundKey ( s ta te , round key [ i ] ) ;

}SubBytes ( s ta t e ) ;ShiftRows ( s ta te ) ;AddRoundKey ( s ta te , round key [ num rounds ] ) ;return s ta te ;

}

2.2.2 RSA

RSA [17] is a asymmetric cryptographic algorithm based on the modular arithmetic over large

positive numbers. The security of the algorithm relies on the difficulty in factoring large numbers into

its base primes, since no algorithm capable of doing that efficiently was found.

The basic idea of this algorithm is to raise the message value to the public key when encrypting

or private key when decrypting, ending with a modular operation that uses a known value. This

xxiv

Figure 2.6: AES AddRoundKey.

Figure 2.7: AES SubBytes.

Figure 2.8: AES ShiftRows. Figure 2.9: AES MixColumns.

mathematical operation is called modular exponentiation.

The public key is the pair Ppu = e, n were e is the value that is going to be used to raise the

message value M and n is the value that will be used on the modular operation.

The cipher operation is presented as the following:

Me (modn) = C(ciphertext) (2.1)

The private key is a pair Ppu = d, n were n is the value that is going to be used to raise the

message value M and n is the value that will be used on the modular operation. The decipher

operation is presented as the following:

Cd (modn) =M(Message) (2.2)

RSA and other asymmetric cryptographic algorithms are based on the modular exponentiation.

The most used technique to perform a modular exponentiation is known as the square-and-multiply

algorithm. The algorithm works by checking if the last bit of the exponent number is one or zero. If it

is zero, the algorithm performs a square operation with the current result and the exponent is shifted

one position to the right. If the last bit is one, the previous operations are applied plus a multiplication

between the current result and the base. The following presents a possible implementation of the

algorithm:

xxv

squareAndMul t ip ly ( x , n ){i f ( n < 0)

return squareAndMul t ip ly ( 1 / x , −n ) ;else i f ( n == 0)

return 1;else i f ( n == 1)

return x ;else i f ( isEven ( n ) )

return squareAndMul t ip ly ( x ∗ x , n / 2 ) ;else / / i s Odd

return x ∗ ( squareAndMul t ip ly ( x ∗ x , ( n − 1 ) / 2 ) ) .}

This implementation is vulnerable to SPA since the execution of the multiplying operation is de-

pendent of the secret key bits [3]. This issue will be addressed in detail on section 2.3.2.

2.3 Side-Channel Analysis Attacks

Side-Channel Attacks (SCA) takes advantage of the information leaked by the physical implemen-

tation of a cryptographic device. They can be divided in two major groups [18], passive/active and

invasive/non-invasive attacks, having the following characteristics:

Passive and Active attacks: Passive attacks rely on the observation of the device’s behaviour

under normal specifications, meaning no attempts to tamper with it are performed. In contrast,

active attacks try to tamper with the device functionality, so it operates under abnormal specifi-

cations.

Invasive and non-invasive: Invasive attacks start by depackaging the chip to get access to

the components, so they can be directly probed. In contrast, non-invasive attacks try to gather

information only using the external components.

This type of attacks are not exclusive from each other, meaning it is possible to have a passive

attack using invasive or non-invasive techniques and vice-versa.

The attacks can be grouped in classes, depending on what physical characteristic they exploit

like, timing, power consumption, fault induction/analysis, temperature, electromagnetic emanations,

acoustic emissions and others. Their main goal is the same: retrieve information that can be used

to help revealing the device’s secret keys, but mostly differ in time, cost, expertise and equipment

needed to perform them.

Power analysis is typically a type of non-invasive attack [3] that can be performed with inexpensive

equipment when compared to other active attacks. They have been proven to be effective in recov-

ering secrets from the cryptographic device while maintaining the process fairly simple: to record the

power consumption of a device, while performing cryptographic operations, and in the end perform

statistical analysis over recorded power consumptions. This concept was the main reason this attack

was chosen, it is effective and fairly simple.

xxvi

2.3.1 Power Analysis

Most common modern digital circuits are build using complementary Metal-OxideSsemi Conductor

(CMOS) cells. This technology has the particularity of having significant power consumptions when

the internal logic cell values change. This type of power consumption is dynamic and occurs when

logic cells change their internal values from 0 → 1 or 1 → 0. Logic cells that maintain their internal

state have a static consumption, leading to very little power consumptions. This happens because

when there is a state transition it is usually required to charge or discharge a condenser, used to

maintain the logical state.

The state of logic cells depends on the input and operations a device is performing. An attacker can

acquire knowledge about operations and data being processed at a given moment. This information

leak can ultimately lead to the discovery of sensitive data, such as secret keys on a cryptographic

device. This type of analysis is called PAA where a correlation between the power consumption and

the key dependent operations performed on the device is evaluated.

An attacker has, in most cases, limited to no knowledge about a device implementation. So to

simulate the power consumption of a device, they make use of simple power models. The most

commonly used are the Hamming-Distance (HD) and Hamming-Weight (HW) [5].

The Hamming-Weight power model is a simple model that considers the number of bits set to one,

at a given moment, to describe the power consumption of a device. This model requires almost no

knowledge about the device structure, and only needs to know the value being processed at a given

time.

The Hamming-Distance counts how many bits have changed from 1→ 0 and 0→ 1 over a transi-

tion. This model requires the attacker to know the predecessor and successor values at a given time.

It also requires some knowledge about the device implementation which in most cases is not known.

These power models are then used on PAAs. PAAs are divided into three types: SPA, DPA and

High-Order DPA. Later sections are going to be addressed the first two attacks.

The following sections will refer only to the software implemented algorithms on a 8-bit smart card,

since they are simpler to understand. For a power analysis attack on an AES Application Specific

Integrated Circuits (ASIC) implementation refer to [19].

2.3.2 Simple Power Analysis - Visual Inspection

Simple power analysis can be described as “a technique that involves directly interpreting power

consumption measurements collected during cryptographic operations” [5] to gather sensitive infor-

mation that can lead to the key discovery. This technique uses from one or few power traces of the

device under attack. This usually leads to the use of complex statistical analysis in order to retrieve

useful information from the power trace. Also, detailed knowledge about the algorithm implementation

running on the device is almost always necessary in order to retrieve useful information [5].

Visual inspection is one technique that can be used to retrieve information about the operations a

device is performing. This is possible because devices execute the algorithm instructions in sequence,

xxvii

i.e. the algorithm implementation is translated into a set of instructions that run in sequence on the

device.

The instruction set of a device can be divided into four subsets: the arithmetic, logical, data trans-

fer and branching sets. These sets work with different components as the arithmetic-logic unit, the

co-processor, the RAM, the ROM and the peripheral components. Components have their unique im-

plementation and purpose, making them have different power consumptions patterns when running.

This means the different components can be observed in the power traces allowing to identify what

instructions are running at a given time.

Instructions have their own power characteristic, leading to potential severe risk if their execution,

depends on the secret key. For a device running an algorithm with such characteristics, the key can be

compromised without much effort. The best example can be seen in some public-key cryptosystems

implementations using the square-and-multiply modular exponentiation [3]. The squaring operation

is performed for every bit of the secret key while the multiply operation is performed only for bits of

the key equal to one. Since the square and multiplication can be distinguished in the power trace, re-

trieving the secret key is as simple as checking when only a squaring or the squaring and multiply are

performed. Figure 2.10 depicts an example of a power trace during a square and multiply operation.

Even if the sequence of operations does not depend on the secret key, visual inspection can still

be useful in cases where the algorithm running on a device is not known. It can also give some hints

about how an algorithm is implemented.

Figure 2.10: An illustrative example of how Simple Power Analysis can be used to identify the bits being pro-cessed by visually inspecting the instantaneous power consumption.

2.3.3 Differential Power Analysis

In power analysis attacks, DPA are more effective than SPA in most cases. One of the reasons

being, they do not require detailed knowledge about the attacked device, being sufficient in most

cases to know the algorithm running on the device [5].

DPA requires many traces in order to be successful, but it’s effectiveness recovering the secret

key is superior to SPA, being able to recover keys even if power traces are noisy. The attack looks for

data dependencies at specific points of the power trace, instead of patterns in a complete trace.

This type of analysis is performed in five steps [5] detailed follow:

Step 1: Choosing an Intermediate Result of the Executed Algorithm. In the first step, an

intermediate stage of the algorithm running on the attacked device is chosen. This stage must

xxviii

be a function that depends on a known value, the algorithm’s input or output data, and a portion

of the secret key.

Step 2: Measuring the Power Consumption. In the second step, the power trace of the

device is obtained while performing several cryptographic operations with known data as input.

The data used in the operations is stored in a vector d and the produced power trace is stored

in a vector t. Finally, a matrix T is built using all the recorded t vectors, where every position of

vector d produces one line on the matrix T. The first position of vector d produces the power

trace on the first line of matrix T and the same is done for the rest of the elements. Note that the

intermediary operation input must be known to the attacker. Figure 2.11 illustrates the process.

Step 3: Calculating Hypothetical Intermediate Values. In the third step, all possible keys

used in the intermediary operation are generated and stored in a vector k. The elements of

vector k are typically called key hypotheses. Then we build a matrix V containing all the possible

intermediary values using vector k and the data vector d. Each line of the matrix will have all

the possible intermediary values for one input data.

Since every column value depends on one hypothetical key and the input data, one of columns

will have the same values as the attacked device. Figure 2.12 illustrates the process.

Step 4: Mapping Intermediate Values to Power Consumption Values. In the fourth step,

each element of matrix V is mapped to a hypothetical power consumption. The values calcu-

lated are then stored in a matrix H. The function used to perform the mapping between the

intermediary values and the hypothetical power consumption, is usually one of the power mod-

els referred before, namely Hamming-weight or Hamming-distance. Figure 2.13 illustrates the

process.

Step 5: Comparing the Hypothetical Power Consumption Values with Power Traces. In

the final step, each column of hypothetical power consumption matrix H is compared with all

the columns of matrix T. These operation compares all hypothetical power consumption of all

key hypothesis with all the position of the recorded traces. All results are then stored in matrix

R. These comparisons between matrices are performed using statistical analysis, such as the

correlation coefficient or distance of means methods [3]. Figure 2.14 illustrates the process.

In the end, if the power model and the number of used traces used are adequate, the key is found

by searching the row, on the resulting matrix R, that as the highest value.

It is worth mentioning that when in step 4, the correlation coefficient is used, this attack is named

Correlation Power Analysis.

xxix

Figure 2.11: Illustration of the power consumption measurement (step 2).

Figure 2.12: Illustration of the hypothetical intermediate values matrix construction (step 3).

Figure 2.13: Illustration of mapping intermediate values to power consumption values (step 4).

Figure 2.14: Illustration of the comparison between the hypothetical power consumption values with the powertraces (step 5).

2.4 Signal Characteristics

This section will talk about the conversion between an analog signal to digital, explaining how

the conversion is done and how to minimize the loss of information during the process. Then the

components that constitute each sample of a power trace, are presented.

2.4.1 Analog to Digital

The power consumption of a device can be seen has as an analog signal, i.e. continuous on

the time domain. To be processed and analysed by a digital device, like a computer, the analog

signal must be converted from continuous to discrete. This conversion is done using sampling and

quantization. Sampling converts the time axis to a finite number of sampled values. Quantization

xxx

is performed on the y axis, where the amplitude of a signal is mapped to a finite set of values, as

illustrated in Figure 2.15.

Figure 2.15: Sampling and quantization of a continuous signal.

When performing analog to digital conversions, there is a risk of losing information on the process.

This can happen if the sampling frequency is low or the converter used has inadequate quantization

resolution, where sampling frequency is the number of points gathered per second on the time axis;

and signal resolution, usually presented in bits, affects the number of values that can be mapped from

the analog to the digital y axis.

When sampling, one has to be careful with the frequency chosen because a low frequency, de-

pending on the signal, might not record all the signal information and lead to what is called, aliasing.

One naive solution could be to always use the maximum sampling frequency, but this has the dis-

advantage of requiring more space and increase the cost of processing the signal. A better way is

to find the minimal sampling rate, in order to reconstruct an analog signal, and then adjust the sam-

pling frequency from there. One theorem that is used to discover this minimal sampling rate is using

the Nyquist-Shannon sampling theorem, which states that in order to capture a signal without losing

information, the sample rate has to be twice the maximum frequency of the signal being sampled.

Another thing to take into account is that low bit resolution leads to a bigger quantization error.

Quantization error is essentially how different is the signal in terms of amplitude from the recorded

value and the original signal. To minimize the quantization error, the signal must occupy the entire

range of the acquisition device. This what the full quantization range is used to map the signal,

maximizing its resolution.

2.4.2 Power traces

The total power consumption in each point is the sum of four components. The operation-dependent

component is represented by Pop. Data-dependent component is represented by Pdata. The other

two, electronic noise and constant power consumption, are represented by Pel.noise and Pconst re-

spectively.

Electronic noise happens when a undesired signal or signals exist and are recorded. This is some-

thing that is present in every measurement in practice. The way to characterize it, is by performing the

same operations with constant input several times and average the result. This way, since the noise

is something more or less random, it will be partially averaged out.

xxxi

Constant power consumption is usually ignored, since it does not carry any useful information

from it. This is because the presence of Pconst in measurement is normally caused by current leaks.

Internal value changes on transistors that are independent of the data processed and operations

being performed are called switching noise. The total power on each sample of a recorded power

trace, can be reformulated by:

Ptotal = Pop + Pdata + Pel.noise + Pconst (2.3)

If an attacker wants to retrieve useful information from the power traces, it can use different types

of power analysis on Pop, Pdata. This type of analysis targets different properties of those components,

meaning that, he can target a complete or a small part of a component. The part used in the attack is

called exploitable component, denoted by Pexp. The part not used is called switching noise, denoted

by Psw.noise. The relation between these components can be defined as:

Pop + Pdata = Pexp + Psw.noise (2.4)

Considering this new relation, the total power composition can be formulated as:

Ptotal = Pexp + Psw.noise + Pel.noise + Pconst (2.5)

The analysis of the signal containing useful information Pexp, becomes more difficult to perform

the higher the value of Psw.noise and Pel.noise. One way to quantify the leakage of information, from

one point, is to use a signal-to-noise ratio (SNR) metric. In the context of power analysis, SNR can

be written as follows:

SNR =V ar(Pexp)

V ar(Psw.noise + Pel.noise)(2.6)

The number of samples required to extract useful information from a point is inversely proportional

to the SNR value. So if traces have more noise, more traces are needed to retrieve useful information

from them.

2.5 Welch’s T-Test

Properly acquired power traces are essentially a signal that need to be processed, in order to

retrieve useful information and ultimately discover the secret key of an electronic device. Until now,

the CPA analysis was used to retrieve this secret key, but this method as one disadvantage: it is

computationally expensive to correlate thousands of large traces, when in the end usually only a

portion of the trace is relevant to the recovery of secret keys. This section presents a statistical

tool named t-test, to try to find this relevant point in a power trace, without heavy computation or,

depending on the attack, thousands of traces.

T-test is a statistical hypothesis test used to distinguish if two sets of data are significantly different

from one another. This test follows a Student’s t-distribution, used when the data being analysed are

normally distributed, the number of samples is small and the standard deviation is unknown. Welch’s

t-test is an adaptation of the regular t-test for sets that have unequal variances and sample sizes.

xxxii

This method by itself does not reveal the secret key of a device, but can show potential Points-

Of-Interest (POI) by distinguishing trace samples if they have different power consumptions, revealing

possible differences in the signal, thus revealing possible leakages. This is of special importance

when dealing with a large number of traces with lots of samples, since it would reduce the correlation

to those presenting potential interest. The t-test formula is defined as the following:

t =X1 −X2√

s21N1

+s22N2

(2.7)

where X1, X2 are the sampled mean from dataset 1 and 2 respectively, s21, s22 are the sampled

variance and N1, N2 are the number of samples.

The degrees of freedom v associated with the variance can be calculated using the following

formula:

v ≈(s21N1

+s22N2

)2

(s21N1

)2

N1−1 +(

s22N2

)2

N2−1

(2.8)

This method works in the power analysis context, because different input data cause different in-

termediary values. These intermediate values cause a distinct power consumption that is noticeable,

whereas instructions that are independent of the input data leakages have the same power consump-

tion. The points, where the consumption is different, are potential places that can have sensitive data

like the XOR operation between secret key and plain text. But for this to work, two distinct power trace

groups need to be created by using two pre-selected plaint text groups. The categories are presented

below:

• Fixed vs. Fixed - The two sets of input data have only one data value different from each other.

This method has the advantage of being completely independent of the algorithm being used,

but usually at the cost of increasing the false negatives.

• Fixed vs. Random - One set has one fixed value and the other has random values. This one

is very simple and has the advantage of being completely independent of the algorithm being

analysed.

• Semi-fixed vs. Random - One set has data that produce some fixed intermediary values and the

other set only has random data. Taking the AES as an example, the semi-fixed data set can be

one that sets half of the bytes, after the first round, to zero and the other half to random values.

This method required some knowledge of the algorithm being analyzed.

The advantages of using this method, when comparing with the correlation attack, is that it requires

much less traces to give meaningful results. The main disadvantage is the loss of intuition on where

the main leakages points are located. This happens because t-test tends to give a range of points

around the point of interest while CPA is more precise. There are also possible false negatives, where

t-test reports does not report POIs in places that have sensitive information being handled.

Some techniques can be used to improve the results like the ”Paired t-test” [20] where the values

from the two sets are given alternated to the crypto device, so its internal state changes between

ciphering operations.

xxxiii

3State of Art

Contents3.1 Side-channel Analysis Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxvi3.2 Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxix3.3 Defence Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlii3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlii

xxxv

This chapter introduces some of the signal processing methods and procedures that can be used

to obtain better traces and discusses some platforms currently being used by the industry to assess

the security of an electronic devices. This work will cover the smart card Section 3.1 presents some of

the side-channel analysis platforms. Section 3.2 presents 2 state of the art attacks that can increase

the CPA attack effectiveness. Section 3.3 gives an overview of 2 defence mechanisms that allows to

reduce possible signal leakage. Finally section 3.4 will give a conclusion of this chapter.

3.1 Side-channel Analysis Platforms

SCA platforms, in the context of this work, refers to software and hardware used to assess the

security of cryptographic devices against power analysis attacks. There are two platform categories:

• Ad-hoc/home-made that can consist of an oscilloscope to measure the power consumption, a

resistor in series with the device power supply and some scripts to analyse the obtained data.

• Commercial tools and software developed by companies or the academic community, with the

purpose of testing the security of an electronic devices.

The following presents some of the most used platforms, both in software and hardware, with

a brief introduction describing their functionality as well some of its advantages and disadvantages.

For hardware, the SAKURA-G/W and ChipWhisperer for in software the ChipWhisperer’s software

module.

3.1.1 SAKURA-G/W

Figure 3.1: Top view of the SAKURA-G. [21]

SAKURA-G, (Figure 3.1) and SAKURA-W (Figure 3.2) are both testing platforms to evaluate

the leakage and related security of the implemented cryptographic modules. The are designed for

xxxvi

Figure 3.2: Top view of the SAKURA-W.

research and evaluate the side-channel leakage, such as SCA and Fault Injection Attacks (FIA).

SAKURA-W is designed to serve as an adapter, that sits on top of the SAKURA-G to enable smart

card security tests. SAKURA stands for Side-channel Attack User Reference Architecture.

This platform is well known in the SCA community, being the one chosen for the DPA contest [22],

a website that hosts a competition where people submit their power analysis algorithms, and the best

one is chosen.

SAKURA-G is a 140mm x 120mm board, composed of two programmable FPGAs: A Xilinx

Spartan-6 (C6SLX75) for the cryptographic circuit (the main FPGA) and a Spartan-6 (XC6SLX9) for

the control circuit (controller FPGA). The main clock oscillator for these FPGAs is clocked at 48Mhz,

but this value can be scaled up or down when programming the FPGAs.

The board is designed to be low-noise, in terms of power analysis, and comes with an on-board

amplifier to facilitate power analysis. The amplifier has a 360MHz bandwidth with a +20dB gain. It

has power sources, one via USB and the other from an external power supply. Also, it comes with two

sets of 40 user I/O pins, where one set is controlled by the main FPGA and the other by the controller

FPGA.

SAKURA-W is an expansion board; it uses the SAKURA-G’s FPGA controller to deliver the com-

mand on the smart card. It has a smart card reader, no amplifier and only has a set of 40 I/O pins.

The main disadvantage of this platform is its high cost, since it can be a barrier for those wanting

to start study the field of power analysis.

3.1.2 ChipWhisperer

ChipWhisperer is an open-source toolchain that provides both hardware and software for power

analysis and glitch attacks. The project started has a crowd funding project on Kickstarter belonging

to NewAE Technology Inc [23].

xxxvii

Figure 3.3: A ChipWhisperer starter kit that comes with a multi-target board, an oscilloscope, differential probeand a low noise amplifier.

Figure 3.4: ChipWhisperer software interface.

The project provides a board, to allow a user to configure it with the desired algorithm, plug it

into on the computer and start experimenting on the field of side-channel attacks using the provided

software (depicted Figure 3.4 ), while maintaining a low cost when compared to other commercial

products.

One of the first products, still in production is the ChipWhisperer-lite (depicted Figure 3.3) that is

formed by the measuring board and a target board. Both boards come connected with each other,

needing to be break a way from each other and the connectors needed to be soldered. There is also

a more expensive version that comes with the components already soldered.

As for the board components, the measuring board has a 10-bit Analog-to-digital converter, one

Atmel SAM3U High-Speed USB, Xilinks S6LX9 FPGA, a +55 low noise amplifier, one MOSFET for

glitch generation and the board is powered by a micro-USB, which also serves as the communica-

tor. The target board comes with an 8/16-bit XMEGA128 microcontroller that guarantees real-time

performance, i.e. that the system responds within a specific time constrains, and is of low power

consumption. At the time, the price of the ChipWhisperer-Lite was $250 USD and other options, like

the board already with the connectors or a more robust version is also available to buy on the website

xxxviii

[24].

The product is designed to be low cost, while performing at the same level of other more expensive

products, by providing a synchronous capture. The concept of synchronous capture is to sync the

trace collection with the target device clock and multiply or divide the base frequency. This helps to

maintain the traces aligned, while allowing the device to have cheaper parts when compared to high

end oscilloscopes.

The software is written in python 2.7, designed to be cross-platform, coming in two modules: Cap-

ture module, used to configure the oscilloscope, communicate with the device and gather the power

traces; and the analyser module, used to visualize and process the captured data. This software is

also compatible with regular oscilloscopes as PicoScope .

3.2 Attacks

The following presents 2 attacks that can be used to improve the CPA analysis. First the template

attacks are going to be presented, where the device consumption is characterized to reduce the

traces required to discover the correct key. Then collision attacks are presented to reduce the number

of keys guesses by finding equal intermediary values in different plain texts.

3.2.1 Template Attacks

Template attacks [5, 25] are based on the characterization of power traces using multivariate

normal distributions. This technique depends on the data being processed and can be divided in two

parts: the template building and the template matching.

The main idea of this attack is to record the power trace of a set of sequenced instructions, on

a device equal to the one under attack [5], using different data and secret keys. Then the power

consumption trace is recorded and the multivariate normal distribution is computed. The result of this

will be a template for each pair of data and the key used on the profiling device. The final step is the

recording of the power trace on the device under attack and compare it with the template built. The

result of this operation is a probability, measuring how good a given template suites the power trace.

Template Building Phase

In the template building phase, a single instruction or a set of sequenced instructions are charac-

terized. The targeted instructions must manipulate the secret value while the power consumption is

recorded. For the instruction characterization to work, different pairs of data and keys (di, kj) must be

used. Then, from the traces gathered, a multivariate normal distribution composed by a mean vector

and a covariance matrix is computed. There are three typically strategies used to build templates: the

usage of pairs of data and keys, where multiple data and keys are used to build a template. the usage

of intermediary values, where a function that uses (di,dk) is characterized for all its possible values;

the usage of power models like the Hamming-weight or Hamming-distance.

xxxix

Template Matching Phase

In the template matching phase, the probability density function and the power trace are evaluated

using the Gaussian/Normal distribution seen in the following formula:

p(t; (m,C)di,kj) =exp(− 1

2 · (t−m)′ ·C−1 · (t−m))√(2 · π)T · det(C)

(3.1)

were m is the mean vector, C is the covariance matrix, t is the power trace and (di, kj) are the data and

key used to build template. From the formula, a probability is obtained indicating how well the power

trace suits the template, outputting the highest value for the best match. During the computation,

numerical problems can arise while computing the covariance matrix inversion and exponentiation.

Performing the exponentiation on small numbers can lead to numerical problems. One solution is

to perform the logarithm to the equation. This has the consequence that now the smallest absolute

value of the logarithm indicates the correct key and not the highest one. To avoid the inversion of

matrix C, it is set to the identity matrix. Doing that, discards the covariance between points considering

only the mean vector. Finally, the resulting equation still has one exponentiation and to remove it,

the logarithm is again applied, avoiding possible numerical problems when using small exponential

values. The final expression is as following:

ln p(t; (m,C)di,kj) = −1

2(ln(2 · π)NIP + (t−m)′ · (t−m)) (3.2)

With these additional operations, the template that produces the smallest absolute value is the one

that indicates the correct key.

Example using the Hamming-weight

An example of use of Hamming-Weight can be given for the MOV instruction, that moves part of

the secret key to a register. An attacker can characterize the Hamming-weight of a device under his

control, by checking the power consumption for each value (from 0 to 8 on an 8-bit microcontroller).

If each of the different weights has distinct power consumptions, a template can be built for each

Hamming-weight value. Then he gathers the power trace on the device under attack, at the approxi-

mate moment were the MOV operation is being performed and compare it with the expected values.

This will give the Hamming-weight of the key portion being processed by the MOV instruction. If this

is repeated for every key portion, it reduces the number of guesses an attacker as to make to retrieve

the secret key.

Template-Based Attack for DPA

Template-based attacks improve regular DPA attacks by reducing the probability of wrong key

guesses. They can be seen as an extension of the template attacks for SPA where the device con-

sumption is characterized.

The basis of this attack starts by, given a trace ti what is the probability of finding key kj , written

as p(kj |ti). Consider ti as an element of power trace vector t and kj as an element from the possible

xl

key guesses vector k. Using the Bayes’ theorem [3] the following formula is deducted:

p(kj |ti) =p(ti|kj) · p(kj)∑n

l=1(p(ti|kl) · p(kl))(3.3)

Bayes’ theorem can be seen has a update function, that receives as an input the key probabilities

p(kl) that do not consider ti, but its output value consider it. The input key probabilities p(kj) are

known as prior probabilities and the output p(kj |ti) is known as posterior probabilities.

The previous formula works for one trace, but in practice, multiple traces are used to gather more

information about the secret key. Now using matrix T, were each line can be seen as a power trace

vector, the mathematical condition is written as p(kj |T ). Since every trace is statiscally independent,

applying the Bayes’ theorem leads to the following formula:

p(kj |ti) =(∏m

i=1 p(ti|kj)) · p(kj)∑nl=1((

∏mi=1 p(ti|kl) · p(kl))

(3.4)

Finally, the probabilities of p(kj) and p(ti|kj) need to be determine in order to calculate p(kj |ti).

The value of p(kj), since all key values are equally likely, is 1/K, were K is the number of possible

keys. As for the value of p(ti|kj), they are calculated after phase 3 of the DPA attack, were the

probabilities are based on the power trace matrix M and the all possible intermediaries values matrix

V.

3.2.2 Collisions Attacks

During encryptions using different plaint texts and an unknown key, it is possible that intermediary

functions produce the same values. When this event occurs, it is said that a collision happened. The

importance of a collision is that, when it can only occur for a certain subset of the all possible key

values. For example: considering an intermediary function f(di, kj) where di, kj are the input data

and the unknown key respectively, if f(d1, k1) = f(d2, k1) and d1 6= d2, k1 can only assume a reduced

number of values so the two functions have the same value. This reduces the number of key guesses

to find the correct key. This type of attacks uses side channel analysis to detect the internal collisions,

in this case using power analysis.

On AES, a collision attack can be performed on the Mix Column Transformation, for example, the

mix columns operation can be presented has a matrix multiplication, like the following:x0x1x2x3

=

02 03 01 0101 02 03 0101 01 02 0303 01 01 02

×a0a1a2a3

(3.5)

where an is the result of sbox(dn ⊕ kn) and b0 is given by

b0 = 02× sbox(d0 ⊕ k0) + 03× sbox(d1 ⊕ k1) + 01× sbox(d2 ⊕ k2) + 01× sbox(d3 ⊕ k3) (3.6)

For the attack to work, one must consider only plain text values that would make booth d0 = d1 = 0

and d2 = d3. When two plaint texts producing d2 6= d′2 and d3 6= d′3 produce the same output,

information can be deducted from k2 and k3.

xli

3.3 Defence Mechanism

Until now, attacks against cryptographic devices were covered, where the power consumption of

a device is correlated with the values it processes. To reduce this correlation between these values

and the power consumption there are two main methods used, namely hiding and masking.

Hiding tries to make the power consumption independent from the operation being performed and

the intermediary values being processed, either in software or hardware. On the software level, to

make the power consumption appear randomly, instruction delays, dummy operations, and instruction

shuffling, inserted in the program. These operations are controlled by random values that generates

values that are then used to decide how long the delays are, how many dummy operations are per-

formed and where the instructions are going to be shuffled. This method has the disadvantage of

increasing the power consumption and the processing time.

On the hardware level, the device can be built to consume an equal amount of power for every

operation and data processed. One way to do this is by using dual rail logic, logic cells receive and

output a value and their complement. This with a precharged logic, which puts the output of a logic

gate on a specific value of either 1 or 0, which always produces the same sequence of bit transitions.

This method has the disadvantage of, besides increasing the manufacturing cost, it is not possible to

make a device’s power consumption 100% independent of each operation and data processed.

Masking has the same objective as hiding, making the device consumption independent of its

intermediary values, but uses random values to mask the intermediary values, instead of trying to

change the device’s power consumption. This requires only changes on the algorithm so it applies

and takes the random mask from its intermediary values. As an example, consider v to be the in-

termediary value and m a random value generated internally by the algorithm. The masked value is

the result of XORiing the two values vm = v ×m. This type of defence mechanism works because

the consumption of a device depends on the values being processed. If something random is added

before the intermediary is processed, the consumption of the masked value will appear to be random,

making the consumption independent from the processed value.

3.4 Conclusion

This chapter presents some of the state of art SCA platforms, attack methods and defence mech-

anisms. This chapter starts by presenting 2 types of platforms for side-channel analysis, were one is

an ad-hoc/home-made platform, whereas the other is a CPA commercial product. Next, some more

advance attack methods are presented, starting with template attacks, that rely on characterising

the device’s power consumption, and collision attacks that try to identify when collision happens to

reduce the number of key guesses. Finally, hiding and masking defence mechanism are presented

to increase the CPA resistance, where hiding tries to directly change the power consumption of the

device, and masking tries to conceal the intermediary value before processing then. For more infor-

mation on this topic refer to [26].

xlii

4Proposed Solution and

Implementation

Contents4.1 Processing Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlv4.2 Trace Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlix4.3 Signal Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . liii4.4 The overall setup usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . liv4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lvi

xliii

This chapter presents both the proposed setup, describing the components and the analysis

scripts developed.

This work intends to provide a setup that allows the assessment of an electronic device security,

using power analysis. The setup components can be divided in 3 main parts: the device under test,

the trace acquisition and the trace analysis. The device under test is the equipment from where the

power traces are retrieved while performing cryptographic operations. In this work the devices used

were the smart card and FPGA, both configured with an AES algorithm.

The component that gathers traces is the oscilloscope, measuring the power consumption of the

device being tested, and a python program running on a PC that configures and coordinates the trace

collection both on the oscilloscope and on the device under tests. The component that analyses the

traces is composed of the analysis scripts that are execute on a PC, they perform various statistical

analyses on the power traces to not only retrieve the secret key, but also assess the signal quality or

find potential points that can be exploited. Figure 4.1 illustrates the components and their relationship.

Figure 4.1: A illustration of a power analysis setup.

It is important to understand how the components work together, so the user can make the most

of the setup. The following sections provide a deeper explanation of the setup. This chapter presents

some of the challenges encountered while configuring and programming the setup and the decisions

made to overcome them. If a user wishes to use/understand the setup, make improvement to it or is

developing its own setup, the following sections can help save time since many of the discovering and

problem solving was done while developing this one.

This chapter is divided into 4 sections. Section 4.1 presents the work developed in regard to the

target devices, from their characteristics to the software developed. Section 4.2 presents the work

developed on the trace collection device, it presents the used oscilloscope characteristics and the

development of the trace collection python program and communication interface with the targeted

devices. Section 4.3 presents the scripts developed for trace analysis, explaining their purpose and

structure. Section 4.4 presents an overview on how the setup can be assembled and configured both

xliv

for SAKURA-G and SAKURA-W. Section 4.5 presents some concluding remarks of this chapter.

4.1 Processing Units

This section presents and explains all the work related to the two target devices used, the FPGA

and smart card. Although both were physical devices, this section is divided between software and

hardware, because some of the nature work was more software programming and the other more

hardware setup testing, in the case of the smart card and the FPGA respectively.

It is worth mentioning that the smart card programming has a larger section when compared with

SAKURA-G, since the SAKURA-G already provided the source code and did not need any changes

in its behaviour. On the other hand, the provided smart card software, required licensing. Meaning

that the source code was not publicly available, which prevented the software behaviour from being

changed. To solve this restriction, the smart card software was developed from scratch. Next the

work developed and the decisions made on both devices will be presented.

4.1.1 SAKURA-G/W

As mentioned in the previous chapter, the SAKURA platform is designed to allow research and

development on SCA and FIA. This platform has been the one chosen since, besides being the

most used by the side-channel community to perform power analysis attacks, it was the only one that

provided a smart card reader, which it was the main targeted device of this work.

The two main components of this board are the controller, a Xilinx Spartan-6 XC6SLX9, and the

main security circuit, a Spartan-6 XC6SLX75. The controller instructs the main security circuit. The

main circuit contains an implementation of the desired algorithm to be tested.

To program the SAKURA-G, two software are used, one for the controller FPGA and another

for the main FPGA. For the SAKURA-W only the controller FPGA is programmed, redirecting the

received commands from the main FPGA to the SAKURA-W (that sits on top of the SAKURA-G). The

programming of the FPGAs is done using Xilinx’s Platform Cable USB II, providing a user-friendly

configuration of Xilinx FPGAs and programming of Xilinx programmable read-only memory (PROM).

The software used to load the files and to communicate with the Platform Cable USB II is the iMPACT

tool from Xilinx.

There are two types of programming, one that is non-persistent, meaning if the power is turned

off, the FPGA configuration is lost, and another that is persistent. The non-persistent programming

should be preferable on the cases the FPGA is going to be changing a constantly, since the persistent

programming is stored on a flash that can only be written a limited amount of times.

The device configuration, provided by the platform’s website for the SAKURA-G/W, was the Verilog-

HDL code and another program to check if the boards were working properly. A program to be written

on the smart card, compatible with the SAKURA-W software, was also provided but was not published

with the source code. The SAKURA-G main FPGA is set to work at 24MHz while the SAKURA-W is

clocked at 3.571Mhz. If the user desired to change these values, it is possible to change by editing

xlv

the Verilog-HDL.

In terms of triggering, in the SAKURA-G is done by sending the signal through one of the top

external pins. The first 4 pins are set at the start of the key scheduling, first AES round, last round

and for all rounds. On the SAKURA-W, the 8th pin of the second row of the bottom pins, is mapped

to smart card’s pin AUX1. This pin signals the trigger and the configuration of when it goes ON is

defines is controlled by the smart card software.

4.1.2 Smart Card

A smart card can be seen as a miniaturized computer, having a CPU, RAM, ROM and a storage

memory. The smart cards used in this project were acquired from a company named wb electronics

[27], the same entity that also supplied the smart cards that came with the SAKURA board. This was

important, in order to ensure compatibility between the smart cards that came with the boards and

the ones acquired.

These smart cards use an 8-bit atmega8515 microcontroller and have 512 bytes of Static Random

Access Memory) (SRAM) and EEPROM, 64K bytes external memory. They are able to perform up to

16 Millions of Instructions Per Second (MIPS) at 16MHz, operating at 4.5V - 5.5V and have a 10,000

write/erase cycles. To understand how the software components and the challenges encountered

while developing the setup, the next section details the encountered the issues.

Source code development

The smart cards that came with the SAKURA board only provided the compiled code and required

a licence for the source code. This prevented modifications to its behaviour and implementation,

like trigger positioning/duration, changes to the AES algorithm, introduction of delays, modifying the

communication protocol and other changes. Because of this, the decision was to implement the smart

card software from scratch, to have more control and flexibility of what could be done with it.

Before starting to implement, the wb electronics website named infinity USB, was checked for

software examples of the smart card. On the website, source code examples for the PC side were

available, but for the smart card only the compiled code was provided. We reached out the company

and asked if it could provide the source code of the smart card, since was for academic use and after

some time they agreed and sent the source code. The source code was of great help since it provided

the program structure and communication functions to transfer data from/to the smart card.

While waiting for the answer, the smart card’s atmega8515 microcontroller datasheet [28] was

studied to understand how the microcontroller ports were organized, labelled and what ports could

be mapped to the smart card pins. On the software part, the 8515 I/O Application programming

interface (API) was studied to understand what software ports were available and how they could be

used. This later helped to better understand the source code and its functionality.

After having the source code example, the next step was to program the new smart card software,

using the functions to transfer and receive bytes of data from the provided source code. In the end,

the goal was to have smart cards that were identical to the smart cards provided with SAKURA smart

xlvi

cards, in terms of functionality, to ensure compatibility with the SAKURA provided software, but able

of being changed.

Programming

The source of the smart card program needed to be tested to check if it was working properly,

since sometimes the code provided is not the final one and has bugs. The source code was compiled

using atmel studio 7.0 and the program was written on the smart card using the smart card writer

named infinityUSB. After that, the smart card was tested using the PC side software provided on the

infinityUSB website and after confirming everything worked, the smart card programming started.

The program was adapted to have the same smart card command structure as the SAKURA smart

cards (the APDUs), to ensure compatibility with the software provided by the SAKURA’s website. After

this, a C implementation of the AES algorithm named tiny-AES128-C [29] was used because of its

small size and simplicity, since the smart cards have limited resources. After having the program

finished, the code did not fit on the internal memory of the smart card. The problem was that, on

microcontrollers, every variable and vector is stored on RAM, and 512 bytes was not enough to store

everything. The solution was to use the pragma PROGMEM, to store the constant variables and

constant vector on the flash memory. The only difference was that now to access the values stored in

these variables and vectors the function pgm read byte near and pgm read byte need to be called.

After loading the compiled program into the smart card, the software was checked to assess if

everything was working properly. To do this, the smart card writer was set to reader mode and the

smart card was tested. After cheking the developed smart card software was working with a modified

program from the infinityUSB website, the smart card was tested on the SAKURA-W.

On SAKURA-W a modified program, provided by the SAKURA’s website, was used. This func-

tionality of this program was to check if the SAKURA-W smart cards was working properly, and the

adaptation was done to strip the program of its interface and leave only the communication part.

However, when tested with the modified smart card it did not answer to the commands that were sent.

The problem was found to be the frequency used by the smart card reader, which was lower than the

SAKURA-W’s reader.

Frequency Mismatch

After a careful inspection on the data transmission’s functions and values being used, it was con-

cluded that the smart card software was configured to work at 6MHz. SAKURA-W’s smart card reader

was clocked at 3.571MHz, but at first this should not make any difference since the microcontroller

could work at lower frequency as well. The problem was on the receiving and sending data functions,

that were dependent on the CPU frequency. To explain the issue a brief explanation of the functions

will be given below:

Data on the smart card is transmitted in serial mode i.e. one bit at the time. The smart card

reads/writes from/to the data pin, where the values can be either low or high (0 or 1) and there is

a time window for both reading and writing each state. In order for the smart card to know when

xlvii

to read, a delay function is used, where it receives a value, decrements it until reaching zero and

leaves the function. The idea is, the function will spend a certain amount of time decrementing

the value, and this duration depends on that value passed to this function and the CPU frequency.

Higher CPU frequency’s mean faster instructions processing and smaller delay times using the same

function value. The program came with a predefined value for the 6MHz CPU and for the bit rate of

9600 Bits Per Second (BPS). Note that the bit rate must also be considered, because the higher the

bits transmitted per second, the smaller the bit intervals will be. To calculate the new value for the

SAKURA-W frequency, the following formula was inferred from the values already calculated in the

code:

delay value =cpu frequency

3× bps(4.1)

The CPU frequency divided by three times the bps, gives the number of decrement loops the function

should perform. The multiplication by 3 is because the function delay spends 3 CPU instructions or

clock cycles to perform the value decrement.

After the new delay values, have been calculated, the smart card was tested again on the SAKURA-

W and this time it responded to the commands but it did not deliver the correct result, having extra

value in between the smart card answer. To try and solve the problem, the communication was in-

spected at the bit level using the oscilloscope, as described next.

Communication Debugging

To confirm if the new delay values were compatible with SAKURA-W frequency’s, and to check if

the extra value of the result could be observed on the communication bits, the communication of the

SAKURA smart cards and the programmed one was compared. In order to do that, the developed

smart card software was programmed to have the same command structure and received/sent the

same data as the card that came with the SAKURA board. This comparison was done at the bit level,

comparing the time between each bit transmission, using the PicoScope and its capturing software.

The transmission protocol is constituted by one delay bit that start at a low value, eight bits of data,

one parity bit and two stop bits that stay at a high value. After comparing the two, it was observed

that they had a small difference, but not enough to cause problems in the transmission, except in one

case. When a call to the function receive and then sent a byte, the protocol was not waiting the time

specified by the protocol on the stop bits.

After fixing the problem and checking the extra value was not present on the low-level communi-

cation, the smart card was tested again to confirm the extra value was still present on the result. The

conclusion was, the error did not come from data transmission, i.e. the problem was not on the smart

card, and the extra value was being inserted by the PC side software. To confirm this hypothesis,

the PC side code was written from scratch using also python to check if the problem persisted. After

having the python program finished, the extra value was gone and the only thing left to do was the

trigger. In a later section, the development of python program will be described in more detail.

xlviii

Trigger Implementation

With the smart card programmed and communicating with the main python program, the only

thing left to do was to programming a trigger signalling to identify when the AES operation started.

This ensures that the acquired traces are aligned, resulting in a more effective CPA analysis. Also,

the trigger would allow to measure how long an AES round takes since it is difficult to identify the

beginning and ending of each round from the power traces. This measured time can then be used to

configure the PicoScope capturing period.

The trigger was implemented by setting one of the auxiliary pins of the card to serve as a trigger.

First the desired port was identified with the help of the atmega8515 datasheet, then the pin was set

to output by changing the 5th register from Data Direction Register (DDR) B to 1. Set the pin to high

or low, was done by setting the 5th bit of the register of PORTB to 1 or 0 respectively.

Having the trigger implemented and set to go ON during the first AES round, a successful corre-

lation analysis was possible in order to confirm it was working properly.

4.2 Trace Acquisition

This section presents and explains the components used for the trace acquisition and the chal-

lenges encountered during its development. The components that are going to be addressed are the

PicoScope, the oscilloscope used to measure the power consumption, and the main collecting soft-

ware that is used, not only to configure and gather the power traces, but also to send and receive data

to the device being measured. It is worth mentioning that the main gathering program is essentially

the same for the smart card and on the FPGA, with the two differences: i) the parameters for trace

acquisition, so it adjusts to the signal being measured; ii) the interface used to communicate with the

target device. Next, a brief description of the PicoScope is presented.

4.2.1 PicoScope

The PicoScope series 6000 [30] (figure 4.2), in particular the 6404D, is a high-performance USB

oscilloscope that, together with its software, turns a computer into an oscilloscope and spectrum

analyser. This type of oscilloscope offers portability, performance, flexibility and is programmable. It

is configured and accessed via PC using the software that comes along with the device, or by making

use of the PicoScope device driver.

The oscilloscope comes with 4 input channels, 8-bit signal resolution, 500 MHz bandwidth and up

to 5 gigasamples per second (GS/s) in real-time, shared among the 4 channels. It comes with what is

called a deep memory that can store acquires up to 2 gigasample. To handle that amount of memory

and at the same time display the traces without compromising performance, PicoScope comes with

a Hardware Acceleration Engine (HAL) 4 that guarantees both trace gathering and trace visualization

without slowing down. The allowed voltage range scales are: ±50 mV, ±100 mV, ±200 mV, ±500 mV,

±1 V, ±2 V, ±5 V, ±10 V, ±20 V. It has an integrated wave generator output, capable of generating

xlix

Figure 4.2: PicoScope 6000 series.

repeated waveforms, whose characteristics can be set by the user. It has an auxiliary trigger that can

also serve has a reference clock signal.

The software that comes along with the oscilloscope, allows the user to observe and measure

the signal(s) being captured. Some of the options provided by the software allow the adjustment

of the vertical and horizontal scale, where the vertical is the voltage range and the horizontal is the

time-based measured in units of time. It is also possible to adjust the sampling frequency, config-

ure triggers, perform zoom in/out on sections of the signal and configure the capturing channels to

Alternating Current (AC)/Direct Current (DC)/DC50(ohms).

Before starting to capture traces, the desired signal was observed on the PicoScope software

and then adjusted vertically and horizontally to be close to the axis limits. Then, after having the

optimal parameters, they are passed to the trace gathering program. To communicate with the driver,

the program uses a 32-bit Windows dynamic link library (DLL) denoted as ps6000.dll, that provided

access to the PicoScope functions.

4.2.2 Programming the traces collecting program

The main trace gathering program was based on an open source program made by Colin O’Flynn,

denoted as pico-python [31]. This software works as a wrapper for the PicoScope ’s API, providing

more improved functions to access PicoScope’s functionalities. The program is divided in 3 parts:

PicoScope configuration, target device instrumentation and storage.

On the PicoScope configuration, first the communication driver is located and loaded. Then the

parameters of the PicoScope , such as the trace length, signal amplitude and sampling frequency are

l

sent to the PicoScope . Also, the input channels that are not going to be used are turned off (by omis-

sion PicoScope activates all channels) and the trigger is configured. On the device instrumentation,

the plain texts can be loaded from a file or be generated randomly by the program. Then the trigger

is armed and the program sends the plain text to the device, using an interface that is different for

SAKURA-G and SAKURA-W. After the trace is gathered, the program receives the cipher text.

On the trace storage, the gathered trace is converted from a raw type to a 32-float type, and then

is stored in a Matlab file format.

It is worth mentioning that because python 32-bit is used, there is a limitation on the amount of

memory that can be allocated, meaning the program is not capable of supporting large traces in

memory. To overcome this problem, the program gathers a specified amount of traces, stores them

on a file, clears the memory and continue to gather new traces.

4.2.3 PC to SAKURA-G communicator programming

The interface to communicate with the SAKURA-G, used by the trace gathering program, was

based on the program source code from the SAKURA website [32]. This program, named SAKURA-

G checker, communicates with the board to verify if the board was working properly by sending plain

texts and checking their output.

After inspecting the SAKURA checker, it was noticed that the code used a DLL written in C# to

communicate with the SAKURA-G and it had a considerable amount of code to perform the sending

and receiving operations. The use of a C# DLL in is relevant because the gathering script was written

in python and the implementation being used, the standard python, did not support the load C# DLLs

or programs. Also, even if a C/C++ programmed DLL was found, the time that would be spent rewriting

and then debugging it, at the time, was not worth it when a working example was available and just

needed some adaptations.

Taking these two points into account, the decision was to strip the code from its graphical interface

and adapt the code to receive instructions via console, leaving the communication with the SAKURA

board as it was. The result was a console program that can receive 3 types of commands: A command

to change the secret key and the other two to cipher data with the difference one would return the

result and the other would not.

After having the communicator program working, the final step was to integrate the program with

the main gathering script. This was achieved by calling the executable with some parameters such as

the command, the communication channel number SAKURA was and the either the key or plain text.

The result of this calling would be the cipher text.

4.2.4 PC to SAKURA-W communicator programming

The interface to communicate with the SAKURA-W, used by the trace gathering program, was also

based on a program from the SAKURA website, named SAKURA-W checker [33]. At the beginning

a similar strategy for the SAKURA-G program was adopted, i.e. transform the existing code into an

executable that receives commands. When the program was tested, it did not work and, with the help

li

of an oscilloscope, the executable was found to make the smart card reset, after sending a plain text

to the smart card. This behaviour caused the trigger to be set to ON at startup, causing the gathering

program to record the smart card initialization instead of the AES rounds. The solution was to code

all the handling and send in to the smart card APDUs in python.

In contrast with the SAKURA-G checker, that used a C# DLL to communicate with the board,

SAKURA-W did not require any DLL since the communication was performed via virtual COM port.

Also, the code used to communicate was simpler when compared with the SAKURA-G checker. After

finishing the python implementation of the SAKURA-W communicator, the first tests were performed

but the smart card did not seem to answer back. After confirming the commands were being delivered

correctly with the use of the oscilloscope, by observing the bits transmission to the smart card, one

of the smart card that came with the board was tested instead and the result was the same. This

behaviour of maintaining the trigger pin at high was observed at the smart card initialization stage, so

this indicated that maybe the smart card was stuck at the initialization stage.

Further analysis of the code revealed that in the configuration code, the one that configured the

communication parameters with the virtual COM port, the smart card received a RST (Request to

send) signal for 200ms, then waited for 500ms before requesting Answer To Reset (ATR). This served

as hint that maybe the python module used to communicate with the COM port, named pyserial, was

somehow stuck sending the RST signal even after it was explicitly coded that it should not. After trying

all the commands that could possibly turn off, the signal did not change the smart card state, it came

the idea of inspecting the COM lines to assess if indeed the problem was the reset state. To do that

the virtual COM lines were inspected to check if the RST was still active when it should not.

To analyse the virtual COM port, it is important to understand the communication protocol of the

port, in this case the RS-232. The RS-232 is used to transfer digital data between a Data Terminal

Equipment (DTE) and Data Communication Equipment (DCE) a one bit at the time i.e. it performs

a serial data transmission. The next step was to create two virtual COM ports, one connected to

the communicator program and the other port connected to a terminal. The communicator program

would be a modified SAKURA-W checker and the python program used on the smart card’s trace

gathering program. The idea was to compare the data and the state flags being passed during the

communication on the terminal.

Comparing the two programs revealed that the data being passed was the same, but the state of

the flags was different. The flags displayed, that represented the status of the pins, were the Carrier

Detect (CD), Clear to Send (CTS), Data Set Ready (DSR) and Ring Indicator (RI). In the case of

the SAKURA-W, written in C#, the flags after the initialization state only the CTS was ON. With the

smart card communicator, written in python, the CTS, CD, DSR were ON. This clearly indicated that

something was not right, since both flags should be the same.

The possible fix was to try the latest version of the pyserial the 3.2.1 instead of the 2.7, but that

was only available for python 3 and the one being used was python 2. Fortunately, only small incom-

patibilities with the new version were raised, but were easily corrected, and the flag state problem was

solved and the program started to communicate normally with the smart card.

lii

4.3 Signal Analysis

After the traces were collected and stored they need to be processed so they can provide mean-

ingful information. The scripts developed during this work were: the correlation power analysis, to

extract the secret key from the traces; t-test analysis, to find points of interest with a low number of

traces; signal-to-noise ratio, to assess and compare the setups that provide a better signal quality.

The software used to develop these scripts was Matlab, since it offered programming flexibility and

provided many of the functions needed by the scripts, as well as tools to present the data such as in

graphics. The following presents and explains in more detail what the scripts do and how they were

developed.

The correlation script is used to correlate all power consumption of possible key’s with the mea-

sured power traces using two power models, the Hamming-Height and Hamming-Distance. In terms

of attackable AES rounds, the script can attack the first and last rounds, which are the most common

targets, but can easily be extended to attack other rounds. This script was based on exercises by

rozvoj website [34].

In terms of the script structure, it can be divided in four parts: The first is the data input, where the

power traces are loaded into memory and the script is adjusted for the trace characteristics such as

the number of samples, the number of traces, the sample averaging and others. The second part is

the construction of the hypothetical power consumption, where a matrix is built considering the pair

text/cipher text, generates all hypothetical values for one byte of the key, using the Hamming-Weight or

Hamming-Distance. Next is the correlation attack, where the traces collected are compared with the

hypotheses matrix producing the correlation matrix and finally, the analysis of the correlation matrix

and the output of the data, such as the key, the correlation coefficient of the top 5 keys.

This script was later extended to also perform incremental CPA, where the number of the traces

being correlated, generating a picture that shows the evolution of the correlation with the number of

traces. The other extension was the partial correlation, where only a certain number of traces were

correlated at the time, allowing to check if there was some variance among the traces collected.

4.3.1 Signal-to-noise ratio

The SNR script was developed to efficiently measure a signal quality and output a value that can

use to compare, for example, different trace gathering setups. The script was based on the one used

in the book ”Power Analysis Attacks: Revealing the Secrets of Smart Cards” [5] where they explained

how one can calculate the signal-to-noise ratio of one sample point in a trace.

SNR =signal

noiseSNRdb = 10× log10(SNR) (4.2)

For the script to work, the user needs to know which values are being processed at a given time,

this case. To do that, some pre-generated plain texts must be used to produce the desired Hamming-

Weights at a determined AES round operation, in this case the S-box output. Then, this operation

needs to be found on the trace or during the device operation, with the use of a trigger for example.

liii

After having the plain texts and the operations located, the traces are grouped into nine sets,

according to their Hamming-Weight values, from 0 to 8, and each group processed individually. The

operations performed on each trace group is the average, to retrieve the signal, and then the standard

deviation to calculate the electronic noise. Finally, the signal is divided by the noise retrieving the SNR.

If the user desires to have the value in decibels (dB), it must apply the base 10 logarithm and multiply

by 10.

4.3.2 T-test Analysis

The t-test script was implemented to give the intuition of points-of-interest, starting with a low

trace count number and allowing to maintain, depending on the method used, an abstraction of the

algorithm being used. For this test, there are two sets: one set can have a single value, or a group

of values that will produce a specific intermediate value, while the other set is random. Depending on

the type of the set used the t-test is called fixed vs. random, when using a fixed data set or a semi-

fixed vs. random t-test produces specific intermediary values. These two sets will then be compared

using the Welcher’s t-test, was explained in the previous chapter. It essentially makes a hypothesis

test assessing if the two sets have an equal mean value or not. After having the t-test results, a graph

is generated showing the point of interest.

4.3.3 Intermediary value generator for AES

Both the SNR script and the semi-fixed vs. random t-test use a set of plain texts that produce a

specified intermediary value, in this case a Hamming-Weight, at a specific round operation. A script

to generate these values was built, based on a Matlab implementation of AES from [35], to produce

these plain texts.

There were two possible ways to implement this: the first was to generate random inputs and

perform an AES ciphering and check the value that was being generated at the specified location; The

second approach was to define the intermediary value and then perform the deciphering operation

from the targeted AES round, i.e. performing only the number of deciphering rounds equal to the

targeted round. At the end, the second method was chosen because it would give more control over

the intermediary values being generated on a specific round.

The intermediary values generated are after a S-box, and can be generated for any AES round.

When the plain text is generated, it is stored in one of the 9 files, each one representing the Hamming-

Weights from 0 to 8.

4.4 The overall setup usage

A user wanting to use this setup has to do a couple of steps before starting collecting traces. First,

depending if the SAKURA-G or SAKURA-W is used, the programming of the FPGAs is different. For

the SAKURA-G, the two FPGAs need to be programmed using the Xilinx platform cable USB, for

liv

the SAKURA-W only the controller is required. Note that the main circuit FPGA only needs to be

programmed once, even when changing between SAKURA-G and SAKURA-W.

After that, one SMA cable needs to be connected on to board and PicoScope, to measure the

power consumption, and a probing cable needs to go for the trigger pin. On the SAKURA-G there

are two options for power measurement: If the user wants to use the integrated amplifier, it plugs the

cable on SMA J3, but if the user wants to use no amplifier or its own amplifier SMA J2 is used. For

the SAKURA-W the measuring point is the SMA J2. For the trigger on SAKURA-G, a probe cable is

connected to pin 3 to gather all AES rounds and for SAKURA-W, the 8th pin counting from left to right

on the second row of the 40 pins is the one used, because it connects to smart card pad the AUX 1.

Next the user configures the main gathering script with parameters such the number of traces, the

signal amplitude, recording duration and sampling frequency. One way to obtain these parameters is

to use the software that comes with the PicoScope , that enables one to see in real-time the signal

being gathered and can adjust the parameters for optimal results. Then, after the main program is

configured, the trace gathering starts and when it finishes the user chooses one of the Matlab scripts

to perform signal analysis or the attack. Figure 4.3 illustrates the setup using the SAKURA-W.

Figure 4.3: The complete setup using the laboratory power supply, SAKURA-W on top of the SAKURA-G andPicoScope .

lv

4.5 Conclusion

This chapter illustrates the how the overall trace gathering setup is organized. First, an overview

of the 3 main components the setup is presented, as well as their functionality. Then each of those

components was described in more detail, explaining some of the development challenges and deci-

sions. The first component one was the processing unit, here it was explained the two types of the

devices used, the SAKURA-G for the FPGA and SAKURA-W for the smart card, and the work devel-

oped on them. The second component presented was the trace acquisition, were the PicoScope was

introduced as well the trace gathering program. The third component was the signal analysis, where

3 scripts were developed, explaining their functionality and structure.

lvi

5Experimental Results and

Evaluations

Contents5.1 Different Setup Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lviii5.2 T-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lxv5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lxvii

lvii

Following this, a CPA attack is performed, increasing the number of traces progressively, and

evacuating how the results evolve with the increase of traces.

This section evaluates the proposed setup, described in the previous section. This evaluation

will show how different setup configurations affect the gathered smart card power traces. It first

analyses the traces SNR and the respective Hamming-Weight leakage. Following this, a CPA attack

is performed, first increasing the number of traces progressively and evaluating how the results evolve

with the increase of traces, and then measuring the results using groups of 50 traces.

5.1 Different Setup Configurations

The setup has two main fixed physical components used to gather traces, the target device and

the oscilloscope, that is in this case were the SAKURA-W (smart card) and the PicoScope . Other

components that can be used were also considered, to improve the overall signal quality, namely an

external amplifier (Minicircuits ZFL-1000LN+) and a DC blocker (API 8037), a device used to filter

out the DC. To assess the impact on the signal quality, three setups were considered using these

components:

• Setup 1 - Amplifier, DC blocker and oscilloscope in AC mode.

• Setup 2 - DC blocker and oscilloscope in AC mode.

• Setup 3 - Only using oscilloscope in AC mode.

The oscilloscope also had a DC mode, but this one was not considered since the DC component was

being filtered out, by the DC blocker, and using the DC mode or the AC mode was equivalent.

For the SNR measurement, the oscilloscope was set to take samples at a rate of 1.25 GS/s (Giga-

samples per second) for the duration of 5×10−6, which lead to 6250 samples and each power trace

occupied 25 KB on file. For the other measurements, the oscilloscope was set to take samples at a

rate of 1.25 GS/s for the duration of 3.38×10−4, which lead to 422500 samples and each power trace

occupied 1.6 MB on file.

5.1.1 Signal-to-noise Ratio Comparison

To compare the mentioned setups, the SNR is used as a quantitative measure to assess the

signal quality and noise level of each setup. The proposed way to measure the SNR, as mentioned in

previous chapters, consists of measuring the power consumption of 9 different Hamming-Weight at a

given moment, from 0 to 8 since one byte is can have values from 0 to 255. As a simplification, and

for better result accuracy, only one value per Hamming-Weight is considered. However, one thing to

keep in mind is that each bit position might consume different amounts of power, i.e. the value 0x01

may leak differently than 0x10.

The byte chosen for this measurement, was the first S-box output byte on the first AES round. This

was found to have a considerable amount of leakage, mainly because the S-box value is fetched from

lviii

the EEPROM. This power consumption contributes for noticeable distinction between each hamming-

weigh, improving the SNR values, since the algorithm measures how distinguishable different signals

from the noise. Each 9 Hamming-Weights were measured 200 times, to ensure the noise could be

removed by averaging and the noise characterized with the standard deviation.

The evaluation metric used to evaluate is the SNR average of 2 clock tick, where the Hamming-

Weights have more stable values, and the signal distinction by measuring the distance between

highest and minimum hamming weights (8 and 0 respectively). This distance will be measured in

quantization values, since the captured signals have different amplitudes in terms of voltage, so an

equal comparison can be made in terms of signal distance. The average of the highest and lowest

Hamming-Weights (0x00 vs 0xFF) will be calculated and the difference used to measure the distance

between the two signals.

The results obtained for the 3 setups depicted in Figures 5.1, 5.2 and 5.3 suggest that: setup 1

had a SNR average of 29.41 dB and a distance of 2.64, (Hamming-Weight 0 and 8), on clock 1 (from

sample 20 to 180) and a SNR average of 25.41 dB and a distance of 2.43 on clock 2 (from sample

200 to 360); setup 2 had a SNR average of 16.99 dB and a distance of 1.54 on clock 1 and a SNR

average of 13.55 dB and a distance of 0.07 on clock 2; setup 3 had a SNR average of 18.95 dB and

a distance of 1.21 on clock 1 and a SNR average of 13.97 dB and a distance of 0.06 on clock 2.

Figure 5.1: Setup 1: Left image shows the SNR of the power traces shown on the right.

From these results, it can be concluded that: Setup 1 provided both the highest SNR and distance

between the Hamming-Weight 0 and 8; Setup 2 and 3 had similar results, whereas setup 2 had the

lowest SNR but better distance when compared with setup 3. Considering the metrics proposed, the

overall setup can benefit from an external amplifier with a DC blocker, when capturing smart card

traces. It is worth mentioning that, setup 1 and 2 might be further improved if another DC blocker is

used. The reason lies on the minimum pass frequency of 10MHz of the DC blocker, possibly causing

some information loss, since the smart card is working at 3.571Mhz. Also, when the SNR ratio was

performed on a larger trace portion it was noticed that some places, besides the ones analysed before,

showed presence spikes in SNR. These spikes might have useful information, but at the moment it is

not known what information is.

lix



5.1.2 CPA Attack

This section evaluates the same 3 setup configurations, but now using CPA, to assess if the results

obtained corroborate the analysis of the previous section.

To test the setups, 2 tests will be performed. First, an incremental CPA is performed, where the

number of traces are incremented 10 samples at a time until 500, and the distance between the

correct key average of other key guesses is measured. Secondly, as a complementary test, the trace

quality of the traces gathered is measured correlating 50 traces at a time and checking how much

they vary from start to end. The traces used for this test to cover the first AES round and use the

same smart card tested on the previous section.

The results of the incremental CPA test, assessing the correlation value with 50, 100, 200 and 500

traces, show that: Setup 1 had an average correlation distance of 0.52; Setup 2 had a correlation

distance of 0.40; Setup 3 had a correlation distance of 0.14, as depicted in Figure 5.4.

These results suggest that the best proposed setup is indeed the one that presented the best

distinction between the correct key byte and the other guesses, namely setup 1. A more interesting

lx

Figure 5.4: Difference between the correct key and the average of other keys hypothesis.

result was between setup 2 and 3 because, from the previous section, setup 3 had better SNR how-

ever setup 2 greater distance, but the incremental CPA was much better on setup 2. One possible

explanation for these results is that, the distance between Hamming-Weights are more relevant than

the SNR on CPA attacks. Another explanation is that the other spikes seen in SNR, might be leaking

information.

One relevant observation that can be noticed on Figure5.4 in that there were a decrease in value

of the correlation values at the start of setup 2 and 3. This may mean that the setups had a period of

adaptation, where the first traces had worse quality than the following ones. This leads to the second

test, with CPA, performed over groups of 50 traces at a time to assess if the trace quality changes

with more time spend capturing them. The results obtained are depicted in Figures 5.5, 5.5 and 5.5.

As it can be observed, the assumption that the first traces have a worse quality than the following

ones can be observed on both setup 2 and 3. One reason for this to happen can be due an adaptation

a period from the oscilloscope, since with setup 2 the oscilloscope has to use its voltage range ±50

mV which might bring more noise that is filtered out. On setup 3, it might have the same problem

as setup 2, plus the filtering of the DC component mode on the oscilloscope. By inspecting the first

traces, gathered suing setup 3, one can observe a period where the DC is taken out progressively.

Figure 5.8 shows the first 10 traces gathered from setup 3.

lxi

Figure 5.5: Partial CPA from setup 1. Figure 5.6: Partial CPA from with setup 2.

Figure 5.7: Partial CPA from with setup 3.

As it can be observed the first traces measured by the oscilloscope sill have some of the DC

component, that is then gradually removed until the trace stabilize closer to the 0 mV.

After analysing and selecting the best setup configuration, a full CPA analysis was performed and

all key bytes were found correctly with 25 traces and a correlation distance, between the correct key

bytes and average of the other key guesses, of 0.15. The conclusion from this section is that the

setup can benefit from an external amplifier with a DC blocker, leading to a faster correlation distance

from the correct key and the other keys guesses.

lxii

Figure 5.8: First traces 10 traces gathered from setup 3, showing the progressive adaptation of the oscilloscopeAC mode.

5.1.3 Power Supply Comparison

The SAKURA-(G/W) platform can be powered using the communication USB or an external power

supply. Depending on the power source used, the acquired power traces can contain more or less

noise. An interesting test is to use different power sources and see the impact they have on the trace

quality. Intuitively, a noisy power supply should impact negatively the power traces.

For this test, 3 power sources used, in particular an adjustable laboratory power supply; a wall

charger; and the USB cable. Given the results obtained on the previous sections, setup 1 is used

to gather the traces. The first test, consists in measuring the distance between the maximum and

minimum peaks of voltage with the power supplies in void, using an oscilloscope, then each power

supplies will be compared on an incremental CPA using 500 traces, to assess the real impact they

have on the CPA.

In the first test, the minimum and maximum distance between voltage peaks, the laboratory supply

showed a distance of 0.11 V, the wall charger had 0.063 V and the USB power supply had 0.551 V,

as depicted in Figures 5.9, 5.10 and 5.11.

This shows that the USB power supply has the highest noise from the 3 power sources. It is thus

expected to perform worse than the other two. Surprisingly, the wall charger had the lowest ripple

noise, even when compared to a more expensive adjustable laboratory power supply. One thing to

keep in mind is that, the results obtained are just to have an idea of how much fluctuation all three

power supplies have, since they were measured in void. A more accurate test would be to measure

the amount of noise the power supplies induce, when current is drawn from them. The second test

consists in the incremental CPA evaluation, as depicted in Figure5.12.

lxiii

Figure 5.9: Power trace of the laboratory power supply in void. Figure 5.10: Power trace of the wall charger in void.

Figure 5.11: Power trace of the USB power supply in void.

The obtained results suggest that the best power supply was the adjustable laboratory power

supply, having about 0.05 volts of distance from the other 2 power supplies starting at trace 200 and

beyond.

Despite the fact the wall charger had the lowest ripple noise from the other setups, it performed

identically to the USB power supply. As mentioned before, the power supplies were tested in void, so

it could be the case that, when the device starts drawing power, the noise increases. Also, it can be

noticed that the power supplies do not significantly affect the performed analysis.

lxiv

Figure 5.12: Difference between the correct key and the average of other keys.

5.2 T-Test

In the previous section, a CPA was performed showing that was possible to recover the secret

keys with a correlation distance of about 0.15 from the correct key byte guesses and the average

other key guesses, using only 25 traces to guess the secret is low, however the smart card used

has no protection mechanisms, and has its S-box written on the EEPROM, further increasing the

Hamming-Weight leakage.

If the user was dealing with a more protected smart card, thousands of traces could be neces-

sary to compromise the smart card. Since performing the CPA computation on a large trace can be

computationally demanding, a better approach would be to discover portions of the trace that could

potentially leak information, using a statistic tool that did not require many traces neither processing

time, reducing the number of samples that needed to be correlated. Since it has been proven pre-

viously that setup 1 had the best results on gathered traces, it was the one chosen to perform this

test.

This section looks into the Welch’s t-test, a statistical tool that can be used to find these points of

interest. This test was performed by creating two data sets, one that produced the Hamming-Weight

value of 0 on the first S-box on the first round, and another set that produced random values. Then 20

traces of the first AES round were gathered for each data set and the t-test was performed. To assess

the t-test accuracy, a CPA attack was also performed using the random data set gathered. This allows

to infer if the t-test distinguishes the point-of-interest faster than the CPA and if they match.

For this measurement, the oscilloscope was set to take samples at a rate of 1.25 GS/s for the

duration of 3.38×10−4, which lead to 422500 samples and each power trace occupied 1.6 MB on file.

lxv

The results using 10, 15, 20 and 25 traces are shown in Figures 5.13, 5.14, 5.15 and 5.16.

Figure 5.13: Left image shows the correlation result and right image the t-test using 10 traces.



This allows to infer if the t-test distinguishes the point-of-interest faster than the CPA and if the

lxvi


points match. The results using 10, 15, 20 and 25 traces are shown in Figures 5.13, 5.14, 5.15 and

5.16.

From the results obtained, t-test seems to a useful tool to access points-of-interest much faster

than CPA, but it is worth mentioning that further analysis need to be done using a more protected

smart card.

5.3 Conclusion

This section presented 3 analysis setup configurations, in order to assess the one that could pro-

vide the best results on the CPA. First the SNR and the distance between the maximum and minimum

Hamming-Weights, on a trace portion that had a great and noticeable leakage, was proposed as the

first metric to choose the best setup. Then to confirm this metric, the CPA was performed, using

the same setup, and the results of the best setup were confirmed. Finally, to reduce the number of

samples processed by a CPA attack, t-test can be used as a faster way to provide points of interest,

since the POI showed up much faster and matched the one found on the CPA attack.

lxvii

lxviii

6Conclusions and Future Work

Contents6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lxxi

lxix

Smart cards are a common asset used in our daily lives. Their applicability gives from transporta-

tion, health, payments, telecommunications, identification, among other areas. These smart cards

provide several tamper proof and unauthorized access protection mechanisms, making then appro-

priate to store sensitive information, such as, secret and private keys.

Power analysis is an effective way to retrieve a sensitive information from a smart card. SPA

attacks are based on the visual inspection of one or a few power traces. They can provide information

about what operations were performed by the smart card, and even in some particular cases expose

the secret key. In most cases, the attacker needs to know some smart card implementation details to

be successful. DPA, depending on the trace quality and protection mechanisms the smart card under

attack has, might require a considerable amount of traces. This type of attack has the advantage

of being easy to automate, since the attack is based on finding correlations over specific points of a

power traces. This has also the advantage of not requiring a detailed knowledge of the smart card’s

implementation to be successful.

The proposed and implemented solution was to configure a setup that allows the user to per-

form side-channel analysis on both smart cards and FPGAs. The main hardware used was the

SAKURA-G/W, and a PC oscilloscope branded picoscope. To have the setup operating, four software

components were also developed:

• The software used to program the smart card, containing the AES algorithm, the communication

and customizable trigger.

• The trace gathering program responsible for configuring and setting the picoscope, communi-

cate with the target device and gather/store the traces.

• The SAKURA-G and SAKURA-W communication program used by the trace gathering program

to deliver and received data from the devices.

• The scripts used for the trace analysis (CPA, SNRP, T-test), which were responsible to analyse

the traces and present the data in a way understandable to the user, such as in graphics and

data comparison.

Then the implemented solution was analysed to assess which configuration gives the best CPA

results, since the quality of the traces influences the attack success. The experimental results suggest

that the setup with the external amplification and using an adjustable laboratory power supply was the

one that got the best results. This conclusion was deduced from the metrics defined in the evaluation,

where the SNR and correlation value from CPA were used. Using the best setup configuration, it was

possible to recover the secret key from an unprotected smart card by performing 25 power traces.

Finally, a t-test analysis was performed and compared with a CPA analysis, and as expected, the

t-test revealed leaks of information faster than the CPA, but also seem to provide some false positive

points-of-interest.

lxx

6.1 Future Work

The work developed in this thesis used as its core the CPA algorithm on unprotected cryptographic

devices, in this case, an smart card. If cryptographic devices that come with protection mechanism

are used, these methods might not be enough to compromise the device.

As for future work, it would be interesting to see how the CPA attack would perform against the

defence mechanism that were mentioned on Chapter 3. Also, the use of more advance techniques,

such as template and collision attack, could be tested to assess their effectiveness against protected

and unprotected devices.

The methodology used to select the best setup was done using the smart card, another interesting

test would be to do the same for the FPGA and see in the results match. Also, other types of setups

can be tested to see if the overall CPA improved. For example, by changing the DC blocker to one

with a lower frequency range, the results might be improved.

Finally, it would be interesting to use smart cards that are available on the market and assess their

security against this type of attacks. To do this, the communication protocol used with that card would

have to be discovered in order for the trace collecting program communicate with it.

lxxi

Bibliography

[1] P. C. Kocher, “Timing attacks on implementations of diffie-hellman, rsa, dss, and other systems,”

in Annual International Cryptology Conference. Springer, 1996, pp. 104–113.

[2] P. Kocher, J. Jaffe, and B. Jun, “Differential power analysis,” in Annual International Cryptology

Conference. Springer, 1999, pp. 388–397.

[3] P. Kocher, J. Jaffe, B. Jun, and P. Rohatgi, “Introduction to differential power analysis,” Journal of

Cryptographic Engineering, vol. 1, no. 1, pp. 5–27, 2011.

[4] S. C. Alliance, “What makes a smart card secure?” A Smart Card Alliance Contactless and

Mobile Payments Council White Paper (Oct. 2008), 2008.

[5] S. Mangard, E. Oswald, and T. Popp, Power analysis attacks: Revealing the secrets of smart

cards. Springer Science & Business Media, 2008, vol. 31.

[6] S. Chari, C. S. Jutla, J. R. Rao, and P. Rohatgi, “Towards sound approaches to counteract power-

analysis attacks,” in Annual International Cryptology Conference. Springer, 1999, pp. 398–412.

[7] S. Chari, C. Jutla, J. R. Rao, and P. Rohatgi, “A cautionary note regarding evaluation of aes

candidates on smart-cards,” in Second Advanced Encryption Standard Candidate Conference.

Citeseer, 1999, pp. 133–147.

[8] C. Clavier, J.-S. Coron, and N. Dabbous, “Differential power analysis in the presence of hard-

ware countermeasures,” in International Workshop on Cryptographic Hardware and Embedded

Systems. Springer, 2000, pp. 252–263.

[9] J.-S. Coron, “Resistance against differential power analysis for elliptic curve cryptosystems,” in

International Workshop on Cryptographic Hardware and Embedded Systems. Springer, 1999,

pp. 292–302.

[10] W. Rankl and W. Effing, Smart card handbook. John Wiley & Sons, 2004.

[11] gorferay.com, “Smart card formats,” http://www.gorferay.com/pic/AT88SC153 Card SLE5542

Card SLE5528 Cards ISO Contact Cards 5.jpg, last accessed 10 May 2017.

[12] J. F. R. Mackinnon and S. P. L. Yatawara, “Smart cards: A case study,” Internation al Technical

Support Organization IBM, 1998.

lxxiii

http://www.gorferay.com/pic/AT88SC153_Card_SLE5542_Card_SLE5528_Cards_ISO_Contact_Cards_5.jpg

http://www.gorferay.com/pic/AT88SC153_Card_SLE5542_Card_SLE5528_Cards_ISO_Contact_Cards_5.jpg

[13] S. C. Alliance, “What makes a smart card secure?” A Smart Card Alliance Contactless and

Mobile Payments Council White Paper (Oct. 2008), 2008.

[14] D. Eastlake 3rd and P. Jones, “Us secure hash algorithm 1 (sha1),” Tech. Rep., 2001.

[15] R. Rivest, “The md5 message-digest algorithm,” 1992.

[16] N. F. Pub, “197: Advanced encryption standard (aes),” Federal Information Processing Standards

Publication, vol. 197, no. 441, p. 0311, 2001.

[17] E. Milanov, “The rsa algorithm,” RSA Laboratories, 2009.

[18] F. Koeune and F.-X. Standaert, “A tutorial on physical security and side-channel attacks,” in

Foundations of security analysis and design III. Springer, 2005, pp. 78–108.

[19] S. B. Ors, F. Gurkaynak, E. Oswald, and B. Preneel, “Power-analysis attack on an asic aes

implementation,” in Information Technology: Coding and Computing, 2004. Proceedings. ITCC

2004. International Conference on, vol. 2. IEEE, 2004, pp. 546–552.

[20] A. A. Ding, C. Chen, and T. Eisenbarth, “Simpler, faster, and more robust t-test based leakage

detection,” in International Workshop on Constructive Side-Channel Analysis and Secure Design.

Springer, 2016, pp. 163–183.

[21] S. Lab., “Sakura-g quick start guide,” http://satoh.cs.uec.ac.jp/SAKURA/doc/SAKURA-G Quik

Start Guide Ver1.0 English.pdf, last accessed 10 May 2017.

[22] T. P. S. research group et al., “Dpa contest,” http://www.atmel.com/images/doc2512.pdf, last ac-

cessed 10 May 2017.

[23] Newae, “Chipwhisperer,” https://newae.com, last accessed 10 May 2017.

[24] ——, “Chipwhisperer products,” https://newae.com/tools/chipwhisperer, last accessed 10 May

2017.

[25] S. Chari, J. R. Rao, and P. Rohatgi, “Template attacks,” in International Workshop on Crypto-

graphic Hardware and Embedded Systems. Springer, 2002, pp. 13–28.

[26] K. Schramm, G. Leander, P. Felke, and C. Paar, “A collision-attack on aes,” in International

Workshop on Cryptographic Hardware and Embedded Systems. Springer, 2004, pp. 163–175.

[27] W. Electronics, “Wb electronics website,” http://www.infinityusb.com, last accessed 10 May 2017.

[28] Atmel, “atmega8515 datasheet,” http://www.atmel.com/images/doc2512.pdf, last accessed 10

May 2017.

[29] G. user: Kokke, “Tiny aes128,” https://github.com/kokke/tiny-AES128-C, last accessed 10 May

2017.

lxxiv

http://satoh.cs.uec.ac.jp/SAKURA/doc/SAKURA-G_Quik_Start_Guide_Ver1.0_English.pdf

http://satoh.cs.uec.ac.jp/SAKURA/doc/SAKURA-G_Quik_Start_Guide_Ver1.0_English.pdf

http://www.atmel.com/images/doc2512.pdf

https://newae.com

https://newae.com/tools/chipwhisperer

http://www.infinityusb.com

http://www.atmel.com/images/doc2512.pdf

https://github.com/kokke/tiny-AES128-C

[30] P. technology, “Picoscope 6000,” https://www.picotech.com/oscilloscope/picoscope-6000-series,

last accessed 10 May 2017.

[31] C. O’Flynn, “Pico-python,” https://github.com/colinoflynn/pico-python, last accessed 10 May

2017.

[32] S. Lab., “Sakura-g checker software,” http://satoh.cs.uec.ac.jp/SAKURA/hardware/SAKURA

Checker release 20130902 3.zip.

[33] ——, “Sakura-w aes checker software,” http://satoh.cs.uec.ac.jp/SAKURA/resource/SAKURA

W VCP AES Checker 140917.zip.

[34] C. T. University, “Differential power analysis exercises,” https://rozvoj.fit.cvut.cz/Lisbon/Analysis,

last accessed 10 May 2017.

[35] P. D.-I. J. J. Buchholz, “Matlab implementation of aes,” http://buchholz.hs-bremen.de/aes/aes.

htm, last accessed 10 May 2017.

lxxv

https://www.picotech.com/oscilloscope/picoscope-6000-series

https://github.com/colinoflynn/pico-python

http://satoh.cs.uec.ac.jp/SAKURA/hardware/SAKURA_Checker_release_20130902_3.zip

http://satoh.cs.uec.ac.jp/SAKURA/hardware/SAKURA_Checker_release_20130902_3.zip

http://satoh.cs.uec.ac.jp/SAKURA/resource/SAKURA_W_VCP_AES_Checker_140917.zip

http://satoh.cs.uec.ac.jp/SAKURA/resource/SAKURA_W_VCP_AES_Checker_140917.zip

https://rozvoj.fit.cvut.cz/Lisbon/Analysis

http://buchholz.hs-bremen.de/aes/aes.htm

http://buchholz.hs-bremen.de/aes/aes.htm

smart card power analysis: from theory to practice · 1.4 document structure this document...

Documents