vlsi design and implementation of asics for the security...

140
FACULTY OF ENGINEERING INTEGRATED CIRCUITS LAB ELECTRONICS AND COMMUNICATIONS DEPARTMENT B. SC. GRADUATION PROJECT 2000 - 2001 VLSI Design and Implementation of ASICs for the Security Core of BLUETOOTH TM Wireless Communication System Standard BY AHMAD ABDELHAMEED & SAMEH ASSEM IBRAHIM SUPERVISOR: PROF. HANI F. RAGAAI

Upload: dinhliem

Post on 28-May-2018

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

FACULTY OF ENGINEERING – INTEGRATED CIRCUITS LAB

ELECTRONICS AND COMMUNICATIONS DEPARTMENT

B. SC. GRADUATION PROJECT 2000 - 2001

VLSI Design and Implementation of ASICs for the Security Core of BLUETOOTHTM Wireless

Communication System Standard

BY AHMAD ABDELHAMEED

& SAMEH ASSEM IBRAHIM

SUPERVISOR:

PROF. HANI F. RAGAAI

Page 2: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

بسم اهللا الرمحن الرحيم

Page 3: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

I ICL

Abstract

Bluetooth is a new promising technology. The expectations for Bluetooth are huge. Analysts’ mark projections place Bluetooth-enabled products in the “next big thing” category, with sales expected to top a billion units by 2005. From these facts came the importance of our graduation project for year 2000/2001 and our report at your hands.

The report is aimed to cover our whole year work in the project. It is written in a

way to simulate the same road we followed. In the beginning you will have a small journey into the world of Bluetooth. In it

we will try to introduce the technology to you in a quick approach. After that you will spend some time with us understanding in deep details the

security measures taken in Bluetooth. It is our project to implement these measures in hardware. This task would have never been completed except with the help of the tools at our hands. That’s why we will try to introduce these tools to you as well.

This interesting journey is covered in the first three chapters. The three chapters

to follow will cover our proposed core starting with authentication and key generation implementation followed by encryption and random number generator implementation and finally how we were able to obtain the final core of 12.04 mm2 54 MHz maximum clock frequency using the AMS 0.6 µ m technology and how we tested some of its aspects on Xilinix Spartan FPGA.

All of Bluetooth security procedures were implemented in the core. So get ready

and enjoy your journey.

Page 4: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

II ICL

Acknowledgement

We would like to thank our dear Professor Hani F. Ragaai for his support and supervision during the project period.

We would like to thank the guest lecturers as well for they taught us a lot about Bluetooth,

VHDL and finite state machines.

Thanks go to Eng. Aiman Hassan and Eng. Tamer Salah.

Thanks to Eng. Mohammed Bahnas from Mentor Graphics who helped us with their tools.

Thanks to Miss Noha Fathi of Mentor Graphics too. She helped us in getting the CD’s of FPGA advantage and in the license issue.

Great thanks go to our families and friends for their support and encouragement.

Page 5: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

III ICL

About Us

AHMAD ABDELHAMEED SARHAN 4th year Electronics and Communications Engineering dept. Faculty of Engineering Ain Shams University [email protected] SAMEH ASSEM IBRAHIM 4th year Electronics and Communications Engineering dept. Faculty of Engineering Ain Shams University [email protected]

Graduation Project 2000/2001 Professor Hani F. Ragaai

Bluetooth Security

Page 6: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

IV ICL

TABLE OF CONTENTS CHAPTER 1

INTRODUCTION TO BLUETOOTH

1.1. THE APPEARANCE OF BLUETOOTH TECHNOLOGY: 1 1.2. BLUETOOTH VISION: 3 1.3. BLUETOOTH GOALS: 4 1.4. BLUETOOTH ARCHITECTURE: 5

1.4.1. NETWORK TOPOLOGIES: 5 1.4.2. PROTOCOL ARCHITECTURE: 6 1.4.3. RF: 7 1.4.4. BASEBAND: 8 1.4.5. LINK MANAGEMENT: 12

1.5. BLUETOOTH ADVANTAGES: 13 1.6. BLUETOOTH DISADVANTAGES: 13 1.7. BLUETOOTH USAGE MODELS: 14 CHAPTER 2

BLUETOOTH SECURITY 2.1. BLUETOOTH SECURITY ENTITIES: 20 2.2. KEY HANDLING: 21 2.2.1.KEY TYPES: 21 2.2.2. INITIALIZATION: 23 2.2.3. ALGORITHMS USED IN KEY GENERATION: 23 2.2.4. GENERATION OF THE INITIALIZATION KEY: 25 2.2.5. GENERATION OF A UNIT KEY: 26 2.2.6. GENERATION OF A COMBINATION KEY: 26 2.2.7. GENERATING A MASTER KEY: 27 2.2.8. POINT-TO-MULTIPOINT CONFIGURATION: 28 2.2.9. MODIFYING THE LINK KEYS: 29 2.2.10. GENERATION OF THE ENCRYPTION KEY: 30 2.3. AUTHENTICATION: 30 2.3.1. REPEATED ATTEMPTS: 31 2.3.2. THE AUTHENTICATION ALGORITHM E1 : 31 2.4. SAFER+ ALGORITHM USED IN THE FUNCTIONS AR AND AR’: 33 2.4.1. HISTORY OF THE ALGORITHM: 33 2.4.2. SPECIFICATION OF THE SAFER+ ALGORITHM: 33 2.4.3. THE USE OF SAFER+ ALGORITHM IN BLUETOOTH SECURITY: 40 2.5. ENCRYPTION: 42 2.5.1. INTRODUCTION TO CRYPTOGRAPHY: 42 2.5.2. ENCRYPTION IN BLUETOOTH: 46 2.5.2.1. ENCRYPTION KEY SIZE NEGOTIATION: 48 2.5.2.2. ENCRYPTION MODES: 48 2.5.2.3. ENCRYPTION CONCEPT: 49 2.5.2.4. ENCRYPTION ALGORITHM: 50 2.5.2.5. LFSR INITIALIZATION: 52 2.5.2.6. KEY STREAM SEQUENCE: 55 2.5.3 ENCRYPTION IN BLUETOOTH FROM THE LMP POINT OF VIEW: 55 2.6. BLUETOOTH RANDOM NUMBER GENERATOR: 57

CHAPTER ٣

INTRODUCTION TO VHDL AND DIGITAL DESIGN

3.1. WHAT IS VHDL? 60 3.2. DIGITAL SYSTEM DESIGN: 61 3.3. THE MARKETPLACE: 64 3.4. THE ROLE OF HARDWARE DESCRIPTION LANGUAGES: 64 3.5. DESIGN FLOW USING MENTOR GRAPHICS TOOLS: 68

Page 7: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

V ICL

CHAPTER ٤

IMPLEMENTATION OF AUTHENTICATION AND KEY GENERATION 4.1. THE DESIGN OF AUTHENTICATION AND KEY GENERATION ALGORITHMS: 71 4.1.1. SAFER+ ENCRYPTION ROUND: 71 4.1.2. SAFER+ KEY SCHEDULE: 75 4.1.3. AR/AR’ BLOCK: 77 4.1.4. E1, E21, E22 AND E3 ALGORITHMS BLOCK: 78 4.2. THE SIMULATION OF THE DESIGN: 80 4.2.1. THE SAFER+ EXAMPLE: 81 4.2.2. THE SAMPLE DATA WITH THE MATLAB PROGRAM: 83 CHAPTER 5

IMPLEMENTATION OF ENCRYPTION ENGINE AND PRNG 5.1. KEY REDUCTION: 89 5.2. THE LINEAR FEEDBACK SHIFT REGISTERS (LFSRS): 89 5.3. THE INPUT SHIFT REGISTERS: 92 5.4. THE SUMMATION COMBINER BLOCKS: 93 5.5. STORING THE OUTPUTS FOR PARALLEL LOADING: 94 5.6. THE FINITE STATE MACHINE: 95 5.7. ENCRYPTION ENGINE SIMULATION: 95 5.8. THE RANDOM NUMBERS GENERATOR: 104 5.8.1. TRUE RANDOM NUMBER GENERATORS: 104 5.8.2. LFSR TO BE USED IN SINGLE BIT PRNG: 104 5.8.2.1 LFSR TERMINOLOGY: 105 5.8.2.2 LFSR IMPLEMENTATION: 105 5.8.2.3 MULTIPLE-BIT RNG USING LFSR: 106 5.8.2.4 MAXIMAL LENGTH SEQUENCES: 107 5.8.3 IMPLEMENTATION: 107 CHAPTER ٦

THE BLUETOOTH SECURITY CORE 6.1. THE DESIGN OF THE BLUETOOTH SECURITY CORE: 110 6.2. THE SIMULATION OF THE CORE: 113 6.3. THE ASIC FLOW: 115 6.4. BACK ANNOTATION AND FPGA TESTS: 117

References 125

Appendix a DESIGN TIME TABLE A-1

Page 8: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

VI ICL

LIST OF FIGURES

Figure1.1: The Bluetooth figure mark 2

Figure1.2: Piconet Formation in Bluetooth and the Concept of Master/Slaves 5

Figure 1.3: Scatternet formation in Bluetooth 6

Figure 1.4: Bluetooth Protocol Stack (Indicated by shaded blue regions) 6

Figure1.5: The Bluetooth protocol stack 7

Figure 1.6: Standard packet format 9

Figure 1.7: A suggested Bluetooth chip 11

Figure 1.8: The 3 in 1 phone. 14

Figure 1.9: The Internet bridge 14

Figure 1.10: The interactive conference 14

Figure 1.11: The ultimate headse t 15

Figure 1.12: The automatic synchronizer 15

Figure 1.13: Bluetooth enhanced meeting 15

Figure 1.14: Bluetooth enhanced factory 15

Figure 1.15: Bluetooth enhanced lights 16

Figure 1.16: PDA between home and work 16

Figure 1.17: Electronic tour guide 16

Figure 1.18: Automatic checking in 16

Figure 2.1: Different issues covered in chapter 2 19

Figure 2.2: Different types of keys. 23

Figure 2.3 : Key generation algorithm E2 24

Figure 2.4: Generation of the encryption key 25

Figure 2.5: Generation of unit key. 26

Figure 2.6: Generating a combination key. 27

Figure 2.7: Master link key distribution and computation of the corresponding encryption key. 28

Figure 2.8: Challenge-response for the Bluetooth. 30

Figure 2.9: Challenge-response for symmetric key systems. 30

Figure 2.10: Flow of data for the computation of E1 . 32

Figure 2.11: Encrypting and Decrypting structures of SAFER+ algorithm 35

Figure 2.12: The structure of SAFER+ algorithm encryption round i 35

Figure 2.13: Diffusion and Confusion in SAFER+ encryption 36

Figure 2.14: The structure for SAFER+ algorithm decryption round i 37

Figure 2.15: SAFER+ key schedule for 128 bit key 39

Figure 2.16: The round of the functions Ar and Ar’ 41 Figure 2.17a: Conventional (Secret Key) encryption Figure 2.17b: Public key encryption 43

Figure 2.18: A taxonomy of cryptographic primitives 45

Figure 2.19: E3 generates the encryption key 46

Figure 2.20: stream ciphering with E0 46

Figure 2.21: the Bluetooth Device Address 47

Page 9: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

VII ICL

Figure 2.22 : Bluetooth clock 48

Figure 2.23: Encryption comes in between CRC & FEC 49

Figure 2.24: Functional description of the encryption procedure 50

Figure 2.25: Concept of the encryption engine. 51

Figure 2.26: Between each start of a packet (TX or RX), the LFSRs are re-initialized. 52

Figure 2.27: Arranging the input to the LFSRs. 54

Figure 2.28 Distribution of the 128 last generated output symbols within the LFSRs. 55

Figure 3.1: Typical activity flow in top-down digital system design 62

Figure 3.2: An Example of a synthesis design flow for FPGAs 63

Figure 3.3: Design views and corresponding levels of abstraction 65

Figure 3.4 :the complete HDL design flow using Mentor Graphics tools 69

Figure 4.1: The combinational round as entered in Renoir 72

Figure 4.2: The round Controller as entered in Renoir 73

Figure 4.3: The sequential round as entered in Renoir 74

Figure 4.4: The Combinational SAFER+ Key Schedule as entered in Renoir. 75

Figure 4.5: The sequential SAFER+ Key Schedule as entered in Renoir. 76

Figure 4.6: The Controller of Ar/Ar’ block as entered in Renoir. 77

Figure 4.7: Ar/Ar’ block as entered in Renoir 78

Figure 4.8: The controller of all algorithms block as entered in Renoir. 79

Figure 4.9: The block diagram that can be used for E1, E21, E22 and E3 algorithms as entered in Renoir 80

Figure 4.10: An extract form the SAFER+ nomination paper 81

Figure 4.11a: ModelSim List output for key schedule simulation 81

Figure 4.11b: ModelSim Wave output for key schedule simulation 82

Figure 4.12: ModelSim Wave output for SAFER+ Simulation 82 Figure 4.13: The GUI MATLAB program written for producing the test vectors and comparing the ModelSim

output to MATLB output. 85

Figure 5.1: LFSR single bit taken from Renoir version 2000.3 90

Figure 5.2: the feedback switch implementation 91

Figure 5.3: the four LFSRs as taken from Renoir 91

Figure 5.4: structural adder 93

Figure 5.5: the second adder with divide by 2 94

Figure 5.6: the FSM of E0 96

Figure 5.7: E0 Block Diagram as entered in Renoir 97

Figure 5.8: selected waveforms and outputs of the first set of sample data 101

Figure 5.9: selected waveforms and outputs of the fourth set of sample data 103

Figure 5.10: A True 1-bit Random Number Generator 104

Figure 5.11: Galois Implementation 106

Figure 5.12: Fibonacci Implementation 106

Figure 6.1: The Bluetooth Security core as entered in Renoir. 112

Figure 6.2: Simulation results of initialization key generation 113

Figure 6.3: Simulation of the authentication process 114

Figure 6.4 : Simulation of encryption key generation 114

Figure 6.5: The final obtained core. 116

Page 10: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

VIII ICL

Figure 6.6: A zoomed section in the core 116

Figure 6.7: Pads and cells added before routing 116

Figure 6.8: TimeCloser Physical Design Flow 117

Figure 6.9: Flow unique to Alliance and Leonardo Spectrum 118

Figure 6.10: The designer life cycle 118

Figure 6.11:The Floor Planning and the obtained figure from FPGA Editor tool of Xilinx Alliance 120

Figure 6.12: Arrangement of components on the XS40 Board. 121

Figure 6.13: The setup and the obtained waveforms. 121

Figure 6.14: The block diagram of all the controllers as entered in Renoir 121

Page 11: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

IX ICL

LIST OF TABLES

Table 1.1: Summary of announced Bluetooth products 2

Table 1.2: A comparison between Bluetooth devices (to the left) and cables (to the right) 5

Table 1.3: Different packet types 9

Table 1.4: Some Key Bluetooth Specifications. 11

Table 2.1: Entities used in authentication and encryption procedures 20

Table 2.2: Different algorithms used in key generation 23

Table 2.3: Possible traffic modes for a slave using a semi-permanent link key. 49

Table 2.4: Possible encryption modes for a slave in possession of a master key. 49

Table 2.5: The four primitive feedback polynomials 50

Table 2.6: The mappings of T1 and T2 52 Table 2.7: Polynomials used when creating Kc. All polynomials are in hexadecimal notation. The LSB is in the

rightmost position 54

Table 5.1: input shift register 92

Table 5.2: Taps for Maximum-Length LFSR 108

Table 6.1: The pins of the core and their functions. 111

Page 12: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

ICL

B

Introduction to Bluetooth

luetooth is a technology that replaces cables when you want to connect two devices to each other, or one device to a network. It uses microwaves to communicate with each

other. Since Bluetooth works with microwaves it does not need to have free sight as IR-connections need. The range that the signal reaches is between 10 cm and 10 m, but with an amplifier in the transmitter it can have a range of over 100 m. One good thing is that it gets rid of all the cables, but the big thing is that you can connect to another device or network immediately, you are not bound to fixed infrastructures. The technology is an open specification for wireless communication of data and voice. It is based on a low-cost short-range radio link, built into a 9 x 9 mm microchip, facilitating protected ad hoc connections for stationary and mobile communication environments. This chapter is hoped to be a general overview on the Bluetooth technology. Section 1.1 is about the appearance of Bluetooth. Section 1.2 is about Bluetooth vision. Section 1.3 shows Bluetooth goals. The architecture of Bluetooth technology is discussed in section 1.4. Section 1.5 covers the advantages of Bluetooth. Section 1.6 is about its disadvantages. And finally section 1.7 is about some usage models of the Bluetooth technology. 1.1. THE APPEARANCE OF BLUETOOTH TECHNOLOGY: A study was initiated at Ericsson Mobile Communications in 1994 to find a low power and low cost radio interface between mobile phones and their accessories. The requirements regarding price, capacity and size were set so that the new technique would have the potential to outdo all cable solutions between mobile devices. Initially a suitable radio interface with a corresponding frequency range had to be specified. A number of criteria for the concept were defined regarding size, capacity and global uniformity. The radio unit should be so small and consume such low power that it could be fitted into portable devices with their limitations. The concept had to handle both speech and data and finally the technique had to work all around the world.

The study soon showed that a short-range radio link solution was feasible. When designers at Ericsson had started to work on a transceiver chip, Ericsson soon realised that they needed companions to develop the technique. The associates strove not only to improve the technical solutions but also to get a solid and broad market support in the business areas of PC hardware, portable computers and mobile phones. Fear for a market situation with a multitude of non-standard cable solutions, where one cable is designed specifically for one pair of devices, was one of the motives that made competing companies join the project.

Ericsson Mobile Communications, Intel, IBM, Toshiba and Nokia Mobile Phones formed a

Special Interest Group (SIG) in February 1998. This group represented the diverse market support that was needed to generate good support for the new technology. In May of the same year, the Bluetooth consortium announced itself globally. The intention of the Bluetooth SIG is to form a de

Goodbye cables, Communications made easy

Page 13: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 1 Introduction to Bluetooth

2 ICL

facto standard for the air interface and the software that controls it. The purpose is to achieve interoperability between different devices from different producers of portable computers, mobile phones and other devices.

The name Bluetooth comes from a Danish Viking and King, Harald Blåtand (Bluetooth in

English), who lived in the latter part of the 10th century. Harald Blåtand united and controlled Denmark and Norway the same way Bluetooth Technology would unite the whole world.

At the start, SIG consisted of the five companies mentioned above. As of the 1st December 1999 the SIG core group was extended as 3Com, Lucent technologies, Microsoft and Motorola also joined. It is deemed that Microsoft’s participation in the SIG is a huge stamp of approval for the technology. Whilst initially thought of as a telecommunications initiative, as more and more software companies has joined the SIG it is now considered to be accepted by the PC community. This is especially true now when Microsoft has joined the SIG as well. Today more than 1300 companies have joined the SIG to work for an open standard for the Bluetooth concept. By signing a zero cost agreement, companies can join the SIG and qualify for a royalty-free license to build products based on the Bluetooth technology. To avoid different interpretations of the Bluetooth standard regarding how a specific type of application should be mapped to Bluetooth, the SIG has defined a number of user models and protocol profiles. The SIG also works with a Qualification Process. This process defines criteria for Bluetooth product qualification that ensures that products that pass this process meet the Bluetooth specification.

On Wednesday the 17th of May,2000, the Bluetooth promoter

companies announced the availability of a new figure mark (Shown in figure 1.1 aside) to be used along with Bluetooth applications. The new figure mark is based on the Bluetooth history as it is made up of the two runic characters "H" and "B" - short for "Harald Bluetooth".

Version 1.0 of Bluetooth specifications was released on the 22nd of July,1999, a new

revision version 1.0b was released on the 1st of December,1999. The latest version which is 1.1 was released on the 22nd of February,2001. Table 1.1 shows some of the announced Bluetooth products up till now. Silicon/modules Protocol Software Cell

Phones Headsets PC cards Notebooks LAN

access PDA Print

Server Atmel Classwave Alcatel Ericsson Ericsson IBM Axis Palm Axis

Brighcom Enea Ericsson Motorola IBM Panasonic Intel Handspring HP Broadcom Extended Motorola Motorola NEC Red-M Socket

CSR IVT Nokia Psion Toshiba TDK Widcomm Conexant IBM Panasonic TDK Toshiba Ericsson Intel 3COM Socket Infineon Microsoft Toshiba Widcomm Lucent Telelogic Widcomm Mitel TTPCommunications Xircom

Motorola Widcomm National

OKI Philips

Qualcomm SiliconWave

STMicroelectronics TI

Table 1.1: Summary of announced Bluetooth products

Figure 1.1: The Bluetooth figure mark

Page 14: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 1 Introduction to Bluetooth

3 ICL

1.2.BLUETOOTH VISION:

A few years ago it was recognized that the vision of a truly low-cost, low-power radio-based cable replacement was feasible. Such a ubiquitous link would provide the basis for portable devices to communicate together in an ad hoc fashion by creating personal area networks which have similar advantages to their office environment counterpart - the local area network (LAN).

In recent years, wireless connectivity has been an active area of research as we have witnessed a large number of government and industry initiatives, research efforts and standard activities that have aimed at enabling wireless and mobile networking technologies. As a result, today we have a diverse set of wireless access technologies from satellite networks, to wide area cellular systems, and from wireless local loop and PCS to wireless LANs. However, most of these solutions target narrow and specific application scenarios. With all such efforts spent on wireless link technologies, we still lack a universal framework that offers a way to access information based on a diverse set of devices (e.g., PDAs, mobile PCs, Phones, Pagers, etc.) in a seamless, user-friendly and efficient manner. Although we are encouraged and excited about the explosion of activities in this area, we would like to see a wireless solution that brings all these technologies in different sectors together and at the same time provides a universal and ubiquitous connectivity solution between computing, communication and supporting devices.

We believe that Bluetooth can revolutionize wireless connectivity for personal and business

mobile devices, enabling seamless voice and data communication via short-range radio links and allowing users to connect a wide range of devices easily and quickly, without the need for cables, expanding communications capabilities for mobile computers, mobile phones and other mobile devices, both inside and outside of the office. Considering a wide range of computing and communication devices such as PDAs, notebook computers, pagers, and cellular phones with different capabilities, we envisage Bluetooth to provide a solution for access to information and personal communication by enabling a collaboration between devices in proximity of each other where every device provides its inherent function based on its user interface, form factor, cost and power constraints. Furthermore, the Bluetooth technology enables a vast number of new usage models for portable devices. For notebook computer manufacturers, the development of a short-range radio frequency (RF) solution enables the notebook computer to connect to different varieties of cellular phones and other notebook computers. For cellular handset manufacturers, the RF solution removes many of the wires required for audio and data ex-change. Wireless hands-free kits operate even while the cellular phone is stored in a purse.

A very key characteristic of Bluetooth that differentiates it from other wireless technologies

is that it enables combined usability models based on functions provided by different devices. Let us consider a connection between a PDA (computing device) and a cellular phone (communicating device) using Bluetooth and a second connection between the cellular phone and a cellular base station providing connectivity for both data and voice communication. In this model, the PDA maintains its function as a computing device and the phone maintains its role as a communication device - each one of these devices provide a specific function efficiently, yet their function is separate and each can be used independently of the other. However, when these devices are near each other they provide a useful combined function. We believe that this functional connectivity model based on a combination of wireless access technologies each matched to different device capabilities and requirements is a powerful paradigm that will enable ubiquitous and pervasive wireless communication. Many of these wireless link technologies are available today, however there is a need to provide a wireless connectivity, networking, and application framework to realize the total solution. This is exactly the charter of the Bluetooth SIG. In addition to combining the resources of a personal network, the RF link could also connect the personal network to the wired

Page 15: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 1 Introduction to Bluetooth

4 ICL

infrastructure. A data access point in an office, conference room, or airport kiosk would act as an information gateway for a notebook computer or cellular handset. 1.3. BLUETOOTH GOALS:

Cable replacement: See table 1.2. Handle both voice and data: The air protocol must support good quality real-time voice, where “good” is considered to be wired phone line quality. Voice quality is important to both end-users who are accustomed to it, and for speech recognition engines whose accuracy depends on it. Able to establish ad hoc connections: The dynamic nature of mobility makes it impossible to make any assumptions about the operating environment. Bluetooth units must be able to detect other compatible units and establish connections to them. A single unit must be able to establish multiple connections in addition to accepting new connections while connected. Ignoring a new connection requests while connected is confusing to the user and deemed unacceptable. Especially if we want to support unconscious computing while retaining the ability to perform interactive operations. Withstand interference from other sources in an unlicensed band: The Bluetooth radio operates in the unlicensed 2.4 GHz band where many other RF radiators are expected to exist. The fact that microwave ovens operating at this frequency is one reason why this band is unlicensed in most countries. The challenge is to avoid significant degradation in performance when other RF radiators, including other personal area net-works in nearby use, are in operation. Worldwide use: Not only are “standard” cables equipped with a variety of connectors, different standards exists in different geographical locations throughout the world. Experienced mobile travelers are accustomed to carrying around a number of different power, phone, and network connectors. The challenge here is very regulatory in nature with many governments having their own set of restrictions on RF technology. And while the 2.4 GHz band is unlicensed through most parts of the world, it varies in range and offset in a number of different countries. Similar amount of protection compared to a cable: In addition to the radio’s short-range nature and spread spectrum techniques, Bluetooth link protocols also provide authentication and privacy mechanisms. Users certainly don’t want others listening in on their conversations, snooping their data transmissions, or using their cellular phones for Internet access. Small size to accommodate integration into a variety of devices: The Bluetooth radio module must be small enough to permit integration into portable devices. Wearable devices in particular, such as mobile phones, headsets, and smart badges have little space to spare for a radio module. Negligible power consumption compared with the device in which the radio is used: Many Bluetooth devices will be battery powered. This requirement implies the integration of the Bluetooth radio should not significantly compromise the battery lifetime of the device. Encourage ubiquitous deployment of the technology: To achieve this goal, the SIG is designing an open specification defining the radio, physical, link, and higher level protocols and services.

Page 16: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 1 Introduction to Bluetooth

5 ICL

Table 1.2: A comparison between Bluetooth devices (to the left) and cables (to the right)

1.4. BLUETOOTH ARCHITECTURE:

1.4.1. NETWORK TOPOLOGIES: Bluetooth devices are forming so-called piconets, where there can be 8 devices and 3 voice

channels per piconet. The communications range between the devices is 10 m. Some power issues: the sleep power is 30 µA, while in transmit mode 800 µA and in standby mode 300 µA. Both synchronous and asynchronous services are supported, as well as both circuit and packet switching.

A piconet contains a master and several slave devices. All channel access is controlled by the master, so slaves can only talk to the master and not to other slaves directly. So there is point-to-point communication within a piconet. (See figure 1.2)

Figure1.2: Piconet Formation in Bluetooth and the Concept of Master/Slaves.

A piconet can have a number of active members (up to 8), and the other devices become parked members. Parked slaves can still remain synchronized to the master, but are not active on a channel. It is important to remark that each piconet has its own hopping channel. A piconet can be formed in two ways: either by Inquiry Scan Protocol or by Page Scan Protocol. In the case of

Page 17: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 1 Introduction to Bluetooth

6 ICL

Inquiry Scan Protocol a device learns the device address of the other devices, while when the Page Scan Protocol is used, then a device establishes links with nodes in proximity.

A scatternet is composed of multiple piconets, where a master in one piconet can be a slave in another piconet. to switch between piconets, time multiplexing is used. (See figure 1.3)

Figure 1.3: Scatternet formation in Bluetooth.

1.4.2 PROTOCOL ARCHITECTURE:

Figure 1.4 shows the protocol architecture for Bluetooth.

Figure 1.4: Bluetooth Protocol Stack (Indicated by shaded blue regions).

To help the understanding of the picture, here is the list of abbreviations used in the figure: WAE : Wireless Application Environment vCard, vCalendar, vMessage and vNotes : state the formats for the electronic business card,

electronic calendar, scheduling, messaging and mails WAP - Wireless Application Protocol OBEX - Object Exchange Session Protocol SDP - Service Discovery Protocol RFCOMM - Emulation of serial ports L2CAP - Logical Link Control Access Protocol LMP - Link Manager Protocol

Page 18: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 1 Introduction to Bluetooth

7 ICL

Looking into the architecture in comparison to the OSI model is shown in figure 1.5

Figure1.5: The Bluetooth protocol stack

1.4.3. RF: The Bluetooth air interface is based on a nominal antenna power of 0dBm. The air interface

complies with the FCC rules for the ISM band at power levels up to 0dBm. Spectrum spreading has been added to facilitate optional operation at power levels up to 100 mW worldwide. Spectrum spreading is accomplished by frequency hopping in 79 hops displaced by 1 MHz, starting at 2.402 GHz and stopping at 2.480 GHz. Due to local regulations the bandwidth is reduced in Japan, France and Spain. This is handled by an internal software switch. The maximum frequency hopping rate is 1600 hops/s. The nominal link range is 10 centimeters to 10 meters, but can be extended to more than 100 meters by increasing the transmit power.

In the ISM band, the signal bandwidth of FH systems is limited to 1 MHz. For robustness, a

binary modulation scheme was chosen. With the above-mentioned bandwidth restriction, the data rates are limited to about 1 Mb/s. For FH systems and support for bursty data traffic, a noncoherent detection scheme is most appropriate. Bluetooth uses Gaussian-shaped frequency shift keying (FSK) modulation with a nominal modulation index of k = 0.3. Logical ones are sent as positive frequency deviations, logical zeroes as negative frequency deviations. Demodulation can simply be accomplished by a limiting FM discriminator. This modulation scheme allows the implementation of low-cost radio units.

Some specifications are made for low cost, single chip implementation:

Noise floor margin for substrate noise and low current LNA Linearity set by near-far problem In-band image allows low-cost low IF VCO phase noise enables integrated VCO TX-RX turn around time enables single synthesizer 2.4 ISM band chosen for global use and process capabilities

Page 19: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 1 Introduction to Bluetooth

8 ICL

1.4.4.BASEBAND: The baseband describes the specifications of the digital signal processing part of the hardware

- the Bluetooth link controller, which carries out the baseband protocols and other low-level link routines. Establishing network connections Before any connections in a piconet are created, all devices are in STANDBY mode. In this mode, an unconnected unit periodically "listens" for messages every 1.28 seconds. Each time a device wakes up, it listens on a set of 32 hop frequencies defined for that unit. The number of hop frequencies varies in different geographic regions; 32 is the number for most countries (except Japan, Spain and France). The connection procedure is initiated by any of the devices which then becomes master. A connection is made by a PAGE message if the address is already known, or by an INQUIRY message followed by a subsequent PAGE message if the address is unknown. In the initial PAGE state, the master unit will send a train of 16 identical page messages on 16 different hop frequencies defined for the device to be paged (slave unit). If no response, the master transmits a train on the remaining 16 hop frequencies in the wake-up sequence. The maximum delay before the master reaches the slave is twice the wakeup period (2.56 seconds) while the average delay is half the wakeup period (0.64 seconds). The INQUIRY message is typically used for finding Bluetooth devices, including public printers, fax machines and similar devices with an unknown address. The INQUIRY message is very similar to the page message, but may require one additional train period to collect all the responses. A power saving mode can be used for connected units in a piconet if no data needs to be transmitted. The master unit can put slave units into HOLD mode, where only an internal timer is running. Slave units can also demand to be put into HOLD mode. Data transfer restarts instantly when units transition out of HOLD mode. The HOLD is used when connecting several piconets or managing a low power device such as a temperature sensor. Two more low power modes are available, the SNIFF mode and the PARK mode. In the SNIFF mode, a slave device listens to the piconet at reduced rate, thus reducing its duty cycle. The SNIFF interval is programmable and depends on the application. In the PARK mode, a device is still synchronized to the piconet but does not participate in the traffic. Parked devices have given up their MAC address and occasional listen to the traffic of the master to re-synchronize and check on broadcast messages. If we list the modes in increasing order of power efficiency, then the SNIFF mode has the higher duty cycle, followed by the HOLD mode with a lower duty cycle, and finishing with the PARK mode with the lowest duty cycle.

The link type defines what type of packets can be used on a particular link. The Bluetooth

baseband technology supports two link types:

Synchronous Connection Oriented (SCO) type (used primarily for voice) Asynchronous Connectionless (ACL) type (used primarily for packet data)

Different master-slave pairs of the same piconet can use different link types, and the link type may change arbitrarily during a session. Each link type supports up to sixteen different packet types. Four of these are control packets and are common for both SCO and ACL links. Both link types use a Time Division Duplex (TDD) scheme for full-duplex transmissions. The SCO link is symmetric and typically supports time-bounded voice traffic. SCO packets are transmitted over reserved intervals. Once the connection is established, both master and slave units may send SCO packets without being polled. One SCO packet types allows both voice and data transmission - with only the data portion being retransmitted when corrupted. The ACL link is packet oriented and supports both symmetric and asymmetric traffic. The master unit controls the link bandwidth and decides how much piconet bandwidth is given to each slave, and the symmetry of the traffic. Slaves must be polled before they can transmit data. The ACL link also supports broadcast messages from the master to all slaves in the piconet.

Page 20: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 1 Introduction to Bluetooth

9 ICL

Packets are 1, 3 or 5 slots long. Carrier hops each packet. Frame consists of two packets (transmit and then receive). The general format of the packet is shown in figure 1.6 whereas the different packet types are shown in table 1.3.

Figure 1.6: Standard packet format

Table 1.3: Different packet types There are three error-correction schemes defined for Bluetooth baseband controllers:

1/3 rate forward error correction code (FEC) 2/3 rate forward error correction code (FEC) Automatic repeat request (ARQ) scheme for data.

The purpose of the FEC scheme on the data payload is to reduce the number of retransmissions. However, in a reasonably error-free environment, FEC creates unnecessary overhead that reduces the throughput. Therefore, the packet definitions have been kept flexible as to whether or not to use FEC in the payload. The packet header is always protected by a 1/3 rate FEC; it contains valuable link information and should survive bit errors. An unnumbered ARQ scheme is applied in which data transmitted in one slot is directly acknowledged by the recipient in the next slot. For a data transmission to be acknowledged both the header error check and the cyclic redundancy check must be okay; otherwise a negative acknowledge is returned.

Page 21: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 1 Introduction to Bluetooth

10 ICL

The Bluetooth baseband provides user protection and information privacy mechanisms at the physical layer. Authentication and encryption is implemented in the same way in each Bluetooth device, appropriate for the ad hoc nature of the network. Connections may require a one-way, two-way, or no authentication. Authentication is based on a challenge-response algorithm. Authentication is a key component of any Bluetooth system, allowing the user to develop a domain of trust between a personal Bluetooth device, such as allowing only the owner's notebook computer to communicate through the owner's cellular telephone. Encryption is used to protect the privacy of the connection. Bluetooth uses a stream cipher well suited for a silicon implementation with secret key lengths of 0, 40, or 64 bits. Key management is left to higher layer software. The goal of Bluetooth security mechanisms is to provide an appropriate level of protection for Bluetooth short-range nature and use in a global environment. Users requiring stalwart protection are encouraged to use stronger security mechanisms available in network transport protocols and application programs. In the Bluetooth design, special attention has been paid to reduction of current consumption. In the idle mode, the unit only scans a little over 10 ms every T s where T can range from 1.28 to 3.84 s. Thus, the duty cycle is well below 1 percent. Additionally, a PARK mode has been defined where the duty cycle can be reduced even more. However, the PARK mode can only be applied after the piconet has been established. The slave can then be parked; that is, it only listens to the channel at a very low duty cycle. The slave only has to listen to the access code and the packet header (126 µs excluding guard time to account for drift) to resynchronize its clock and decide whether it can return to sleep. Since there is no uncertainty in time and frequency (the parked slave is locked to the master, similar to how cordless and cellular phones are locked to their base stations), a much lower duty cycle is achievable. Another low-power mode during connection is the SNIFF mode, in which the slave does not scan at every master-to-slave slot, but has a larger interval between scans. In the connection state, current consumption is minimized and wasteful interference prevented by only transmitting when data is available. If no useful information needs to be exchanged, no transmission takes place. If only link control information needs to be transferred (e.g., ACK/NACK), a NULL packet without payload is sent. Since NACK is implicit, a NULL packet with NACK does not have to be sent. In longer periods of silence, the master once in a while needs to send a packet on the channel such that all slaves can resynchronize their clocks and compensate for drift. The accuracy of the clocks and the scan window length applied in the slave determines the period of this resynchronization. During continuous TX/RX operations, a unit starts to scan for the access code at the beginning of the RX slot. If in a certain window this access code is not found, the unit returns to sleep until the next TX slot (for the master) or RX slot (for the slave). If the access code is received (which means the received signal matches the expected access code), the header is decoded. If the 3-bit slave address does not match the recipient, further reception is stopped. The header indicates what type of packet it is and how long the packet will last; therefore, the non addressed recipients can determine how long they can sleep. The nominal transmit power used by most Bluetooth applications for short-range connectivity is 0 dBm. This both restricts current consumption and keeps interference to other systems to a minimum. However, the Bluetooth radio specifications allow TX power up to 20 dBm. Above 0 dBm, closed-loop received signal strength indication (RSSI)-based power control is mandatory. This power control only compensates for propagation losses and slow fading. In the uncoordinated environment where ad hoc systems operate, interference-based power control is to say the least doubtful, especially since different types of systems with different power characteristics share the same band. Since power control cannot be coordinated among different systems, it cannot be prevented that certain systems always try to overpower their contenders, and the strongest transmitter will prevail.

Page 22: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 1 Introduction to Bluetooth

11 ICL

A proposed implementation of both the baseband and RF layers is shown in figure 1.7

Figure 1.7: A suggested Bluetooth chip

Table 1.4 shows some of the keys of the Bluetooth specifications.

Table 1.4: Some Key Bluetooth Specifications.

Page 23: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 1 Introduction to Bluetooth

12 ICL

1.4.5.LINK MANAGEMENT: The Link Manager (LM) software entity carries out link setup, authentication, link

configuration, and other protocols. The Link Manager discovers other remote LM's and communicates with them via the Link Manager Protocol (LMP). To perform its service provider role, the LM uses the services of the underlying Link Controller (LC). Services provided:

Sending and receiving of data Name request. The Link Manager has an efficient means to inquire and report a name or

device ID upto 16 characters in length. Link address inquiries. Connection set-up. Authentication. Link mode negotiation and set-up, e.g. data or data/voice. This may be changed during a

connection. The Link Manager decides the actual frame type on a packet-by-packet basis. Setting a device in sniff mode. In sniff mode, the duty cycle of the slave is reduced: it listens

only every M slots where M is negotiated at the Link manager. The master can only start transmission in specified time slots spaced at regular intervals.

Setting a link device on hold. In hold mode, turning off the receiver for longer periods saves power. Any device can wake up the link again, with an average latency of 4 seconds. This is defined by the Link Manager and handled by the Link Controller.

Setting a device in park mode when it does not need to participate on the channel but wants to stay synchronized. It wakes up at regular intervals to listen to the channel in order to re-synchronize with the rest of the piconet, and to check for page messages.

1.4.6. SOFTWARE FRAMEWORK : Bluetooth devices will be required to support baseline interoperability feature requirements to

create a positive consumer experience. For some devices, these requirements will extend from radio module compliance and air protocols, up to application-level protocols and object exchange formats. For other devices, such as a headset, the feature requirements will be significantly less. Ensuring that any device displaying the Bluetooth "logo" interoperates with other Bluetooth devices is a goal of the Bluetooth program. Software interoperability begins with the Bluetooth link level protocol responsible for protocol multiplexing, device and service discovery, and segmentation and reassembly. Bluetooth devices must be able to recognise each other and load the appropriate software to discover the higher level abilities each device supports. Interoperability at the application level requires identical protocol stacks. Different classes of Bluetooth devices (PC's, handhelds, headsets, cellular telephones) have different compliance requirements. For example, you would never expect a Bluetooth headset to contain an address book. Headsets compliance implies Bluetooth radio compliance, audio capability, and device discovery protocols. More functionality would be expected from cellular phones, handheld and notebook computers. To obtain this functionality, the Bluetooth software framework will reuse existing specifications such as OBEX, vCard/vCalendar, Human Interface Device (HID), and TCP/IP rather than invent yet another set of new specifications. Device compliance will require conformance to both the Bluetooth Specification and existing protocols. The Software Framework is contemplating the following functions:

Configuration and diagnosis utility Device discovery Cable emulation

Page 24: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 1 Introduction to Bluetooth

13 ICL

Peripheral communication Audio communication and call control Object exchange for business cards and phone books Networking protocol

1.5. BLUETOOTH ADVANTAGES:

Bluetooth technology has a lot of advantages compared to its competitors.

First and most important is that it does not require line of sight between its connected devices, but it has a lot of other appealing features too.

The radio-frequency transmitters that are used do not need a lot of power, a small watch battery will keep the device running for months. This will mean that it will not add much weight to devices that are being carried, like mobile phones.

It has built in security. The power of the radio waves are so weak ,that it does not interfere with other electrical

devices and it is not any hazard of health risks. It is easy to integrate TCP/IP for networking. A big plus is that it has been approved worldwide by governments, regarding the radio

frequency. This means it stands a good chance of being a standard technology. Although not all major players in the computer industry have agreed to participate in the

work towards a standard technology, a lot of them have, and this will of course reflect on its availability to the customers. More products will lead to that the prices will go down. So within a couple of years Bluetooth technology should be quite cheap.

1.6. BLUETOOTH DISADVANTAGES:

Bluetooth has several advantages to its competitors, however it also has some disadvantages that its competitors do not have.

Since it uses a radio frequency that is open to most radio transmitters on the market it has

to cope with interference from common products like: garage door openers, cordless phones, microwave ovens and several others that use that frequency. However the impact from that interference is hard to grasp now because the product has not yet hit the market. Bluetooth frequency hopping protection might be enough to avoid disturbance from the interference.

While the IrDA devices simply needs to be pointed at each other to discover its match, Bluetooth devices need to perform a time-consuming operation to find any other devices in the same area. If several devices are found then the user will be forced to choose the right one from a presented list . This will of course require that information regarding the other devices is available to the user in order for him to make the right choice.

Initially the prices for Bluetooth products will be high. The chip for mobile devices will cost about $20 which will of course result in high prices for the product. But eventually the chip is deemed to reach costs of under one dollar when production rates increase.

Page 25: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 1 Introduction to Bluetooth

14 ICL

1.7. BLUETOOTH USAGE MODELS: Bluetooth products have a large variety of applications. In this sections we are going to present some of them:

The three-in-one phone: At home, your phone functions as a portable phone (fixed line charge). When you're on the move, it functions as a mobile phone (cellular charge). And when your phone comes within range of another mobile phone with built-in Bluetooth wireless technology it functions as a walkie talkie (no telephony charge). (See figure 1.8)

Figure 1.8: The 3 in 1 phone

The Internet bridge: Use your mobile computer to surf the Internet wherever your are, and regardless if you're cordlessly connected through a mobile phone (cellular) or through a wire-bound connection (e.g. PSTN, ISDN, LAN, xDSL). (See figure 1.9)

Figure 1.9: The Internet bridge

The interactive conference: In meetings and conferences you can transfer selected documents instantly with selected participants, and exchange electronic business cards automatically, without any wired connections. (See figure 1.10)

Figure 1.10: The interactive conference

Page 26: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 1 Introduction to Bluetooth

15 ICL

The ultimate headset: Connect your wireless headset to your mobile phone, mobile

computer or any wired connection to keep your hands free for more important tasks when you're at the office or in your car. (See figure 1.11)

Figure 1.11: The ultimate headset

The automatic synchronizer: Automatic synchronization of

your desktop, mobile computer, notebook (PC-PDA and PC-HPC) and your mobile phone. For instance, as soon as you enter your office the address list and calendar in your notebook will automatically be updated to agree with the one in your desktop, or vice versa. (See figure 1.12)

While in a meeting, you access your PDA to send your presentation to the electronic whiteboard. You record meeting minutes on your PDA and wirelessly transfer these to the attendees before they leave the meeting. (See figure 1.13)

You are the factory supervisor. As you walk through the factory, you are able to check the status of every piece of test equipment you encounter because you can instantly download a user interface for every machine. You request product defect rates and piece part failures at selected workstations. (See figure 1.14)

Figure 1.12: The automatic synchronizer

Figure 1.13: Bluetooth enhanced meeting

Figure 1.14: Bluetooth enhanced factory

Page 27: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 1 Introduction to Bluetooth

16 ICL

Upon arriving at your home, the door automatically unlocks for you, the entry way lights come on, and the heat is adjusted to your pre-set preferences. (See figure 1.15)

Your PDA morphs from business to personal as you enter your home. An electronic bulletin board in the home automatically adds your scheduled activities to the family calendar, and alerts you of any conflicts. (See figure 1.16)

As you enter a national park, a map of the park appears

on your display. You can view the schedule of activities for the park and your own personal electronic tour guide is downloaded to your vehicle. ( See figure 1.17)

You arrive at the hotel. As you enter, you are automatically checked in and your room number and electronic key are transferred to your PDA. As you approach the room, the door automatically opens. (See figure 1.18)

It seems that usage models for Bluetooth products are endless. To prove this, let’s read this extract from an IEEE spectrum article:

When the last morning speaker during a recent Bluetooth conference wrapped up a half hour earlier than expected, Bryan, suggested that, rather than breaking early for lunch, the 120 attendees could make good use of the time by brainstorming new ideas for profiles that could lead to new applications for Bluetooth. “In the spirit of this kind of forum, no idea is too silly,” said Bryan, director of technical marketing for Ericsson’s Standards Business Group,. One attendee thought it

Figure 1.15: Bluetooth enhanced lights

Figure 1.16: PDA between home and work

Figure 1.17: Electronic tour guide

Figure 1.18: Automatic checking in

Page 28: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 1 Introduction to Bluetooth

17 ICL

would be neat to have “a system that will automatically reset all the digital clocks in my house following a power outage.” Another idea, whispered among a few young engineers sitting in the back of the room, was a Bluetooth link between their roller blades and a speedometer in a digital watch.

These would probably work, but most of the more than 2000 member companies of the

Bluetooth Special Interest Group (SIG) are counting initially on big-number applications, such as cell phones and Palm-type personal digital assistants (PDAs). Although originally thought of simply as a replacement for the unseemly nest of wires that connects PCs to keyboards and printers, Bluetooth quickly evolved into a system that will allow people to detect and communicate with each other through a variety of mainly portable devices without their users’ intervention. Bluetooth-enabled devices will be able to “talk” to each other as they come into range, which is about 10 meters, although this figure can be extended to more than 100 meters by increasing the transmit power from a nominal 1 mW to as much as 100 mW.

With Bluetooth technology, you can send e-mail from the computer on your lap to the

cellular phone in your briefcase. Your Bluetooth-linked cell phone or similarly equipped PDA can automatically synchronize with your desktop PC whenever you pass it within Bluetooth range. Or, you can have hands-free communications between a Bluetooth-enabled headset and a cell phone, or you can download images from a digital camera to a PC or cell phone.

Critical mass is critical to Bluetooth success. Bluetooth technology is expected to make its

debut in cell phones and PDAs, but then will move quickly into notebook and laptop computers, printers, scanners, digital cameras, household appliances, security/remote access, games, toys, and more. “Adoption is the first and foremost concern in the Bluetooth space,” said Randy Giusto, vice president of Desktop and Mobile Research at International Data Corp., Framingham, Mass. “The key for a communications technology is that it has other devices to talk to. Without other devices supporting Bluetooth, the technology is useless.”

Ericsson, which started it all with the development of the Bluetooth concept, has already

announced several Bluetooth products, including a headset, a PC Card for laptops and PDAs, and two Bluetooth cell phones. A Bluetooth keyboard and mouse are on the drawing board. Nokia and Fujifilm are working on a mobile imaging technology they believe would enable Nokia to add a Bluetooth chip to its clamshell-shaped 9110 Communicator so that it could receive images taken on a Bluetooth-equipped Fujifilm digital camera. After the addition of a few lines of text, the received photographs can be sent to another Nokia Communicator, or to the Fujifilm Web service, where it can be viewed, printed, or burned into a CD-ROM. Finnish telecom operator Sonera has even demonstrated a Bluetooth-enabled vending machine— consumers buy products out of the machine by simply signaling an account code from a Bluetooth cell phone or PDA. The code would debit the user’s account based on the code. Eventually, cell phones and PDAs are expected to be able to display personal bar codes, which can be read by a vending machine scanner. The Gartner Group calls it the Supranet—the wireless connection of data and transactions between the hard-wire Internet, wireless devices such as cell phones and PDAs, and the “papernet,” meaning the physical world of business cards and legal documents. Emerging seamless connections will deliver a whole host of new technologies, according to Gartner, with one of the first integral technologies to be tied to the Supranet being Bluetooth. By 2004, according to Gartner, 70 percent of new cell phones and 40 percent of new PDAs will use wireless technology for direct access to Web content and enterprise networks. Gartner believes that Bluetooth is set to become a defining force in portable electronic products.

Page 29: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 1 Introduction to Bluetooth

18 ICL

OPUFT!

Page 30: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

ICL

T

Bluetooth Security

he need for Bluetooth Security measures emerged from the fact that it is a wireless communication standard working in the ISM band which is not a licensed band. Many

applications use this free band which renders the environment noisy that we can receive data from unauthorized users. The transmitted data between different Bluetooth devices can suffer from eavesdropping as well. To avoid this, different measures have been taken into consideration in Bluetooth Security as will be explained.

Baseband Bluetooth Security has three main procedures: 1- Key generation and handling. 2- Authentication. 3- Encryption.

In this chapter, We will have a quick look on the Bluetooth entities used in security measures in section 2.1. We are going to talk about different keys used,: how they are generated and when they are used. This is covered in section 2.2. Then we are going to talk about Authentication in section 2.3. Section 2.4 will cover the SAFER+ algorithm used in both key generation and authentication. Section 2.5 will cover the issue of encryption in Bluetooth and finally comes section 2.6 to talk about the need for a random number in Bluetooth Security operations. See figure 2.1.

Figure 2.1: Different issues covered in chapter 2

Bluetooth Security

Key Handling Sec. 2.2

Encryption Sec. 2.5

Authentication Sec. 2.3

SAFER+ Sec. 2.4

Random Number

GenerationSec 2.6

Bluetooth Security entities Sec. 2.1

A secret between more than two is not a secret

Page 31: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

20 ICL

2.1. BLUETOOTH SECURITY ENTITIES: Four different entities are used for maintaining security at the link layer: a public address

which is unique for each user, two secret keys, and a random number which is different for each new transaction. The four entities and their sizes as used in Bluetooth are summarized in table 2.1.

Entity Size

BD_ADDR 48 bits

Private user key, authentication 128 bits

Private user key, encryption 8-128 bits

RAND 128 bits

Table 2.1: Entities used in authentication and encryption procedures

The Bluetooth device address (BD_ADDR) is the 48-bit IEEE address which is unique for

each Bluetooth unit. The Bluetooth addresses are publicly known, and can be obtained via MMI interactions, or, automatically, via an inquiry routine by a Bluetooth unit. The secret keys are derived during initialization and are further never disclosed. Normally, the encryption key is derived from the authentication key during the authentication process. For the authentication algorithm, the size of the key used is always 128 bits. For the encryption algorithm, the key size may vary between 1 and 16 octets (8 - 128 bits). The size of the encryption key shall be configurable for two reasons. The first has to do with the many different requirements imposed on cryptographic algorithms in different countries both w.r.t. export regulations and official attitudes towards privacy in general. The second reason is to facilitate a future upgrade path for the security without the need of a costly redesign of the algorithms and encryption hardware; increasing the effective key size is the simplest way to combat increased computing power at the opponent side. Currently it seems that an encryption key size of 64 bits gives satisfying protection for most applications. The encryption key is entirely different from the authentication key (even though the latter is used when creating the former. Each time encryption is activated, a new encryption key shall be generated. Thus, the lifetime of the encryption key does not necessarily correspond to the lifetime of the authentication key. It is anticipated that the authentication key will be more static in its nature than the encryption key once established, the particular application running on the Bluetooth device decides when, or if, to change it. To underline the fundamental importance of the authentication key to a specific Bluetooth link, it will often be referred to as the link key. The RAND is a random number which can be derived from a random or pseudo-random process in the Bluetooth unit. This is not a static parameter, it will change frequently. In the remainder of this chapter, the terms user and application will be used interchangeably to designate the entity that is at the originating or receiving side.

Page 32: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

21 ICL

2.2. KEY HANDLING:

Keys are used in Bluetooth to secure data transmission as well as to authenticate users. Many types of keys are used in Bluetooth Security as will be explained in this section. Also how to generate these keys is explained as well.

2.2.1.KEY TYPES: The link key is a 128-bit random number which is shared between two or more parties and is

the base for all security transactions between these parties. The link key itself is used in the authentication routine. Moreover, the link key is used as one of the parameters when the encryption key is derived. In the following, a session is defined as the time interval for which the unit is a member of a particular piconet. Thus, the session terminates when the unit disconnects from the piconet. The link keys are either semi-permanent or temporary. A semi-permanent link key is stored in non-volatile memory and may be used after the current session is terminated. Consequently, once a semi-permanent link key is defined, it may be used in the authentication of several subsequent connections between the Bluetooth units sharing it. The designation semi-permanent is justified by the possibility to change it. The lifetime of a temporary link key is limited by the lifetime of the current session it cannot be reused in a later session. Typically, in a point-to-multipoint configuration where the same information is to be distributed securely to several recipients, a common encryption key is useful. To achieve this, a special link key (denoted master key) can temporarily replace the current link keys.

In the sequel we sometimes refer to what is denoted as the current link key. This is simply

the link key in use at the current moment. It can be semi-permanent or temporary. Thus, the current link key is used for all authentications and all generation of encryption keys in the on-going connection (session).

In order to accommodate for different types of applications, four types of link keys have

been defined: 1. The combination key KAB. 2. The unit key KA. 3. The temporary key Kmaster. 4. The initialization key Kinit. In addition to these keys there is an encryption key, denoted Kc . This key is derived from

the current link key. Whenever the encryption is activated by a LM command, the encryption key shall be changed automatically. See figure 2.2.

The purpose of separating the authentication key and encryption key is to facilitate the use

of a shorter encryption key without weakening the strength of the authentication procedure. There are no governmental restrictions on the strength of authentication algorithms. However, in some countries, such restrictions exist on the strength of encryption algorithms.

For a Bluetooth unit, the combination key KAB and the unit key KA are functionally

indistinguishable; the difference is in the way they are generated. The unit key KA is generated in, and therefore dependent on, a single unit A. The unit key is generated once at installation of the Bluetooth unit; thereafter, it is very rarely changed. The combination key is derived from information in both units A and B, and is therefore always dependent on two units. The combination key is derived for each new combination of two Bluetooth units. It depends on the application or the device whether a unit key or a combination key is used. Bluetooth units which

Page 33: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

22 ICL

have little memory to store keys, or, when installed in equipment that must be accessible to a large group of users, will preferably use their own unit key. In that case, they only have to store a single key. Applications that require a higher security level preferably use the combination keys. These applications will require more memory since a combination key for each link to a different Bluetooth unit has to be stored.

The master key, Kmaster, is a link key only used during the current session. It will replace the

original link key only temporarily. For example, this may be utilized when a master wants to reach more than two Bluetooth units simultaneously using the same encryption key.

The initialization key, Kinit , is used as link key during the initialization process when no

combination or unit keys have been defined and exchanged yet or when a link key has been lost. The initialization key protects the transfer of initialization parameters. The key is derived from a random number, an L-octet PIN code, and a BD_ADDR. This key is only to be used during initialization. The PIN can be a fixed number provided with the Bluetooth unit (for example when there is no MMI as in a PSTN plug). Alternatively, the PIN can be selected arbitrarily by the user, and then entered in both units that have to be matched. The latter procedure is used when both units have an MMI, for example a phone and a laptop. Entering a PIN in both units is more secure than using a fixed PIN in one of the units, and should be used whenever possible. Even if a fixed PIN is used, it shall be possible to change the PIN; this in order to prevent re-initialization by users who once got hold of the PIN. If no PIN is available, a default value of zero is to be used. For many applications the PIN code will be a relatively short string of numbers. Typically, it may consist of only four decimal digits. Even though this gives sufficient security in many cases, there exist countless other, more sensitive, situations where this is not reliable enough. Therefore, the PIN code can be chosen to be any length from 1 to 16 octets. For the longer lengths, we envision the units exchanging PIN codes not through mechanical (i.e. human) interaction, but rather through means supported by software at the application layer. For example, this can be a Diffie-Hellman key agreement, where the exchanged key is passed on to the generation process in both units, just as in the case of a shorter PIN code.

Figure 2.2: Different types of keys.

Key Types

Encryption Key Kc

Authentication or Link Key

Semi-Permanent Keys

Unit Key KA Combination key KAB

Temporary Key

Master Key Kmaster

Initialization key Kinit

Page 34: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

23 ICL

2.2.2.INITIALIZATION:

The link keys have to be generated and distributed among the Bluetooth units in order to be used in the authentication procedure. Since the link keys must be secret, they cannot be obtained through an inquiry routine in the same way as the Bluetooth addresses. The exchange of the keys takes place during an initialization phase which has to be carried out separately for each two units that want to implement authentication and encryption. All initialization procedures consist of the following five parts:

1- Generation of an initialization key. 2- Generation of link key. 3- Link key exchange. 4- Authentication. 5- Generating of encryption key in each unit (optional). After the initialization procedure, the units can proceed to communicate, or the link can be

disconnected. If encryption is implemented, the E0 algorithm is used with the proper encryption key derived from the current link key. For any new connection established between units A and B, they will use the common link key for authentication, instead of once more deriving Kinit from the PIN. A new encryption key derived from that particular link key will be created next time encryption is activated. If no link key is available, the LM shall automatically start an initialization procedure.

The most important secret parameter in Bluetooth should be the PIN code, since all other key has to be derived from it to some extent (initialization key), or need it to exchange securely (all kinds of link key). This decides Bluetooth is not fit for a wide scale scenario like WAN or Internet. However, since Bluetooth aims at the small scale "short range radio link", it should be secure satisfiable in most applications.

2.2.3. ALGORITHMS USED IN KEY GENERATION:

Three algorithms are used in the generation of keys as shown in table 2.2.

Key Type Algorithm Used Unit Key E21

Combination Key E21 Master Key E22

Initialization Key E22 Encryption Key E3

Table 2.2: Different algorithms used in key generation

E2 is the algorithm used in the derivation of the key used for authentication. It has two

modes of operation E21 and E22 as shown in figure 2.3. In the first mode, the algorithm should produce on input of a 128-bit RAND value and a 48-bit address, a 128-bit link key . This mode is utilized when creating unit keys and combination keys. In the second mode the function should produce, on input of a 128-bit RAND value and an L octet user PIN, a 128-bit link key . The second mode is used to create the initialization key, and also whenever a master key is to be generated.

This key generating algorithm exploits the cryptographic function Ar’ which is a modified

version of the SAFER+ algorithm to be explained in section 2.4. It can be written as:

Page 35: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

24 ICL

),('),(}1,0{}1,0{}1,0{: 12848128

21

YXAraddressRANDE

a

→×

where (for mode 1)

=

⊕∪=

∪=

)]6(mod[

)6]15[(]14...0[15

0iaddressY

RANDRANDX

i

Note that: RAND[0] is the octet number 0 of RAND.

Figure 2.3 : Key generation algorithm E2

Let L be the number of octets in the user PIN. The augmenting is defined by

−−∪−

=],1...0[

}],15,5min{...0[_]1...0['

LPINlADDRBDLPIN

PIN B ,16,16

=<

LL

where it is assumed that unit B is the claimant. Then, in mode 2, E2 can be expressed as

),(')',,'(}1,0{}16,...,2,1{}1,0{}1,0{: 128128'8

22

YXArLRANDPINE L

a

→××

where

⊕∪=

= ∪=

)']15[(]14...0[

)]'(mod['15

0

LRANDRANDY

LiPINXi

E21

E22

Mode 1 Mode 2

RAND

BD_ADDR RAND

PIN’

Key Key

128 128

128

128 48

8L’

L’

Page 36: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

25 ICL

and L’=min{16,L+6} is the number of octets in PIN’.

The encryption key Kc used by E0 is generated by E3. See figure 2.4. The algorithm E3 is

constructed as follows:

)12,,,(),,(}1,0{}1,0{}1,0{}1,0{: 12896128128

3

COFRANDKHashCOFRANDKE

a

→××

Where Hash is the hash function to be explained in authentication procedure in section 2.3. It

is formed again by Ar’ together with Ar. Note that the produced key length is 128 bits. However, before use within E0, the encryption key Kc will be shortened to the correct encryption key length.

The value of COF (Ciphering Offset Number) is determined as follows:

=,

,__ACO

ADDRBDADDRBDCOF

where ACO is the Authentication Ciphering Offset produced during authentication using E1

as explained in section 2.3.

Figure 2.4: Generation of the encryption key

2.2.4. GENERATION OF THE INITIALIZATION KEY: A link key used temporarily during initialization is derived (the initialization key). This key

is derived by the algorithm E22 from a BD_ADDR, a PIN code, the length of the PIN (in octets), and a random number IN_RAND as discussed in 2.2.3. The 128-bit output from E22 will be used for key exchange during the generation of a link key. When the units have performed the link key exchange, the initialization key shall be discarded. When the initialization key is generated, the PIN is augmented with the BD_ADDR. If one unit has a fixed PIN the BD_ADDR of the other unit is used. If both units have a variable PIN the BD_ADDR of the device that received IN_RAND is used. If both units have a fixed PIN they cannot be paired. Since the maximum length of the PIN used in the algorithm cannot exceed 16 octets, it is possible that not all octets of BD_ADDR will be

if link key is a master key otherwise

E3

EN_RAND

Link key

Kc

128

128

128

COF 96

Page 37: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

26 ICL

used. This procedure ensures that depends on the identity of the unit with a variable PIN. A fraudulent Bluetooth unit may try to test a large number of PINs by each time claiming another BD_ADDR. It is the application’s responsibility to take countermeasures against this threat. If the device address is kept fixed, the waiting interval until next try is permitted is increased exponentially.

2.2.5. GENERATION OF A UNIT KEY:

A unit key is generated when the Bluetooth unit is for the first time in operation; i.e. not during each initialization. The unit key is generated by the E21 algorithm as described in 2.2.3. Once created, the unit key is stored in non-volatile memory and (almost) never changed. If after initialization the unit key is changed, the previously initialized units will possess a wrong link key. At initialization, the application has to determine which of the two parties will provide the unit key as link key. Typically, this will be the unit with restricted memory capabilities, since this unit only has to remember its own unit key. The unit key is transferred to the other party and then stored as link key for that particular party. So, for example in figure 2.5, the unit key of unit A, KA , is being used as link key for the connection A-B; unit A sends the unit key KA to unit B; unit B will store KA as the link key KBA. For another initialization, for example with unit C, unit A will reuse its unit key KA whereas unit C stores it as KCA .

Figure 2.5: Generation of unit key. When the unit key has been exchanged, the initialization key

shall be discarded in both units.

2.2.6. GENERATION OF A COMBINATION KEY:

If it is desired to use a combination key, this key is first generated during the initialization procedure. The combination key is the combination of two numbers generated in unit A and B, respectively. First, each unit generates a random number, say LK_RANDA and LK_RANDB . Then, utilizing E21 with the random number and the own BD_ADDR, the two random numbers

),_,_(_ 21 AAA ADDRBRRANDLKEKLK = and

),_,_(_ 21 BBB ADDRBRRANDLKEKLK =

are created in unit A and unit B, respectively. These numbers constitute the units’ contribution to the combination key that is to be created. Then, the two random numbers LK_RAND A and LK_RAND B are exchanged securely by XORing with the current link key, say K. Thus, unit A

Unit A Unit B

+

KA

Kini

+

Kini

KBA=KA

Page 38: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

27 ICL

sends ARANDLKK _⊕ to unit B, while unit B sends BRANDLKK _⊕ to unit A. Clearly, if this is done during the initialization phase the link key K = Kinit .

When the random numbers LK_RANDA and LK_RANDB have been mutually exchanged, each unit recalculates the other units contribution to the combination key. This is possible since each unit knows the Bluetooth device address of the other unit. Thus, unit A calculates LK_KB and unit B calculates LK_RANDA. After this, both units combine the two numbers to generate the 128-bit link key. The combining operation is a simple bitwise modulo-2 addition (i.e. XOR). The result is stored in unit A as the link key KAB and in unit B as the link key KBA . When both units have derived the new combination key, a mutual authentication procedure shall be initiated to confirm the success of the transaction. The old link key shall be discarded after a successful exchange of a new combination key. The message flow between master and slave and the principle for creating the combination key is depicted in figure 2.6.

Figure 2.6: Generating a combination key. The old link key (K) shall be discarded after the exchange of a new combination key has succeeded

2.2.7. GENERATING A MASTER KEY:

To create the master link key, which can replace the current link key during an initiated

session, other means are needed. First, the master creates a new link key from two 128-bit random numbers, RAND1 and RAND2. This is done by

)16,2,1(22 RANDRANDEK master =

Clearly, this key is a 128-bit random number. The reason to use the output of and not

directly choose a random number as the key, is to avoid possible problems with degraded randomness due to a poor implementation of the random number generator within the Bluetooth unit.

Then, a third random number, say RAND, is transmitted to the slave. Using E22 with the

current link key and RAND as inputs, both the master and slave computes a 128-bit overlay. The master sends the bitwise XOR of the overlay and the new link key to the slave. The slave, who

Page 39: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

28 ICL

knows the overlay, recalculates Kmaster . To confirm the success of this transaction, the units shall perform a mutual authentication procedure using the new link key . This procedure is then repeated for each slave who shall receive the new link key. The ACO values from the involved authentications should not replace the current existing ACO as this ACO is needed to (re)compute a ciphering key when the master wants to fall back to the previous link (non-temporary) key.

When so required - and potentially long after the actual distribution of the master key - the master activates encryption by an LM command. Before doing that, the master must ensure that all slaves receive the same random number, say EN_RAND, since the encryption key is derived through the means of E3 individually in all participating units. Then, each slave computes a new encryption key,

),_,(3 COFRANDENKEK masterc = where the value of COF is derived from the master’s BD_ADDR as specified in 2.2.3. The principle of the message flow between the master and slave when generating the master key is depicted in figure 2.7. Note that in this case the ACO produced during the authentication is not used when computing the ciphering key.

Figure 2.7: Master link key distribution and computation of the corresponding encryption key.

2.2.8. POINT-TO-MULTIPOINT CONFIGURATION:

It is quite possible for the master to use separate encryption keys for each slave in a point-to-multipoint configuration with ciphering activated. Then, if the application requires more than one slave to listen to the same payload, each slave must be addressed individually. This may cause unwanted capacity loss for the piconet. Moreover, a Bluetooth unit (slave) is not capable of switching between two or more encryption keys in real time (e.g., after looking at the AM_ADDR in the header). Thus, the master cannot use different encryption keys for broadcast messages and individually addressed traffic. Alternatively, the master may tell several slave units to use a common link key (and, hence, indirectly also to use a common encryption key) and broadcast the information encrypted. For many applications, this key is only of temporary interest. In the sequel, this key is denoted .

The transfer of necessary parameters is protected by the routine described in 2.2.7. After the

confirmation of successful reception in each slave, the master shall issue a command to the slaves to replace their respective current link key by the new (temporary) master key. Before encryption can

Page 40: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

29 ICL

be activated, the master also has to generate and distribute a common EN_RAND to all participating slaves. Using this random number and the newly derived master key, each slave generates a new encryption key. Note that the master must negotiate what encryption key length to use individually with each slave who wants to use the master key. In case the master already has negotiated with some of these slaves, it has knowledge of what sizes can be accepted. Clearly, there might be situations where the permitted key lengths of some units are incompatible. In that case, the master must have the limiting unit excluded from the group.

When all slaves have received the necessary data, the master can communicate information

on the piconet securely using the encryption key derived from the new temporary link key. Clearly, each slave in possession of the master key can eavesdrop on all encrypted traffic, not only the traffic intended for itself. If so desired, the master can tell all participants to fall back to their old link keys simultaneously. 2.2.9. MODIFYING THE LINK KEYS: In certain circumstances, it is desirable to be able to modify the link keys. A link key based on a unit key can be changed, but not very easily. The unit key is created once during the first use. Changing the unit key is a less desirable alternative, as several units may share the same unit key as link key (think of a printer whose unit key is distributed among all users using the printer’s unit key as link key). Changing the unit key will require re-initialization of all units trying to connect. In certain cases, this might be desirable; for example to deny access to previously allowed units. If the key change concerns combination keys, then the procedure is rather straightforward. The change procedure is identical to the procedure illustrated in 2.6, using the current value of the combination key as link key. This procedure can be carried out at any time after the authentication and encryption start. In fact, since the combination key corresponds to a single link, it can be modified each time this link is established. This will improve the security of the system since then old keys lose their validity after each session.

Of course, starting up an entirely new initialization procedure is also a possibility. In that case, user interaction is necessary since a PIN is required in the authentication and encryption procedures.

2.2.10. GENERATION OF THE ENCRYPTION KEY:

The encryption key, Kc , is derived by algorithm E3 from the current link key, a 96-bit Ciphering OFfset number (COF), and a 128-bit random number. The COF is determined in one of two ways. If the current link key is a master key, then COF is derived from the master BD_ADDR. Otherwise the value of COF is set to the value of ACO as computed during the authentication procedure. This was explained in 2.2.3.

There is an explicit call of E3 when the LM activates encryption. Consequently, the encryption key is automatically changed each time the unit enters the encryption mode.

Page 41: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

30 ICL

2.3.AUTHENTICATION:

The entity authentication used in Bluetooth uses a challenge-response scheme in which a claimant’s knowledge of a secret key is checked through a 2-move protocol using symmetric secret keys. The latter implies that a correct claimant/verifier pair share the same secret key, for example K. In the challenge-response scheme the verifier challenges the claimant to authenticate a random input (the challenge), denoted by AU_RANDA , with an authentication code, denoted by E1 , and return the result SRES to the verifier, see figure 2.8. This figure shows also that in Bluetooth the input to E1 consists of AU_RANDA and the Bluetooth device address (BD_ADDR) of the claimant. The use of this address prevents a simple reflection attack. The secret K shared by units A and B is the current link key.

Figure 2.8: Challenge-response for the Bluetooth.

The challenge-response scheme for symmetric keys used in the Bluetooth is depicted in figure 2.9.

Figure 2.9: Challenge-response for symmetric key systems. In the Bluetooth, the verifier is not necessarily the master. The application indicates who has

to be authenticated by whom. Certain applications only require a one-way authentication. However, in some peer-to-peer communications, one might prefer a mutual authentication in which each unit is subsequently the challenger (verifier) in two authentication procedures. The LM coordinates the

Page 42: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

31 ICL

indicated authentication preferences by the application to determine in which direction(s) the authentication(s) has to take place. For mutual authentication with the units of figure 2.8, after unit A has successfully authenticated unit B, unit B could authenticate unit A by sending a AU_RANDB (different from the AU_RANDA that unit A issued) to unit A, and deriving the SRES and SRES’ from the new AU_RANDB , the address of unit A, and the link key KAB .

If an authentication is successful the value of ACO as produced by E1 should be retained.

2.3.1.REPEATED ATTEMPTS: When the authentication attempt fails, a certain waiting interval must pass before the verifier

will initiate a new authentication attempt to the same claimant, or before it will respond to an authentication attempt initiated by a unit claiming the same identity as the suspicious unit. For each subsequent authentication failure with the same Bluetooth address, the waiting interval shall be increased exponentially. That is, after each failure, the waiting interval before a new attempt can be made, for example, twice as long as the waiting interval prior to the previous attempt. The waiting interval shall be limited to a maximum. The maximum waiting interval depends on the implementation. The waiting time shall exponentially decrease to a minimum when no new failed attempts are being made during a certain time period. This procedure prevents an intruder to repeat the authentication procedure with a large number of different keys.

To make the system somewhat less vulnerable to denial-of-service attacks, the Bluetooth units should keep a list of individual waiting intervals for each unit it has established contact with. Clearly, the size of this list must be restricted only to contain the N units with which the most recent contact has been made. The number N can vary for different units depending on available memory size and user environment.

2.3.2. THE AUTHENTICATION ALGORITHM E1 : The authentication algorithm E1 proposed for the Bluetooth is a computationally secure

authentication code, or often called a MAC. E1 uses the encryption function called SAFER+. The algorithm is an enhanced version of an existing 64-bit block cipher SAFER-SK128, and it is freely available. In the sequel the block cipher will be denoted as the function which maps under a 128-bit key, a 128-bit input to a 128-bit output, i.e.

.)(}1,0{}1,0{}1,0{: 128128128

txkAr

a×→×

The details of Ar are given in section 2.4. The algorithm E1 is constructed using Ar as follows:

),(),,(}1,0{}1,0{}1,0{}1,0{}1,0{: 963248128128

1

ACOSRESaddressRANDKE

a

×→××

where SRES = Hash(K,RAND,address,6)[0,…,3], where Hash is a keyed hash function

defined as,

)])),((),([],~(['),,,(

}1,0{}12,6{}1,0{}1,0{}1,0{:

116116221

1288128128

IIKArLIEKArLIIK

Hash L

⊕+

→××× ×

a

where the operator +16 denotes bytewise addition mod 256 of the 16 octets, and the operator

16⊕ denotes bytewise XORing of the 16 octets.

Page 43: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

32 ICL

and where

)](mod[()],1,...,0[(}1,0{}12,6{}1,0{: 1688

LiXLLXE L

a−→× ××

is an expansion of the L octet word X into a 128-bit word. Thus we see that we have to evaluate the function Ar twice for each evaluation of E1 . The key K~ for the second use of Ar (actually Ar’) is offseted from K as follows :

,149]14[]14[~,179]12[]12[~,223]10[]10[~

,233]8[]8[~,256mod)149]6[(]6[~,256mod)179]4[(]4[~,256mod)223]2[(]2[~,256mod)233]0[(]0[~

⊕=

⊕=

⊕=

⊕=

+=

+=

+=

+=

KpK

KpK

KpK

KpK

KK

KK

KK

KK

.256mod)131]15[(]15[~,256mod)167]13[(]13[~,256mod)193]11[(]11[~

,256mod)229]9[(]9[~,131]7[]7[~,167]5[]5[~,193]3[]3[~,229]1[]1[~

+=

+=

+=

+=

⊕=

⊕=

⊕=

⊕=

KK

KK

KK

KK

KpK

KpK

KpK

KpK

A data flowchart of the computation of E1 is depicted in figure2.10. E1 is also used to deliver the parameter ACO (Authenticated Ciphering Offset) that is used in the generation of the ciphering key by E3 . The value of ACO is formed by the octets 4 through 15 of the output of the hash function. i.e.

].15,...,4)[6,,,( addressRANDKHashACO =

Figure 2.10: Flow of data for the computation of E1 .

for i = 0…15)

Page 44: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

33 ICL

2.4. SAFER+ ALGORITHM USED IN THE FUNCTIONS AR AND AR’:

2.4.1. HISTORY OF THE ALGORITHM:

SAFER+ was one of the candidates for the Advanced Encryption Standard (AES). SAFER stands for Secure And Fast Encryption Routine. The inventors of the algorithm are:

Prof. James L. Massey (Prof. Emeritus, ETH Zurich, Switzerland) Prof. Gurgen H. Khachatrian (Academy of Sciences, Armenia) Dr. Melsik K. Kuregian (Academy of Sciences, Armenia)

The owner of the algorithm is Cylink corporation. SAFER+ is based on the existing SAFER family of ciphers, which comprises the ciphers

SAFER K-64, SAFER K-128, SAFER SK-64, SAFER SK-128, and SAFER SK-40. The block size of all the ciphers in the existing SAFER family is 64 bits, while the user-selected-key length is 40 or 64 or 128 bits as indicated in the name of the particular cipher.

The ciphers in the existing SAFER family are non-proprietary ciphers and were designed by

Prof, James L. Massey of the ETH Zurich (Swiss Federal Institute of Technology, Zurich) at the request of Cylink Corporation. The first of these ciphers, SAFER K-64, was publicly announced at the Dec. 9-11, 1993, Fast Software Encryption workshop in Cambridge, England. The other ciphers in the SAFER family differ from SAFER K-64 only in the key schedules that they use and in the recommended number of encryption rounds to be used.

SAFER+ offers substantial improvements over the previous ciphers in the SAFER family,

which is one of the grounds for choosing the name SAFER+ for this cipher. The name SAFER+ also serves to distinguish the proposed cipher from those in the existing SAFER family. The improvements incorporated in SAFER+ were developed by Maasey together with Prof. Gurgen H, Khachatrian and Dr. Melsik K. Kuregian. SAFER+ provides for a block size of 128 bits for the plaintext and ciphertext and accommodates three different user-selected-key lengths, namely 128, 192 and 256 bits.

2.4.2. SPECIFICATION OF THE SAFER+ ALGORITHM:

The general encryption and decryption structure of the SAFER+ algorithm is shown in figure

2.11. As indicated in figure 2.11, the input for encryption is the plaintext block of 16 bytes. The plaintext block then passes through r rounds of encryption where r is determined by the key length chosen for encryption in the following manner; • if key length = 128 bits, then r = 8 rounds.

• if key length = 192 bits, then r = 12 rounds. • if key length = 256 bits, then r = 16 rounds.

Two 16-byte round subkeys are used within each round of encryption. These round subkeys (K1 , K2 ,…, K2r) are determined from the user-selected key K according to a key schedule described below. Finally, the last round subkey K2r+1 is "added" to the block produced by the r rounds of encryption in the manner that bytes 1, 4, 5, 8, 9, 12, 13, and 16 are added together bit-by- bit modulo two (the bitwise "exclusive-or" operation) while bytes 2, 3, 6, 7, 10, 11, 14 and 15 are added together modulo 256 ("byte addition"). This "addition" of round subkey K2r+1 constitutes the output transformation for SAFER+ encryption and produces the ciphertext block of 16 bytes.

Page 45: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

34 ICL

As indicated in figure 2.11, the input for decryption is the ciphertext block of 16 bytes. Decryption begins with the input transformation that undoes the output transformation in the encryption process. In the input transformation, the round subkey K2r+1 is "subtracted" from the ciphertext block in the manner that the round subkey bytes 1, 4, 5, 8, 9, 12, 13, and 16 are added together bit-by-bit modulo two (the bitwise "exclusive-or" operation) to the corresponding ciphertext bytes (because modulo-two addition and subtraction coincide) while round subkey bytes 2, 3, 6, 7, 10, 11, 14 and 15 are subtracted modulo 256 ("byte subtraction") from the corresponding ciphertext bytes. The result of this "subtraction" is the same 16-byte block as was produced from the r rounds of encryption before the output transformation was applied. This block then passes through the r rounds of decryption, round 1 of which undoes round r of encryption, round 2 of which undoes round r -1 of encryption, ... , and round r of which undoes round 1 of encryption to produce the original plaintext block. Note that the round keys for decryption are the same as those for encryption but are used in reverse order.

The details of one round of encryption with SAFER+ are shown in figure 2.12. The first

operation within round i, ri ≤≤1 , is the "addition" of the round subkey K2i-1 to the 16-byte round input in the manner that bytes 1, 4, 5, 8, 9, 12, 13, and 16 are added together bit-by-bit modulo two while bytes 2, 3, 6, 7, 10, 11, 14 and 15 are added together modulo 256. The 16-byte result of this "addition" is then processed by a nonlinear layer in the manner that the value x of byte j is converted to 45x mod 257 for bytes j = 1, 4, 5, 8,9, 12, 13, and 16 (with the convention that when x = 128 then 45128 mod 257 = 256 is represented by 0), while the value x of byte j is converted to log45 (x) for bytes j = 2, 3, 6, 7, 10, 11, 14 and 15 (with the convention that when x = 0 then the output log45 (0) is represented by 128). The round key K2i is then "added" to the output of the nonlinear layer in the manner that bytes 2, 3, 6, 7, 10, 11, 14 and 15 are added together bit-by-bit modulo two, while bytes 1, 4, 5, 8, 9, 12, 13, and 16 are added together modulo 256. The 16-byte result of this "addition"

],,,,,,,,,,,,,,,[ 16151413121110987654321 xxxxxxxxxxxxxxxxx =

is then postmultiplied by the matrix M modulo 256 to give the 16-byte round output

],,,,,,,,,,,,,,,[ 16151413121110987654321 yyyyyyyyyyyyyyyyy =

in the manner

y=xM

where M is the 16 x 16 matrix

241212221111114824241244112211816481212112211112481612241144112224124811111122241212816112211442424122422111111481224244411221181612112211124824121111442212816242411111111481412122222111181624241244111148241212221111228162424124411221124121248111144112424128161122

Page 46: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

35 ICL

For instance, this operation gives

161514131211109876543212 48222242 xxxxxxxxxxxxxxxxy +++++++++++++++=

(where the arithmetic is modulo 256, i.e., normal byte arithmetic) as follows from the second column of the matrix M.

Figure 2.11: Encrypting and Decrypting structures of SAFER+ algorithm

Figure 2.12: The structure of SAFER+ algorithm encryption round i

The upper part of the encryption round made of addition, xoring, exponential and logarithmic

operations aim to provide for “confusion” required in any encryption algorithm whereas the matrix M aim to provide for diffusion. See figure 2.13.

Page 47: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

36 ICL

In the method of diffusion, the statistical structure of the plaintext is hidden by spreading out the influence of a single bit in the plaintext over a large number of bits in the ciphertext. In the method of confusion, the data transformations are designed to complicate the determination of the way in which the statistics of the ciphertext depend on the statistics of the plaintext.

Figure 2.13: Diffusion and Confusion in SAFER+ encryption

The details of one round of decryption with SAFER+ are shown in figure 2.14. The operations

in the decryption round simply invert in reverse order the operations from the encryption round. Thus, the first operation in the decryption round is to postmultiply the 16-byte round input

],,,,,,,,,,,,,,,[ 16151413121110987654321 yyyyyyyyyyyyyyyyy =

by the matrix M-1, which is the modulo 256 inverse of M, to give the 16-byte result

],,,,,,,,,,,,,,,[ 16151413121110987654321 xxxxxxxxxxxxxxxxx =

in the manner

x=yM-1

The round subkey K2r-2i+2 is then "subtracted" from x in the manner that the round subkey bytes 1,4, 5, 8, 9,12,13, and 16 are subtracted modulo 256 from the corresponding bytes of x while round subkey bytes 2, 3, 6, 7,10,11,14 and 15 are added bit-by-bit modulo 2 to the corresponding bytes of x. The 16-byte result of this "subtraction" is then processed nonlinearly in the manner that the value x of byte j is converted to log45 (x) for bytes j=1,4,5,8, 9, 12, 13, and 16 (again with the convention that when x = 0 then log45(0) is represented by 128), while the value x of byte j is converted to 45x mod 257 for bytes j = 2, 3, 6, 7, 10, 11, 14 and 15 (again with the convention that when x =128 then 45128 mod 257 = 256 is represented by 0). The round subkey K2r-2i+1 is then

SAFER+

Confusion

xor – exp – add or

add – log – xor

Diffusion

16 X 16 Matrix M

Page 48: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

37 ICL

"subtracted" from the 16-byte result in the manner that the round subkey bytes 1, 4, 5, 8, 9, 12, 13, and 16 are added bit-by-bit modulo 2 to the corresponding input bytes while round subkey bytes 2, 3, 6, 7, 10, 11, 14 and 15 are subtracted modulo 256 from the corresponding input bytes to produce the 16-byte round output.

Figure 2.14: The structure for SAFER+ algorithm decryption round i

The 2r+1 16-byte SAFER+ round subkeys required for the r rounds and for the output

transformation of encryption (which are the same as those required for the input transformation and the r rounds of decryption) are produced from the input key according to a key schedule that depends on the key length selected.

The key schedules of SAFER+ make use of 16-byte bias words to "randomize" the round

subkeys produced. The required number of bias words is the same as the number 2r + 1 of round subkeys, i.e., this number is 17, 25 or 33 depending on whether the user-selected-key length is 128 bits, 192 bits or 256 bits, respectively. The first bias word, however, is a "dummy" word that, is never used but is convenient to have defined for programming purposes.

Let Bi denote the i-th bias word and let Bi,j denote the j-th byte of this i-th bias word. For bias

words B2,B3,…B17 , which are used in all the key schedules and are the only bias words needed for a 128-bit user-selected key, the bias bytes are computed in the following manner:

257mod45 )257mod45(,

17 ji

jiB+

=

(where Bi,j is represented as 0 in case this expression gives a value of 256 and where this expression applies for i = 2,3,,.., 17 and j=1,2,..., 16). The bias words B18 , B19 , .., B33 , of which only the first eight are needed for a 192-bit user-selected key but all sixteen of which are needed for a 256-bit user-selected key, are computed in the following manner:

Page 49: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

38 ICL

257mod4517,

jijiB +=

(where Bi,j is represented as 0 in case this expression gives a value of 256 and where this expression applies for i == 18,19,..., 33 and j = 1,2,,.,, 16). Computing the bias words from the above given formulas gives all the bias words from B2 till B33 as follows:

51241131100145146186244251171662151191342206225517714124321121615919222785594781156725324711419118727522418910716838235148910382075618416413531201742169190147226451407231494916115869910588179221102225520035115231245347517323811183124772539725229

17561127197170118941625514410612823722812511710448121214797621339182061612615711211171

6240914213812337195902801161446419830172198210176101185204193101436670230205233

1508921922109248154249194502017393122254137236188671103121202171992502341082089624217115216978361553225257224222142261222318284246741331804160232289282196130608713916320211315112920293214020315320913643178181442395124113110014514618624425117166215119134220172409020117955197101618316318617715170136241863612671271341891121423059115223182131237172401768314724211638181157109124243452441741072191672046313974126037229847769531567875105722031420016491234132718024

210801452177698158232185166249260331125021522325596118201409485922881996632252562122582159122108473919622612916920714119251114467914421961172061351942391781731254489130175168224152051611862482092820858

2023511121857247162112721449231222128422114368652981230642325125325505218497421489915318812319019334187921132133114687931601041551502122351916773542331061372161951386152201021102461542481388149103198170171236

100172409020117955197101618316318617715170

The key schedule for the 128 bit (or 16 byte) input key is diagrammed in figure 2.15. The necessary 17 round subkeys for the 8 rounds and the output transformation of encryption are produced in the following manner. The user-selected key itself is used as the first round subkey K1 and is also

Page 50: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

39 ICL

loaded into the first 16 byte positions of a 17-byte key register. The last byte position of this register is loaded with the bit-by-bit modulo-two sum of the 16 bytes of the user-selected key. Each byte of the key register is then rotated leftwards by 3 bit positions. The second round subkey K2 is then computed as the modulo 256 sum of the bytes in the 16-byte bias word B2 with the bytes in byte positions 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 and 17, respectively, of the key register. Each byte of the key register is then again rotated leftwards by 3 bit positions. The third round subkey K3 is then computed as the modulo 256 sum of the bytes in the 16-byte bias word B3 with the bytes in byte positions 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,17 and 1, respectively, of the key register. This processes continues with leftwards rotation by 3 bit positions of the key register followed by addition of the appropriate bias word to the sixteen bytes of the key register located one byte position rightwards (with position 1 understood to be to the right of position 17) of those previously used until the seventeenth round subkey K17 has been produced as the modulo 256 sum of the bytes in the 16-byte bias word B17 with the bytes in byte positions 17, I, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 12, 13, 14, and 15, respectively, of the key register.

Figure 2.15: SAFER+ key schedule for 128 bit key

The key schedule for the 192 bit (or 24 byte) input key or the 256 (or 32 byte) input key is

done in the same way but more subkeys are produced.

Page 51: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

40 ICL

2.4.3. THE USE OF SAFER+ ALGORITHM IN BLUETOOTH SECURITY:

SAFER+ has some performance problems such as:

The poor diffusion of key material through the cipher when using SAFER+ with 256-bit keys.

Vulnerable to the meet-in-the-middle attack and related-key attack.

Very slow across platforms.

That’s why it didn’t pass the AES first round. Yet it has some advantages as:

Good security margin .

Well-suited to smart cards due to low RAM and ROM requirements.

Supports on-the-fly subkey generation with subkeys computable in any order.

Bluetooth uses it only on key generation and authentication, so such problems are very trivial. And notice SAFER+ has very low requirement to RAM and ROM volume, it do be a good choice.

The Bluetooth specifications adopts the use of the SAFER+ algorithm in its Ar function and a slightly modified version of it in the Ar’ function. To meet the specifications SAFER+ was used in the following form:

Encryption rounds only.

128 bit user-selected-key.

8 encryption rounds.

The M matrix formed by 4 rows of PHT blocks and permutations. Where PHT is Pseudo Hadamard Transform.

]256mod)(,256mod)2[(),( yxyxyxPHT ++=

A slightly modified version referred to as Ar’ is used in which the input of round 1 is added to the input of the 3rd round. This is done to make the modified version non-invertible and prevents the use of Ar’ (especially in E2x) as an encryption function. See figure 2.16.

Page 52: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

41 ICL

Figure 2.16: The round of the functions Ar and Ar’

Page 53: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

42 ICL

2.5. ENCRYPTION:

The establishment of a global or even personal electronic business environment is one of the most exciting challenges facing the telecommunications industry today. Confidence in that environment, or the lack of it, will be a crucial factor deciding its success or failure. Possibly, the most important techniques available to us to generate that confidence fall within the purview of cryptography. Cryptography is the science of using mathematics to encrypt and decrypt data.

2.5.1. INTRODUCTION TO CRYPTOGRAPHY Cryptography, to most people, is concerned with keeping communications private. Indeed, the

protection of sensitive communications has been the emphasis of cryptography throughout much of its history. However, this is only one part of today’s cryptography. Encryption is the transformation of data into some unreadable form. Its purpose is to ensure privacy by keeping the information hidden from anyone for whom it is not intended, even those who can see the encrypted data. Decryption is the reverse of encryption; it is the transformation of encrypted data back into some intelligible form. Encryption and decryption require the use of some secret information, usually referred to as a key. Depending on the encryption mechanism used, the same key might be used for both encryption and decryption, while for other mechanisms, the keys used for encryption and decryption might be different.

Cryptography has a long and fascinating history. The most complete non-technical account of

the subject is Kahn’s The Codebreakers. This book traces cryptography from its initial and limited use by the Egyptians some 4000 years ago, to the twentieth century where it played a crucial role in the outcome of both world wars. The predominant practitioners of the art were those associated with the military, the diplomatic service and government in general. Cryptography was used as a tool to protect national secrets and strategies. The proliferation of computers and communications systems in the 1960s brought with it a demand from the private sector for means to protect information in digital form and to provide security services. Beginning with the work of Feistel at IBM in the early 1970s and culminating in 1977 with the adoption as a U.S. Federal Information Processing Standard for encrypting unclassified information, DES, the Data Encryption Standard, was the most well known cryptographic mechanism in history and was the standard means for securing electronic commerce for many financial institutions around the world. The AES has replaced it now.

While modern cryptography is growing increasingly diverse, cryptography is fundamentally

based on problems that are difficult to solve. A problem may be difficult because its solution requires some secret knowledge, such as decrypting an encrypted message or signing some digital document, or the problem may be hard because it is intrinsically difficult to complete. So as the field of cryptography has advanced, the dividing lines for what is and what is not cryptography have become blurred. A cryptanalyst attempts to compromise cryptographic mechanisms, and cryptology (from the Greek kryptos logos, meaning “hidden word”) is the discipline of cryptography and cryptanalysis combined.

CRYPTOGRAPHIC GOALS: 1. Confidentiality is a service used to keep the content of information from all but those

authorized to have it. Secrecy is a term synonymous with confidentiality and privacy. There are numerous approaches to providing confidentiality, ranging from physical protection to mathematical algorithms, which render data unintelligible.

2. Data integrity is a service, which addresses the unauthorized alteration of data. To assure data integrity, one must have the ability to detect data manipulation by unauthorized parties. Data manipulation includes such things as insertion, deletion, and substitution.

Page 54: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

43 ICL

3. Authentication is a service related to identification. This function applies to both entities and information itself. Two parties entering into a communication should identify each other. Information delivered over a channel should be authenticated as to origin, date of origin, data content, time sent, etc. For these reasons this aspect of cryptography is usually subdivided into two major classes: entity authentication and data origin authentication. Data origin authentication implicitly provides data integrity (for if a message is modified, the source has changed).

4. Non-repudiation is a service, which prevents an entity from denying previous commitments or actions. When disputes arise due to an entity denying that certain actions were taken, a means to resolve the situation is necessary. For example, one entity may authorize the purchase of property by another entity and later deny such authorization was granted. A procedure involving a trusted third party is needed to resolve the dispute.

A fundamental goal of cryptography is to adequately address these four areas in both theory and practice. Cryptography is about the prevention and detection of cheating and other malicious activities.

PUBLIC KEY/PRIVATE KEY: The most striking development in the history of cryptography came in 1976 when Diffie and

Hellman published New Directions in Cryptography. Whitfield Diffie and Martin Hellman introduced the concept of public key/private key systems in 1976. In their system, each person gets a pair of keys, one called the public key and the other called the private key. The public key does not have to be kept secret and can be widely published, whereas the private key is kept secret. The need for the sender and receiver to share secret information is therefore eliminated (since users only exchange their public key, and the private key is never transmitted or shared). The only requirement is that public keys be associated with their users in a trusted and authentic manner, for example in a third-party database. Anyone can send a confidential message by just using public information, but the message can only be decrypted with a private key, which is in the sole possession of the intended recipient. Furthermore, public-key cryptography can be used not only for privacy (encryption), but also for authentication (digital signatures) and other various techniques.

Because the private key is always linked mathematically to the public key, it is theoretically

possible to derive the private key from the public key. In the symmetric key arrangement, this could not happen. To defend against private key detection, the people who design key systems make the problem of deriving the private key from the public key as difficult as possible. Some public key/private key systems are designed such that deriving the private key from the public key requires the attacker to factor a very large prime number. Since it is very difficult and time consuming to factor large primes, most public key/private key ciphers use this technique. However, this technique results in public keys that must be very large compared to symmetric keys in order to ensure equal levels of security. Due to their size, public keys slow down the algorithm, making the public key/private key system a less attractive means of exchanging large amounts of data. Nevertheless, it is a good way to exchange small amounts, as would be the case in the exchange of symmetric keys.

Figure 2.17a: Conventional (Secret Key) encryption Figure 2.17b: Public key encryption

Page 55: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

44 ICL

DISCUSSION OF KEY SCHEMES: It is difficult to evaluate one key system against the other, since each system has its merits and

shortcomings. Generally though, it is thought that symmetric key systems are faster and more secure, especially if the ‘ man-in-the-middle attack’ can be eliminated—as it could be if symmetric keys were exchanged using a public key/private key system. In some situations, secret-key cryptography alone is sufficient. These include environments where secure distribution of the secret key can take place, for example, by users meeting in private. It also includes environments where a single authority knows and manages all the keys (e.g., a closed banking system). The public key/private key system is certainly unnecessary in a single-user environment. For example, if an individual wants simply to encrypt personal files, a single password-turned symmetric key will suffice. Together, the public key/private key and symmetric key systems and the algorithms they work with enable both individuals and organizations to ensure the secure storage and transfer of data. The algorithms that these key systems work with are integral to the production of the keys themselves and it is the combination of key length and algorithm that determines the strength of a given encryption method. These algorithms fall under two categories.

CATEGORIES OF CIPHER: BLOCK AND STREAM Today, all ciphers are either block ciphers or stream ciphers. The way in which block and

stream ciphers differ from each other is described below. 1. Block Ciphers: A block cipher is a type of symmetric-key encryption algorithm that

transforms a fixed-length block of plain text into a block of cipher text of the same length. This transformation takes place under the action of the symmetric key. Reversing the transformation to the cipher text block using the same key produces decrypted plain text again. The fixed length is called the block size. Currently, the block size is 64 bits, but in coming years the less common 128-bit block size will become more prevalent as processors become more powerful. Because different plain text blocks are mapped to different cipher text blocks (to allow unique decryption), a block cipher effectively provides a permutation of the set of all possible messages. The permutation effected during encryption is completely secret, since it is dependent on the key. Iterated block ciphers encrypt a plain text block by a process that has several ‘ rounds’ where the data goes through the algorithm a few times instead of just once. The number of rounds in an iterated block cipher depends on the desired security level and the consequent tradeoffs in performance. In most cases, an increased number of rounds will improve the security offered by a block cipher, but will reduce the speed at which encryption takes place. Examples involve BLOWFISH and CAST.

2. Stream ciphers: Stream ciphers form an important class of symmetric key encryption

schemes. They are, in one sense, very simple block ciphers having block length equal to one. What makes them useful is the fact that the encryption transformation can change for each symbol of plain text being encrypted. In situations where transmission errors are highly probable, stream ciphers are advantageous because they have no error propagation. They can also be used when the data must be processed one symbol at a time (e.g., if the equipment has no memory or buffering of data is limited). Stream ciphers can be designed to be exceptionally fast, much faster than any block cipher. While block ciphers operate on large blocks of data, stream ciphers typically operate on bits. The encryption of any particular plain text with a block cipher will result in the same cipher text when the same key is used. With a stream cipher, the transformation of these smaller plain text units will vary, depending on when they are encountered during the encryption process. A stream cipher generates a sequence of bits (keystream) and encryption is provided by combining the keystream with the plaintext, usually with the bitwise XOR operation.. Encryption is accomplished by combining the key with the plain text. The generation of the key can be independent of the plain text and cipher text (synchronous stream) or it can depend on the data and its encryption (self-

Page 56: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

45 ICL

synchronizing). Most stream cipher designs are for synchronous stream ciphers. No particular stream cipher has emerged as a standard, but the most widely used stream cipher is RC4.

In Bluetooth, stream cipher is the one used.

Figure 2.18: A taxonomy of cryptographic primitives.

CRYPTANALYSIS: An expensive and formidable cryptanalytic attack could possibly be mounted by someone

with vast supercomputer resources, such as a government intelligence agency. They might crack your public key by using some new secret mathematical breakthrough. But civilian academia has been intensively attacking public key cryptography without success since 1978.

FUTURE CRYPTOGRAPHY- CHANGING STANDARDS: In 1952, the U.S. government established the National Security Agency (NSA). The NSA

provides military and government data security and gathers information about other countries’ communications. The NSA and the National Institute of Standards and Technology (NIST) play major roles in developing cryptography standards. During the 1970s, IBM, NIST and the NSA developed an algorithm called the Data Encryption Standard, DES. This cipher has been a standard since 1977, with reviews leading to renewals every few years. It is a block cipher that encrypts data in 64-bit blocks. DES is a symmetric algorithm and uses a key-length of 56 bits. The general consensus is that DES will not be strong enough for future encryption needs. NIST endorse a new standard, AES, the Advanced Encryption Standard that will replace DES. The Advanced Encryption Standard use key lengths of 128, 192, or 256 bits to encrypt 128-bit blocks, and thus provide increased security. The five Candidates at the final round were:

Page 57: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

46 ICL

MARS by Zunic et al. (IBM). RC6 by Rivest, Robshaw, Yin (RSA Laboratories) Serpent by Anderson, Biham and Knudsen (Cambridge University) Twofish Schneier et al. (Counterpane) Rijndael by Joan Daemen and Vincent Rijmen, and was declared as the winner. 2.5.2. ENCRYPTION IN BLUETOOTH:

The data on the piconet channel is conveyed in packets. The general packet Format is shown

in figure1.6 Each packet consists of 3 entities: the access code, the header, and the payload. In the figure, the number of bits per entity is indicated.

User information can be protected by encryption of the packet payload; the access code and

the packet header are never encrypted. The encryption of the payloads is carried out with a stream cipher called E0 that is re-synchronized for every payload.. Bluetooth don't use the link key to encrypt the data directly. Instead, it will generate a new encryption key each time it enter the encryption mode. Then, a stream cipher algorithm is used. The overall principle is shown in figure 2.19 and figure 2.20. The encryption key is derived by E3 as explained in section 2.2.

Figure 2.19: E3 generates the encryption key

Figure 2.20: stream ciphering with E0 The stream cipher system E0 consists of three parts. One part performs the initialization

(generation of the payload key), the second part generates the key stream bits, and the third part performs the encryption and decryption. The payload key generator is very simple - it merely combines the input bits in an appropriate order and shift them into the four LFSRs used in the key stream generator. The main part of the cipher system is the second, as it also will be used for the

Page 58: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

47 ICL

initialization. The key stream bits are generated by a method derived from the summation stream cipher generator attributable to Massey and Rueppel. The method has been thoroughly investigated, and there exist good estimates of its strength with respect to presently known methods for cryptanalysis. However, the summation generator has weaknesses that can be used in so-called correlation attacks, the most important class of attacks on LFSR-based stream ciphers. Basically, if one can in some way detect a correlation between the known output sequence and the output of one individual LFSR, this can be used in a “divide-and-conquer” attack on the individual LFSR. The high re-synchronization frequency will disrupt such attacks. Also, Bluetooth uses high weight LFSRs and this helps in defending the attacks too. (James Massey was, also, one of the inventors together with Xuejia Lai of the International Data Encryption Algorithm abbreviated IDEA in 1990. IDEA is a block cipher that operates on 64-bit plaintext blocks; the key is 128 bits long.)

Each Bluetooth transceiver is allocated a unique 48-bit Bluetooth device address

(BD_ADDR). This address is derived from the IEEE 802 standard. This 48-bit address is divided into three fields: (figure 2.21)

LAP field: lower address part consisting of 24 bits UAP field: upper address part consisting of 8 bits NAP field: non-significant address part consisting of 16 bits

The LAP and UAP form the significant part of the BD_ADDR. The total address space obtained

is 232.

Figure 2.21: the Bluetooth Device Address Every Bluetooth unit has an internal system clock, which determines the timing and hopping

of the transceiver. The Bluetooth clock is derived from a free running native clock, which is never adjusted and is never turned off. For synchronization with other units, only offsets are used that, added to the native clock, provide temporary Bluetooth clocks which are mutually synchronized. It should be noted that the Bluetooth clock has no relation to the time of day; it can therefore be initialized at any value. The Bluetooth clock provides the heart beat of the Bluetooth transceiver. Its resolution is at least half the TX or RX slot length, or 312.5 µ s. The clock has a cycle of about a day. If the clock is implemented with a counter, a 28-bit counter is required that wraps around at 2 28 -1. The LSB ticks in units of 312.5 µ s, giving a clock rate of 3.2 kHz.

The timing and the frequency hopping on the channel of a piconet is determined by the Bluetooth clock of the master. When the piconet is established, the master clock is communicated to the slaves. Each slave adds an offset to its native clock to be synchronized to the master clock. Since the clocks are freerunning, the offsets have to be updated regularly. The clock determines critical periods and triggers the events in the Bluetooth receiver. Four periods are important in the Bluetooth system: 312.5 µ s, 625 µ s, 1.25 s, and 1.28 s; these periods correspond to the timer bits CLK0 , CLK1 , CLK2 , and CLK12 , respectively, see figure 2.22. Master-to-slave transmission starts at the even-numbered slots when CLK0 and CLK1 are both zero. CLK is the master clock of the piconet. It is used for all timing and scheduling activities in the piconet. All Bluetooth devices use the CLK to schedule their transmission and reception.

Page 59: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

48 ICL

Figure 2.22 : Bluetooth clock

2.5.2.1. Encryption key size negotiation: Each Bluetooth device implementing the BaseBand specification needs a parameter defining

the maximal allowed key length maxL , 161 max ≤≤ L (number of octets in the key). For each application, a number minL is defined indicating the smallest acceptable key size for that particular application. Before generating the encryption key, the involved units must negotiate to decide what key size to actually use. The master sends a suggested value, )(M

sugL , to the slave. Initially, the

suggested value is set to )(maxML . If )()(

minM

sugS LL ≤ , and, the slave supports the suggested length, the slave

acknowledges and this value will be the length of the encryption key for this link. However, if both conditions are not fulfilled, the slave sends a new proposal, )()( M

sugS

sug LL ≤ , to the master. This value should be the largest among all supported lengths less than the previous master suggestion. Then, the master performs the corresponding test on the slave suggestion. This procedure is repeated until a key length agreement is reached, or, one unit aborts the negotiation. An abortion may be caused by lack of support for sugL and all smaller key lengths, or if minLLsug ≤ in one of the units. In case of abortion Bluetooth link encryption cannot be employed. The possibility of a failure in setting up a secure link is an unavoidable consequence of letting the application decide whether to accept or reject a suggested key size. However, this is a necessary precaution. Otherwise a fraudulent unit could enforce a weak protection on a link by claiming a small maximum key size.

2.5.2.2. Encryption modes: If a slave has a semi-permanent link key (i.e. a combination key or a unit key), it can only

accept encryption on slots individually addressed to itself (and, of course, in the reverse direction to the master). In particular, it will assume that broadcast messages are not encrypted. The possible traffic modes are described in table 2.3. When an entry in the table refers to a link key, it means that the encryption/decryption engine uses the encryption key derived from that link key.

If the slave has received a master key, there are three possible combinations as defined in table

2.4. In this case, all units in the piconet use a common link key, masterK . Since the master uses encryption keys derived from this link key for all secure traffic on the piconet, it is possible to avoid ambiguity in the participating slaves on which encryption key to use. Also in this case the default mode is that broadcast messages are not encrypted. A specific LM-command is required to activate encryption both for broadcast and for individually addressed traffic. The master can issue an LM-command to the slaves telling them to fall back to their previous semi-permanent link key. Then, regardless of the previous mode they were in, they will end up in the first row of table 2.3.; i.e. no encryption.

Page 60: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

49 ICL

Broadcast traffic Individually addressed traffic No encryption No encryption

No encryption Encryption, Semi-permanent link key

Table 2.3: Possible traffic modes for a slave using a semi-permanent link key.

Broadcast traffic Individually addressed traffic No encryption No encryption

No encryption Encryption, masterK

Encryption, masterK Encryption, masterK

Table 2.4: Possible encryption modes for a slave in possession of a master key.

Bluetooth, also, distinguishes between: Trusted Device: The device has been previously authenticated, a link key is stored and the

device is marked as "trusted" in the Device Database Untrusted Device: The device has been previously authenticated, a link key is stored but the

device is not marked as "trusted" in the Device Database. Unknown Device: No security information is available for this device. This is also an

untrusted device.

2.5.2.3. Encryption concept: For the encryption routine, a stream cipher algorithm will be used in which ciphering bits are

bit-wise modulo-2 added to the data stream to be sent over the air interface. The payload is ciphered after the CRC bits are appended, but, prior to the FEC encoding. figure 2.23.

Figure 2.23: Encryption comes in between CRC & FEC Each packet payload is ciphered separately. The cipher algorithm E0 uses the master Bluetooth

address, 26 bits of the master realtime clock ( 126−CLK ) and the encryption key Kc as input, see figure 2.24 (where it is assumed that unit A is the master). The encryption key Kc is derived from the current link key, COF, and a random number, EN_RANDA . The random number is issued by the master before entering encryption mode. Note that EN_RANDA is publicly known since it is transmitted as plain text over the air. Within the 0E algorithm, the encryption key Kc is modified into another key denoted 'Kc . The maximum effective size of this key is factory preset and may be set to any multiple of eight between one and sixteen (8-128 bits). The procedure for deriving this key will be described later. The real-time clock is incremented for each slot. The E0 algorithm is re-initialized at the start of each new packet (i.e. for Master-to-Slave as well as for Slave-to-Master transmission). By using 126−CLK at least one bit is changed between two transmissions. Thus, a new keystream is generated after each re-initialization. For packets covering more than a single slot, the Bluetooth clock as found in the first slot is being used for the entire packet. The encryption

Page 61: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

50 ICL

algorithm E0 generates a binary keystream, cipherK , which is modulo-2 added to the data to be encrypted. The cipher is symmetric; decryption is performed in exactly the same way using the same key as used for encryption.

Figure 2.24: Functional description of the encryption procedure 2.5.2.4. Encryption algorithm: The system uses linear feedback shift registers (LFSRs) whose output is combined by a

summation combiner. The output of this combiner is the key stream sequence, or, during initialization phase, the randomized initial start value. The algorithm is presented with an encryption key Kc , a 48-bit Bluetooth address and the master clock bits 126−CLK . figure 2.25 shows the setup. There are four LFSRs (LFSR1, LFSR2, LFSR3 and LFSR4) of lengths

33,31,25 321 === LLL and 394 =L , with feedback polynomials as specified in table 2.5. The total length of the registers is 128. These polynomials are all primitive. The Hamming weight of all the feedback polynomials is chosen to be five; a reasonable trade-off between reducing the number of required XOR gates in the hardware realization and obtaining good statistical properties of the generated sequences. Hamming weight: is the number of non-zero symbols in a symbol sequence. For binary signaling, Hamming weight is the number of "1" bits in the binary sequence.

i Li Feedback )(tfi

1 25 18122025 ++++ tttt

2 31 112162431 ++++ tttt

3 33 14242833 ++++ tttt

4 39 14283639 ++++ tttt

Table 2.5: The four primitive feedback polynomials.

Page 62: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

51 ICL

Figure 2.25: Concept of the encryption engine. Let i

tx denote the tht symbol of LFSRi . From the four-tuple 1tx , 2

tx , 3tx and 4

tx we derive the value ty as

ty = 1tx + 2

tx + 3tx + 4

tx

where the sum is over the integers. Thus can take the values 0,1,2,3, or 4. The output of the summation generator is now given by the following equations

where [ ].1T and [ ].2T are two different linear bijections over GF(4). Suppose GF(4) is generated by the irreducible polynomial 12 ++ xx , and let α be a zero of this polynomial in GF(4). The mappings and are now defined as

)4()4(:1 GFGFT → xx →|

)4()4(:2 GFGFT → xx )1(| +→ α

Page 63: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

52 ICL

We can write the elements of GF(4) as binary vectors. This is summarized in table 2.6:

x [ ]xT1 [ ]xT2 00 00 00 01 01 11 10 10 01 11 11 10

Table 2.6: The mappings of T1 and T2

Since the mappings are linear, we can realize them using XOR gates; i.e.

1T : ),(),( 0101 xxxx →

2T : ),(),( 01001 xxxxx ⊕→ The operation of the cipher: Figure 2.26 gives an overview of the operation in time. The encryption algorithm shall run

through the initialization phase before the start of transmitting or receiving a new packet. Thus, for multi-slot packets the cipher is initialized using the clock value of the first slot in the multi-slot sequence.

Figure 2.26: Overview of the operation of the encryption engine. Between each start of a packet (TX or RX), the LFSRs are re-initialized.

2.5.2.5 LFSR initialization: The key stream generator needs to be loaded with an initial value for the four LFSRs (in total

128 bits) and the 4 bits that specify the values of 0c and 1c . The 132 bit initial value is derived from four inputs by using the key stream generator itself. The input parameters are the key, a 128-bit random number RAND, a 48-bit Bluetooth address, and the 26 master clock bits 126−CLK . The effective length of the encryption key can vary between 8 and 128 bits. Note that the actual key length as obtained from E3 is 128 bits. Then, within E0, the key length is reduced by a modulo operation between Kc and a polynomial of desired degree. After reduction, the result is encoded with a block code in order to distribute the starting states more uniformly.

When the encryption key has been created the LFSRs are loaded with their initial values. Then, 200 stream cipher bits are created by operating the generator. Of these bits, the last 128 are fed back into the key stream generator as an initial value of the four LFSRs. The values of

tc and 1−tc are kept. From this point on, when clocked the generator produces the encryption (decryption) sequence which is bitwise XORed to the transmitted (received) payload data.

Page 64: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

53 ICL

In the following, we will denote octet i of a binary sequence X by the notation X [i]. We define bit 0 of to be the LSB. Then, the LSB of X [i] corresponds to bit 8i of the sequence, the MSB is 8i+7 bit of X. For instance, bit 24 of the Bluetooth address is the LSB of ADR[3].

The details of the initialization are as follows:

1. Create the encryption key to use from the 128-bit secret key Kc and the 128 publicly known EN_RAND. Let L , 161 ≤≤ L , be the effective key length in number of octets. The resulting encryption key 'Kc will be:

2. ))(mod)(()()(' )(1

)(2 xgxKcxgxKc LL ∗=

3. where deg ( )()(1 xg L ) = 8L and deg ( ≤))()(

2 xg L 128 - 8L. The polynomials are defined in table 2.7.

4. Shift in the 3 inputs 'Kc , the Bluetooth address, the clock, and the six-bit constant 111001 into the LFSRs. In total 208 bits are shifted in.

a. Open all feedback switches b. Arrange inputs bits as shown in figure 2.27. Set the content of all shift register

elements to zero. Set t=0 c. Start shifting bits into the LFSRs. The rightmost bit at each level of figure 2.27;

is the first bit to enter the corresponding LFSR. d. When the first input bit at level i reaches the rightmost position of LFSRi, close

the switch of this LFSR. e. At t=39 (when the switch of LFSR 4 is closed), reset both blend registers

013939 == −cc ; Up to this point, the content of tc and 1−tc has been of no concern. However, from this moment forward their content will be used in computing the output sequence.

f. From now on output symbols are generated. The remaining input bits are continuously shifted into their corresponding shift register. When the last bit has been shifted in, the shift register is clocked with input = 0;

Note: When finished, LFSR1 has effectively clocked 30 times with feedback closed, LFSR2

has clocked 24 times, LFSR 3 has clocked 22 times, and LFSR 4 has effectively clocked 16 times with feedback closed.

1. To mix initial data, continue to clock until 200 symbols have been produced with

all switches closed (t=239); 2. Keep blend registers tc and 1−tc , make a parallel load of the last 128 generated

bits into the LFSRs according to figure 2.28; at (t=240). After the parallel load, the blend register contents will be updated for each subsequent clock.

In figure 2.27, all bits are shifted into the LFSRs, starting with the least significant bit (LSB). For instance, from the third octet of the address, ADR[2], first ADR16 is entered, followed by ADR17 , etc. Furthermore, CL0 corresponds to CLK1, …, CL 25 corresponds to CLK26 .

Note that the output symbols 1

tx , 2tx , 3

tx and 4tx are taken from the positions 24, 24, 32, and 32

for LFSR1, LFSR2, LFSR3, and LFSR4, respectively (counting the leftmost position as number 1). In figure 2.28, the 128 binary output symbols Z0,..., Z127 are arranged in octets denoted

Z[0],..., Z[15]. The LSB of Z[0] corresponds to the first of these symbols, the MSB of Z[15] is the latest output from the generator. These bits shall be loaded into the LFSRs according to the figure. It is a parallel load and no update of the blend registers is done. The first output symbol is generated at the same time. The octets are written into the registers with the LSB in the leftmost position (i.e. the opposite of before). For example, Z 24 is loaded into position 1 of LFSR4.

Page 65: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

54 ICL

Table 2.7: Polynomials used when creating K’c. All polynomials are in hexadecimal notation. The LSB is in the rightmost position.

Figure 2.27: Arranging the input to the LFSRs.

Page 66: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

55 ICL

Figure 2.28 Distribution of the 128 last generated output symbols within the LFSRs. 2.5.2.6 Key stream sequence: When the initialization is finished, the output from the summation combiner is used for

encryption/decryption. The first bit to use is the one produced at the parallel load, i.e. at t=240. The circuit is run for the entire length of the current payload. Then, before the reverse direction is started, the entire initialization process is repeated with updated values on the input parameters

Sample data of the encryption output sequence can be found in “Appendix IV” of the Bluetooth specification, Encryption Sample Data. A necessary condition for all Bluetooth-compliant implementations is to produce these encryption streams for identical initialization values.

2.5.3 ENCRYPTION IN BLUETOOTH FROM THE LMP POINT OF VIEW: If at least one authentication has been performed encryption may be used. If the master wants

all slaves in the piconet to use the same encryption parameters it must issue a temporary (Kmaster ) and make this key the current link key for all slaves in the piconet before encryption is started, This is necessary if broadcast packets should be encrypted.

First of all the master and the slave must agree upon whether to use encryption or not and if

encryption shall only apply to point-to-point packets or if encryption shall apply to both point-to-point packets and broadcast packets. If master and slave agree on the encryption mode, the master continues to give more detailed information about the encryption.

The next step is to determine the size of the encryption key. The master sends

LMP_encryption_key_size_req including the suggested key size MsugL , , which is initially equal to MLmax, .If MsugS LL ,min, ≤ and the slave supports

MsugL , it responds with LMP_accepted and MsugL , will be used as the key size. If both conditions are not fulfilled the slave sends back LMP_encryption_key_size_req including the slave’s suggested key size SsugL , . This value is the slave’s largest supported key size that is less than MsugL , . Then the master performs the corresponding test on the slave’s suggestion. This procedure is repeated until a

Page 67: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

56 ICL

key size agreement is reached or it becomes clear that no such agreement can be reached. If an agreement is reached a unit sends LMP_accepted and the key size in the last

LMP_encryption_key_size_req will be used. After this, the encryption is started. If an

agreement is not reached a unit sends LMP_not_accepted with the reason code Unsupported parameter value and the units are not allowed to communicate using Bluetooth link encryption.".

Finally, encryption is started. The master issues the random number EN_RAND and

calculates the encryption key as Kc = E3 (current link key, EN_RAND, COF). The random number must be the same for all slaves if the piconet should support encrypted broadcast. Then the master sends LMP_start_encryption_req, which includes EN_RAND. The slave calculates Kc when this message is received and acknowledges with LMP_accepted. On both sides, Kc and EN_RAND are used as input to the encryption algorithm E0.

Before starting encryption, higher-layer data traffic must be temporarily stopped to prevent

reception of corrupt data. The start of encryption will be done in three steps: 1. Master is configured to transmit unencrypted packets, but to receive encrypted

packets. 2. Slave is configured to transmit and receive encrypted packets. 3. Master is configured to transmit and receive encrypted packets.

Between step 1 and step 2, master-to-slave transmission is possible. This is when LMP_start_encryption_req is transmitted. Step 2 is triggered when the slave receives this message. Between step 2 and step 3, slave-to-master transmission is possible. This is when LMP_accepted is transmitted. Step 3 is triggered when the master receives this message.

Before stopping encryption, higher-layer data traffic must be temporarily stopped to prevent

reception of corrupt data. Stopping of encryption is then done in three steps, similar to the procedure for starting encryption.

1.Master is configured to transmit encrypted packets, but to receive unencrypted packets. 2.Slave is configured to transmit and receive unencrypted packets. 3.Master is configured to transmit and receive unencrypted packets.

Between step 1 and step 2 master to slave transmission is possible. This is when

LMP_stop_encryption_req is transmitted. Step 2 is triggered when the slave receives this message. Between step 2 and step 3 slave to master trans-mission is possible. This is when LMP_accepted is transmitted. Step 3 is triggered when the master receives this message.

If the encryption mode, encryption key or encryption random number need to be changed,

encryption must first be stopped and then re-started with the new parameters.

Page 68: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

57 ICL

2.6. BLUETOOTH RANDOM NUMBER GENERATOR: Each Bluetooth unit has a random number generator. Random numbers are used for many

purposes within the security functions-for instance, for the challenge-response scheme, for generating authentication and encryption keys, etc. Ideally, a true random generator based on some physical process with inherent randomness is used. Examples of such processes are thermal noise from a semiconductor or resistor and the frequency instability of a free running oscillator. For practical reasons, a software-based solution with a pseudo-random generator is probably preferable. In general, it is quite difficult to classify the randomness of a pseudo-random sequence. Within Bluetooth, the requirements placed on the random numbers used are that they be non-repeating and randomly generated.

The expression ‘non-repeating’ means that it shall be highly unlikely that the value should

repeat itself within the lifetime of the authentication key. For example, a non-repeating value could be the output of a counter that is unlikely to repeat during the lifetime of the authentication key, or a date/time stamp. The expression ‘randomly generated’ means that it shall not be possible to predict its value with a chance that is significantly larger than 0 (e.g., greater than L2/1 for a key length of L bits).

Clearly, the LM can use such a generator for various purposes; i.e. whenever a random

number is needed (such as the RANDs, the unit keys, Kinit , Kmaster , and random back-off or waiting intervals).

We have found that we need 7 different RAND numbers. They are:

1. IN_RAND to be used in the generation of Kinit with E22 algorithm. 2. AU_RAND to be used during authentication in E1. 3. LK_RAND for the generation of the combination key with E21 algorithm. 4. EN_RAND for the generation of the encryption key with E3. 5. RAND1 to be used in the generation of the master key. 6. RAND2 to be used in the generation of the master key. 7. RAND to be transmitted during the generation of the master key.

However, we didn't find any specification about such a great random number generator. No

design principle, how to get seed and so on. We think Bluetooth just want to increase its security to some extent by keeping it secret.

Page 69: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 2 Bluetooth Security

58 ICL

OPUFT!

Page 70: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

ICL

T

INTRODUCTION TO VHDL AND DIGITAL DESIGN

VHDL…Easy to Learn Hard to Master

he technology of integrated circuits is rapidly changing; possibilities are growing at an incredible rate. Due to the increase in functionality and dropping of prices the number of

application areas is increasing rapidly. More and more powerful circuits can be constructed because the number of transistors that can be created on a chip is still growing. According to the well-known Moore’s law the maximum number of transistors that can be created on a memory chip will double every 18 to 24 months. It is expected that this grow rate will at least continue until 2010. By that time we can expect memory chips with up to 64 billion bits of storage capacity and processors with up to 90 million transistors per cm2 with chip sizes of 620 mm2. Designing such large and complex microprocessors requires a tremendous amount of effort. In 1996 Intel invested 1.8 billion dollars in Research and Development.

Obviously, such large designs cannot be made without the help of automation. Special

computer programs are used to accomplish the tedious tasks. But these programs should become increasingly powerful. Motorola’s M68000 processor contains 68000 transistors and the Intel 8088 contains 29000 transistors. In 1979, the year in which both these processors were marketed, designing a processor containing 5.5 million transistors must have seemed impossible. Currently designing a circuit with 5.5 million transistors is a tremendous task, but what about a circuit with 20 million transistors? Will it take double the amount of time to design it? Perhaps even more? If a design becomes twice as large, employing twice as many engineers does not solve the problem. When more designers are working on the same design the time needed for meetings and discussions increases rapidly. The solution lies in automating the design task. Powerful computer programs are required to help designers.

“Those who can not remember the past are condemned to repeat it.” From: “The Life of Reason”, by George Santayana, 1906

Technology is changing rapidly. It took 21 years to get to a 1Ghz processor. It will take 1 year to get to a 2Ghz processor.

Page 71: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 3 Introduction to VHDL and Digital Design

60 ICL

3.1. WHAT IS VHDL? The acronym VHDL stands for the VHSIC Hardware Description Language. The acronym

VHSIC, in turn, refers to the Very High Speed Integrated Circuit program. This program was sponsored by the Department of Defense (DoD) with the goals of developing a new generation of high-speed integrated circuits. During the course of this program the increasing complexity of digital systems that were made possible by continuous advances in semiconductor and packaging technologies were found to have a fundamental impact on the economics of the design of military and space electronic systems. When the life cycle cost of these systems were examined the cost of maintenance was becoming significant. Furthermore it was becoming increasingly difficult to share designs of subsystems across contractors. The need for a standardized representation of digital systems became apparent. A team of DoD contractors was awarded the contract to develop the language and the first version was released in 1985. The language was subsequently transferred to the IEEE for standardization after which representatives from industry, government, and academe were involved in its further development. Subsequently the language was ratified in 1987 and became the IEEE 1076–1987 standard. The language was reballoted after five years and with the addition of new features forms the 1076–1993 version of the language. Work is still continuing as user demands are increasing. Many working groups exist which focus on different aspects like synthesis, object oriented extensions, analog extensions, math packages etc.

Ever since VHDL became an IEEE standard it has enjoyed steadily increasing adoption

throughout the electronic systems EDA community. The DoD requires that VHDL descriptions be delivered for all application-specific integrated circuits (ASICs). The interoperability between models developed using VHDL environments from different EDA or CAD vendors has been improved by the establishment of the IEEE 1164 standard package. Synthesis support has been similarly supported through the establishment of a version of the IEEE 1164 package for synthesis. Practically every major EDA vendor supports VHDL. This status of VHDL as an industry standard provides a number of practical benefits including model interoperability among vendors, third party vendor support, and design re-use. Conventional procedural programming languages such as C or Pascal typically describe procedures for computing a mathematical function or manipulating data, for example, matrix multiplication or sorting respectively. A program is a recipe or a sequence of steps for how to perform a computation or manipulate data. The execution of the program results in the computation or rearrangement of data values. On the other hand VHDL is a language for describing digital systems. Such descriptions can be used by a simulator for simulating the behavior of the system without having to actually construct the system. Alternatively synthesis compilers can utilize such a description for creating descriptions of the digital hardware for implementing the system. Although VHDL has been investigated for its use in describing and simulating analog systems, the language is predominantly used in the design of digital electronic systems.

Looking at a VHDL description in a top down way the following hierarchy constructs can be

found. At the top level a design consists of entities. The entity describes the interface (inputs and outputs) of a design. With each entity one or more architectures can be associated. The architecture describes the functionality of the system, i.e. what it should do. An architecture can also be used to describe how a certain behavior should be realized, if it is described at a sufficiently low abstraction level. VHDL is event driven, to allow for efficient simulation of hardware. Computations are only performed when some data has changed. Another important difference is the fact that VHDL contains the following statements, which can be executed concurrently:

• block statement • component instantiation statement • generate statement • process statement

Page 72: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 3 Introduction to VHDL and Digital Design

61 ICL

• concurrent procedure call • concurrent assertion statement • concurrent signal assignment statement The most important of the concurrent statements is the process statement, since all other

concurrent statements can be rewritten as processes. A process contains sequential code, similar to Pascal. To allow for concurrent execution of processes a special execution mechanism is required. It must be possible to synchronize the processes and to exchange information between them. But above all, the results must not depend on the order in which the concurrent statements are executed.

In VHDL signals exist to exchange information between processes. A signal can be

considered as a special variable used for communication between concurrent statements. Variables in VHDL are local to a process, and hence are not visible to other concurrent statements.

The synchronization is performed by wait-statements. When during execution a wait-

statement is encountered, execution suspends. The wait-statement allows execution to resume when certain conditions have been satisfied. In most cases wait-statements check for events on certain signals. To guarantee that results of a computation do not depend on the order in which the concurrent statements are executed VHDL has a special mechanism. When a new value is assigned to a signal, this new value does not take effect immediately. It is stored temporarily. As a result all processes see the same value for a signal. Once all processes have suspended, the new values are assigned to the signals. When a new value is assigned to a signal this may cause an event. When a certain process was suspended in a wait-statement that was waiting for an event on that signal, it resumes execution. This again can cause new values to be assigned to signals. When no processes can restart, the computed results are stable and can be used. As a whole, VHDL is a rather complicated language for tool designers. 3.2. DIGITAL SYSTEM DESIGN:

One can view a design of an integrated circuit at many levels of abstraction. The lower the

abstraction level, the more detail. When moving to higher levels of abstraction, details disappear and specifications become shorter. Since one can only cope with a certain amount of information, abstraction is important when making large designs. The larger the design the higher the abstraction level which is required to make designing a feasible task. The advances in technology over the past years allowed for increasingly complex designs. As a consequence designing has to be performed at higher abstraction levels. Lower levels have to be automated or at least tools should be available to help performing the tedious tasks. The major levels of abstraction are the Transistor level, the Gate level, the Register Transfer Level and the Behavioral level. Currently many designs are performed at the Register Transfer Level (often called RTL). To enter an RTL design, mostly hardware description languages are used (VHDL)

The design of digital systems is a process that starts from the specification of requirements

and proceeds to produce a functional design that is eventually refined through a sequence of steps to a physical implementation. Both simulation and synthesis are complementary activities employed in the design process although the specific relationship is a function of the target implementation. Consider the design of an application specific integrated circuit (ASIC) for processing digital “Security” functions. This is a custom chip designed for a specific task unlike a microprocessor that may be programmed for a variety of tasks. Custom ASICs are generally the highest performing solution for any computation but often the most expensive. An example of the sequence of activities that typically takes place during ASIC design is shown in figure 3.1.

Page 73: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 3 Introduction to VHDL and Digital Design

62 ICL

The first step is the specification of the requirements. Such a specification will typically include the keys, their lengths, algorithms to drive each and the operations to be performed on them, as well as interface requirements, cost constraints, and other physical requirements such as system size and power dissipation. From these functional requirements, a preliminary high-level functional design can be generated. Simulation is often used at this level to converge to a functional design that can meet the performance requirements. This initial functional design is now refined to produce a more detailed design description at the level of registers, memories, arithmetic units, and state machines. This is the register transfer level (RTL) of the design. Subsequent refinement of this RTL description produces a logic design that implements each of the RTL components. Both RTL and logic level simulation may be used to ensure that the design meets the original specification. Fault simulation can model the effects of expected manufacturing defects as well as faults that may be induced due to the environment. For example, if this “Security” chip is to be flown in a satellite, radiation effects in space can cause devices to change state and lead to single bit errors. If the error rate in the intended orbit is relatively high the design can be modified to accommodate such bit errors using techniques tuned to the particular physical phenomena. Finally, the logic level implementation is transformed into a circuit level implementation and then to a physical chip layout from which accurate physical properties of the design, such as chip area and power dissipation can be evaluated. Design rule checks, circuit parameter extraction, and circuit simulation activities can be performed at this level.

At each level of this design hierarchy there are components that are used to describe the

design. At the higher or more abstract levels we have a smaller number of more powerful components such as adders and memories. At the lower and less abstract levels we have a larger number of simpler, less powerful components such as gates and transistors.

Figure 3.1: Typical activity flow in top-down digital system design Each level of the design hierarchy corresponds to a level of abstraction and has an associated

set of activities and design tools that support the activities at this level. Some of the activities at each level are shown in figure 3.1. The accuracy with which we can predict the behavior, physical properties, and performance of the circuit increases at the lower levels of the hierarchy with considerably longer simulation times. Imagine having to simulate the behavior of 100 million

Page 74: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 3 Introduction to VHDL and Digital Design

63 ICL

transistors on a chip! If design errors are discovered at these finer levels of detail changes in the design may be expensive to make, particularly if we have to move back several levels in the design process to correct these errors. This can lead to longer development times and consequently increased cost not to the mention loss of revenues by being late to market. Much of the motivation for the development of hardware description languages in general stems from the evolving economics of the marketplace for electronic systems and the methodologies used to design these systems. With the ability to simulate designs at multiple levels of abstraction, errors can be discovered and corrected early. Moreover it is important to note that throughout this hierarchy simulation is a commonly used technique. Hardware description languages such as VHDL are targeted for use throughout this design hierarchy and provide some degree if uniformity across the various levels.

A typical design flow that employs automated synthesis depends on the target hardware. Let

us consider our chip and this time we wish to implement the chip in a Field Programmable Gate Array (FPGA). For the moment we can think of an FPGA as hardware device that provides a large number of gates and flip flops that we can connect as we wish. This is not an accurate image but will do for the moment. A functional design is created VHDL, and can be simulated with Modelsim® (Mentor Graphics tool) as before to verify functional correctness. This functional design may be refined to a register transfer level description where data flows and basic hardware functional units are defined. Once we have created the register transfer level design a synthesis compiler like Leonardo® of Mentor Graphics too creates a gate level implementation of the system described by the VHDL code. At this point a logic level simulation can be performed to obtain preliminary performance estimates. This logic implementation is now placed on the FPGA using the gates and flip flops provided by the chip. The corresponding gates and flip flops must now be connected by routing signals between them on the chip. Once this placement and routing processes have been completed we can derive accurate estimates of the delays through the wires and gates since these delays are available from the chip specification. This timing information can be extracted from the design and stored in an industry standard format. We can now conduct a realistic timing simulation.

Figure 3.2: An Example of a synthesis design flow for FPGAs

Page 75: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 3 Introduction to VHDL and Digital Design

64 ICL

The growth in the FPGA marketplace has been explosive as they potentially provide very low cost programmable hardware solutions for many tasks currently performed by embedded controllers. However they are also making headway in the market for high performance custom solutions. In fact FPGA solutions have been proposed that can outperform current generation ASIC chips and at a significantly reduced cost. It is this potential cost-performance advantages in several niche markets that is garnering significant interest in the research, development, and product communities. In particular it is the ASIC replacement market that is fueling much the current development. Thus the synthesis focus in this text is motivated by, and on occasion only relevant to, FPGAs although the emphasis is on general principles that must be addressed by most any synthesis compiler.

3.3. THE MARKETPLACE:

The use of computer-aided design environments in general and hardware description

languages in particular is driven by the technology marketplace. After all, these tools and languages are intended to enable the cost-effective and profitable development of electronics products. The key question with respect to the state of the art is: where are the costs, both in terms of costs incurred and revenues lost due to design paradigms that are not as efficient or effective as they could be? The costs incurred are directly a function of the available technology. Both the semiconductor industry and software industry have been in overdrive during the last three decades and show no signs of abatement in the next decade. Memory densities have been quadrupling every 3 years and processor speeds doubling every 18 months. As chip densities increase, new products are becoming available at a faster rate and development cycles are becoming shorter. New EDA tool environments and hardware description languages are an integral part of any solution. Historically, the EDA tool industry has developed in a bottom up fashion and has been driven by the development of point tools and processes: design rule checking, layout, verification, and so forth.

Digital design was a manual process in that designs needed to be transformed manually

between levels of abstraction and at each level of abstraction we would be assisted by the appropriate tools. This is partly a by product of the fact that for chip and board designs there has been an understanding of the specific design problems, which include layout, design rule checking, test vector generation and so on. Advances have focused on better tools to address these problems spurred by the new challenges of faster, denser, more complex technologies. In contrast, very few tools exist at the architectural level for addressing system design issues. The problems are not as well defined, but the impact can be overwhelming. It has been observed that the first 10%–20% of the design cycle can determine 70%–80% of the final system cost. Further, it has been reported that only 5%– 10% of the design cycle time is spent in studying and formulating requirements, while 70% of the manufacturing costs are affected by customer requirements. New methodologies are needed to address the issues of requirements capture, system specification, and early analysis via rapid prototyping. Such new design methodologies are emerging, and VHDL is becoming an integral part of such design approaches.

3.4. THE ROLE OF HARDWARE DESCRIPTION LANGUAGES:

Traditional design methodologies have been structured around a hierarchy of representations

of the system being designed. Distinct representations at differing levels of detail are necessary for the various tasks encountered during design. One of the best-known representations of the different views and levels of abstraction in a digital system is the Y–chart shown in figure 3.3 and illustrated through the following example.

Page 76: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 3 Introduction to VHDL and Digital Design

65 ICL

Figure 3.3: Design views and corresponding levels of abstraction

Imagine a company SAHM, Inc., “stands for Sameh and Ahmad” in the business of designing a Bluetooth BaseBand chip, code name “Security”. Given the current costs of fabrication and the time window within which the chip must be brought to market to be competitive, we wish to verify prior to fabrication that the chip can support the intended applications. From the design flow shown in figure 3.1, we know that this behavior can be described at multiple levels of abstraction, that is, at the functional level, RTL level, logic level, and so on. Early in the design behavioral descriptions are necessary so that simulations can ensure that the chip is functionally correct. This functionality may be verified at more than one level of abstraction. The advantage of this approach is that we can make this assessment independent of the many possible physical implementations. Once we have verified the functionality the design can be translated to a structural description comprised of the major components of this chip: memories, registers, arithmetic units, logic units, and so forth. Simulation can be employed again to ensure that the “Security” structural design correctly performs the intended functions using the components that we have selected. As shown in Fugure (3-3), the structural description may also be provided with varying levels of detail. This design can be refined until we translate this description to a physical description that can be further refined to produce a manufacturable specification.

Historically, hardware description languages have been targeted to a certain level of

abstraction such as the gate level or register transfer level. Tools have been developed and targeted for tasks at a specific level of the design. For example, some tools may be optimized for implementing state machines that describe data movement at the register transfer level. Other tools may be developed for verification at the gate level by generating test vectors used for final chip verification. The tools at the different levels of abstraction may employ different descriptions of our “Security” chip. Such point tools are focused on a single aspect of the design and on a single level of abstraction. Often these tools came with their own languages and associated compilers and simulators. However, as chips become more complex and design processes start using an increasingly diverse set of tools, the time taken to move information between tools has become a concern. Recently, estimates of the lost productivity due to incompatibility between design tools has been estimated as high as $4.5 billion dollars/year. While simulation has been used at all levels of system design designated in Fugure (3-3) synthesis attempts to move in an automated fashion between the three domains and levels of abstraction. The use of hardware description languages such as VHDL can help address several aspects of this fundamental problem of design refinement.

Page 77: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 3 Introduction to VHDL and Digital Design

66 ICL

Interoperability: The VHDL language provides a set of constructs that can be applied at multiple levels of abstraction and multiple views of the system. This significantly expands the scope of the application of the language and promotes a standardized, portable model of electronic systems. Thus, the complexity of the movement of data and design information between tools is significantly reduced. The net effect is a reduction in the time to design the “Security” and bring it to market. We may also expect that the reduction in the number of distinct types of tools and languages realizes a reduction in the cost of the design infrastructure and therefore a reduction in the product unit cost.

Technology is a rapidly moving target. We cannot anticipate innovations in technology design

styles or products. Therefore an attractive philosophy is to develop a design environment that is independent of technology. The industry is characterized by a number of distinct technologies and associated design styles that target specific points in the continuum of time to market, cost, and performance. For example, the use of programmable logic devices (PLDs) FPGAs has the attributes of low cost and quick time to market. ASIC products incur higher nonrecurring engineering development costs and therefore usually higher product costs. However, they can deliver substantially higher performance. These distinct design styles leads to environments with a distinct set of EDA tools and methodologies.

Technology Independence: The VHDL descriptions of the design are not tied to a specific

methodology or target technology. The language is rich enough that it can be used to describe a chip at the instruction set level, register transfer level, or switching transistor level. DSP Inc. must produce a high performance version for use in a real-time radar processing system for the military. Design tools for custom and ASIC chips utilize VHDL in their suite to simulate and validate their designs prior to detailed physical design. Prior to physical design we would wish to have detailed timing information to ensure that the performance requirements of the application can be met. However, the application software developers may simply want a functionally accurate instruction set simulation of the “Security” chip so that development of application software may begin immediately – hardware/ software co-design. Eventually when a detailed physical design is available, the software may be tested on an associated simulation to determine the performance that can be achieved for the applications of interest—two widely differing levels of detail, both easily supported within the same language. Now 6 months later we have found a commercial market for the “Security” to implement video compression algorithms. The only problem is that they have to cost a tenth of the cost of the military part but the processing constraints are much looser. The major EDA vendors have tools that can synthesize designs to FPGA and complex PLD (CPLD) devices. The “Security” is re-implemented within an FPGA using synthesis from the VHDL models. The result is a cheaper, slower, higher volume product.

Now let us say that our company develops a model of the “Security” using EDA tools from

vendor Mentor Graphics. Now our company wishes to have a subcontractor use this chip in the design of a second product, say a voice recognition board for personal communicators. The design environment is completely different, using archaic design tools from StoneAge CAD Tools, Inc. We now need to send them detailed schematics and operational specifications and educate them on the “Security” design so that they may use this description to design and validate the board level product using their design tools. However, they do support the VHDL language. Since VHDL is a standard, rather than having them reconstruct the design in their environment, the preceding instruction set models of the “Security” can be directly used and simulated, and will produce identical results using the StoneAge’s simulators.

Design Reuse: We can see libraries of VHDL models of components emerging and being

shared across platforms, toolsets, organizations, and technical groups. Engineers working on a large

Page 78: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 3 Introduction to VHDL and Digital Design

67 ICL

design can be independently designing subsystems with considerably less concern for design environment or design tool compatibly issues.

Hardware/Software Prototyping: Part of the difficulty with a board design is that the

software for processing the data streams cannot be tested until hardware is available. But what if we have detailed hardware descriptions of the components on the board that behave exactly as the “Security” chip to the level of detail of a clock cycle? We could simulate the system to a sufficient level of detail so that the simulator could take the place of the hardware for software development purposes. Application programs could be executed on a software simulation. This would permit trade-offs between the hardware and software implementations even before a single chip or board was designed! Such an approach based on this concept of Virtual Prototyping is an example of a newly emerging design methodology for systems in general, but is currently being applied to digital signal processing systems in particular.

Hardware description languages are at the core of modern digital systems design

methodologies. Proficiency in their use will be necessary for designers and familiarity a must for electronics and communications engineering students.

In brief we can say: Digital system design is a process of creating and managing multiple descriptions of systems

representing distinct views and varying levels of abstraction. Historically, design environments grew around point tools for solving specific problems at each level of abstraction. Design methodologies governed the application of these tools and the flow of information between them. The evolution of distinct methodologies and modeling styles can hinder interoperability and make it difficult to share models. As electronic systems grew in complexity, the need to integrate these point tools into a cohesive design process has determined a large component of the economics of product design. The advent of hardware description languages such as VHDL and their acceptance as industry standards has had an enormous impact on the economics of product development. We see that VHDL modeling can be used to model hardware and software systems at multiple levels of abstraction. The language is independent of technology and design methodologies or styles, and therefore promotes portable descriptions, rapid prototyping, and free exchange of models among organizations and individuals. The result has been the promise of reduced design cycle times, faster time to market and reduced cost. We can expect to see the use and growth of the language continue as a widely used hardware description language for both military and commercial systems for some time to come. Within the purview of the design of electronic systems, the VHDL language is accompanied by several other hardware description languages. One of the more widely used is Verilog. The goals and motivation for the Verilog language parallel those of VHDL although Verilog shares a distinct developmental heritage. Mentor Graphics tool support both the VHDL and Verilog languages.

Page 79: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 3 Introduction to VHDL and Digital Design

68 ICL

3.5. DESIGN FLOW USING MENTOR GRAPHICS TOOLS:

The design flow using Mentor Graphics Tools can be divided into two major steps. a) DESIGN ENTRY: Mentor Graphics FPGA Advantage package is used for design entry (RENOIR), simulation

(MODELSIM) and synthesis (LEONARDO SPECTRUM). All of the design's blocks are written in the form of synthesizable VHDL code, and Renoir is

used for connecting these blocks together and port mapping. A VHDL test-bench is to be written and ModelSim is used for behavioral simulation. MATLAB from MathWorks or VISUAL C++ can be used for verification of simulation

results. After behavioral verification is done, Leonardo Spectrum is used for synthesis targeting the

design on XILINX ALLIANCE for FPGA flow or targeting an ASIC technology library for ASIC flow. After synthesis, a VHDL netlist is extracted using Leonardo and re-entered again with the VHDL test-bench to ModelSim for post-synthesis functional verification. After proper verification, the design is output from the synthesizer in the form of an EDIF netlist.

b) DESIGN IMPLEMENTATION: For implementation, Mentor Graphics IC STATION is used and some other tools. The EDIF netlist is read by ENIREAD tool to be converted into an EDDM design

viewpoint (format used by most of Mentor Graphics tools). A configuration file is written for ENIREAD to map the instances of the EDIF netlist properly to their corresponding layout cells.

SCHEMATICS GENERATOR (SG) tool was used to generate schematics for the core. The schematics were opened with DESIGN ARCHITECT and the symbol for the core was

generated. Making a new sheet the symbol is imported and pads (input, output and power pads) are added to the core to form a complete chip.

DVE was used to generate different levels of viewpoint (Apar, Device and Cell). The Apar level viewpoint is read into IC STATION. Using Autofloorplan with the specified

parameters, routing channels were defined. IC STATION Autoplace feature is used for automatic placing of the standard cells and pads,

and Autoroute was used for automatic routing. "Compact" feature can be used to reduce the chip's area by compressing the widths of

routing channels IC Trace LVS (Layout Versus Schematics) is then used for comparing the resulted layout

with the original input schematics of the chip. IC Extract is used for extraction of parasitics. QUICKSIM is used for back-annotation and timing simulation.

Figure 3.4 shows the complete HDL design flow using Mentor Graphics tools either targeting

on ASIC or FPGA technologies. We have faced some problems to be explained later in completing the flow.

Page 80: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 3 Introduction to VHDL and Digital Design

69 ICL

VHDL netlist and SDF file for timing simulation

Renoir Design Entry

Download design to FPGA chip Design Architect

Schematic editing

GDSII format to fabrication house

Mach TA/PA Timing and Power

analysis

QuickSim Digital simulation with

delays

SPICE netlist IC station

Autoplace, Autoroute, DRC, LVS, netlist and

parasitics extraction

DVE Viewpoints generation

ENIREAD & SG Format conversion

Schematic generation

Xilinx Alliance FPGA Implementation

Leonardo FPGA or ASIC

synthesis

VHDL netlist & VITAL Post synthesis verification

ModelSim VHDL simulation

FPGA ASIC

Page 81: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 3 Introduction to VHDL and Digital Design

70 ICL

OPUFT!

Page 82: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

ICL

T

Implementation of Authentication

and Key Generation

By: Sameh Ahmed Assem he aim of my work was to implement the E1, E21, E22 and E3 algorithms used for both authentication and key generation as explained in chapter 2. In this chapter I’m going to

explain how I achieved this using VHDL and tools at hand. Section 4.1 is about my design and how these algorithms were implemented. And section 4.2 will cover how I simulated my design.

4.1. THE DESIGN OF AUTHENTICATION AND KEY GENERATION ALGORITHMS:

All the algorithms E1, E21, E22 and E3 depend on the functions Ar and Ar’ which in turn depend on the SAFER+ algorithm. So the design started by designing the SAFER+ encryption core. After that an interface was designed to utilize it in different algorithms.

4.1.1. SAFER+ ENCRYPTION ROUND:

The first main block to be designed was the encryption round of the SAFER+. The specifications of this block was explained in chapter 2 section 2.4. The round was designed used the block diagram editor in Renoir. The small blocks in it was designed as follows:

The modulo-256 adder: This was written in VHDL. The 8-bit XOR: This was written in VHDL. The exponential function: This was a very complicated function ((45x mod 257) mod

256). Where x ranges from 0 to 255. So it was very difficult to be implemented as an algorithm in VHDL. MATLAB was unable to calculate the function as well. So the values were downloaded from the net and written as a look-up table in VHDL.

The logarithmic function: again the values of (log45(x) mod 257) mod 256) were downloaded from the net and written as a look-up table in VHDL.

PHT: This was written in VHDL. Permute: This was written in VHDL.

After designing these blocks the round was designed in a pure combinational way made up of

these blocks just as stated in the specifications. The block diagram of the design looked as shown in figure 4.1.

If you want a thing well done, do it yourself

Page 83: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 4 Implementation of Authentication and Key Generation

72 ICL

PHT

PHT

PHT

PHT

+

e

@

PHT

PHT PHT PHT PHT PHT PHT PHT

log

@

e

+

log

+

e

@

e

@

+ @ @ + +

round_input

e

@

log

+ +

log e

@ +

log

@@ + + @

e

@

e

@+

log

+

log

+

log

+@ + @@

PHTPHT

permute

PHT

permute

PHT PHT PHT

PHT

PHT

PHTPHT

PHT PHT

PHT

permute

PHT PHTPHT

PHT PHT PHT PHT

round_output

k2r_1

k2r

Figure 4.1: The combinational round as entered in Renoir

Page 84: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 4 Implementation of Authentication and Key Generation

73 ICL

The advantage in the design shown in figure 4.1 is the high speed it achieves the round calculations. A single clock is needed to do all the round calculations. This is of course a great advantage. But unfortunately it resulted in a large area design. That’s why after implementation of the overall chip and finding that the core area was about 15.5 mm2, I redesigned the round to be a sequential one and avoid the repetition of look-up tables which increases the area so much. The new design made use of a controller entered as a finite state machine (FSM) (as shown in figure 4.2) and multiplexers and a demultiplexer (as shown in figure 4.3). This new design enabled us to reduce the core area to about 12 mm2. But yet the speed was reduced as 8 clocks are needed now to perform round calculations. Since we are working on 128 bits a round therefore this is not a big problem.

The idea of the controller was to select each clock a different 16 bits of the input and the two

subkeys. Thus we are processing 16 bits per clock and thus only one log LUT and one exp LUT. The order of the 2 octets in the 16 bits were reversed each clock to follow the specifications. Thus the controller was designed to control multiplexers to do this function. At the end a certain output is produced to announce the end of the round.

start

sel <="100";pass <='0';

sel <="110";pass <='0';

sel <="111";pass <='0';

s8sel <="111";pass <='1';

s6

s7

sel <="000";pass <='0';

s0sel <="000";pass <='0';

s1sel <="001";pass <='0';

go='1'

s4

s5sel <="101";pass <='0';

s2sel <="010";pass <='0';

s3sel <="011";pass <='0';

Figure 4.2: The round Controller as entered in Renoir

Page 85: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 4 Implementation of Authentication and Key Generation

74 ICL

round_input : (127:0)

output

@ +

k2r_1 : (127:0)

PHT

PHT

PHT

PHT

round_control

passloge

@+

reg16oct

PHT PHT PHT PHT PHT PHT PHT

PHT PHTPHT

permute

PHT PHT PHT PHT

PHT PHT PHT

permute

PHT PHT PHT PHT

PHT PHT PHT

permute

PHT PHT PHT PHT

round_output : (127:0)

k2r : (127:0)

Figure 4.3: The sequential round as entered in Renoir

Page 86: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 4 Implementation of Authentication and Key Generation

75 ICL

4.1.2. SAFER+ KEY SCHEDULE:

As done in the SAFER+ round, starting with the small components: Bias words: The bias words were downloaded from the net and then written in VHDL

format as a ROM. Octets summer: This was written in VHDL to XOR the 8 bits of all the 16 octets. Rotator: This was written in VHDL to octets 3 bits left Selector: The selection of octets to be added to the bias words was done in the block

diagram by certain connections. See figure 4.4

octets_rotate+

++

Pan el0

octets_sum

octets_rotate+

+

in 2

in 15

out 15

in 0in 1

in 6

out 6

in 2

out 2

in 0 in 1 in 3 in 4 in 5

out 0out 1 out 3out 4out 5

in 11

out 10out 11

in 10in 7 in 8 in 9 in 12in 13in 14

out 7out 8out 9 out 12out 13out 14

in 10in 9

in 3in 4in 5in 6in 7in 8

in 11in 12in 13in 14in 15

output

in 16

out 16

++

octets_rotate

in 14

out 14

octets_rotate

in 14

out 14

+

in 5

out 5

in 0 in 1 in 2 in 3 in 4

out 0out 1out 2out 3out 4

in 10

out 10

in 6 in 7 in 8 in 9

out 6out 7out 8out 9

in 11in 12in 13

out 11out 12out 13

+

++

+ +

++

+

++

+

++

+

+

in 15in 16

out 15out 16

+

+

+

+

in 5

out 5

in 0 in 1 in 2 in 3 in 4 in 10in 11in 12in 13in 6 in 7 in 8 in 9

out 0out 1out 2out 3out 4 out 10out 11out 12out 13out 6out 7out 8out 9

+

++

++

in 15in 16

+

++

+

+

out 15out 16

++

+

+

+

+

++

+

octets_rotate

in 14

out 14

octets_rotate

in 14

out 14

in 14

in 5

out 5

in 0 in 1 in 2 in 3 in 4

out 0out 1out 2out 3out 4

in 10in 11in 12in 13in 6 in 7 in 8 in 9

out 10out 11out 12out 13out 6out 7out 8out 9

++

in 15in 16

out 15out 16

+

+

++

+

+

+

+

++

+

+

+

+

+

+

+

in 5

out 5

in 5

in 0 in 1 in 2 in 3 in 4

out 0out 1out 2out 3out 4

in 10

out 10out 11out 12out 13out 6out 7out 8out 9

in 6 in 7 in 8 in 9 in 11in 12in 13

in 0 in 1 in 2 in 3 in 4 in 10in 11in 12in 13in 6 in 7 in 8 in 9

+

+

+

out 15out 16

+

in 15in 16

++

+

+

+ +

+

in 15in 16

+

+

+

++

octets_rotateout 14

octets_rotate

in 14

out 14

in 14

out 14

out 5

in 5

out 5

out 0out 1out 2out 3out 4 out 10out 11out 12out 13out 6out 7out 8out 9

in 0 in 1 in 2 in 3 in 4

out 0out 1out 2out 3out 4

in 10in 11in 12in 13in 6 in 7 in 8 in 9

out 10out 11out 12out 13out 6out 7out 8out 9

+

+

+

out 15out 16

++

+

+

+

+

+

+

in 15in 16

out 15out 16

++

+

++

in 5

out 5

in 0 in 1 in 2 in 3 in 4

out 0out 1out 2out 3out 4

in 10

out 10

in 6 in 7 in 8 in 9 in 11in 12in 13

out 6out 7out 8out 9 out 11out 12out 13

+

++

+

+

+

+

+

++

++

++

+

++

in 15in 16

out 15out 16

++

+

octets_rotate

+

octets_rotate

in 14

out 14

octets_rotate

in 14

out 14

+

in 5

out 5

in 0 in 1 in 2 in 3 in 4

out 0out 1out 2out 3out 4

in 10

out 10

in 6 in 7 in 8 in 9

out 6out 7out 8out 9

in 11in 12in 13

out 11out 12out 13

+

+

+

+

+

+

+

++

+

++

+

+

in 15in 16

out 15out 16

+

++

+

+

in 5

out 0out 1 out 10out 11out 12out 13out 2out 3out 4out 5out 6out 7out 8out 9

in 0 in 1 in 2 in 3 in 4 in 10in 11in 12in 13in 6 in 7 in 8 in 9

out 15out 16

+

+

++

in 15in 16

+

+

+

+

++

++

++

+

+

+

++

+

octets_rotate

in 14

out 14

octets_rotate

in 14

out 14

in 14

in 5

out 5

in 0 in 1 in 2 in 3 in 4

out 0out 1out 2out 3out 4

in 10

out 10

in 6 in 7 in 8 in 9 in 11in 12in 13

out 6out 7out 8out 9 out 11out 12out 13

+

in 15in 16

out 15out 16

++

++

+

+

+

+

++

+

+

+

+

+

++

in 5

out 5

in 5

in 0 in 1 in 2 in 3 in 4

out 0out 1out 2out 3out 4

in 10

out 10

in 6 in 7 in 8 in 9

out 6out 7out 8out 9

in 11in 12in 13

out 11out 12out 13

in 0 in 1 in 2 in 3 in 4 in 10in 11in 12in 13in 6 in 7 in 8 in 9

+

+

+

in 15in 16

out 15out 16

+

+

+

+

+

++

++

in 15in 16

+

++

+

octets_rotate+

octets_rotateout 14

octets_rotate

in 14

out 14

in 14

out 14

++

out 5

in 5

out 5

out 0out 1out 2out 3out 4 out 10out 11out 12out 13out 6out 7out 8out 9

in 0 in 1 in 2 in 3 in 4

out 0out 1out 2out 3out 4

in 10in 11in 12in 13in 6 in 7 in 8 in 9

out 10out 11out 12out 13out 6out 7out 8out 9

+

+

+

out 15out 16

++

+

+

+

+

++

+

in 15in 16

out 15out 16

++

++

in 5

out 5

in 0 in 1 in 2 in 3 in 4

out 0out 1out 2out 3out 4

in 10

out 10out 11out 12out 13out 6out 7out 8out 9

in 6 in 7 in 8 in 9 in 11in 12in 13

+

+

+

++

+

++

+

+

+

+

+

+

+

out 15out 16

+

in 15in 16

++

+

in 14

out 14

++

in 0 in 1 in 10in 11in 12in 13in 2 in 3 in 4 in 5 in 6 in 7 in 8 in 9

out 5out 0out 1out 2out 3out 4 out 10out 11out 12out 13out 6out 7out 8out 9

in 15in 16

+

+

++

++

++

+

++

+

out 15out 16

++

+

++

+

+

++

+

++

B2 B3 B4

bias_romB13B10 B11 B12B5 B6 B7 B8 B9 B14 B15 B16 B17

mux16x2

Figure 4.4: The Combinational SAFER+ Key Schedule as entered in Renoir.

After finishing the design of this key schedule, it was clear that it is so big. The bias ROM with 17 128 bits outputs at the same time was of no use since we only need two keys per round. Thus the design was changed in order to decrease the area. It was turned to a sequential design giving the required 2 subkeys per round. No additional controller was added as the SAFER+

Page 87: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 4 Implementation of Authentication and Key Generation

76 ICL

controller to be explained later was used instead. See figure 4.5. Yet new blocks were needed such as:

A register: written in VHDL for feedback. Octets rotator: to rotate the octets among themselves for appropriate selection of

octets to be added.

k1

feedback

bias_rom

octets_sum

octets_rotateoctets_circulate

octets_rotateoctets_circulate

reg136

sel : (2:0)

+

+

++

+

+

+

+

+

+

+

+

+

+

+

+

D QD

CLK

Q

+

kii : (127:0)

k17

ki : (127:0)

++

++

++

++

+

+

++

+

++

+

+

+

+

+

+

+

+

+

++

+

+

+

++

Figure 4.5: The sequential SAFER+ Key Schedule as entered in Renoir.

Page 88: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 4 Implementation of Authentication and Key Generation

77 ICL

4.1.3. AR/AR’ BLOCK: Since Ar and Ar’ are never used simultaneously, The idea was to combine both the round and key schedule in manner such that the feedback pass determine whether the block is used as Ar or Ar’. Certain input called “dash” would open a certain path to make the input of third round add to that of first round as required in Ar’. Again after the 8th round a certain path is opened and feedback pass is closed to achieve the output transformation and finally the ciphertext. The plaintext has to enter to the first round, suffers confusion and diffusion and when it gets out of the round it has to re-enter another round. This has to be repeated for 8 rounds. After that, an output transform is done with the final subkey. This is the scenario for Ar. But for Ar’ we have to perform the output transform twice; one at the end as in Ar, while the other is done after the second round between the input to round 1 and the output of round 2. This was achieved by the help of multiplexers, demultiplexers, a register and a controller. Again the controller was entered as a state diagram shown in figure 4.6. The final Ar/Ar’ block is shown in figure 4.7. The controller was designed to produce certain outputs at certain clocks to control the multiplexers and demultiplexers and thus open the correct paths for data. It will start when its ‘go’ pin is set high and when it ends it will set its ‘done’ pin high. The ‘dash’ input is responsible to determine whether the block is working as Ar or Ar’. When it is set high Ar’ is operating.

round_8

sel<="000";int<='0';s0<='0';s1<='0';done<='0';

round_7

end_round ='1'

end_round ='1'

sel<="111";int<='0';s0<='1';s1<='1';done<='1';

sel<="110";int<='0';s0<='1';s1<='0';done<='0';

round_1

sel<="001";int<='0';s0<='1';s1<='0';done<='0';round_2dash

sel<="001";int<='0';s0<='1';s1<='1';done<='0';

dash='0' and end_round ='1'

dash='1' and end_round ='1'

start

2

go='1'

end_round='1'

sel<="000";int<='0';s0<='0';s1<='0';done<='0';

round_2

1

round_3dash

sel<="010";int<='1';s0<='1';s1<='0';done<='0';

end_round ='1'

end_round ='1'

round_3sel<="010";int<='0';s0<='1';s1<='0';done<='0';

end_round ='1'

sel<="101";int<='0';s0<='1';s1<='0';done<='0';

round_4round_6

round_5sel<="100";int<='0';s0<='1';s1<='0';done<='0';

end_round ='1'end_round ='1'

sel<="011";int<='0';s0<='1';s1<='0';done<='0';

end_round ='1'

Figure 4.6: The Controller of Ar/Ar’ block as entered in Renoir.

Page 89: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 4 Implementation of Authentication and Key Generation

78 ICL

safer_control

plain_text : (127:0)

dash

rst

clk

go

doneexcess_operations

reg128

key_sched

round

key : (127:0)

cipher_text : (127:0)

Figure 4.7: Ar/Ar’ block as entered in Renoir

4.1.4. E1, E21, E22 AND E3 ALGORITHMS BLOCK: Again, since the E1 algorithm is used for authentication, E21 is used for unit or combination key generation, E22 is used for master or initialization key generation and E3 is used for encryption key generation (as was explained in chapter 2), then they will never run simultaneously. This means that there is no need to build an Ar/Ar’ block for each of them. One is enough for all of them. This was achieved in the design, but, the problem now is how to prepare the required data for each algorithm. This was again done with the help of multiplexers and a controller. For E1: We need a 128 bits random number, a 128 bits key and a 48 bits Bluetooth device address. The inputs to the first Ar function are the random number and the key. After that comes an Ar’ function between an offseted key and the output of the first Ar output after being xored with the random number and then added to the address concatenated to be 128 bits. To achieve this a feedback is required. The address is already concatenated by connecting the input pins to a 128 bits bus. The output is a 128 bits data whose first 32 bits are SRES. For E21 : We need a 128 bits random number and a 48 bits Bluetooth address. It is just one Ar’ function whose inputs are the random number with one octet of which xored with the number 6 and the address again concatenated as in E1. For E22 : We need a 128 bits random number, an L octets PIN number and a 48 bits Bluetooth address.. Since the PIN is device dependant, we thought of having it of fixed length for our device. Thus we chose L equals to 6. This would make L’= 12. And follows the PIN’ (concatenation of PIN with the Bluetooth address) will be of 12 octets. This will help because now

Page 90: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 4 Implementation of Authentication and Key Generation

79 ICL

PIN’ is equal in length to the COF and thus the connection of pins for concatenation of PIN’ will be the same as that for COF used in E3. The data is now the random number with the first byte xored with 12 this time and the PIN’ concatenated with some octets of PIN to be 128 bits. But notice that the order of data is reversed from that used in E21. For E3 : We need a 128 bits key, a 128 bits random number and a 96 bits COF. It is the same as E1 except that COF is concatenated with some octets from itself to form the second 128 bits rather than the address as in E1. The output this time is a 128 bits encryption key. The new blocks needed are:

The controller: entered as a state diagram. (see figure 4.8). It will start when its ‘start_algo’ input is set high and when it ends it will set its ‘end_algo’ output high. Again it produces outputs to control the multiplexers and the demultiplexer. Also it sets the ‘dash’ input of the Ar/Ar’ block low or high for Ar or Ar’ respectively. It sets ‘go’ high and waits for ‘done’ to be high for transitions. 2 inputs are used to select the algorithm we want the block to operate on. These are called ‘e_type’.

The offset: The function to produce the offseted key. It was written in behavioral VHDL.

The 6/12 multiplexer: This is a VHDL multiplexer but with no inputs. It selects among two ready stored values either 6 or 12.

Thus the final block diagram for E1, E21, E22 and E3 algorithms is shown in figure 4.9. The

pins for this block are: 128 pins for the random number 128 pins for the key 48 pins for a Bluetooth address, a PIN or the upper 48 bits of COF 48 pins for another Bluetooth address or the lower 48 bits of COF 128 pins for the output 2 pins for ‘e_type’ 1 pin for clock 1 pin for reset 1 pin for ‘start_algo’ 1 pin for ‘end_algo’ 1pin for ‘key_change’ added for changing the value of key register as shown in

chapter 6 For a total of 487 pins.

Ardash_startz<='0';end_algo<='0';dash<='1';go<='1';y<='1';key_change<='0';

algorithm_endstart

ar_end

2

y<='0';z<='0';end_algo<='0';dash<='0';go<='0';key_change<='0';

1

ar_starty<='0';z<='1';end_algo<='0';dash<='0';go<='1';key_change<='0';

e_type(1)='1' and start_algo='1'

done='1'

y<='1';z<='0';end_algo<='0';dash<='0';go<='0';key_change<='0';

e_type(1)='0' and start_algo='1'

z<='0';end_algo<='1';dash<='1';go<='0';y<='1';if e_type(1)='0' then key_change<='1';else key_change<='0';end if;

done='1'

Figure 4.8: The controller of all algorithms block as entered in Renoir.

Page 91: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 4 Implementation of Authentication and Key Generation

80 ICL

e_type : (1:0)

start_algo

clk

rst

bdadb_COFl : (47:0)

a : ( 7:0 )

b : ( 7:0 )c : ( 7:0 )

key : (127:0) Rand : (127:0)

outp ut_4 : ( 7:0 )

input : ( 7:0 )

e_typ e(0) : (1:0 )

Rand(7:0 ) : (127 :0)

Rand(127 :8) : (127 :0) Concatenation

ad_PIN_COFu : (47:0)

SAFER

e_typ e(0) : (1:0 )

algo_out : (127:0)

in1 : ( 127 :0) in2 : ( 127 :0)

key_change

in_2 : ( 127 :0)

offset

key : ( 127 :0)outp ut_7 : ( 127 :0)

yin1 : ( 127 :0) in2 : ( 127 :0)

outp ut : ( 127 :0)sel

outp ut_9 : ( 127 :0)

e_typ e(0) : (1:0 )in1 : ( 127 :0) in2 : ( 127 :0)

outp ut : ( 127 :0)sel

outp ut : ( 127 :0)

in2 : ( 127 :0)

algo_controlclk

e_typ e : ( 1:0 )

start_algo

rst

dash

key : ( 127 :0)

outp ut : ( 127 :0)

dash

outp ut_2 : ( 127 :0)

rst

sel

y

z

rst

e_typ e(1) : (1:0 )

clkdonegodone

go

dash

go

key_change

cipher _text : ( 127 :0)

cipher _text : ( 127 :0) done

in1 : ( 127 :0) in2 : ( 127 :0)

in1 : ( 127 :0) in2 : ( 127 :0)

outp ut : ( 127 :0)sel

in1 : ( 127 :0) in2 : ( 127 :0)

outp ut : ( 127 :0)sel

outp ut_1 : ( 127 :0)

y

outp ut_7 : ( 127 :0)

outp ut_8 : ( 127 :0)

in_2 : ( 127 :0)

in1 : ( 127 :0) in2 : ( 127 :0)

outp ut : ( 127 :0)sel

plain_te xt : ( 127 :0)

outp ut : ( 127 :0)sel

outp ut_3 : ( 127 :0)

e_typ e(1) : (1:0 )

end_algo

end_algo

reg128

z

out_ 2 : ( 127 :0)

end_algo xor_128

input 1 : ( 127 :0)

input 2 : ( 127 :0)outp ut : ( 127 :0)

add_128input 1 : ( 127 :0)

input 2 : ( 127 :0)

outp ut : ( 127 :0)

out_ 3 : ( 127 :0)

Rand : ( 127 :0)

outp ut_5 : ( 127 :0)

outp ut_6 : ( 127 :0)

Figure 4.9: The block diagram that can be used for E1, E21, E22 and E3 algorithms as entered in Renoir

4.2. THE SIMULATION OF THE DESIGN:

The simulation of the design depended mainly on 2 things: An example given in the paper of SAFER+ nomination as an AES candidate. The sample data in Bluetooth specifications.

The first was used to simulate the Ar/Ar’ block whereas the second was used to simulate the

overall block. Every small part was simulated by itself first and they all seemed to work properly. Then came the Ar/Ar’ block. It didn’t give the same output as the example from the first time and thus changes in the design were made to correct the errors.

All the blocks should have been all back annotated but only the small ones were back

annotated and worked properly. Others didn’t fit except on large FPGA’s not installed on Xilinix software. So that was left for the ASIC implementation of the whole chip presented.

After being sure of the correctness of the Ar/Ar’ block. It was time to simulate the algorithms

block depending on the sample data and whether they are working correctly or not. Again corrections were required to be done till all the sample data went O.K.

To be sure that the block was working well for other inputs too, I started to write a test bench

for the design. The test bench reads its inputs from a test vector produced by MATLAB. I started to study MATLAB and VHDL files read and write commands, wrote the algorithms as a MATLAB code, checked the correctness of the MATLAB code through the sample data, studied MATLAB graphical user interface (GUI) and finally made a GUI that creates random inputs, evaluates the outputs of E1, E21, E22 and E3 algorithms, puts the inputs in a test vector and the outputs in a file and finally compares this output file with another one produced by ModelSim. The results were amazing.

Page 92: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 4 Implementation of Authentication and Key Generation

81 ICL

4.2.1. THE SAFER+ EXAMPLE: Figure 4.10 shows the example as found in the SAFER+ nomination paper:

Figure 4.10: An extract form the SAFER+ nomination paper Now taking the key and plaintext, converting them to bits using MATLAB and entering them to ModelSim and then simulating the key schedule we get the results shown in figure 4.11.

Figure 4.11a: ModelSim List output for key schedule simulation

Page 93: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 4 Implementation of Authentication and Key Generation

82 ICL

Figure 4.11b: ModelSim Wave output for key schedule simulation

Taking these values and converting each 8 bits into decimal, the 17 subkeys are:

And these are the same as in the SAFER+ nomination paper. Now simulating the whole SAFER+: See figure 4.12.

Figure 4.12: ModelSim Wave output for SAFER+ Simulation

The value of the ciphertext when ‘done’=1 is: 224 31 182 10 12 255 84 70 127 13 89 249 9 57 165 220 which is again as the SAFER+ nomination paper.

Page 94: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 4 Implementation of Authentication and Key Generation

83 ICL

4.2.2. THE SAMPLE DATA WITH THE MATLAB PROGRAM: The GUI MATLAB code I wrote can be used simply to test for all the given sample data and

check the correctness of the design. So first let’s look at the sample data from the Bluetooth specifications:

4 sample data were given for E1: rand :00000000000000000000000000000000 address :000000000000 key :00000000000000000000000000000000 sres :056c0fe6 aco :48afcdd4bd40fef76693b113 ------------------------------------------ rand :bc3f30689647c8d7c5a03ca80a91eceb address :7ca89b233c2d key :159dd9f43fc3d328efba0cd8a861fa57 sres :8d5205c5 aco :3ed75df4abd9af638d144e94 ------------------------------------------ rand :0891caee063f5da1809577ff94ccdcfb address :c62f19f6ce98 key :45298d06e46bac21421ddfbed94c032b

sres :00507e5f aco :2a5f19fbf60907e69f39ca9f ------------------------------------------ rand :0ecd61782b4128480c05dc45542b1b8c address :f428f0e624b3 key :35949a914225fabad91995d226de1d92 sres :80e5629c aco :a6fe4dcde3924611d3cc6ba1 Then 4 sample data were given for E21: rand :00000000000000000000000000000000 address :000000000000 Ka :d14ca028545ec262cee700e39b5c39ee ---------------------------------------------------------- rand :2dd9a550343191304013b2d7e1189d09 address :cac4364303b6 Ka :e62f8bac609139b3999aedbc9d228042 -----------------------------------------------------------

Page 95: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 4 Implementation of Authentication and Key Generation

84 ICL

rand :dab3cffe9d5739d1b7bf4a667ae5ee24 address :02f8fd4cd661 Ka :b0376d0a9b338c2e133c32b69cb816b3 ------------------------------------------------------------ rand :13ecad08ad63c37f8a54dc56e82f4dc1 address :9846c5ead4d9 Ka :5b61e83ad04d23e9d1c698851fa30447 ----------------------------------------------------------- Only one sample data was given for E22 when PIN is made of 6 octets: rand :3348470a7ea6cc6eb81b40472133262c PIN :fcad169d7295 address :430d572f8842 Ka :98f1543ab4d87bd5ef5296fb5e3d3a21 Finally, 4 sample data were given for E3: rand :00000000000000000000000000000000 aco :48afcdd4bd40fef76693b113 key :00000000000000000000000000000000 kc :cc802aecc7312285912e90af6a1e1154 ------------------------------------------------------- rand :950e604e655ea3800fe3eb4a28918087 aco :68f4f472b5586ac5850f5f74 key :34e86915d20c485090a6977931f96df5 kc :c1beafea6e747e304cf0bd7734b0a9e2 -------------------------------------------------------- rand :6a8ebcf5e6e471505be68d5eb8a3200c aco :658d791a9554b77c0b2f7b9f key :35cf77b333c294671d426fa79993a133 kc :a3032b4df1cceba8adc1a04427224299 -------------------------------------------------------- rand :5ecd6d75db322c75b6afbd799cb18668 aco :63f701c7013238bbf88714ee key :b9f90c53206792b1826838b435b87d4d kc :ea520cfc546b00eb7c3a6cea3ecb39ed

Page 96: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 4 Implementation of Authentication and Key Generation

85 ICL

Now it is time to run the MATLAB program and enter the values of all the inputs in these sample data into it to produce the required test vector.

Figure 4.13 shows the GUI MATLAB program: It can be used for the 4 algorithms. The selection is done through the ‘Algorithm’

pop-down menu. Entry of data can be in decimal, binary or hexadecimal. Selection is done through

‘Input Number System’. “Create Random Inputs’ button is used to produce input data randomly. ‘Evaluate’ button runs the algorithm. 4 buttons are found to clear the old test vector, open a new one, add data to it and to

close it after the end of data entry. ‘Compare’ button is used to compare the outputs of both the MATLAB and

ModelSim. Entry fields are found at the bottom to enter data rather than creating them randomly. The upper left corner is used for showing the output data. First in a NRZ waveform,

then in binary, hexadecimal and finally decimal.

Figure 4.13: The GUI MATLAB program written for producing the test vectors and comparing the ModelSim output to MATLB output.

After entering all the sample data, evaluating them, and adding them to the test vector, we close the test vector. Now 2 files are written; ‘test_in.vec’ this is the test vector to be copied to the working directory of ModelSim and read by the test bench and ‘test_out.vec’ which is a file containing all the outputs in binary format to be compared to the output of ModelSim. Now it is time to run ModelSim.

Page 97: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 4 Implementation of Authentication and Key Generation

86 ICL

Figure 4.14: ModelSim Wave output for the simulation using Bluetooth sample data

After the simulation of the test bench using ModelSim (see figure 4.14) a new file ‘test_out_vhdl.vec’ is written. This is taken to working directory of MATLAB and then the ‘Compare’ button is pressed and the results are taken from the MATLAB command window as follows:

MATLAB output : ٠٥٦C0FE648AFCDD4BD40FEF76693B113 ModelSim output : ٠٥٦C0FE648AFCDD4BD40FEF76693B113 They are the same MATLAB output : ٨D5205C53ED75DF4ABD9AF638D144E94 ModelSim output : ٨D5205C53ED75DF4ABD9AF638D144E94 They are the same MATLAB output : ٠٠٥٠٧E5F2A5F19FBF60907E69F39CA9F ModelSim output : ٠٠٥٠٧E5F2A5F19FBF60907E69F39CA9F They are the same MATLAB output : ٨٠E5629CA6FE4DCDE3924611D3CC6BA1 ModelSim output : ٨٠E5629CA6FE4DCDE3924611D3CC6BA1 They are the same MATLAB output : D14CA028545EC262CEE700E39B5C39EE ModelSim output : D14CA028545EC262CEE700E39B5C39EE They are the same MATLAB output : E62F8BAC609139B3999AEDBC9D228042 ModelSim output : E62F8BAC609139B3999AEDBC9D228042 They are the same MATLAB output : B0376D0A9B338C2E133C32B69CB816B3 ModelSim output : B0376D0A9B338C2E133C32B69CB816B3 They are the same MATLAB output : ٥B61E83AD04D23E9D1C698851FA30447 ModelSim output : ٥B61E83AD04D23E9D1C698851FA30447 They are the same MATLAB output : ٩٨F1543AB4D87BD5EF5296FB5E3D3A21

Page 98: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 4 Implementation of Authentication and Key Generation

87 ICL

ModelSim output : ٩٨F1543AB4D87BD5EF5296FB5E3D3A21 They are the same MATLAB output : CC802AECC7312285912E90AF6A1E1154 ModelSim output : CC802AECC7312285912E90AF6A1E1154 They are the same MATLAB output : C1BEAFEA6E747E304CF0BD7734B0A9E2 ModelSim output : C1BEAFEA6E747E304CF0BD7734B0A9E2 They are the same MATLAB output : A3032B4DF1CCEBA8ADC1A04427224299 ModelSim output : A3032B4DF1CCEBA8ADC1A04427224299 They are the same MATLAB output : EA520CFC546B00EB7C3A6CEA3ECB39ED ModelSim output : EA520CFC546B00EB7C3A6CEA3ECB39ED They are the same As noticed all the values agree and agree with the sample data as well. This means that the

block is working well according to the Bluetooth specification. This is in the simulation level. Other simulations were left for the complete chip proposed.

Page 99: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 4 Implementation of Authentication and Key Generation

88 ICL

OPUFT!!!!!!!!!!!!!!!!!!!!!!!!

Page 100: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

ICL

B

IMPLEMENTATION OF ENCRYPTION ENGINE AND PRNG

Good judgement comes from experience

Experience comes from bad judgement

By: Ahmad Abdelhameed

efore this project, I had no previous experience dealing with the VHDL and there was a great need for leaning as much about this new world as I can in a very limited time

starting from the basic design methodologies to reach the most appropriate techniques for perfect synthesis. Mentor Graphics tools gave me a great opportunity doing this while learning the language syntax in parallel with studying the Bluetooth specifications standard.

In the following sections, I will try to present some details on the design and the implementation

of the Bluetooth Encryption Engine and the random number generator. The operation of the encryption algorithm (E0) was presented in chapter 2.

5.1 . KEY REDUCTION:

As we have seen in chapter 2 according to the Bluetooth specification, E3 generates a 128 bits encryption key ( Kc ). This key will pass through some mathematical computations to reduce its length to generate cK ′ according to table 2.7. I have found on the Internet two reasons to use variable encryption key length in Bluetooth and not using the 128 bits directly. First, due to export limitations law in the U.S. which states not to export devices with an encryption key’s length more than 64 bits to any of what the U.S. department of states used to call terrorist countries like Iran, North Korea, Iraq, Syria…etc. But we cannot lean on the 64 bits key only, and here comes the second reason, as the need for strong encryption keys will increase.

We believe that there is no Bluetooth device supporting all the sizes from one to 16 octets. Thus,

no need for such a large function to be designed on our chip. Actually, I left the design of this part until we get an estimation of the final chip area and it was large enough (even after many cycles of reducing the area) to eliminate this part completely and get the 128 bits key directly from E3. This part can be designed as a large lookup table but with a simple controller. Also, a close look at table 2.7 indicates on its last row that using the maximum available key length (128 bits) means using the same Kc as cK ′ to be used in the Encryption Engine.

5.2. THE LINEAR FEEDBACK SHIFT REGISTERS (LFSRS):

The strength of any encryption algorithm depends on how much it can defuse and confuse the hackers and eavesdroppers. The LFSRs play a major rule in this. (I have used this technique, also, in the design of the pseudorandom-number generator as will be shown). An ordinary LFSR consists of a number single bit registers combined together to form a shift register and a number of combinational elements to form the feedback path. A single bit register is an edge-triggered D flip

Page 101: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 5 Implementation of Encryption Engine and PRNG

90 ICL

flop which operates by passing the input to the output every raising (or falling) edge of the applied clock. It is a very common building block in any digital circuits and any FPGA chip contains many of these blocks as a predefined black box with a minimum of five ports: input, output, clock, set and reset. The internal structure can be found in any digital circuits textbook. The combinational logic feedback path consists of XNOR or XOR gates but the Bluetooth standard has chosen the XOR technique. Details will be shown in the PRNG section. Also, as the feedback path of each LFSR is closed before being filled with the input bits stream, I thought that there would be no need for a set or reset inputs. The early first design was based on this idea and the simulation gave me perfect results according to the sample data given in the Bluetooth standard. But, when I reached the back annotation step, I’ve discovered that this cannot work due to the presence of un-initialized bits. This mistake was dangerous enough to rebuild the whole design from the bottom on the 20th of June. Fortunately, I was good enough in VHDL and Mentor Graphics tools, and this didn’t take more than 12 hours.

The encryption process consists of two main steps: 1. A 240-cycle initialization sequence and 2. Key-stream generation and encryption

After the first 240-cycle, the LFSRs must be loaded with the last generated 128 bits. This means an added multiplexer (MUX) to each bit of the LFSRs. The final block diagram of a single LFSR bit is shown in figure 5.1 and b.

(a) (b)

Figure 5.1: a. LFSR single bit taken from Renoir version 2000.3 b. as a black box When LOAD is low (logic ‘0’), the INPUT passes to OUTPUT. But when LOAD is high

(logic ‘1’), P passes (P represents the parallel loading of the last 128 generated bits). LOAD is a control output from the finite state machine controller inside E0 to be explained in section 5.6. As long as INIT is high, OUTPUT will stay low. INIT acts s a reset input and it is controlled by a higher-level controller outside the E0, which will be discussed in chapter 6.

There were two choices which one come first: the MUX or the DFF? Both choices has worked

for the first 240 cycles successfully but there was a delay of one clock cycle when performing the parallel loading for the last 128 generated bits when taking the input to the MUX first. So, I have chosen the configuration shown above.

It is clear that I’m using the structural way for implementing the LFSRs. The behavioral one

can also work. But the synthesis shows additional gates like AND & OR gates. I need only a MUX at each bit and the structural way gives the most perfect results as I am targeting the basic building blocks on the FPGA.

Page 102: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 5 Implementation of Encryption Engine and PRNG

91 ICL

The incoming bits must fill each LFSR first before closing its own feedback switch and here rises a problem: how to design a feedback switch to be closed after 25 clock cycles for the LFSR1, 31 clock cycles for LFSR2, 33 clock cycles for LFSR3 and 39 clock cycles for LFSR4? Prof. HANI asked me this question when we performed the seminar by the end of April. I have designed the switch as shown in figure 5.2. This is the input of LFSR1, which has a Hamming weight of 25. An AND gate is added in the feedback path as shown to perform the AND operation on the feedback signal and a control signal FB25 coming from the finite state machine controller (FSM) of the encryption engine. If FB25 is high, the switch is closed and the incoming feedback signal will pass. If FB25 is low, the output of the AND gate is low (logic ‘0’) and going to the input of the XOR gate at the input of the LFSR to be XORed with IN25, which will pass to the first bit of the LFSR.

Figure 5.2: the feedback switch implementation I need 4 different FB signals, one to each LFSR as their feedback switches are to be closed in

a different time according to its length. The final design of the four LFSR is shown in figure 5.3. The feedback signals FB25, FB31, FB33 and FB39 have the default values of low at t = 0 and will go high at t = 25, 31, 33 and 39 respectively.

Figure 5.3: the four LFSRs as taken from Renoir

Page 103: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 5 Implementation of Encryption Engine and PRNG

92 ICL

5.3. THE INPUT SHIFT REGISTERS:

As said in chapter 2, the encryption engine’s inputs are four streams of bits to the four LFSRs.

These bits are the 128 bits encryption key coming from E3, the 48 bits Bluetooth device address and the 26 bits clock shuffled together to form four combination of bits: 49, 55, 49 and 55 bits with a total of 208 bits and here rises a problem: how can I perform this shuffling?

Let’s suppose that the Key, BD_ADDR and the CL26 are coming in serial to reduce the

number of pins. I need to store them first in a 128bits, 48 bits and 26 bits shift register respectively. After that, two ways can be used:

The first way is to use a finite state machine and an 8 bits counter while using a 3 x1

multiplexer at the input of each LFSR. This FSM will enable and disable shifting for the 3 shift register by enabling or disabling the clock for each of them at specified time instances according to the output of the counter, which is used to count the 208 bits. It will control the MUXs too to be opened for each of the inputs at the appropriate time. This method is complicated and slow.

The other method is to build 4 shift register A, B, C and D with 49, 55, 49 and 55 bits length respectively and perform a parallel load from the other 3 shift registers at predefined positions, which will do the required shuffling. This method is more simpler than the previous one but it consumes more area: it needs an additional 208 bits register: the total is 416 filp flops!!

Both ways are slow and consume area. I’ve used an alternate design idea, which will

guarantee fast operation and less area. This idea can be explained as follows: E0 will get the Key from E3. E3 produces it as a 128 bits parallel output and as my

design will be integrated with the authentication’s design by Sameh Assem, I’ll get the key from him directly as 128 parallel input. The Bluetooth device address is supposed to be stored some where in the device’s RAM as a word. Different blocks in the security system need it, including E0. We get it as a serial bit stream to be stored in a shift register shared between all the blocks and then E0 will get the 48 bits address as a parallel input. The CL26 is coming out of the Bluetooth counter as a parallel output too and E0 will get it as it is. I need, then, two registers of the length 49 and two of the length 55. A parallel loading will be performed when a control signal (Load_reg) comes from the FSM of E0 while shifting is disabled. After that, the control signal (Load_reg) will go low and shifting will start to the inputs of the four LFSRs. The parallel load will guide the incoming parallel inputs to their positions as specified in the Bluetooth standard; shuffling is done as shown below:

Length Loaded bits

1 49 ADR[2] CL[1] K'C[12] K'C[8] K'C[4] K'C[0] CL24

2 55 ADR[3] ADR[0] K'C13] K'C[9] K'C[5] K'C[1] CL[0]L 001

3 49 ADR[4] CL[2] K'C[14] K'C[10] K'C[6] K'C[2] CL25

4 55 ADR[5] ADR[1] K'C[15] K'C[11] K'C[7] K'C[3] CL[0]U 111

Table 5.1: input shift register.

Page 104: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 5 Implementation of Encryption Engine and PRNG

93 ICL

After all the bits being shifting in, I have three choices: 1. Insert a switch to disable the XOR gate at the input of the LFSR. A MUX should

be used too. 2. The switch may come before the XOR gate with the same idea as the feedback

switch. 3. Shift all zeros ‘0’ to the input. Both ways 1 and 2 will need more gates and extra control signals for the switches and MUXs

and these control signals must be time based. This means additional states in the finite stat machine FSM , which will lead to an additional D flip flop for each extra state (as will be explain later) and thus, more area. Here, I’m using the last idea. Shifting the control signal (Load_reg) with the incoming data will work as shifting a ‘0’ after all the bits being shifted in with just adding a wire from the control signal (Load_reg) to the last position of the shift registers. Shifting will not start until (Load_reg) goes to ‘0’.

NOTICE: This configuration has been changed for the purpose of testing only on the FPGA available in the ICL as will be shown in chapter 6.

5.4. THE SUMMATION COMBINER BLOCKS: The first block to be considered is the five inputs XOR gate, which produces the encryption

stream Z. This is not a clocked block but all its inputs are clocked and synchronization is almost guaranteed. Leonardo Spectrum shows that the synthesis will produce two levels of XORing. Spartan s10pc84 FPGA has a maximum of 4 inputs to any of its XOR gates and here comes a problem for the back annotation simulation that shows a delay of almost 15 ns due to glitching effect. Although, this is not a big problem, but it could be solved at the expense of more area using two levels of XOR; the first is two XOR gates and the second is an XOR gate that gets its inputs as the outputs of the other two. Glitching was found to be almost removed. Great thanks go to Prof. Al-Ghitany who taught us these idea two years ago

The second block is the first adder, which computes ty = 1

tx + 2tx + 3

tx + 4tx . A simple

mathematical addition with the aid of the unsigned library in VHDL, the operation can be performed. But, I thought of a general structure to be easily synthesized on any ASIC or FPGA target. The adder is shown in figure 5.4. This adder acts as mean of counting the 1’s in the set of

1tx , 2

tx , 3tx and 4

tx Glitching is not of great importance here because the output of this adder and the next one will pass through a D flip flop.

Figure 5.4: structural adder

Page 105: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 5 Implementation of Encryption Engine and PRNG

94 ICL

The output of this adder is a 3-bits one to be added to the 2 bits signal Ct. The result will, then,

pass through a divider that divides by two. figure 5.5 shows a block diagram that performs these functions.

Figure 5.5: the second adder with divide by 2

5.5. STORING THE OUTPUTS FOR PARALLEL LOADING: It was shown in chapter 2 that at t=240, a parallel loading will be performed to the four

LFSRs for the last generated 128 bits at a specified positions to achieve a complicated shuffling. Two ideas to find the solution for this problem:

1. Four shift registers have to be built for each LFSR (with the same lengths). The encryption

stream output is connected directly to their inputs. With control signals to enable and disable the clock of each of the sift registers, shifting can be done for the appropriate bits at the specified shuffling algorithm. At t=240, a parallel load from the shift registers to their corresponding LFSRs will be done.

We have found a team working on the encryption engine at Stanford University and they

uses this idea. I have tried it myself and found that there should be four individual control signals coming from the controller (FSM) at specified time clocks according to the following time table:

t 111 119 127 135 143 151 159 167 175 183 191 199 207 208 215 223 231 232 239

SR1 ☺ X X X ☺ X X X ☺ X X X ☺ X X X X X X SR2 X ☺ X X X ☺ X X X ☺ X X X ☺ X X X X X SR3 X X ☺ X X X ☺ X X X ☺ X X X ☺ X ☺ X X SR4 X X X ☺ X X X ☺ X X X ☺ X X X ☺ X ☺ X

(☺) means clock enable... (X) means disabling the clock of that register.

2. The second idea is to make a 128 bits shift register with enabling and disabling its clock only once. At t=240 a parallel load of the contents of this register will be performed on the four LFSRs at the specified positions with an appropriate port mapping.

Simulation shows both ideas work. But the first one is more complicated and requires extra

control states and signals in the FSM, which leads to an extra D flip flop for each new state. The

Page 106: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 5 Implementation of Encryption Engine and PRNG

95 ICL

second idea is the one I’m using in the final design. We don’t need even to disable or enable the clock for the128 bits shift register and this is indicated at the end of this chapter when we present the final block diagram.

5.6. THE FINITE STATE MACHINE: We need this controller to manage the operation of different blocks in the Encryption Engine,

which can be summarized as follows:

Opening and closing of the feedback switches for the LFSRs at different clocks. Resetting the blend registers when all switches are closed. Keeping the contents of the blend register and the end of the initialization step. Perform the parallel loading at the end of the initialization step.

It seems simple but it needs a very careful timing design. We need 7 control signals controlled

through the output of an 8 bits counter, which will be reset when an encryption command is issued. There is no need to stop the counter. Renoir can generate a state diagram from the VHDL code. The obtained state diagram is shown in figure 5.6.

The design of a finite state machine is a critical part in today’s digital design. There are many

configurations for implementing a single controller. I had to try many of these methodologies to find the most a appropriate one from the area and delay points of view, specially when using the first idea presented above. Leonardo spectrum was a good friend while synthesizing every configuration as well as every block in the design to get the smallest area. The final block diagram obtained from Renoir is shown in figure 5.7.

5.7. ENCRYPTION ENGINE SIMULATION: Simulation was performed according to “Appendix IV” of the Bluetooth specification,

Encryption Sample Data. All the four sets of sample data with 364 clock cycles for each were obtained successfully. Figure 5.8 shows the waveforms obtained from Modelsim compared to the expected results from the first set of sample data. The waveform is shown around t=240 when the parallel load is performed. Correct parallel load indicates correct preceding bits.

Another waveform was obtained for the fourth set of the sample data is shown in figure 5.9

together with expected results from the Bluetooth specification. For the purpose of simulation only, I have added a 9-bits counter to work in parallel with the

8-bits counter already exist in the design. This counter will show the clock number up to 364 which is the last clock in the sample data and thus helps me to trace all the clocks in all the sets.

A visual C++ program was also used to verify some additional samples and the results were

also O.K. A back annotation was also performed and will be discussed in chapter 6.

Page 107: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 5 Implementation of Encryption Engine and PRNG

96 ICL

(Q = "11101111")

s31

(Q = "00011110")

s00

FB1<='0';FB2<='0';FB3<='0';FB4<='0';Load <='0';En_CLK_C <= '0'; c<='0';

FB1 <='1';FB2<='1';FB3<='1';FB4<='0';Load <='0';En_CLK_C <= '1';c<='0';

FB1 <='1';FB2<='1';FB3<='1';FB4<='1';Load <='0';En_CLK_C <= '1';C <='1';

FB1 <='1';FB2<='1';FB3<='1';FB4<='1';Load <='0';En_CLK_C <= '1';C <='0';

s239 FB1 <='1';FB2<='1';FB3<='1';FB4<='1';Load <='0';En_CLK_C <= '0';c<='0';

s240 FB1 <='1';FB2<='1';FB3<='1';FB4<='1';Load <='1';En_CLK_C <= '0';c<='0';

s241 FB1 <='1';FB2<='1';FB3<='1';FB4<='1';Load <='0';En_CLK_C <= '1';c <= '0';

(Q = "11101110")

(Q = "11110000")

(INIT ='1')

Process Declarations

s0

FB1<='0';FB2<='0';FB3<='0';FB4<='0';Load <='0';En_CLK_C <= '0';c<='0';

s25FB1 <='1';FB2<='0';FB3<='0';FB4<='0';Load <='0';En_CLK_C <= '0';c<='0';

FB2 <='1';FB1<='1';FB3<='0';FB4<='0';Load <='0';En_CLK_C <= '0';c<='0';

(Q ="00000000")

(Q = "00011000")

(Q = "00100000")

Architecture DeclarationsGlobal Actions Concurrent Statements

s40

(Q = "00100111")

State Register Statements

s33

s39(Q = "00100110")

Figure 5.6: the FSM

Page 108: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 5 Implementation of Encryption Engine and PRNG

97 ICL

Figure 5.7a: E0 Block Diagram as entered in Renoir

Page 109: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 5 Implementation of Encryption Engine and PRNG

98 ICL

Figure 5.7b: A closer look

Page 110: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 5 Implementation of Encryption Engine and PRNG

99 ICL

Figure 5.7c: A closer look

Figure 5.7d: A closer look

Page 111: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 5 Implementation of Encryption Engine and PRNG

100 ICL

Page 112: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 5 Implementation of Encryption Engine and PRNG

101 ICL

Figure 5.8: selected waveforms and outputs of the first set of sample data

Page 113: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 5 Implementation of Encryption Engine and PRNG

102 ICL

Page 114: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 5 Implementation of Encryption Engine and PRNG

103 ICL

Figure 5.9: selected waveforms and outputs of the fourth set of sample data

Page 115: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 5 Implementation of Encryption Engine and PRNG

104 ICL

5.8. THE RANDOM NUMBERS GNERATOR: Random numbers are required in a wide variety of applications, including data encryption,

circuit testing and system simulation. In the past, the random number generation was mostly done by software. However, as digital systems become faster and denser, it is feasible, and frequently necessary, to implement the generator directly in hardware. Although the software-based methods are well understood, they frequently require complex arithmetic operations and thus are not feasible to be constructed in hardware. Software may look slower, too.

Ideally, the generated random numbers should be uncorrelated and satisfy any statistical test

for randomness. A generator can be either “truly random” or “pseudo random”. The former exhibits true randomness and the value of next number is unpredictable. The later only appears to be random. The sequence is actually based on specific mathematical algorithms and thus the pattern is repetitive and predictable. However, if the cycle period is very large, the sequence appears to be non-repetitive and random. Although it is possible to implement a true random number generator in hardware, it is slow and relatively expensive.

5.8.1. TRUE RANDOM NUMBER GENERATORS:

True randomness can be derived from certain physical phenomena, such as radioactive decay,

thermal noise in semiconductors, sound samples taken in a noisy environment, and even digitized images of a lava lamp. In electronic circuit, thermal noise is frequently used as the source of randomness because of its well-qualified spectral and statistical properties. A representative implementation is shown in figure 5.10. In this circuit, the source of the noise is the thermal noise of a precision resistor, which is represented as Vnoise. It is amplified by a low-noise amplifier and then passed to a high-speed comparator. The threshold of the comparator (Vref) corresponds to the mean voltage of the input noise signal. The output of the comparator is sampled and latched to a register. The latched signal is a one-bit binary signal that exhibits true randomness.

Figure 5.10: A True 1-bit Random Number Generator The true random number generator is fairly involved since it needs to preserve and amplify the

thermal noise, and at the same time shield all external disturbances. It consists of mainly analog components and cannot be implemented by pure digital circuitry. The mixed-signal implementation significantly increases the system complexity. This implementation is also relatively slow and cannot match the high-speed digital circuit. One major application of the true random number is to generate the initial seed for pseudo random number generator PRNG.

5.8.2. LFSR TO BE USED IN SINGLE BIT PRNG: A single bit random number generator produces a value of 0 or 1. The most efficient

implementation is to use an LFSR (Linear Feedback Shift Register). LFSRs are commonly used in

Page 116: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 5 Implementation of Encryption Engine and PRNG

105 ICL

applications where pseudo-random bit streams are required. LFSRs are the functional building blocks of circuits like the pseudo-random noise (PN) code generator and Gold code generators commonly used in Code Division Multiple Access (CDMA) systems. LFSRs will often satisfy this requirement, although the generated sequence is pseudo-random in nature. Pseudo-random patterns repeat over time; the longer the LFSR, however, the longer the sequence of random numbers before pattern repetition occurs.

5.8.2.1 LFSR Terminology:

LFSRs sequence through (2N –1) states, where N is the number of flip-flops in the LFSR. At

each clock edge, the contents of the flip-flops are shifted right by one position. There is a feedback path from predefined flip-flops to the leftmost flip-flop through an exclusive-NOR (XNOR) or an exclusive-OR (XOR) gate and this is one used by Bluetooth for the Encryption Engine, as we saw. A value of all "1’s" is illegal in the case of an XNOR feedback, and a value of all "0's" is illegal for XOR feedback. The illegal state causes the counter to remain in its present state, locking out any further new values from being registered.

LFSRs have several variables:

The number of stages in the shift register The number of taps in the feedback path The position of each tap in the shift register stage The initial starting condition of the shift register, often referred to as the FILL state

The shift register length is often referred to as the degree, and the longer the shift register, the

longer the duration of the PN sequence before it repeats. For a shift register of fixed length N, the number and duration of the sequences it can generate are determined by the number and position of taps used to generate the parity feedback bit.

The combination of taps and their location is often referred to as a polynomial, and expressed

as: P(x) = X7 + X3 + 1. Various conventions are used to map the polynomial terms to register stages in the shift register implementation. The convention used here is consistent with the convention used in the Bluetooth specification. In the polynomial P(x) = X7 + X3 + 1, the trailing "1" represents X 0, which is the output of the last stage of the shift register. X3 is the output of register stage 3 and X7 the output of the XOR. A few points to note about LFSRs and the polynomial used to describe them. (1) The last tap of the shift register is the leading "1" and is always used in the shift register feedback path. (2) The length of the shift register can be deduced from the exponent of the highest order term in the polynomial. (3) The highest order term of the polynomial is the signal connecting the final XOR output to the shift register input. It does not feed back into the parity calculation along with the other taps identified in the polynomial.

5.8.2.2 LFSR Implementation: There are two implementation styles of LFSRs, Galois implementation and Fibonacci

implementation. Galois Implementation: As shown in figure 5.11, the data flow is from left to right and the

feedback path is from right to left. The polynomial increments from left to right with X0 term (the "1" in the polynomial) as the first term. This is referred to as a Tap polynomial, as it indicates which taps are to be fed back from the shift register. Since the XOR gate is in the shift register path, the Galois implementation is also known as an in-line or modular type (M-type) LFSR.

Page 117: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 5 Implementation of Encryption Engine and PRNG

106 ICL

Figure 5.11: Galois Implementation

Fibonacci Implementation: In figure 5.12, the data flow is from left to right and the feedback path is from right to left, similar to the Galois implementation. However, the Fibonacci implementation polynomial decrements from left to right with X0 as the last term in the polynomial. This polynomial is referred to as a Reciprocal Tap polynomial and the feedback taps are incrementally annotated from right to left along the shift register. Since the XOR gate is in the feedback path, it is also known as an out-of-line or simple type (S-type) LFSR.

Figure 5.12: Fibonacci Implementation 5.8.2.3 Multiple-bit RNG using LFSR: Some applications require more accuracy and need more than a single-bit random number.

Since the numbers produced by a single-bit LFSR random number generator are uncorrelated, one way to obtain a multiple-bit random number is to accumulate several single-bit numbers. There are several techniques to achieve this and they are discussed in below.

Single-LFSR Method: The Single-LFSR method requires only one LFSR. It utilizes the

values stored in shift register to form a multiple-bit number. For example, if a 4-bit random number is needed, we can use the output of register of the 4-bit LFSR (i.e., X0, X1, X2 and X3). Although this implementation is simple, the generated random numbers are highly correlated and fail many statistical tests. This should not come to a surprise since a new random number keeps most bits from the old number and contains only 1-bit new information. To overcome the correlation problem, it is necessary to replace all bits in the random number rather than just one bit.

Single-LFSR with a Counter Method: This implementation consists of a single LFSR and a

counter. This method replaces new bits one at a time. A k-bit random number requires k shift operations of an LFSR to form a new number. As long as k is relatively prime to the period of the LFSR, the k-bit number will cycle through all possible states. In order to keep track the number of shift operation, an additional modulo-k counting circuit is required. The counter will generate a

Page 118: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 5 Implementation of Encryption Engine and PRNG

107 ICL

special enable signal that is asserted once every k clock cycles. The LFSR will operate as usual. However, its output is interpreted as valid only when the enable signal of the counter is asserted. The disadvantage of this approach, of course, is the operation speed. Clearly, the random number generator is slower by a factor k.

Parallel-LFSR method: Parallel-LFSR method is a straightforward extension of the previous

single-bit random number generator method. It utilizes k copies of identical one-bit generator hardware to generate k bits concurrently. The major advantage of this method is the operation speed, which is identical to that of a single-bit generator. However, this method also requires a large amount of hardware. First, k copies of LFSRs are required. Second, each LFSR must have a different seed in order to avoid correlation. Since logic cells of most FPGA chips does not contains both reset and preset inputs, additional initialization circuit is required. Recall that the original LFSR needs only few XOR gates. The initialization circuit may significantly increase the circuit complexity. Another concern of parallel LFSR method is the poor utilization of FPGA’s resource. FPGA devices are made of a collection of generic logic cells, which normally contain a programmable combinational circuit plus one or two registers. Since LFSRs require little combinational circuitry, logic cells are not fully utilized.

5.8.2.4 Maximal Length Sequences: A maximal length sequence for a shift register of length N is referred to as an m-sequence, and

is defined as: L = 2N-1 An eight-stage LFSR, for example, will have a set of m-sequences of length 255 and 128-stages leads to a length of about (340×1036). However, there are some techniques to make the LFSR counts all the 2N values.

Table 5.2 lists the appropriate taps for maximum-length LFSR with XNOR of up to 168 bits.

The same tapes yield to the XOR case. The basic description and the table for the first 40 bits was originally published in XCELL and reprinted in the 1993 and 1994 Xilinx Data Books. This information is based on unpublished research done by Wayne Stahnke while he was at Fairchild Semiconductor in 1970.

5.8.3 IMPLEMENTATION: According to the Bluetooth specifications as shown in chapter 2, there are 7 different random

numbers used in the security algorithms. However, these numbers are to be sent over air in public during Authentication between two devices; these numbers are no more secrets.

In one of Prof. Hani’s lectures, we have learned that in Bluetooth, the PN code has a life time

of 23 hours and 18 minutes. With a 3.2KHz BaseBand clock, we can calculate the length of the LFSR required for a single bit random number to be 28 stages. However, different blocks in the Authentication algorithm needs the 128 bits random number as a parallel inputs. This forces us to use a 128 bits shift register to store the number first. Actually, this leads to a much shorter period besides the extra hardware and the slow operation; we have to wait for 128 clock cycles for the random number to be stored and then perform the parallel input.

I have found that we should use a 128-stages LFSR as our random number generator. In

addition to the 128 parallel outputs, there is a single bit output to get the random number serially and send it outside our chip. When the Bluetooth starts working, say at the morning, a reset signal will force the LFSR to be all zeros. Whenever we need a random number, a command will be issued to make the parallel loading and starts counting 128 serially outputs at the same time. We have built some controlling blocks to enable or disable the operation of the random number generator according to the situation requires such things. Would that be random? Yes. But, if you

Page 119: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 5 Implementation of Encryption Engine and PRNG

108 ICL

need two consecutive random numbers, it'll be predictable. We have found it impossible to have a situation requires two consecutive random numbers. It is also possible to fill this LFSR with a serial number, which is actually the random number coming from the other Bluetooth device. This will lead to more randomness.

As a future work, we may make it possible to load our random number generator by a parallel

input and not just by all zeros. This number may come from a higher level like the Link Manager for example which is a software. Multiplexers are required at each bit.

Table 5.2: Taps for Maximum-Length LFSR

Page 120: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 5 Implementation of Encryption Engine and PRNG

109 ICL

OPUFT!

Page 121: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

ICL

I

The Bluetooth

Security Core

n this chapter, we present the top level of our design. A core capable of performing different tasks required for Bluetooth Security. It can generate different keys required. It

can perform the authentication process producing both SRES and ACO. It encrypts data as well. Besides all these functions, it has its own pseudo-random number generator to produce the 128 bits random numbers required. In section 6.1, we’ll talk about the design of the core and how we achieved these different functions. Section 6.2 is about the simulation of the core. Section 6.3 will cover the ASIC flow done on the core. Finally section 6.4 will cover some back annotation and FPGA tests. 6.1. THE DESIGN OF THE BLUETOOTH SECURITY CORE: The proposed core here is capable of doing some of the link manager functions stated in the Bluetooth specifications. We thought of designing it to be able to deliver a single chip. Combining the E0, E1, E21, E22 and E3 was not that easy task in the absence of a RAM. Every algorithm depends on the output of other algorithms. For example, E0 gets it Kc from E3. E3 gets its COF from E1’s ACO or from the BD_ADDR. Again E1 gets its key from either E21 or E22. All of the algorithms needs a 128 bits random number. And some of the algorithms inputs or outputs have to be send to the other user.

These were the challenges we made:

How to guide the output from one algorithm to another. How to store the outputs for future use in another algorithms. How to interface the external world with such a huge number of pins. How to determine the used random number and send it via air to other Bluetooth

devices. How to get external keys, or external random numbers obtained from other devices. How to get the outputs out of the core to be sent to other devices.

To solve these problem we needed:

The idea of tiny changes cumulated over many steps is an immensely powerful idea, capable of explaining an enormous range of things that would be otherwise inexplicable. R. DAWKINS, The Blind Watchmaker

Page 122: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

ICL

A complicated controller designed as a FSM but was written in VHDL. We believe this controller is doing some of the link manager functions. It took 3 days of continuous work.

Some shift registers to convert from serial to parallel and vice versa and so we were able to reduce the number of pins greatly.

Some multiplexers and registers.

Page 123: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 6 The Bluetooth Security Core

111 ICL

The pins of the core and their functions are shown in table 6.1.

Pin Name Function clk To enter the external clock rst To reset the core data_in To enter the data to be encrypted data_out The output of the encrypted data cl26 (26 pins) The 26 bits CL26 input to the encryption engine E0

e_select (3 pins) To select among various algorithms: “000” : E21 “001” : E22 “010” : E1 “011” : E3 when key used is a master key “100” : E3 when key used is any other key “101” : E0 start “110” : E0 stop “111” : to be selected after any algorithm.

Note that: there are still the upper link manager that determines the sequence of algorithms to be done after each other. And thus it is responsible for putting the values of these pins.

load_adr To start entering the addresses serially set this pin ‘high’ and when finished set it ‘low’ adda_pin An address serial input or pin serial input as well. addb Another address serial input. load_key To start entering an external key serially put it ‘1’ else put it ‘0’ key_in The input of the serial key key_out The output of the serial key en_fb Enables the feedback of the pseudo-random generator. Thus when it is ‘0’, no

feedback is present and its working as a shift register to have an external random number entered serially.

rand_in The input of the external random number. rand_out To output the random number used serially rand_start To indicate the start of the random number used out_start To indicate the start of the output key or SRES sres_out To output the SRES serially

Table 6.1: The pins of the core and their functions.

Now, let’s see some of the functions of the internal controller of the core: According to the value of e_select entered, it issues some outputs to select the

algorithm required as well as opening the correct pass. It stops all the operations as long as en_fb =’0’ or load_adr =’1’ or load_key=’1’. All

this conditions means that there’s an external input being added serially by the upper layer.

It produces the required signals for the start and stop of operations of E0 Together with another block written in VHDL called RNG control, it controls the

random number as required and stops it whenever an algorithm is running or whenever a random number is being added.

It issues ‘start_algo’ required by the algorithm block. It issues outputs that tells the upper layer controller to start saving the output.

Page 124: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 6 The Bluetooth Security Core

112 ICL

The final design is shown in figure 6.1.

load_key key_in

algo_out(4 7:40) : (127:0)a lgo_out(39:32) : (127:0)a lgo_o ut(31:24) : (127:0)a lgo_out(23:16) : (127:0)a lgo_out(15:8) : (127:0)a lgo_out(7:0) : (127:0)

cl26 : (25:0)

Load_reg

data_in

data_out

KEY_REG : (128:1)

SAVE_KEY

CLK

D

clkout_startCLK

CLK_ REG128

clk

key_outKEY_OUT_SR

CLK

KEY_OUT

KEY_READY

Q

KEY_REG : (128:1)

clk

RNGRAND : (128:1)

EN_FB

REG_128GET_KEY : (128:1)

KEY_REG : (128:1)

KEY_IN

en_fb

RAND_IN

RST

RAND_OUT

rst

KEY_REG : (128:1)

Q

ClK

clk

ClK_RNGRAND : (128:1)

rst

algo_out : (127:0)

a lgo_out : (127:0)

rstk ey : (127:0)

Rand : (127:0)

key_change

rst

key_change

RNGcontrol

RST

algo_out(127 :96) : (127:0)

EN

rst

start_algo

algorithme_ty pe : (1 :0)

clk

end_algo

start_algo

out_start

clk SRES_SRCLK

SRES_IN : (32:1)

SRES_OUT

SRES_READY

clk

ADDR_SR ADDR_SRe_select : (2:0)

controller

E_type : (1 :0)

RST

start_algo

E_type : (1 :0)

start_algo

ACO _REG : (48:1)

ACO _REG (48:1) : (96:1)

ACO _ADD R : (48:1)

ADDR _R EG : (4 8:1)

C1

ACO _ADD R : (48:1)

ACO _REG : (48:1)ADDR _R EG : (48:1)

C1

ad_PIN_COFu : (47:0) bdadb_CO Fl : (47:0)

ad_PIN_COFu : (4 7:0)bda db_CO Fl : (47:0)

C1

ACO _REG (96:49 ) : (96:1)

REG96 SAVE_ACO

algo_o ut(95:0) : (127:0)

ACO _REG : (96:1)

ADDR _R EG : (48:1)ADDR _R EG : (48:1)

B1 : (48:1)ADDR _R EG : (48:1)

C1

CLK_Enable

E_selec t : (2 :0)

en_fb

end_algo

INIT

load_adr

load_key

RAND_start

INIT

CLK_Enable

C1

out_start

load_adr

load_key

en_fb

addA_PIN_addB

clk load_adrclk

addA_PIN_addBCLK CLK

CLK_ADDR_REG OUTPUT algo_o ut : (127:0)

a lgo_out(127:120) : (127:0)a lg o_out(119:112) : (127:0)a lgo_out(111:104) : (127:0)a lgo_out(103:96) : (127:0)a lgo_out(95:88) : (127:0 )a lgo_out(87:80) : (127 :0)a lgo_out(79:72) : (1 27:0)a lgo_out(71:64) : (127:0)a lgo_out(63:56 ) : (127:0)a lgo_out(55:48) : (127:0)

DATA

out_start

out_start

rand_out

sres_out

rand_in

en_fb

rand_start

CLK

Load_reg

clk

E0

ADDR _R EG : (48:1)

ADDR _R EG(48:41) : (48:1)ADDR _R EG(40:33) : (48:1)ADDR _R EG(32:25) : (48:1)ADDR _R EG(24:17) : (48:1)ADDR _R EG(16:9) : (48:1)ADDR _R EG(8:1) : (48:1)

CLKclk c l26(25) : (25:0)c l26(24) : (25:0)c l26(23:16) : (25:0)c l26(15:8 ) : (25:0)c l26(7 :4) : (25:0)c l26(3:0) : (25:0)

adda_pin

load_adr

addb

Figure 6.1: The Bluetooth Security core as entered in Renoir.

The multiplexers shown in figure 6.1 are used to select COF equals to ACO or obtained from the concatenation of BD_ADDR when the used key is a mater one. It is also one of the functions of the controller to do it.

Also the and gates shown are used to disable the clocks on the shift registers whenever serial

input has finished.

Page 125: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 6 The Bluetooth Security Core

113 ICL

The core is now ready to be used with an upper layer controller to correctly input the data at correct times. And correctly set the sequences of algorithms used as needed in different security levels. 6.2. THE SIMULATION OF THE CORE:

Due to having the pseudo-random generator built in the core, the simulation was a very complicated task as the random number used is not known. But thanks to the rand_start output the random number can be known. Another problem appeared which is the impossibility of simulation of core when doing the E0 tasks due to not knowing the outputs to be taken. That would have been the case for the other algorithms except for the help of the MATLAB code written. So let’s try a sequence and test the core. First we have to generate an initialization key. We start by resetting the whole core. And setting the e_select = ‘001’; the code for E22. also we set load_adr = ‘1’ which means we’ll enter an address and a PIN code serially. But we’ll start this after some time to have a good random number generated. Remember that en_fb should be = ‘1’ for the RNG to operate. Load_key =’0’ cause we don’t have an external key. After running the simulation for sometime , we have to enter the address and the PIN code serially. And after 48 clock cycles setting the load_adr=’0’. To add this data serially we’ll force a clock on both adda_pin and addb pins with double the clock period one starting with a rising edge and the other with a falling one. This would make PIN=’AAAAAAAAAAAA’ and address=’555555555555’. From ModelSim we see the random number used to be =’ ٠٠ F4CC003B000003000000000007E000’. Taking these values and running the MATLAB program we get: Kinit= ‘٨٤FC3B8C9F1AE0A237F78D4E51685337’ Now let’s set e_select=’111’ and continue running till we get out_start=’1’ and take the value of K produced. Kinit as produced by ModelSim =‘٨٤FC3B8C9F1AE0A237F78D4E51685337’ which is the same as MATLAB. See figure 6.2.

Figure 6.2: Simulation results of initialization key generation

Now , let’s perform an authentication function. The current link key is Kinit and we have a new random number and the address we’ll take the value already entered. So let’s set e_select=’010’ and

Page 126: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 6 The Bluetooth Security Core

114 ICL

then ‘111’ after about 5 clocks. And run the simulation. Notice that en_fb=’1’ , load_adr=’0’ and load_key=’0’ thus the algorithm will run as soon as e_select =’010’.

From ModelSim, we see that the random number used is

‘٣D33000EC00000C00000000001F80001’ whereas the BD_ADDR=’ AAAAAAAAAAAA’ and from initialization we know that the current key =’٨٤FC3B8C9F1AE0A237F78D4E51685337’

So taking these values to MATLAB, we get the output equals to

‘E4E3F1F353C5CB21C8AC8C5DB6A45912’ Whereas in ModelSim we get:

Algo_out=’ E4E3F1F353C5CB21C8AC8C5DB6A45912’ which is again the same. See figure 6.3.

Figure 6.3: Simulation of the authentication process

Now let’s run E3, this is done by setting e_select =’100’ then ‘111’ after about 5 cycles. COF is already known to be equal to ‘53C5CB21C8AC8C5DB6A45912’ whereas the current link key is still ‘٨٤FC3B8C9F1AE0A237F78D4E51685337’ the random number as seen from ModelSim is equal to ‘٠٠١٨٠٠٠٠٠٠٠٠٠٠٣F000030FFF0003C00’

When these values were entered to the MATLAB program, Kc was found to be equal to

‘EE32029B7027864E43A223B430FD240C’. ModelSim gives ‘EE32029B7027864E43A223B430FD240C’ which is again the same. See

figure 6.4.

Figure 6.4 : Simulation of encryption key generation

Page 127: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 6 The Bluetooth Security Core

115 ICL

This is enough to clarify that the core is working alright from the simulation point of view. 6.3. THE ASIC FLOW:

The final core was targeted on AMS 0.6 µ technology. This is a double metal single poly technology.

The selection of technology started from the very beginning during synthesis using

Leonardo. Leonardo produced an EDIF file containing the structure of our core.

After that the EDIF file was converted to an EDDM file by the use of Eniread. EDDM is the format supported be Mentor Graphics tools.

Schematic Generator was then used to create schematic sheets for different parts and

hierarchy levels of the design.

Design Architect was then started to create a top level sheet with the input pads, output pads and power pads.

A design viewpoint suitable for automatic placement and routing was then generated by

DVE.

This viewpoint was then taken to IC station. There we first created a new sheet. We inserted an automatic floorplan. Then we used auto placement of cells, ports and corner pads. To our amazement, the pads were not placed. That was our problem for more than 10 days. No reason were found for such thing. And no instructor was able to explain it as well. When we placed the pads by ourselves rather than autoplacing them, auto routing was impossible. We tried our best to overcome this problem but we failed.

Anyhow, pads are not that important cause we know that the core we designed is just a

part of the Bluetooth core and not used by itself. Thus we worked on the core alone and performed auto routing for it.

After that DRC was done and it was successful.

LVS was a failure due to the problem of the ports.

Parasitic extraction was performed and also backannotation but again Quicksim didn’t

run for an error in the names of VDD and GND.

We hope we could solve these problems in the near future when we can find a free instructor or a help.

The final area of the core was found to be 12.04 mm2 whereas the area with the pad

frame is 18.65 mm2.

See figures 6.5, 6.6 and 6.7. of the obtained core.

Page 128: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 6 The Bluetooth Security Core

116 ICL

Figure 6.5: The final obtained core.

Figure 6.6: A zoomed section in the core.

Figure 6.7: Pads and cells added before routing.

Page 129: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 6 The Bluetooth Security Core

117 ICL

6.4. BACK ANNOTATION AND FPGA TESTS:

FPGAs are an efficient hardware target when only small series are needed, or for rapid prototyping. The FPGAs are complex enough to implement more than glue logic, including complex designs up to several thousands gates. FPGAs are making major in-roads into electronic systems. The technical advances in FPGA architectures and silicon technologies are making FPGAs a premier choice for system integrators. As the logic capacity of FPGAs increases, synthesis for FPGAs is becoming more important. To efficiently exploit increased logic capacity of FPGAs, synthesis tools and efficient synthesis methods for FPGAs targeting become necessary. Several synthesis tools exist for mapping these descriptions to various FPGA families. Using a synthesis-based approach, retargeting a design to other technologies becomes possible at little extra cost. When using FPGAs for rapid prototyping, synthesis can be targeted at FPGAs to exercise it for verification purposes, and later an ASIC implementation can be derived. As with any new technology, EDA tools and design methodologies need to be developed to allow designers to take maximum advantage of these new technologies. Exemplar Logic specializes in the development of advanced FPGAs with its synthesis tool LeonardoSpectrum. Exemplar has developed a new technology called TimeCloser which tightly ties the RTL synthesis process to the physical world of Place and Route. This new technology is allowing designers to create high-speed, high-capacity production circuits that previously needed ASIC technology for implementation.

Figure 6.8: TimeCloser Physical Design Flow

While ideally, the synthesizable VHDL model should be the same for all target

technologies, the efficiency of the resulting design is very much dependent on the description and technology used. In order to generate efficient FPGA designs, the HDL description style has to be adapted to the requirements of FPGA architecture. Unlike ASIC targets, FPGAs offer a fixed set of resources which can be used to generate an efficient design. This requires that the HDL source code of a design be adapted to exploit the available resources. Since the structure of synthesized logic is inferred from the description style, the source-level coding has to be adapted to yield optimal designs. Generating optimized designs requires a good understanding of the FPGA target. In addition, the code has to consider the description styles supported by a particular synthesis tool, while the specific hardware description language used is only of minor importance.

Due to their architecture, optimization problems found in ASIC designs may be

amalgamated, heightened or outright reversed for FPGA designs. In this respect, especially LUT-based architectures (such as the Xilinx devices) are different due to their coarse-grained

Page 130: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 6 The Bluetooth Security Core

118 ICL

architecture, while finer-grained architectures behave more like ASICs. Designing with FPGAs, one of the major differences is that logic functions of the same size cannot be traded: there is a given number of every resource, and whether it is used or not will not change chip size. On the other hand, trading a ‘cheaper’ (less complex) cell for a more ‘expensive’ (more complex) one can actually improve the device budget, if there is an ample amount of the more expensive resource available.

In our design, we have demonstrated how VHDL circuits can be optimized for FPGA targets

by adapting descriptions styles to the available resources, such as flip-flops, three-state buffers and others. This affects coding styles for many basic design blocks, such as storage elements, multiplexers and finite state machines. We were targeting Xilinx FPGAs und using Xilinx Alliance Series for performing place and route and timing analysis and then returning to ModelSim for back annotation.

Figure 6.9: Flow unique to Alliance and Leonardo Spectrum

The overall cycle can be summarized as follows:

Figure 6.10: The designer life cycle

Page 131: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 6 The Bluetooth Security Core

119 ICL

FPGA TEST:

Unfortunately, there were none a single block small in size enough to be tested on the available FPGA chip in the IC; the Spartan S10PC84. Targeting a larger FPGA form Xilinx family was the solution to perform the back annotation with delays on different blocks of the Bluetooth security chip. For the Encryption Engine, it gives correct timing results for all sets of sample data. However, modifications could be made for the Encryption Engine to be tested on S10PC84.

The major problem with the Encryption Engine was the 208 bits input signals and their

corresponding registers. Only 61 input/output and 392 CLB flip flop were available in the Spartan S10PC84 (the other 256 flip flop cannot be reduced). Although, we could make the back annotation and timing simulation using ModelSim and the simprim library on a larger FPGA and find the results correct exactly for all the sample data comes with the Bluetooth specification, we thought of some clever way to download the Encryption Engine on the S10PC84 and test it experimentally with two choices:

1. We can remove the input shift registers and get a serial input directly to the LFSRs. This

idea was hard to realize due to difficulties in synchronizing the inputs with the clock. 2. Replacing the input registers with smaller ones.

We have noticed that the first and the second sample data are all zeros except for the first three or four bits. And thus, we can replace the large shift registers with smaller ones with the size of 4 bits only. Parallel loading will be performed for the first 4 bits. Then, shifting ‘0’ when all the four bits being shifted into the LFSR. This idea has three advantages: (I) reducing the number of required CLB flip flop from208 to 16, (II) reducing the inputs by the same amount, too, and (III) keeping synchronization between the clock and the shifting procedure. The only disadvantage was the limitation to the first and the second sets of sample data only.

In what to follows, we are presenting some results from Leonardo Spectrum, Xilinx

Alliance series and the experimental tests.

Leonardo Spectrum Summary Report

*********************************************** Device Utilization for S10PC84 *********************************************** Resource Used Avail Utilization ----------------------------------------------- IOs 32 61 52.46% FG Function Generators 233 392 59.44% H Function Generators 7 196 3.57% CLB Flip Flops 294 392 75.00% ----------------------------------------------- Using wire table: s10-3_avg Clock Frequency Report Clock : Frequency ------------------------------------ CLK : 55.3 MHz

Page 132: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 6 The Bluetooth Security Core

120 ICL

From the Place and Rout report of Xilinx Alliance

From the Post Layout Timing Report of Xilinx Alliance

Figure 6.11:The Floor Planning and the obtained figure from FPGA Editor tool of Xilinx Alliance

Device utilization summary: Number of External IOBs 31 out of 61 50% Flops: 0 Latches: 0 Number of Global Buffer IOBs 1 out of 8 12% Flops: 0 Latches: 0 Number of CLBs 190 out of 196 96% Total CLB Flops: 294 out of 392 75% 4 input LUTs: 238 out of 392 60% 3 input LUTs: 71 out of 196 36% Number of PRI-CLKs 1 out of 4 25% The Delay Summary Report The Number of signals not completely routed for this design is: 0 The Average Connection Delay for this design is: 3.125 ns The Maximum Pin Delay is: 17.316 ns The Average Connection Delay on the 10 Worst Nets is: 9.417 ns

Timing summary: --------------- Timing errors: 0 Score: 0 Constraints cover 1279 paths, 410 nets, and 1138 connections (100.0% coverage) Design statistics: Minimum period: 37.875ns (Maximum frequency: 26.403MHz) Maximum net delay: 17.316ns

Page 133: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 6 The Bluetooth Security Core

121 ICL

The Encryption Engine in its modified version was ready to be tested by the end of June. Unfortunately, we couldn’t find any free instructor to perform the test under his supervising until the 3rd of July.

So many thanks go to Prof. Hani for allowing us to use the advanced digital oscilloscope in

this test. The tests were performed on the XS40 V1.4 board for Spartan S10PC84, which is shown in figure 6.12. We have taken some photos for the setup and the obtained waveforms from the oscilloscope. Figure 6.13 shows a sample of them.

Figure 6.12: Arrangement of components on the XS40 Board.

Figure 6.13: The setup and the obtained waveforms.

Page 134: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 6 The Bluetooth Security Core

122 ICL

As we have so many controllers in our core and their time performances are critical for a functional operation for the whole design, we could connect all the controllers together for the purpose of back annotation and testing. The success of the back annotation and timing analysis of these controllers gives a good indication about the other blocks in the design which cannot be targeted on the available FPGA. The overall block diagram of the controllers is shown in figure 6.14. A back annotation and testing were successfully done.

E0control

algo_control

controller

safer_control

round_control

Figure 6.14: The block diagram of all the controllers as entered in Renoir

Page 135: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 6 The Bluetooth Security Core

123 ICL

For the estimation of the number of gates in our design, we used Leonardo Spectrum to perform synthesis targeted on a larger FPGA like Virtex v1000fg680. The obtained output report is shown below.

Those were all the tests we did on our Bluetooth Security core. We tried our best within the

available time and available help from instructors. We have more ideas yet in our minds. We hope we can fulfill them in the future. For example

we wish we could join our core with the cores of other projects, we wish we could do more tests and we wish we would make the higher level controller either in hardware or software.

Number of ports : 45 Number of nets : 776 Number of instances : 63 Number of references to this view : 0 Total accumulated area : Number of Dffs or Latches : 2553 Number of Function Generators : 4948 Number of MUX CARRYs : 1006 Number of MUXF5 : 367 Number of MUXF6 : 110 Number of gates : 4752 *********************************************** Device Utilization for v1000fg680 *********************************************** Resource Used Avail Utilization ----------------------------------------------- IOs 45 512 8.79% Function Generators 4948 24576 20.13% CLB Slices 2474 12288 20.13% Dffs or Latches 2553 24576 10.39% ----------------------------------------------- Using wire table: xcv1000-6_avg Clock Frequency Report Clock : Frequency

------------------------------------ clk : 66.2 MHz

Page 136: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Chapter 6 The Bluetooth Security Core

124 ICL

OPUFT!

Page 137: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

125 ICL

References: 1. "The Bluetooth Core Specifications, v.1.1" available at

http://www.bluetooth.com/developer/specification/Bluetooth_11_Specifications_Book.pdf 2. "The Bluetooth Profile Specifications, v.1.1" available at

http://www.bluetooth.com/developer/specification/Bluetooth_11_Profiles_Book.pdf 3. BLUETOOTH BASEBAND Joakim Persson Ericsson Mobile Communications AB 1999 -

12 -07 http://www.nii.org.tw/3C/pdf/BluetoothBaseband.pdf 4. Dasgupta, Korak. "Bluetooth Protocol and Security Architecture Review." Online.

Available. September 2000. http://www.cs.utk.edu/~dasgupta/bluetooth/ 5. Bluetooth Security, Juha T. Vainio, Department of Computer Science and Engineering,

Helsinki University of Technology, http://www.niksula.cs.hut.fi/~jiitv/bluesec.html 6. Muller T. "Bluetooth Security Architecture", July 07, 1999. Online. Available.

http://www.bluetooth.com/developer/download/download.asp?doc=174 7. Bluetooth developers address security

http://www.computingsa.co.za/2000/08/21/News/new15.htm 8. InfoTooth Knowledge Base http://palowireless.com/infotooth/knowbase.asp 9. Träskbäck M, Security in Bluetooth: An overview of Bluetooth security, 2000 11-2

http://www.cs.hut.fi/Opinnot/Tik-86.174/Bluetooth_Security.pdf 10. Ullgren T. Security in Bluetooth: Key management in Bluetooth

http://www.cs.hut.fi/Opinnot/Tik-86.174/sectopics.html 11. Sand K., Bluetooth, 1999-03-04 [referred 2000-03-13]

http://www.tcm.hut.fi/Opinnot/Tik111.550/1999/Esitelmat/Bluetooth/bluetooth.html 12. Saarinen M-J, A Software Implementation of the BlueTooth Encryption Algorithm

E0, 2000-03-08 [referred 2000-03-15] http://www.jyu.fi/~mjos/e0.c 13. Massey and Rueppel . Schneier B., Applied Cryptography, 2nd Ed., John Wiley &

Sons Inc., 1996 14. Motorola Bluetooth vision: http://www.motorola.com/bluetooth/vision/index.html 15. Bluetooth solutions from Philips”

http://www.semiconductors.philips.com/technology/bluetooth/systems 16. http://ieeexplore.ieee.org/lpdocs/epic03/VSearch.htm 17. http://www.palowireless.com/ 18. http://www.bluetooth.com/ 19. “Bluetooth Bluetooth Architecture Overview” by James Kardach Principle Engineer

Bluetooth SIG Program Manager. Intel Corporation Intel Corporation. 20. A list of 20 Bluetooth company sites, with access to each site. The sites are ranked by “hits”

per day. http://www.topsitelists.com/bestsites/bluetooth/ 21. “Report on the Bluetooth technology” by Kjell Hedstrom Erling Hedkvist Samuel

Henriksson 22. “Nomination of SAFER+ as Candidate Algorithm for the Advanced Encryption Standard

AES” submission document of 12 June 1998 23. “WHITE PAPER BLUETOOTH AND BLUETOOTH INTERNET ACCESS POINTS”

form http://www.picocommunications.com 24. “HandBook of applied cryptography” by A. Manezes, P. van Oorschot and S. Vanstone ,

CRC press, 1996 www.carc.math.uwaterloo.ca/hac 25. “Modern Data Encryption” by Digital Interactive Solutions Limited 26. “Answers to Frequently Asked Questions About Today’s Cryptography” RSA DATA

SECURITY, INC. 27. X. Lai, R.A. Rueppel, and J. Woollven. A fast cryptographic checksum algorithm based on

stream ciphers. In Advances in Cryptology — Auscrypt ’92, Springer-Verlag, 1992.

Page 138: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

References

126 ICL

28. R.A. Rueppel. Stream ciphers. In Contemporary Cryptology — The Science of Information

Integrity. IEEE Press, 1992. 29. Miia Hermelin: “Cryptographic properties of Bluetooth Combination Generator” Master’s

Thesis 30. “An Introduction to Cryptography” 1990-1999 Network Associates, Inc. 31. Pong P. Chu and Robert E. Jones “Design Techniques of FPGA Based Random Number

Generator” Department of Electrical and Computer Engineering, Cleveland State University, Cleveland, Ohio 44115 NASA Glen Research Center, Cleveland, Ohio 44112

32. P. Alfke, “Efficient Shift Registers, LFSR Counters, and Long Pseudo-Random Sequence Generators,” Xilinx Application Note, 1995. http://university.xilinx.com/xapp/xapp052.pdf

33. Xilinx official web site http://www.xilinx.com/products/spartan2/index.htm http://www.xilinx.com/products/logicore/alliance/tblpart.htm 34. “Spartan and Spartan-XL Series Data Sheet v1.5, 3/00”

http://support.xilinx.com/partinfo/ds060.pdf 35. “LFSRs as Functional Blocks in Wireless Applications” http://university.xilinx.com/xapp/xapp220.pdf 36. “XS40, XSP Board V1.4 User Manual” http://www.xess.com/FPGA/homepage.html 37. Michael Gschwind, Valentina Salapura “A VHDL Design Methodology for FPGAs” 38. Synopsys. Finite State Machines–Application Note. Synopsys, Inc., Mountain View, CA,

April 1995. 39. “ Introduction to ASICs” http://www-ee.eng.hawaii.edu/~msmith/ASICs/HTML/ASICs.htm 40. A Mentor Graphics Seminar on FPGA with Leonardo Spectrum by Michael A. Bohm,

Chief Scientist for Exemplar Logic http://seminar.techonline.com/mentorg4/live.rpm 41. Sudhakar Yalamanchili “VHDL Starter’s Guide:Simulation & Synthesis” Georgia Institute

of Technology

Page 139: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

A - 1 ICL

APPENDIX A

DESIGN TIME TABLE We knew the outlines of our project so early from the beginning of the year. Outlines such as VLSI design and Bluetooth Baseband components. However, we chose to work on the SECURITY blocks just at the beginning of the second term. That’s why all the first term we were just practicing VHDL and tools to be used specially FPGA advantage of Mentor Graphics and the workstations available in the ICL. Two reports were done in this period one about a pipelined adder and the second one about a multiplexer. At the same period we were reading about the Bluetooth Baseband blocks and attending the Doctor’s lectures about Bluetooth and different communications aspect. Other lectures were attended also about VHDL, State diagrams and UNIX. This continued until we were interrupted by the term exams. After choosing to work in the Bluetooth Security issue, we started searching on the internet to find any implementation for this part. we realized that the work would be too much for each of us to do it alone. So we both agreed to divide the work among us. Ahmed chose to work on Encryption and Sameh chose to work on authentication. Unfortunately, we didn’t find any previous implementation available for our parts. We found no VHDL implementation for it on the net. So we both decided to start the design from the completely nothing. This means that February was a month of net searching and understanding the Bluetooth Specifications and putting the outlines for our design. March was the month when we started designing. More net searches were required to understand the deep details of every block in our designs. As an example the SAFER+ algorithm was not that clear in the specifications and massive searching was required. The small blocks used in the design were created in this month. Working hours were too small cause of the loads of studying that’s why the improvements were so little.

Most of the Encryption Engine blocks were almost done by the end of the month. Whereas both Ar and Ar’ were completely designed by the middle of April. It was time then for simulation. By the end of April, everything seemed to work perfect. The main problem started to appear at this time which is the huge area used by the authentication part which inhibited us from being able to perform back annotation or FPGA implementation of the whole design. It was by that time when Prof. Hani suggested that we’ll target the design on ASIC as long as our project has a part to be implemented on FPGA meaning the encryption part. This was agreed upon when we held a seminar about our work at the end of April. After the seminar, working on the project was stopped for the sake of the final exams. But as soon as the exams finished we started working on the design again. Sameh started by designing the interface to use Ar and Ar’ in all the algorithms required E1, E21, E22 and E3. Also, some new ideas appeared for the Encryption Engine to minimize both area and time This took about 3 days.

After that Sameh started to write a test bench for his part and a MATLAB code to write the test vectors. That was successful after two days of continuous work. All the sample data in the Bluetooth specifications were successfully obtained for both parts; Authentication and Encryption. In addition, a Visual C++ program was intended for more verification on the Encryption Engine proper operation and, also, the results were completely successful.

Page 140: VLSI Design and Implementation of ASICs for the Security ...samehibrahim.tripod.com/sitebuildercontent/sitebuilderfiles/... · generation of a combination key: 26 2.2.7. generating

Appendix A Design Time Table

A - 2 ICL

After that we both worked together in the design of higher level controller to choose among all the available algorithms for Authentication, Key generation and Encryption. Together with adding the random umber generator and shift registers to decrease the number of pins in the ASIC implementation by entering the data serially. After finishing this part we started the work on the ASIC layout and FPGA implementation of Encryption engine. And finally came the time for the report. Our time table is summarized below:

Time Activity

First term

Learning VHDL Attending project lectures Practicing the tools 2 reports about a multiplexer and a pipelined adder Reading about Bluetooth

February Choosing the topics Division of work among us Net search

March Net search about SAFER+, Encryption Engine and the RNG Start of design

Middle of April Basic design blocks finished Start of simulation

End of April A seminar about the progress of most of the final design. ASIC targeting

Middle of June Interface for Ar and Ar’ to work as E1, E21, E22 and E3 Improvement on the Encryption Engine and the RNG Test bench and MATLAB code

End of June

Working on the overall chip ASIC layout FPGA test of the Encryption Engine Some changes on the design

July Final modifications on the design Writing the report