[ieee 2010 21st ieee international symposium on rapid system prototyping (rsp) - fairfax, va, usa...

7
Counter Embedded Memory Architecture for Trusted Computing Platform Gavin Xiaoxu Yao, Ray C.C. Cheung, Kim Fung Man Department of Electronic Engineering City University of Hong Kong Tat Chee Avenue, Kowloon, Hong Kong SAR [email protected],{r.cheung, eekman}@cityu.edu.hk Abstract Due to various hacker attacks, trusted computing plat- form has received a lot of attentions recently. Encryp- tion is introduced to maintain the confidentiality of data stored on such platform, while Message Authentication Codes (MACs) and authentication trees are employed to verify the data memory integrity. These encryption and authentication architectures suffer from several potential vulnerabilities which have been omitted by the previous work. In this paper, we first address our concern about a type of cryptanalysis; a ciphertext stored on memory can be decrypted and attacked by an adversary and the MACs and the authentication trees would become the victim of cryptanalytic attacks. In addition, we show that such an attack can be extended to multi-core systems by simply corrupting other unprotected cores and performing mali- cious behaviors. To handle these scenarios, we propose a Counter Embedded Memory (CEM) design, and employ embedded counters to record every data fetch and trace malicious operations. The proposed platform with CEM allows the system to trace unexpected memory access, thus can indicate potential attack in progress. We present both qualitative discussion and quantitative analysis to show the effectiveness of the proposed architecture. Our FPGA rapid prototype shows that the additional memory overhead is only 0.10% and the latency can be totally neglected. I. . Introduction Trusted computing platforms refer to systems that are able to prevent accessing or altering data without a proper authorization and protect data from software or physical attacks. These platforms are becoming more important, due to the explosion of sensitive information which should be computed on such system. To build a trusted computing environment, one factor is confidentiality, which means the adversary cannot absorb the information. A typical way is to apply the techniques of symmetric-key encryption. The Advanced Encryption Standard (AES) [1] is a widely used cryptography algorithm. AES requires 10 rounds of substitution and permutation if the length of the key is 128 bits. A lot of cryptanalysis work on AES is being carried out. Although there is no effective attack on the 10- round AES published, there exist attacks on AES variants such as 7-round AES [6]. AES does not provide perfect secrecy theoretically and there is no guarantee that the cryptanalytic attacks on AES are absolutely infeasible. Obviously, it is unwise to use encryption alone to keep the confidentiality of the information from adversaries. It is better to provide a system that does not allow any adversary to perform cryptanalysis, or at least the system can detect any cryptanalytic attempt from potential threats. The other important factor of trusted computing plat- form is to ensure the information integrity, which means the adversary cannot modify the information and corrupt the consequent computing result. The procedure of the integrity verification is called authentication. A typical way of integrity verification is performed by computing and verifying Message Authentication Codes (MACs). The MAC is a short “fingerprint” of some data; if the data is modified, the fingerprint will mis-match the original MAC [12]. Integrity tree is also employed to reduce on- chip storage overhead. Only the root of the tree is stored on the trusted processor chip [4]. However, the integrity trees and MACs cannot detect cryptanalytic attacks. The adversary may just fetch the information as the material for cryptanalysis. In addition, the MAC is usually shorter than the original data, and it is no longer one-to-one mapping. Several different messages will give the same MAC. To find such a collision mapping is easier than to crack the MAC key, and the collision can bypass the protection of MACs and integrity trees. The above encryption and authentication tree mech- anism has assumed that the processor chip is trusted. However, there is no guarantee that the processor chip can always be trusted. For instance, King et al. [8] proved that 978-1-42447074-7/10/$26.00 c 2010 DOI 10.1109/rsp 2010.17

Upload: kim-fung

Post on 28-Feb-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2010 21st IEEE International Symposium on Rapid System Prototyping (RSP) - Fairfax, VA, USA (2010.06.8-2010.06.11)] Proceedings of 2010 21st IEEE International Symposium on Rapid

Counter Embedded Memory Architecture for Trusted Computing Platform

Gavin Xiaoxu Yao, Ray C.C. Cheung, Kim Fung ManDepartment of Electronic Engineering

City University of Hong KongTat Chee Avenue, Kowloon, Hong Kong SAR

[email protected],{r.cheung, eekman}@cityu.edu.hk

Abstract

Due to various hacker attacks, trusted computing plat-form has received a lot of attentions recently. Encryp-tion is introduced to maintain the confidentiality of datastored on such platform, while Message AuthenticationCodes (MACs) and authentication trees are employed toverify the data memory integrity. These encryption andauthentication architectures suffer from several potentialvulnerabilities which have been omitted by the previouswork. In this paper, we first address our concern about atype of cryptanalysis; a ciphertext stored on memory canbe decrypted and attacked by an adversary and the MACsand the authentication trees would become the victim ofcryptanalytic attacks. In addition, we show that such anattack can be extended to multi-core systems by simplycorrupting other unprotected cores and performing mali-cious behaviors. To handle these scenarios, we proposea Counter Embedded Memory (CEM) design, and employembedded counters to record every data fetch and tracemalicious operations. The proposed platform with CEMallows the system to trace unexpected memory access, thuscan indicate potential attack in progress. We present bothqualitative discussion and quantitative analysis to show theeffectiveness of the proposed architecture. Our FPGA rapidprototype shows that the additional memory overhead isonly 0.10% and the latency can be totally neglected.

I.. IntroductionTrusted computing platforms refer to systems that are

able to prevent accessing or altering data without a properauthorization and protect data from software or physicalattacks. These platforms are becoming more important, dueto the explosion of sensitive information which should becomputed on such system. To build a trusted computingenvironment, one factor is confidentiality, which means theadversary cannot absorb the information. A typical wayis to apply the techniques of symmetric-key encryption.

The Advanced Encryption Standard (AES) [1] is a widelyused cryptography algorithm. AES requires 10 rounds ofsubstitution and permutation if the length of the key is128 bits. A lot of cryptanalysis work on AES is beingcarried out. Although there is no effective attack on the 10-round AES published, there exist attacks on AES variantssuch as 7-round AES [6]. AES does not provide perfectsecrecy theoretically and there is no guarantee that thecryptanalytic attacks on AES are absolutely infeasible.Obviously, it is unwise to use encryption alone to keepthe confidentiality of the information from adversaries.It is better to provide a system that does not allow anyadversary to perform cryptanalysis, or at least the systemcan detect any cryptanalytic attempt from potential threats.

The other important factor of trusted computing plat-form is to ensure the information integrity, which meansthe adversary cannot modify the information and corruptthe consequent computing result. The procedure of theintegrity verification is called authentication. A typicalway of integrity verification is performed by computingand verifying Message Authentication Codes (MACs). TheMAC is a short “fingerprint” of some data; if the datais modified, the fingerprint will mis-match the originalMAC [12]. Integrity tree is also employed to reduce on-chip storage overhead. Only the root of the tree is storedon the trusted processor chip [4]. However, the integritytrees and MACs cannot detect cryptanalytic attacks. Theadversary may just fetch the information as the material forcryptanalysis. In addition, the MAC is usually shorter thanthe original data, and it is no longer one-to-one mapping.Several different messages will give the same MAC. Tofind such a collision mapping is easier than to crack theMAC key, and the collision can bypass the protection ofMACs and integrity trees.

The above encryption and authentication tree mech-anism has assumed that the processor chip is trusted.However, there is no guarantee that the processor chip canalways be trusted. For instance, King et al. [8] proved that

978-1-42447074-7/10/$26.00 c⃝ 2010 DOI 10.1109/rsp 2010.17

Page 2: [IEEE 2010 21st IEEE International Symposium on Rapid System Prototyping (RSP) - Fairfax, VA, USA (2010.06.8-2010.06.11)] Proceedings of 2010 21st IEEE International Symposium on Rapid

corrupting a processor chip is feasible without involvingtoo many resources and this action can barely be dis-covered. Multi-core processors are also widely employedby the high-end platforms. If a core is contaminated andcontrolled by the adversary, it can circumvent the abovemechanism and commit malicious behavior.

To deal with the issues introduced by MACs, integritytrees, malicious processors and cores, a novel memoryarchitecture, Counter Embedded Memory (CEM), is pro-posed in this paper with minor area overhead and hiddenlatency. The main contributions of this paper are:

∙ The vulnerabilities of the encryption and authenti-cation mechanism proposed by previous works areaddressed and a novel type of attacks brought by themulti-core systems, namely malicious core/processorattacks, is presented.

∙ The design of CEM, which can indicate the unex-pected memory access operation without too manymemory overhead, is discussed in details.

∙ The architecture with CEM is prototyped on FPGA,and verify that the CEM is able to defend against theattacks by the proposed type of threats. The area andtiming results for the proposed design is also given.

The rest of the paper is organized as follows. Section IIintroduces the related works. Section III describes theattack models and the assumptions adopted in this paper.Section IV proposes our design of CEM. Section V illus-trates the whole architecture working along with the newcounter embedded memory and describes the operationsof such architecture. Section VI shows our experimentalsetup, and Section VII gives our results and analysis.Finally, Section VIII concludes this paper.

II.. Related Work

Direct mode encryption is a well-known cryptographicscheme, which uses plaintext to derive ciphertext directlyby ciphers, such as AES. Shi et al. [10] perform predictionto accelerate not only the decryption process of the directmode, but also the authentication process of MAC. Anotherpopular encryption scheme is counter mode encryptionwhich can hide the encryption latency [3]. It uses countervalue or sequence number to generate a crypto pad and thepad bitwise-XORs with the plaintext to produce cipher-text. Optimization of counter mode is researched by theprevious work. The counter values or sequence numberscould be cached to hide the decryption latency [13]. Shiet al. [11] also store the initial counter values for everymemory page and use them to predict the counter valueand precompute the crypto pad to hide latency. The conceptof split counters is introduced to reduce the memoryoverhead [14]. However, the counter mode encryption doesnot provide the memory integrity verification, and is not

secure against chosen-ciphertext attack. Hence, additionalmeasurement should be applied to verify the integrity.

On the other hand, in order to accelerate the authen-tication process, Galois/Counter Mode and GMAC [14]are employed, which is much faster than other hash/MACfunction such as CBC-MAC. In addition, only MAC cannotprotect the memory from replay attacks. Hence, the Merkletree is employed [7]. To reduce the unacceptable latency,a cached tree architecture is proposed and the frequentlyused nodes are cached [7]. The Bonsai Merkle Tree (BMT)is invented [9] to reduce overhead further. The tree struc-ture in the BMT architecture only covers the counters in-stead of the whole off-chip memory. Parallelizable integritytrees are proposed, which can accelerate the authenticationprocess [5]. Recently, researchers also work on exploringthe issues related to multi-core systems [15].

III.. Attack Models and Assumptions3.1. Cryptanalytic Attacks and Birthday AttacksIn counter mode encryption, the counter is served as

part of the key or encryption seed, and its value will also beneeded for decryption. Therefore, the counter value itselfcannot be encrypted, otherwise it will make the decryptionprocess, if possible, very difficult. Although it has beenproved that the security of counter mode is dependent onthe key and irrelevant to the counter value [3], the exposureof the counter value provides shortcut for exhaustive keysearch. It is also possible that the cryptanalytic attacks onAES other than exhaustive search exist.

In a typical trusted computing platform, either MACor hash is used to provide authentication and MAC andhash have the same function. Without loss of generality,we only consider the cases using MAC. With the growthof the memory size, from several MB to several GB, evento TB, more and more data-MAC pairs are presented tocover the whole external memory. As the system workson, new pairs will be generated with the data update. Thisprovides much information for the adversary to performcryptanalytic attacks targeting MACs and integrity trees.

A specific type of cryptanalytic attacks on MACs isbirthday attack named after “birthday paradox”. The ob-jective of birthday attack is to find a collision, two differentmessages which will generate the same MAC, with certainprobability instead of 100% certainty. Birthday attack ismuch easier to achieve theoretically [12]. However, thecryptanalytic attacks targeting the MACs and the integritytrees are neglected in the previous work unfortunately.

3.2. Malicious Core or Processor AttacksThe number of cores on one chip will increase con-

tinuously in the next decade undoubtedly. For multi-coresystem, if one core in the processor is controlled by theadversary, the adversary can use this core to access thememory, read the memory silently, or even inject malicious

Page 3: [IEEE 2010 21st IEEE International Symposium on Rapid System Prototyping (RSP) - Fairfax, VA, USA (2010.06.8-2010.06.11)] Proceedings of 2010 21st IEEE International Symposium on Rapid

Trusted Processor

Malicious Processor

External Memory

BUSSwitch by the

Adversary

Trusted Core

Malicious Core

External Memory

BUS

(a) (b)

Fig. 1. Malicious core or processor attack

data to the off-chip memory with the help of the encryptionengine on the processor chip. The malicious core attackis shown in Fig 1(a). The malicious processor attacksare similar to the malicious core attacks. A maliciousprocessor chip is attached to the interface of the memoryby switching the bus and can access the data as itsown wish. The malicious processor will not be noticedby the legal processor if the malicious processor retreats(becomes inactivate) before the legal one uses the bus.Fig 1(b) shows the malicious processor attack. Note thatthe malicious core attacks and the malicious processorattacks are functionally and topologically identical. Bothof them break the assumption that all the resources on theprocessor should be trusted, and utilize the resource of theprocessor to commit malicious behavior. For simplicity,we do not distinguish the difference between core andprocessor in the remaining part of the paper.

3.3. Assumptions about Trusted Region

In the previous work on memory encryption and in-tegrity verification, one assumption of the threat model isthat processor chip is resistant to all physical attacks, andthus is trusted, while the non-secure region includes alloff-chip resources, primarily including the memory bus,physical memory, as shown in Fig 2(a). The boundary ofthe secure region is defined as security frontier.

In this paper, we assume that each chip, including theprocessor chip and the memory chip, will naturally providea secure boundary. Therefore, the adversary cannot changethe data stored on read-only register or read the data fromthe write-only register on chips. This assumption is basedon the fact that data fetch or injection is typically throughthe memory interface or bus signal, while changing thedata by other means are more difficult or infeasible.

Furthermore, for multi-core systems, shared memoryissue is the main concern. In the shared memory system,only the core that runs the sensitive program with itsexclusive cache is considered as trusted, while other coresare not. Therefore, the security frontier has been changed.Each core along with its cache is treated as a trusted entitywhen necessary. Without loss of the generality, we usedual-core model to present multi-core systems and treatCore A and its cache as trusted one and Core B is con-sidered probably occupied by the adversary. The security

Cache B

Core A Core B

Cache A

Security Frontier

External Memory

ProcessorProcessor

Core

Encryption/Decryption,Integrity Verification

Engine

CacheSecurity Frontier

Trusted Secure Processor

External Memory

(a) (b)

Fig. 2. Security frontier assumptions

frontier for multi-core systems is shown in Fig 2(b).For the attacks, we assume the adversary is able to

access the system within finite time. Also we do notconsider the situation such as Lurker attacks which merelymonitor the bus or interface and are totally passive evenwithout any active read operation, because the adversarycan only access very limit information that is not enoughto perform cryptanalysis in this circumstance.

IV.. Counter Embedded Memory Design

4.1. CEM Design for Single Trusted ProcessorThe memory access can be divided into two types: read

and write. The unexpected read implies the adversary isattempting to perform cryptanalysis, and the unexpectedwrite implies the adversary has already taken action andmade active attack. As unexpected write can be detectedby the method of integrity authentication described inSection II, our goal is to detect the unexpected readoperation and protect the system from cryptanalysis. Thesimplest way to discover the unexpected read operation isto construct a trace that can record every memory accessand have a copy of the trace on the secure area. When nec-essary, we compare the trace from the untrusted area withthe trusted copy. If they match, we can confidently say theinformation is secure. The most basic trace is a counterthat will auto-increase when there is any read operation.

In order to create such counter trace, we propose anew memory prototype architecture, Counter EmbeddedMemory (CEM). The CEM architecture for single trustedprocessor is shown in Fig 3. In the CEM, each memoryrow (8KB) is concatenated with a 48-bit register that storesthe counter value. The row address is used as the indexto locate the corresponding counter register, i.e., the rowaddress refers to the address of each counter register. Auniversal counter on the memory chip will record everyread operation to the same memory row. When the memoryrow is inactivated or left idle for a certain time, theoriginal counter value from the corresponding register willbe added with the times of read operation and written backto the same register. The counter registers can only be

Page 4: [IEEE 2010 21st IEEE International Symposium on Rapid System Prototyping (RSP) - Fairfax, VA, USA (2010.06.8-2010.06.11)] Proceedings of 2010 21st IEEE International Symposium on Rapid

Memory Row Cnt RegNode

Universal Counter

Control Bus

Encrypted Counter

Data Bus

CEM Chip

Counter Data Bus

Memory Row Cnt Reg

Memory Row Cnt Reg

Memory Row Cnt Reg

Memory Row Cnt Reg

Memory Row Cnt Reg

Memory Row Cnt Reg

Memory Row Cnt Reg

Node

Node

Node

Node

Node

EmbeddedCounter

8 KB 48 bit

Fig. 3. CEM for single trusted processor

written by the universal counter and therefore, it is read-only to the processor and other peripheral device. Hence,it is impossible to change the register value through thebus or the memory interface. We can use the proposedarchitecture to identify which row has been accessed whenthere is mismatch.

The width of 48 bits is large enough for the counter torecord the memory access operations and resist overflow.Suppose the register is n-bit long, the adversary can accessthe memory and fetch the data 2𝑛 times such that thecounter will overflow and return to its original value.Hence, a 48-bit counter will overflow when 248 fetchoperations are done. Suppose the maximum access rateis 1 GB/s, it takes 218 seconds till overflow, which is over3 days. Thus, the overflow attack is infeasible when theverification of all the registers is performed every hour.

It is important to note that in our approach, we use anuniversal counter framework but not a global counter toprovide better security. For the universal counter, given arow address, the universal counter stores its counter valueto the corresponding counter register which belongs to therow. On the other hand, a global counter records everyoperation and can only store one counter value. Compare tothe global counter, our design provides several advantages.First, this mechanism can provide information about whichmemory block and which program are violated when anunexpected read is detected. Second, in the multi-coresituation, the global counter is useless because the accessoperation from every core will be counted and it is difficultor impossible to tell which core issues the access operation.

In order to save the precious storage resource on theprocessor chip, we also construct a tree structure to com-press the digest and only store the root of the tree onthe processor chip. Notice that this integrity tree has thesame vulnerability described in Section III.1, e.g. birthdayattacks, we employ an encrypted counter to record theread operation targeting the nodes of the tree and theadditional encrypted counter itself. Therefore, only theread operations to the counter registers are not recorded.

T0 T1 T2 T3 T4 T5 T6 T7CLK

CLK

COMMAND

ADDRESS

Acti-vate

Row Addr

Read

Col. Addr

DATA D0 D1 D2 D3

NOP NOPPre-

charge

Bank Addr

Counter Control

Read Cnt Reg

CountRead

Write Cnt Reg

NOP NOP NOP

Column Latency

*NOP : No Operation Col. : Column

Addr : Address Cnt Reg: Counter Register

Fig. 4. An example of CEM operating

However, the counter registers would provide the adversaryno information but numbers which are irrelevant to thecontent of the storage.

Our embedded counters are suitable for most kindsof storage media to improve security. Without loss ofthe generality, we take DDR2 SDRAM as example. Thecolumn latency contribute the major part of the wholelatency, and it is 3-5 clock cycles for DDR2 SDRAM,while the counter registers are much smaller than the wholememory, and thus, the latency for reading a counter registeris less than that of a normal memory column. In addition,when read or write a data, the corresponding row addressmust be provided with the bank activate command, whichis ahead of the read or write command. Therefore, the rowaddress is provided at least 4 clock cycles before the datais read from the memory, which provides an opportunity tohide the latency of reading and writing the counter register.

As in Fig 4, when a bank activate command and therow address are issued, the counter value is fetched fromthe corresponding register. The counter increases whenthere is a read command. When a precharge commandis issued, which means the operation to this memory rowis finished, the original counter value adds the times ofthe read operations recorded by the counter, and the newcounter value is stored back to the original counter register.

4.2. CEM Design for Multi-core ProcessorFor multi-core system, we can still trace cryptanalytic

attacks by detect unexpected read. However, the multi-core systems also raise issues to our proposed architecture.As the memory is shared and not all the memory blocksbelong to the same core, the global tree structure whichcovers all counter registers will be meaningless. Hence,we provide each core a tree structure to cover all thecounter registers, and an encrypted counter to record datafetch from the tree nodes and the encrypted counter itself.Additionally, a mask register is also provided for each core,where the memory allocation information of that core isstored. According to the mask information, the counterregisters are masked in the corresponding tree. Therefore,

Page 5: [IEEE 2010 21st IEEE International Symposium on Rapid System Prototyping (RSP) - Fairfax, VA, USA (2010.06.8-2010.06.11)] Proceedings of 2010 21st IEEE International Symposium on Rapid

Memory Row

Cnt RegNode

UniversalCounter

Control Bus

Encrypted Counter B

Data Bus

CEM Chip

Memory Row

Cnt Reg

Memory Row

Cnt Reg

Memory Row

Cnt RegNode

EmbeddedCounter

Node

Encrypted Counter A

Counter Data Bus

NodeMask Info. APIN A

Mask Reg A

Mask Info. BPIN B

Mask Reg B

Fig. 5. CEM design for dual-core system

one core’s memory access will not affect other cores inthe normal condition. Other than these components, theCEM for multi-core system is the same with that for singletrusted processor. Every memory row is concatenated witha counter register and a universal counter will record everyread operation. Without loss of generality, we use dual-coresystem as an example. The design of CEM for dual-coresystem is shown in Fig 5.

The mask register is write-only and includes two parts:the mask information register, and the PIN register. Asthe memory allocation varies online, the mask informationwill change all the time. In order to protect the coreassignment information from sabotage without loss offlexibility, a correct PIN must be provided first whenevermask information is changed. If the PIN is incorrect, asignal will be sent to the processor chip to raise alert. ThePIN value is a random number assigned by the core.

V.. Trusted Computing Platform with CEM

The CEM can serve as ROM and RAM, and theoperations on CEM ROM are similar with that of CEMRAM. For simplicity, we only introduce the architecturewhich employs CEM as RAM. It is important to note thatthe CEM do not provide any confidentiality or integrityprotection for the information stored on the memory.Therefore, Bonsai Merkle Tree (BMT) is employed to pro-vide the basic confidentiality and integrity protection. BMTimplies that confidentiality is achieved by split countermode encryption. To build our own BMT, GMAC is usedas MAC for its efficiency. So other than the proposed CEM,the whole single trusted processor platform also includesa hardware counter mode encryption engine, which alsoserves as decryption engine, GMAC verification engineand the trusted processor with its cache where stores theroot value of the integrity tree and the encrypted counter

Memory Row

Processor

Encryption, Decryption

and Integrity Verification

Block

Cnt Reg

Cache

Memory Row Cnt Reg

Memory Row Cnt Reg

Memory Row Cnt Reg

Node

Node

Root

EncryptedCounter

EncryptedCounter

Processor Chip

CEM Chip

Cipher Text Data BUS

Plain Text Data BUS

Fig. 6. Single core platform with CEM

Core B

Cnt Reg

Cache B

Cnt RegCnt RegCnt Reg

Node

Node

Root

Processor Chip

CEM Chip

Core A

Cache A

Node

Node

EncryptedCounter A

EncryptedCounter B

EncryptedCounter

EncryptedCounter

Root PRNGFIFO

Mask Register

Fig. 7. Dual-core platform with CEM

value. The architecture is shown in Fig 6. For multi-coreplatform, the architecture is almost the same. The differ-ence is that each cache stores its corresponding root valueand encrypted counter value, and the encryption enginecontains a Pseudo Random Number Generator (PRNG) toprovide PINs for the mask registers. Each random numberis used once for a certain core and then discarded. Fig 7illustrates the architecture for multi-core systems with theunchanged part not shown.

There are two ways to perform the counter valueauthentication: online and off-line. Online authenticationmeans that every time there is read operation, the countervalue authentication is performed simultaneously, whilethe off-line authentication is taken place on a fixed timeinterval. To reduce latency, online authentication usuallyonly verifies the counter register and the correspondingnodes that will increase and update by the read operation.However, the infrequent fetched memory block will loss

Page 6: [IEEE 2010 21st IEEE International Symposium on Rapid System Prototyping (RSP) - Fairfax, VA, USA (2010.06.8-2010.06.11)] Proceedings of 2010 21st IEEE International Symposium on Rapid

Memory Row

Processor

Encryption, Decryption

and Integrity Verification

Block

Cnt Reg

Cache

Memory Row Cnt Reg

Memory Row Cnt Reg

Memory Row Cnt Reg

Node

Node

Root

Encrypted Counter

Encrypted Counter

FPGA Chip

External Memory

Processor Chip of the Proposed Platform

The Proposed CEM Chip

Cipher Text Data BUS

Plain Text Data BUS

Fig. 8. The FPGA prototype with CEM

the protection if only online authentication is employed. Inorder to detect the cryptanalytic attacks on the infrequentfetched block, we take off-line authentication once perhour. Therefore, any unexpected read operation will befound within one hour. A dedicated GMAC engine isused to computing the nodes of the counter integrity treeso that the BMT authentication could work with CEMauthentication parallelly.

VI.. Experimental Setup

We use FPGA and external memory to prototypethe trusted computing platform. The FPGA is XilinxXC5VLX110T, which has 17,280 Virtex-5 Slices and 14836Kb RAM Blocks. In the single core system, MicroBlaze,an RSIC softcore, is served as the processor. A 64KBdirect mapped write-back cache is attached to the processorand each cache line is 32B. The processor chip alsocontains the AES counter mode encryption engine, GMACcomputing engine, memory interface and other peripheraldevices. As the fabrication of a prototypal memory chipis very expensive, the counter and counter registers areactually built by the slices and block RAMs of the FPGA.However, this compromise does not affect the function ofthe platform to evaluate the performance of the proposedarchitecture with CEM. Fig 8 shows the difference betweenthe proposed CEM architecture (dotted-line regions) andthe implemented FPGA prototype (shadowed regions). Thedual-core system involves two MicroBlaze cores, and eachof them has its own 64KB cache. There is a multi-portmemory interface and these two cores share the externalmemory. The external memory is 256MB SDRAM.

TABLE I. Logic utilization of CEM prototypeModule Item Used Avail. Ratio

Cnt Regs BRAMs 44 148 29.7%Integrity Tree BRAMs 14 148 9.4%

Encrypted Cnt[2] Slices 400 69120 1.2%Others Slices 823 69120 1.2%

Total(Single core)

Slices 823 69120 1.2%BRAMs 58 148 39.2%

Total (Dual core)Slices 1223 69120 1.8%

BRAMs 72 148 48.6%

VII.. Evaluation

The system with CEM can effectively detect the un-expected read operation. When such an operation occurs,the hardware counter verification engine detects it and theexternal interrupt is set. The processor presents a warningthrough the monitor. When the active attack is imple-mented, the BMT authentication engine detects it and alsoprovides a warning report which specifies which memoryblock is corrupted. However, the system with BMT butwithout CEM only raises alarm when active attacks happenbut does not response to the read operation from othersource expect the trusted processor. Besides, in the dual-core system, the two cores work effectively when eachof them only communicates with its own memory blocksallocated to it. When one core accesses the memory blockbelongs to the other, a warning is raised. Additionally, theencrypted counter detects the illegal read operation to thecounter register tree nodes and the the encrypted counteritself. The above results illustrate that the cryptanalyticattacks and malicious core attacks as well as active attackswill be detected in our proposed architecture.

The overhead of the additional counter and counterregisters is relatively small to the whole memory. The CEMprototype logic utilization of the FPGA is listed in Table I.The FPGA slices are used to construct the embeddedcounter, its interface with the processor, the encryptedcounter (not necessary for single core system) and thecontrol unit. The consumption caused by our design isquite little no matter in the sense of the absolute value orthe relative ratio to the source available. The Block RAMis the main consumption of the FPGA resource. TheseRAM blocks are used to construct the counter registers andintegrity tree nodes. The reason of high consumption of theblock RAM is that the available block RAM is relativelysmall compared to the 256MB memory. The total availableblock memory is 666KB, only 0.2% of the 256MB.

A more fair way to evaluate the proposed work is tocompare the overhead of the CEM with that of BMT andwith the whole memory. The memory overhead of CEMand BMT is listed in Table II. In our experiment, a memoryrow is 8KB. A 256MB RAM contains 215 memory row andtherefore, 215 48-bit counter registers are needed, which is192KB. The MAC compression ratio we use is 4:1, so that

Page 7: [IEEE 2010 21st IEEE International Symposium on Rapid System Prototyping (RSP) - Fairfax, VA, USA (2010.06.8-2010.06.11)] Proceedings of 2010 21st IEEE International Symposium on Rapid

TABLE II. Memory Overhead of CEM and BMTItem CEM BMT Memory

Per Memory Row 6B 272B 8KBCounter Regs 192KB 8.5MBIntegrity Tree 64KB 2.8MB

Total overhead 256KB 11.3MB 256MBPercentage 0.10% 4.41%

the integrity tree is 64KB. The total overhead is only 0.10%of the whole memory. On the other hand, BMT employssplit counter to perform counter mode encryption. Eachmemory page has a 64-bit page counter and each cache-line has a 8-bit small counter. In the experiment, a memoryrow contains 2 memory pages, each 4KB, and a cache lineis 32B. At the same compression ratio, the integrity treefor BMT is 2.8MB. Therefore, the 256MB RAM needs11.3MB overhead to perform BMT. The memory overheadof CEM is quite small, only 2.3% of BMT overhead.

On the other hand, there is no latency caused by ourdesign compared to the BMT architecture. The main sourceof the latency comes from the integrity tree authenticationand update. The depth of the integrity tree is a reasonableindex of the latency and it is proportional to log𝐴 𝑀 , whereA is the number of child nodes or leaves a node has, andM is the number of the leaves. In the experiment, the treestructure is the same for both CEM and BMT, while theleaves of BMT is much more than that of CEM, 8.5MB to192KB. So the depth of the CEM tree is much less thanthat of BMT. Therefore, we succeed to hide the latencycaused by the CEM under the latency caused by BMT.

VIII.. Conclusions

In this paper, we emphasize the attacks on the MACsand the integrity tree that have been neglected in the trustedcomputing platform. We also present a new type of attacks,malicious core or processor attacks, which may bypass theexisting verification method and corrupt the trusted com-puting platform. In order to protect the systems from theseattacks, we proposed a new memory prototype, CounterEmbedded Memory (CEM), which utilizes the read-onlycounter to record every memory access and identify thesuspicious ones. The overhead of the new componentsdo not occupy the precious resource of the processorchip and it is on the memory chip. We also introducethe architecture of the trusted computing platform withCEM serving as RAM. Our experiment proves that thearchitecture can work effectively and detect the unexpectedoperations to the external memory with quite small areaoverhead, 0.10%, and without causing additional latency.

In the future work, virtual memory and paging mech-anism will be introduced into the trusted computing plat-form where CEM serve as the main off-chip memory. Suchcounter-embedded protection can also be extended to pro-

tect hard drive. The development of this architecture willmake the platform more secure, practical and universal.

Acknowledgement

The authors thank Prof. Ruby B. Lee for introducing thetopic of memory integrity tree and security design issues.

References[1] Advanced Encryption Standard (AES). Federal Information Pro-

cessing Standard Publication 197, 2001.[2] P. Bulens, F.-X. Standaert, J.-J. Quisquater, P. Pellegrin, and

G. Rouvroy. Implementation of the aes-128 on virtex-5 fpgas.AFRICACRYPT, pages 16–26, 2008.

[3] W. Diffie and M. Hellman. Privacy and authentication: An intro-duction to cryptography. Proceedings of the IEEE, 67(3):397–427,March 1979.

[4] R. Elbaz, D. Champagne, C. Gebotys, R. B. Lee, N. Potlapally,and L. Torres. Hardware mechanisms for memory authentication:A survey of existing techniques and engines. Transactions onComputational Science IV, Lecture Notes in Computer Science(LNCS), pages 1–22, 2009.

[5] R. Elbaz, D. Champagne, R. B. Lee, L. Torres, G. Sassatelli, andP. Guillemin. Tec-tree: A low-cost, parallelizable tree for efficientdefense against memory replay attacks. In CHES ’07: Proceedingsof the 9th international workshop on Cryptographic Hardware andEmbedded Systems, pages 289–302, Berlin, Heidelberg, 2007.

[6] N. Ferguson, J. Kelsey, S. Lucks, B. Schneier, M. Stay, D. Wagner,and D. Whiting. Improved cryptanalysis of rijndael. In FSE ’00:Proceedings of the 7th International Workshop on Fast SoftwareEncryption, pages 213–230, London, UK, 2001.

[7] B. Gassend, G. E. Suh, D. Clarke, M. van Dijk, and S. Devadas.Caches and hash trees for efficient memory integrity verification.In HPCA ’03: Proceedings of the 9th International Symposium onHigh-Performance Computer Architecture, page 295, Washington,DC, USA, 2003.

[8] S. T. King, J. Tucek, A. Cozzie, C. Grier, W. Jiang, and Y. Zhou.Designing and implementing malicious hardware. In LEET’08:Proceedings of the 1st Usenix Workshop on Large-Scale Exploitsand Emergent Threats, pages 1–8, Berkeley, CA, USA, 2008.

[9] B. Rogers, S. Chhabra, M. Prvulovic, and Y. Solihin. Usingaddress independent seed encryption and bonsai merkle trees tomake secure processors os- and performance-friendly. In MICRO’07: Proceedings of the 40th Annual IEEE/ACM InternationalSymposium on Microarchitecture, pages 183–196, Washington, DC,USA, 2007.

[10] W. Shi and H.-H. S. Lee. Accelerating memory decryption and au-thentication with frequent value prediction. In CF ’07: Proceedingsof the 4th international conference on Computing frontiers, pages35–46, New York, NY, USA, 2007.

[11] W. Shi, H.-H. S. Lee, M. Ghosh, C. Lu, and A. Boldyreva. Highefficiency counter mode security architecture via prediction andprecomputation. SIGARCH Comput. Archit. News, 33(2):14–24,2005.

[12] D. R. Stinson. Cryptography: Theory and Practice, Third Edi-tion (Discrete Mathematics and Its Applications). Chapman &Hall/CRC, November 2005.

[13] G. E. Suh, D. Clarke, B. Gassend, M. v. Dijk, and S. Devadas.Efficient memory integrity verification and encryption for secureprocessors. In MICRO 36: Proceedings of the 36th annualIEEE/ACM International Symposium on Microarchitecture, page339, Washington, DC, USA, 2003.

[14] C. Yan, D. Englender, M. Prvulovic, B. Rogers, and Y. Solihin.Improving cost, performance, and security of memory encryptionand authentication. SIGARCH Comput. Archit. News, 34(2):179–190, 2006.

[15] Y. Zhang, L. Gao, J. Yang, X. Zhang, and R. Gupta. Senss: Securityenhancement to symmetric shared memory multiprocessors. InHPCA ’05: Proceedings of the 11th International Symposium onHigh-Performance Computer Architecture, pages 352–362, Wash-ington, DC, USA, 2005.