hsdpa-adaptive modulation and coding acceleration with …kom.aau.dk/group/06gr1043/master.pdf ·...

HSDPA-Adaptive Modulation and Codingacceleration with FPGA

Group 943

Sasikanth Munagala & Raju Muchanthula

May 2, 2006

AALBORG UNIVERSITYInstitute of Electronic SystemsApplied Signal Processing and Implementation, 9th semester

Fredrik Bajes Vej 7 9220 Aalborg Øst Phone: +45 98158522 fax: 98 15 97 57

TITLEHSDPA - AdaptiveModulation andCoding accelerationwith FPGA

PROJECT PERIOD9th semester20th September 2005 -30th June 2006

PROJECT GROUPASPI 943

MEMBERSSasikanth MunagalaRaju Muchanthula

SUPERVISORSYannick Le MoullecLars Kristensen

Number of Reports: not yet

Total Pages: not yet

ABSTRACT

Contents

Table of Contents iii

List of Figures v

List of Tables vi

1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 HSDPA System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Adaptive Modulation and Coding (AMC) . . . . . . . . . . . . . . . . 41.3 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3.1 Physical Layer Operation . . . . . . . . . . . . . . . . . . . . . . . . . 51.3.2 Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3.3 Limitations and Assumptions . . . . . . . . . . . . . . . . . . . . . . 7

2 Design Framework 82.1 A3 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.1 Design Model (Rugby Meta-Model) . . . . . . . . . . . . . . . . . . . 82.1.2 Design Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2 The Design Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

I Analytical Model 14

3 System Specification 153.1 System Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.1.1 Testbed Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4 Principle of Adaptive Modulation and Coding (AMC) 184.1 HS-DSCH Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.2 Simulation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.3 Layer 1 Payload and Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.4 Forward Error Correction / Channel Coding . . . . . . . . . . . . . . . . . . . 19

Thesis report 05gr943 i

CONTENTS

4.4.1 Cyclic Redundancy Check . . . . . . . . . . . . . . . . . . . . . . . . 214.5 Turbo Coder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.5.1 Turbo Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.5.2 Interleaving for Turbo Coding . . . . . . . . . . . . . . . . . . . . . . 234.5.3 Puncturing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.5.4 Turbo Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.5.5 The MAP and Log-MAP Algorithms . . . . . . . . . . . . . . . . . . 28

4.6 Rate Matching and HARQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.7 Constellation Rearrangement . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.8 Adaptive Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.8.1 QPSK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.8.2 16-QAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.9 Spreading and Scrambling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.10 The Additive Noise Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5 Simulation 355.1 Simulation Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.2 Simulated Performances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.3 A Comparison of Decoding Algorithms . . . . . . . . . . . . . . . . . . . . . 38

5.3.1 Computational Complexity of Log-MAP Algorithm . . . . . . . . . . 395.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

II Design Model 41

6 Design Methodology 426.1 DK Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

6.1.1 Handel-C Language . . . . . . . . . . . . . . . . . . . . . . . . . . . 426.2 Design Flow using System Generator . . . . . . . . . . . . . . . . . . . . . . 456.3 DSP Builder Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476.4 Hardware Software Co-design . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6.4.1 Driving factors behind Hardware-Software Co-design . . . . . . . . . . 496.4.2 Degree of Programmability . . . . . . . . . . . . . . . . . . . . . . . . 496.4.3 Implementation Features . . . . . . . . . . . . . . . . . . . . . . . . . 50

6.5 Generic Co-design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506.6 Followed Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

7 Design Space Exploration 557.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557.2 Design Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

7.2.1 Cost Function for the Implementation . . . . . . . . . . . . . . . . . . 557.3 Design Space Exploration (DSE) . . . . . . . . . . . . . . . . . . . . . . . . . 567.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Thesis report 05gr943 ii

CONTENTS

III Hardware Implementation 57

8 Hardware Implementation 588.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 588.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

8.2.1 SpecC Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 588.3 Clock Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 608.4 Partition and Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618.5 Handel-C implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618.6 Hardware-Software Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618.7 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

9 Conclusion 62

A Different Types of Channels in HSDPA 63A.1 Transport channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

A.1.1 Dedicated transport channels . . . . . . . . . . . . . . . . . . . . . . . 63A.1.2 Common transport channels . . . . . . . . . . . . . . . . . . . . . . . 63

A.2 Physical channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65A.3 Physical signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

B Derivation of the log MAP algorithm 66B.1 The MAP algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66B.2 The log MAP algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

C Handel-C Language 70C.1 Comparison of ANSI C and Handel-C . . . . . . . . . . . . . . . . . . . . . . 70C.2 Efficient Use of Hardware in Handel-C . . . . . . . . . . . . . . . . . . . . . . 71C.3 Timing and Control in Handel-C . . . . . . . . . . . . . . . . . . . . . . . . . 72C.4 Parallel Hardware Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 73C.5 Channel Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74C.6 External Communication via Interfaces . . . . . . . . . . . . . . . . . . . . . 74C.7 Bit Level Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74C.8 Calling C/C++ Libraries in Handel-C . . . . . . . . . . . . . . . . . . . . . . 74C.9 Some Restrictions when using Handel-C and FPGAs . . . . . . . . . . . . . . 74

Bibliography 75

Thesis report 05gr943 iii

List of Figures

1.1 Evolution of the Mobile Systems . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 HSDPA Inclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Physical Layer Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1 The A3 framework with focus on Algorithmic and Architecture Mapping . . . 92.2 Generic Rugby Meta-Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3 Rugby Meta-Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.4 The Design Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1 Rugby Meta-Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.2 Blocks of the physical layer necessary to simulate . . . . . . . . . . . . . . . . 17

4.1 Slot Formats of HS-DSCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2 Coding Chain for the HS-DSCH Transport Channel . . . . . . . . . . . . . . . 204.3 Factors affecting the design of channel coding and modulation scheme . . . . . 214.4 LFSR for Cyclic Redundancy Check . . . . . . . . . . . . . . . . . . . . . . . 224.5 Turbo encoder schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.6 The HSDPA Turbo Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.7 A general m-stage shift register with linear feedback . . . . . . . . . . . . . . 254.8 A typical section of a code trellis . . . . . . . . . . . . . . . . . . . . . . . . . 254.9 Structure of Turbo Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.10 Soft Output Viterbi Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 264.11 Soft-Input Soft-Output Turbo Decoder Structure . . . . . . . . . . . . . . . . . 284.12 The Implemented Log-MAP Decoder . . . . . . . . . . . . . . . . . . . . . . 294.13 Constellation Rearrangement for 16-QAM . . . . . . . . . . . . . . . . . . . . 304.14 Adaptive Modulation and Coding . . . . . . . . . . . . . . . . . . . . . . . . . 324.15 Downlink Spreading and Modulation . . . . . . . . . . . . . . . . . . . . . . 344.16 The Additive Noise Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.1 Symbol Error Rate for Modulation Schemes . . . . . . . . . . . . . . . . . . . 365.2 Bit Error Rate for Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . 365.3 Comparison of Frame Error Rates using Log-Map and SOVA algorithms . . . . 375.4 Comparison of Bit Error Rates using Log-Map and SOVA algorithms . . . . . 37

Thesis report 05gr943 iv

LIST OF FIGURES

5.5 Comparison of Bit Error Rates using Log-Map and SOVA algorithms . . . . . 39

6.1 Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436.2 DK Design Suite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446.3 Overview of the process of translating code into hardware using Handel-C [41] 466.4 Design Flow using System Generator . . . . . . . . . . . . . . . . . . . . . . 466.5 System-level Design Flow using DSP Builder . . . . . . . . . . . . . . . . . . 486.6 Generic Co-Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

8.1 Modem Top level specification . . . . . . . . . . . . . . . . . . . . . . . . . . 598.2 Slot Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598.3 Modem Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

C.1 Comparison of Ansi-C and Handel-C . . . . . . . . . . . . . . . . . . . . . . . 71

Thesis report 05gr943 v

List of Tables

2.1 Rugby Model for the AMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.1 Constellation Rearrangement for 16-QAM . . . . . . . . . . . . . . . . . . . . 31

5.1 Simulated Parameters for the System Model . . . . . . . . . . . . . . . . . . . 365.2 Computational complexity comparison of Decoders where M =Encoder Mem-

ory Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.3 Complexity of Log-MAP algorithm in terms of States . . . . . . . . . . . . . . 39

6.1 Modern Co-design Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 51

8.1 Specification for the Modem . . . . . . . . . . . . . . . . . . . . . . . . . . . 588.2 Slot Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Thesis report 05gr943 vi

List of Abbreviations

3GPP 3rd Generation Partnership Project

ACK/NACK ACKnowledement/No ACKnowledgement

AM Acknowledged Mode

AMC Adaptive Modulation and Coding

ASIC Application Specific Integrated Circuit

AWGN Additive White Gaussian Noise

BER Bit Error Rate

CCTrCH Code Composite Transport CHannel

CPLD Complex Programmable Logic Device

CQI Channel Quality Indicator

CRC Cyclic Redundancy Check

CRNC Controlling Radio Network Controller

DCH Dedicated CHannel

DPDCH Dedicated Physical Data Channel

DL DownLink

DRNC Drift Radio Network Controller

Thesis report 05gr943 vii

LIST OF TABLES

DSCH Downlink Shared CHannel

DSE Design Space Exploration

DSP Digital Signal Processor

DTX Discontinous Transmission

EDGE Enhanced Data rates for Global Evolution

EDIF Electronic Design Interchange Format

FCSS Fast Cell Site Selection

FDD Frequency Division Duplex

FEC Forward Error Correction

FER Frame Error Rate

FP Frame Protocol

FPGA Field Programmable Gate Array

FPLA Field Programmable Logic Array

GPRS General Packet Radio Service

GSM Global System for Mobile Communication

HARQ Hybrid Automatic Repeat Request

HDL Hardware Description Language

HI HS-DSCH Indicator

HS-DPCCH High Speed-Dedicated Physical Control CHannel

HS-DSCH High Speed Downlink Shared CHannel

HS-PDSCH High Speed Physical Downlink Shared Channel

Thesis report 05gr943 viii

LIST OF TABLES

HS-SCCH High Speed Shared Control CHannel

HSDPA High Speed Downlink Packet Access

HW/SW Hardware/Software

IP Internet Protocol

L1 Physical Layer of OSI Model / Layer 1

LA Link Adaptation

LFSR Linear Feedback Shift Register

MAC Medium Access Control

MAC-c/sh MAC common/shared

MAC-d MAC dedicated

MAC-hs MAC high speed

MAP Max A Posteri

Mcps Mega Chips Per Second

ML Maximum Likelihood

NSC Nonsystematic Convolutional Coder

OVSF Orthogonal Variable Spreading Factor (codes)

PAL Programmable Array Logic

PDCP Packet Data Convergence Protocol

PDU Protocol Data Unit

PDSCH Physical Downlink Shared Channel

PLA Programmable Logic Array

Thesis report 05gr943 ix

LIST OF TABLES

PN Pseudo Random Noise

QAM Quadrature Amplitude Modulation

QPSK Quadrature Phase Shift Keying

RAM Random Access Memory

RAN Radio Access Memory

RNC Radio Network Controller

RLC Radio Link Control

ROM Read Only Memory

RSC Recursive Systematic Convolutional Coder

RV Redundancy Version

S-RNC Serving - Radio Network Controller

SCCH Shared Control CHannel

SER Symbol Error Rate

SISO Soft In Soft Out

SF Spreading Factor

SOVA Soft Output Viterbi Algorithm

SPLD Simple Programming Logic Device

SRNC Serving Radio Network Controller

SYNC Synchronous

TDD Time Division Duplex

TTI Transmission Time Interval

Thesis report 05gr943 x

LIST OF TABLES

UE User Equipment

UL UpLink

UM Unacknowledgement Mode

UMTS Universal Mobile Communication System

UTRA Universal Terrestrial Radio Access

VoIP Voice over Internet Protocol

VSF Variable Spreading Factor

WCDMA Wideband Code Division Multiple Access

Thesis report 05gr943 xi

Chapter 1Introduction

1.1 BackgroundOne of the main drivers for the development of next generation wireless communication systemsis increasing demand for high rate data services. First and second generation communicationsystems were mainly focused on voice services, while the third generation system is bring-ing a revolution in communications by providing high speed data transfer services. Increasein number of mobile users accessing internet and multimedia based applications demands morebandwidth in wireless networks. Today’s Universal Mobile Telecommunication System (UMTS)offers data services of the acceptable quality, typically reaching data rates of 384Kbps. It canbe foreseen that in the next couple of years, multimedia traffic will overrule voice traffic. By2008, it is expected that multimedia communication will account for 1/4th of the mobile trafficas shown in figure 1.1[1].

Future applications will require even higher data rates, with the role of IP-based data services

Figure 1.1: Evolution of the Mobile Systems [1]

expanding. The ability to combine various services will also increase in importance. Appli-cations such as intranet access and large file downloads result in an ever-increasing amount of

Thesis report 05gr943 1

1.1 Background

data traffic in the mobile radio network. Mobile radio network operators are faced with in-creasing the capacity of their networks so that they can meet the demands of the increased datavolume and provide quality service to subscribers of new services. For these reasons, HighSpeed Downlink Packet Access (HSDPA) has been developed to enhance UMTS technology asspecified by the 3rd Generation Partnership Project (3GPP) Release 5.

In 2nd generation wireless communication systems, computational complexity for the basebandsignal processing in the receiver, comprising equalising, channel and source coding is relativelylow and can be implemented using one Digital Signal Processor (DSP), whereas HSDPA stan-dards comprise of advanced equalising and coding algorithms. In particular, Turbo-Codes aschannel coding scheme pose great demands on the computing power of devices and infrastruc-ture. Along with the higher throughput requirements, the resulting computational complexityhas raised at least an order of magnitude compared to 2G. Dedicated hardware solutions forthe baseband signal processing fulfill the requirements, but often lack the flexibility to supportvarious existing or emerging future communication standards. Therefore efficient implemen-tations on programmable architectures like Field Programmable Gate Arrays (FPGAs) are ofgreat interest.

The extreme flexibility of FPGAs has made them the medium of choice for fast hardware pro-totyping. FPGAs are composed of thousands of small programmable logic cells dynamicallyinterconnected to allow the implementation of any logic function. Tremendous growth in de-vice capacity has made possible implementation of complex functions in FPGAs. In additionFPGAs offer a much faster time to market for time-critical applications and allow post-siliconin-field modification to prototypical or low-volume designs where an Application Specific Inte-grated Circuit (ASIC) is not justified.

Despite growing importance of application-specific FPGA designs, these devices are still dif-ficult to program making them inaccessible to the average developer. The standard practicerequires to express the application in a hardware-oriented language such as Verilog or VHDL,and synthesize the design to hardware using a wide variety of synthesis tools. As optimizationsperformed by synthesis tools are very limited, high-level and global optimizations performedby hand are still needed.

Because of the complexity of synthesis, it is difficult to predict the performance and area char-acteristics of the resulting design [47]. So these systems require, by nature, a heterogeneousspecification and an implementation with heterogeneous architectural styles. Thus in order todeal with these complex architectures and to meet the more severe time-to-market constraints,new system design methods are needed. Hardware/Software (HW/SW) co-design has emergedas a promising approach to cope with this challenge. One of the most important issues of thisapproach is design space exploration (DSE). Where an iterative refinement cycle, at each stepmanually applying transformations, synthesizing the design, examining the results, and modi-fying the design to trade off performance and area are done. In other words, it is important tofind the best system architecture including the right partition between hardware and softwarecomponents and the right hardware components and communication protocols.


1.2 HSDPA System

1.2 HSDPA SystemAs previously stated, the evaluation of the mobile communication market is expected to bringa major increase of data traffic demands combined with high bit rate services. To successfullysatisfy these requirements, third generation systems must increase their spectral efficiency andsupport high user data rates especially on downlink direction of the communication path dueto its heavier load. There exist considerable technologies that can improve the downlink per-formance of the Wide-band code division multiple access (WCDMA) radio interface. One ofthese technologies is High Speed Downlink Packet Access (HSDPA) concept, which has beenincluded by the 3GPP in the specification of the Release 5 as an evolutionary step to boost theWCDMA performance for downlink packet traffic. HSDPA has technological enhancementsthat permit to increase user peak data rates up to 10Mbps. This can be achieved by introductionof a new transmission channel for a user data, the High Speed Downlink Shared Channel (HS-DSCH) and High Speed Control Channel (SH-SCCH), based on the existing downlink sharedchannel (DSCH) and shared control channel (SCCH) respectively and are mentioned in ap-pendix A. Figure 1.2 plots the fundamental features of inclusion and exclusion in HS-DSCH of

AMC

Variable SF

Power Control

Soft Handover

Advanced PS

TTI-2ms

H-ARQ

Multicode Operation

Basic WCDMA Technology

Included in HSDPA

Enhanced in HSDPA

Excluded from

HSDPA

Figure 1.2: Fundamental features to be included and excluded in the HS-DSCH of HSDPA [14]

HSDPA. For this new transport channel, two of the most main features of the WCDMA tech-nology such as closed loop power control and variable spreading factor have been deactivated.

In WCDMA, past power control stabilizes the received signal quality by increasing the transmis-sion power during the fades of the received signal level. This causes peaks in the transmissionpower and subsequent power rise which reduces the total network capacity. Moreover, the op-eration of power control imposes the need of certain headroom in the total node-B transmissionpower to accommodate its variations. The elimination of the power control avoids the powerrise as well as the cell transmission power headroom. But due to the exclusion of power control,HSDPA requires other link adaptation mechanisms to adapt the transmission signal parametersto the continuously varying channel conditions.


1.2 HSDPA System

One of these techniques is denominated Adaptive Modulation and Coding (AMC). Using thistechnique, the modulation and coding rates are adapted to the instantaneous channel qualityinstead of adjusting the power. Variable spreading factor is deactivated because its long-termadjustment to the average propagation conditions is not required anymore. The spreading factor(SF) has been fixed to 16, which gives a good data rate resolution with reasonable complexity[4].

The transmission time interval (TTI) duration is reduced from 10ms in WCDMA down to2ms in HSDPA to minimize the channel quality variations. The fast Hybrid ARQ (H-ARQ)technique is added, which rapidly retransmits the missing transport block and combines thesoft information from the original transmission with any subsequent retransmission before thedecoding process. The network may include additional redundant information which is incre-mentally transmitted in subsequent retransmissions (Incremental Redundancy).

The main focus of this Master Thesis work is the implementation of Adaptive Modulation andCoding technique (AMC) of HSDPA on FPGA using HW/SW co-design.

1.2.1 Adaptive Modulation and Coding (AMC)As mentioned in section 1.2 HSDPA utilizes other link adaptation techniques to substitute powercontrol and variable spreading factor. HSDPA adapts modulation, the coding rate and the num-ber of channelization codes to the instantaneous radio conditions. The combination of the firsttwo mechanisms is called Adaptive Modulation and Coding (AMC). This ensures that each sub-scriber is provided optimum service. For example, a subscriber close to the base station needsonly minimal protection against transmission errors because the transmission quality is so good.This subscriber can also receive a higher data rate than a subscriber at the edge of the cell.

Each mobile phone regularly sends messages to the base station about the channel quality.The base station uses this information to decide how many resources can be made available toa subscriber and which modulation mode can be selected. This process improves the protectionagainst transmission errors and optimises the use of resources on the air interface.

Besides QPSK, HSDPA incorporates the 16-QAM modulation to increase the peak data ratesfor users served under favorable radio conditions. Support for QPSK is mandatory for the mo-bile, though the support for 16-QAM is optional for the network and the UE [5]. The inclusionof the higher order modulation introduces some complexity challenges for the receiver terminal,which leads to estimate the relative amplitude of the received symbols, where as it only requiresthe detection of the signal phase in the QPSK case.

The channel coding is based on the Release’99 1/3 turbo encoder and is always 1/3 rate (forevery bit that goes into the coder, three bits come out). The effective code rate varies, however,depending on the parameters applied during the two-stage Hybrid Automatic Repeat Request(HARQ) rate matching process.


1.3 Scope

1.3 ScopeConsidering the advantages of the FPGA technology mentioned in section 1.1, the scope ofthe project is to design the AMC scheme of HSDPA, according to 3GPP standards [3] and isdivided into two phases as described below.

Analytical Part : A generic AMC scheme as explained in Section 1.3.1 is studied and mod-elled. The system is simulated and verified in Matlab. To test the performance and verify theend results of the system, demodulation and turbo decoder (Log-Map algorithm) are modelledand simulated.

Implementation part : The execution time of the Matlab code in the Analytical part is opti-mized. Algorithmic level characteristics and parallelism are explored by design space explo-ration. HW/SW partitioning is done and HW/SW co-design techniques are applied. Differentdesign methodologies are considered during the design process. Two different approaches wereconsidered. Firstly, he simulated Matlab/C code is converted to Handel-C code and synthesis isperformed. Place and route is finally performed on a chosen hardware platform (FPGA). Sec-ondly, simulink is used along with system generator and DSP Builder EDA tools from the Xilinxand Altera respectively. Thus, place and route can be done. The details would be explained inthe section ??.

1.3.1 Physical Layer OperationAs described in the previous section, the analytical part of the project with the algorithm ofthe simplified downlink transmitter structure for a HS-DSCH is depicted in figure 1.3. Theoperation has the following stages:

CRC addition

Bit Scrambling

Turbo encoding

Rate Matching

LA/PS control

Interleaving &Constellation arrangement

Physical Channel

segmentationModulation

Physical Channel Mapping

Channel

DemodulationPhysical channel

desegmentation

Channel Decoder

CRC detachment

Figure 1.3: Physical Layer Operation

• A 24 bit CRC (Cyclic Redundancy Checking) is attached to the data packet in order toprovide packet error detection capabilities.

• The packet is scrambled to minimize the probability of having a long sequence of bitswith the same value, i.e. this block is used to increase the entropy of the source bits.


1.3 Scope

• The scrambled data packet is feed through a rate 1/3 turbo encoder, which provides alarge degree of Forward Error Correction (FEC) potential.

• To allow flexible data rate adjustment, the amount of coding protection can be adjustedby the rate matcher block through either repetition or puncturing.

• After rate matching, the encoded data packet is split into the number of physical channelsallocated for the UE (the HS-PDSCH multi-code operation).

• The interleaver shuffles the bits in a well-defined manner. Thus, the bit errors will bedistributed evenly over the packet so that the decoder operates optimally.

• When 16QAM is used as the modulation scheme, two of four bits constructing the re-ceived symbols will have a higher probability of error than the other two bits. In order tocompensate this effect, it is possible to use constellation re-arrangement for retransmis-sions, which provides the same average level of error probability after the retransmissioncombining.

• The data bits are then mapped into symbols. The modulation schemes used by HSDPAare QPSK and 16QAM, which provide 2 and 4 bits per symbol respectively.

• Finally, the modulated symbols for the physical channels are fed to the spreading units,where each physical channel is spread using a spreading factor of 16, which is fixed forHSDPA.

1.3.2 ActivitiesThe activities of the thesis work is realizing an HSDPA channel coding system are

• Analyze and design the Adaptive Modulation and Coding model for HSDPA.

• Simulation and verification using Matlab followed by optimisation.

• Design Space Exploration for the algorithmic implementation.

• Optimizing the HW/SW partitioning

• Use of a hardware/software co-design methodology based on the Rugby meta-model.

• To use the Handel-C language to design HSDPA channel coding system.

• Increase the speed of the design by making use of parallel operations.

• Synthesize the Handel-C core design and transfer the hardware platform on the XilinxVirtex-II and Altera Stratix FPGAs and compare the performance.

• Use system generator from Xilinx and DSP Builder from Altera and compare the perfor-mances.


1.3 Scope

1.3.3 Limitations and Assumptions• Downlink scenario is considered.

• Link Adaptation and Packet Scheduler (LA/PS) (higher layer) control depicted in theblock diagram 1.3 is not applied as its information is not available.

• Physical channel segmentation and scrambling are not simulated in channel coding chain.

• AWGN channel is used for transmission channel.

• Parameters used for the upper layers are considered constants.

• Synchronization issues between transmitter and receiver are not considered.

• IP cores and VHDL language is used for the implementation other than Handel-C lan-guage.


Chapter 2Design Framework

2.1 A3 FrameworkA framework has been considered for the initial start of the project for the whole system de-scription. This is illustrated in figure 2.1 relates the algorithmic part and the implementationpart. This framework consists of three domains, namely Application, Algorithm, and Architec-ture. Using the first letters of each of the three domains, the framework is symbolically calledA3 framework. It states that there is more than one way to express the given application throughalgorithms. There is a one-to-many mapping from algorithm to architecture i.e., it is possible toimplement the functionality in more than one type of architecture.

As shown in the figure 2.1 the application is the HSDPA physical layer. And here the al-gorithm considered is the adaptive modulation and coding (AMC). The oval in the algorithmcircle shows the AMC algorithm and in which there are sub-algorithms such as channel cod-ing and modulation. So these algorithms are mapped to the architecture domain i.e., the targetplatform such as FPGA, ASIC, DSP. But in the project the target platform considered is theFPGA. Even here, there are different architectures available from different vendors. XilinxVirtex XC2V3000 FG676-4 and Altera Stratix EP1S10F780C6 FPGA architectures are used.There is a feedback in each mapping process for verification as shown in the figure 2.1.

The main area of concern is the algorithm to architecture mapping where design space ex-ploration and different design methodologies are applied. The sections 2.1.1 and 2.1.2 presentsthe design model and methodologies used in the project.

2.1.1 Design Model (Rugby Meta-Model)A model specifies the functionality of a system which can be described in detail exactly howthe system will work. To obtain efficient implementations of an algorithm, the implementationspace has to be explored on multiple abstraction levels. To express designs, design process anddesign tools, a model named Rugby meta-model is used. The model has similar objectives likethat of Y-chart [45] but its scope is extended to complex systems requiring concurrent processesand mixed HW/SW implementations [43]. This model has four domains namely, computation,


2.1 A3 Framework

Adaptive Modulation and Coding

Application

Algorithm

Architecture

One to many

One to many

HSDPA(Physical Layer)

Feedback

Target Platform (FPGA)(Xilinx Virtex2 XC2V3000-4fg676 &

Altera Stratix EP1S10F780C6)

Feed

back

Feedback

Figure 2.1: The A3 framework with focus on Algorithmic and Architecture Mapping

communication, data and time. A domain is an aspect of a model which can logically be anal-ysed independently from other aspects. This model is able to express mixed HW/SW designsand design processes at various levels of abstraction. Hierarchy, abstraction and domains arethe three important concepts for understanding Rugby Meta-model. Hierarchy is partitioningof a design model such that the details of each part are concealed in a lower hierarchical level.Hierarchy defines the amount of information presented and visible at the particular level. Anabstraction level defines the modelling concepts and their semantics for representing a system.Different types of information are available at various levels. A domain focuses on one designaspect of a model that can be logically analyzed independently from other aspects. Figure 2.2shows the generic Rugby Meta-model.

The four domains of the Rugby model allow analysis of different aspects of the model inde-pendently at different level of abstractions. As shown in figure (2.3), the abstraction level of themodel decreases during the design stages. The design process starts with an idea and finisheswith a low-level description of the hardware part and compiled software. An important propertyof the meta model is that it is designed for HW/SW systems. It is quite obvious that differentabstractions are required for hardware and software descriptions, as for example the lowest soft-ware description is a source code compiled in its target instruction set, whereas hardware can gofurther to transistor level. Rugby’s four domains are chosen to model different aspects of con-current processes and mixed hardware/software systems and analyze their different problems.To represent different implementation technologies, such as hardware and software, domainlines are split. At higher levels, similar concepts such as communication processes and abstract


2.1 A3 Framework

Electronic System Design Lab 21 March 2000

7 of 19

4: The Rugby Model

Y chart is biased towards hardware implementation because its three domains, behavioural, struc-

tural and physical, are inadequate to represent concepts such as inter-process communication, timingbehaviour of systems, and various types of data abstractions and data encoding. To model differentaspects of concurrent processes and mixed HW/SW systems and analyse their different problems, wechoose Computation, Communication, Time, and Data as the four domains in our model (see figure 4).

In order to represent different implementation technologies such as hardware and software, weallow domain lines to be split. Hardware and software, when close to realization, use very differentconcepts. For instance an assembler program and a netlist of gates are very different which should beappreciated in a meta-model like Rugby. On the other hand, at higher levels hardware and softwaredevelopers use very similar concepts such as communicating processes and abstract data types and cantherefore be treated uniformly. Thus, at higher levels from requirements definition to communicatingprocesses there is only one line in each domain representing the system level. This is independent fromthe later implementation which can be hardware, software or a mixture of both. At lower levels thecharacteristics of the implementation technology becomes dominant which requires separate domainlines as illustrated in figure 5.

4.1: Justification of DomainsThe computation domain is derived from Y chart’s behavioural domain but is more restrictive and

focuses on the way the results are computed independent of the exact data types involved and from theexact timing behaviour of the computation.

For software there is a long tradition to model data types explicitly with modelling concepts suchas entity-relationship diagrams, and in the hardware community data structures have evolved from bitvectors to more complex data structures like arrays, records, linked lists etc. This is reflected in a flurryof research activity in this domain. Thus, for the modelling of mixed HW/SW systems it is desirable totreat data and data types as an independent aspect.

Time is a crucial design characteristic which deserves independent analysis. Many electronic sys-tems are expected to be reactive real-time systems or have hard real time constraints. Furthermore,numerous publications on how to model time illustrate that it is not bound to particular kinds of com-putation, but it is rather independent.

Complex systems are naturally modelled as communicating concurrent processes. Refining theseabstract communications to intra and inter component (ASICs, processor cores, memories etc.) com-munication primitives is a major part of the design effort and is now being treated as a research prob-lem [5]. Furthermore, languages and notations that were not main-stream in the hardware designcommunity, are being explored to specify communication dominated functionality [6, 7].

IdeaPhysicalsystem

Computation

Communication

Time

Data

Development time line

Highabstraction Low

abstraction

Figure 4. The Rugby modelFigure 2.2: Generic Rugby Meta-Model

data types, hardware and software can be treated uniformly. Thus, at higher levels, from re-quirements definition to communicating processes, only one line in each domain represents thesystem level. This is independent of the later implementation, which can be hardware, soft-ware or combination of both. At lower levels, the implementation technology’s characteristicsbecome dominant, requiring separate domain lines as shown in the figure (2.3).

Representing AMC in Rugby Meta Model

The requirements include functional constraints, interface constraints, data constraints, and per-formance constraints. The system specification gives a description of the system model usinga high-level language. Also, implementation requirements are provided in the form of compu-tational or memory usage constraints, performance constraints such as frames per second, datarate, energy consumption, etc.

The system specification corresponds to the initial stage of the Rugby model. In the timedomain, time is expressed in terms of performance constraints, communication is defined bydata relationships, computation is given by the overall algorithm and data domain is specifiedby abstract data types. In order to proceed with the implementation in hardware, the systemspecification is expressed by an internal behavioral description.

The partitioning introduces parallelism into the system. However, the parallelism is still ab-stract because it is still not clear which tasks are going to execute concurrently yet. However,they can be divided into concurrent processes. This relates to a lower abstraction level of thecomputation domain. In the Rugby model, partitioning is the step when the system is split intoseparate concurrent processes and communication between the processes is defined.


2.1 A3 Framework

Figure 2.3: Rugby Meta-Model

The software descriptive elements are compiled for their respective processor instruction sets.The parts of the system that are allocated onto hardware processing elements are synthesizedfrom their corresponding hardware descriptions. The functional description is converted intoa data path by using block representation and finally synthesized to a gate or transistor leveldescription. The time domain is expressed in clock and physical form. Inter-node communica-tions have to be synthesized into buses between processing units or data flow topology insidethe PEs. The abstraction for data types is lowered by redefining exact data format in terms ofbits or analog values. In the final HW/SW integration step, the hardware description blocks areintegrated together with the software and interface elements. This process is shown in the table(2.1).

Abstraction levelsDesign Modelling domains

Computation Communication Data TimeRequirements Functional con-

straintsInterface con-straints

AMC data Performanceconstraints

System Model Algorithm Inter processcommunication

symbols Causality

SWMatlab Algorithm Parameter pass-

ingSymbols /numbers

Causality

Handel-Ccode

Algorithm Parameter pass-ing

Processordata types

Processsorcycle time

HWVHDL Algorithm/FSM topology Symbols /

bitsclocked

SynthesizedNetlist

Logic blocks topology bits Physicaltime

Table 2.1: Rugby Model for the AMC


2.2 The Design Approach

2.1.2 Design MethodologyA methodology is a set of models and transformations, possibly implemented by design tools,that refines the abstract, or behavioral implementation description. Based on the specification,three steps in the methodology is to be performed prior to any implementation [46].

1. Detailed algorithm analysis with respect to its computational complexity as well as data-transfer and storage demands

2. Extraction and exploration of the inherent algorithmic parallelism

3. Algorithmic transformation, i.e., to decrease the implementation complexity or increasethe algorithmic parallelism.

During each design step, the design model will be statically analysed to estimate certain qualitymetrics and how they satisfy the constraints. Based on these steps, two methodologies, DKmethodology and Spec C (graphical representation) are used in this project to carry the designprocess as explained in the chapter ??

2.2 The Design ApproachThe proposed design approach is composed of three main steps as illustrated in figure 2.4 : 1)algorithm design with Matlab. 2) algorithmic-level characterization and parallelism explorationusing Design-Trotter SoC framework. 3) HW/SW partitioning and synthesis performed withCeloxicaŠs "DK4 Design Suite" and, 4) Implementing on hardware platform using the toolsprovided by Xilinx and Altera. The features of the design approach are as follows:

Design Space Exploration

Design space exploration is done using an academic tool Design Trotter. This provides infor-mation regarding the functionalities of the system where parallelism can be deployed. Thus byusing the area-time trade off curves HW/SW partitioning is done.

Hardware-Software Co-Design

The main issues of the design is presented below:

• Concurrent design of HW and SW components of the channel coding system.

• Development from algorithm to implementation using Handel-C, so that the developmenttime can be reduced compared to the traditional methods.

• Allow new possibilities in HW by exploiting reconfigurability of the FPGAs and the easy-to-use language tool.


2.2 The Design Approach

Algorithm Design with Matlab

1. Simulations2. Verification3. Matlab to C /Simulink

Conversion

Design Space Exploration

1. Parallelism2. Trade-off curves3. HW/SW partitioning

Hardware Implementation

1. C to Handel-C 1. Simulink blocks 2. Handel-C to VHDL/EDIF with DK suite 2. .mdl to hdl3. ISE/Quartus tools 3. Synthesis4. Program Device 4. ISE/Quartus tools5. Trade-off curves 5. Program device

Figure 2.4: The Design Approach

Handel-C DK4 Suite from Celoxica

The DK4 suite is an official product of Celoxica, which features Handel-C language. Handel-Cprovides numerous benefits, few of them are mentioned in the appendix C.

Target Platform

System need to meet a requirement such as processing speed, it is these stringent requirementsthat ultimately drive the choice of the hardware platform.

FPGAs provide an ideal implementation platform for developing systems such as AMC. State-of-the-art high-end, high performance FPGAs are usually used to accelerate performance andenable new functionality.

The target FPGAs are Xilinx Virtex-II and Altera Stratix. The functional description of thetarget platforms would be presented in the appendix.

This chapter gives a brief introduction to the application i.e., HSDPA system and AMC codingchain. The scope and the activities gives the aim of the project. Considered design framework,Rugby meta-model and the design approach are discussed.


Part I

Analytical Model


Chapter 3System Specification

The idea of the system is in the previous chapter. As moving towards the physical systemimplementation, this chapter presents the specification, system model and the algorithm stepsby using the rugby meta-model as shown in figure 3.1. These three steps combined togetherform the analytical model in this project. The project specification, as shown in figure 3.1 isdiscussed in section 3.1, the next steps i.e., system model and the algorithms are described insection 4.2. In the section ??, simultion results are presented obtained using Matlab. The later

Figure 3.1: Rugby Meta-Model

steps in the rugby meta-model are described in the implementation part 7 leading to the finalphysical system.

3.1 System SpecificationIn chapter 1 the HSDPA was introduced as one of the candidates for higher data rates in mobilecommunication. The goals as per the 3GPP standards [2][3][5] were also outlined and espe-


3.1 System Specification

cially the goal of adaptive techniques used for modulation and channel coding. The AdaptiveModulation and Coding (AMC) as mentioned in section 1.2.1 system is chosen for simulation.Generalising the WCDMA physical layer system, it is possible to verify using this specificsystem with a similar bandwidth.

3.1.1 Testbed SpecificationFor testing the FEC (Forward Error Correction) coding and modulation schemes and the cor-responding decoder/demodulator it is important to have a precise model of the system. Thismodel must at the same time be flexible enough to allow a fast implementation and can give afast results for the coding scheme’s performance.

The following are the two factors which will be weighted in this project,

• Flexible ImplementationIt is chosen to implement the testbed in Matlab. Furthermore it is possible to implementthe coded blocks in C language. Thus it is possible to generate the VHDL / Handel-Ccode for these algorithms for FPGA implementation. Matlab (version 7.1) is available atEmbedded Systems lab, Aalborg University. Even simulink was used for the implemen-tation along with EDA tools such as systemgenerator and dspbuilder from Xilinx andAltera companies respectively.

• Precise ModelImplementing all the details and blocks in the system results in high complexity andlarge simulation times. Therefore, limiting the number of bits tested in the simulationthus reducing the complexity. The model is made with a minimum required details, stillgiving usable precise result.

AMC has many aspects which relate to network matters and higher layer protocols. Here, onlythe physical layer is considered. A model of the physical layer is illustrated in figure 3.2. Themodel contains the blocks necessary to simulate the coding system performance. Multiple codechannels are left out, although they may influence the result. When more code channels areused, the delay spread on the channel and imperfectness of the channel can introduce somecross correlation between the coding channels. This cross correlation component will decreasethe performance of the demodulator, thus raising the BER. Adding more users to the systemwill correspond to higher BER. By using one code channel the results are easier to generalise.Another approach would be to simulate the system at different bit rates, but this informationwill be possible to calculate using the results from only one code channel, so this approach isnot applied.


3.1 System Specification

Payl

oad

CR

C a

ttach

men

t

16-Q

AM

16-Q

AM

QPS

K

QPS

K

Cha

nnel

In

terle

aver

Punc

turin

gTu

rbo

Enco

ding

CR

C

detta

chm

ent

SOVA

alg

orith

m

Log-

Map

(SIS

O)

algo

rithm

Dei

nter

leav

er

Cha

nnel

Noi

se +

In

terfe

renc

e

Rec

eive

d Bi

t se

quen

ce

Mod

ulat

ion

Dem

odul

atio

n

FEC

Cod

ing

Dec

odin

g

Figure 3.2: Blocks of the physical layer necessary to simulate


Chapter 4Principle of Adaptive Modulation andCoding (AMC)

In cellular communication systems, the quality of a signal received by a UE depends on numberof factors like the distance between the desired and interfering base stations, path loss exponent,multipath, log-normal shadowing, short term Rayleigh fading and noise. In order to improve thesystem capacity, peak data rate and coverage reliability, the signal transmitted to and by a par-ticular user is modified to account for the signal quality variation through a process commonlyreferred to as link adaptation [11]. Traditionally, CDMA systems have used fast power controlas the preferred method for link adaptation. Recently, AMC have offered an alternative linkadaptation method that promises to raise the overall system capacity [10]. AMC provides theflexibility to match the modulation-coding scheme to the average channel conditions for eachuser. With AMC, the power of the transmitted signal is held constant over a frame interval,and the modulation and coding format is changed to match the current received signal qualityor channel conditions. In a system with AMC, users close to the Node-B are typically assignedhigher order modulation with higher code rates (e.g. 16-QAM with R = 1/3 turbo codes), butthe modulation order and/or code rate will decrease as the distance from Node-B increases.

In order to implement the HSDPA functionalities for AMC, Release 5 of 3GPP [3] hasintroduced a new downlink transport channel for best effort packet data, called HS − DSCH(High Speed Downlink Shared Channel) and related signalling channels apart from the existingchannels as described in [2].

4.1 HS-DSCH ChannelHS−DSCH (High Speed Downlink Shared Channel) is terminated in the Node-B and is con-trolled by MAC − hs [2][3]. HS − DSCH is shared by several UEs, transmitted over theentire cell or over only part of the cell using e.g. beam-forming antennas.

HS-DSCH uses a shorter TTI (Transmission Time Interval) equal to 2ms (equivalent to onesubframe of three slots i.e.7680 chips) compared to 10ms dedicated channel TTI as shown inthe figure 4.1. Also, there is one transport block per HS-DSCH TTI. Shorter TTI helps in reduc-


4.2 Simulation Model

Figure 4.1: Slot Formats of HS-DSCH

ing the delay as it reduces the retransmission time. Also it helps in designing a finer schedulingprocess as it can track channel variations in a better way. HS-DSCH is allocated to several userson the basis of HS-DSCH TTI i.e. within one TTI, HS-DSCH code resource is usually allocatedto a single user, and however, two or more users can use the HS-DSCH channel within oneTTI by sharing the code resource. In other words, multiplexing of multiple users for HS-DSCHtransmission is allowed in code-domain as well as in time-domain. The corresponding physicalchannel is denoted by HS-PDSCH (High Speed Physical Downlink Shared Channel).

4.2 Simulation ModelThe blocks of the specification presented in figure 3.2 is elaborated in this section.

4.3 Layer 1 Payload and ChannelThe input to the system (figure 3.2) is random bit sequence with an equal probability of 0.5for 1 and 0. The bit sequence represents information bits which has to be transmitted over thechannel with a BER in the order of 10−4.

4.4 Forward Error Correction / Channel CodingForward error correcting (FEC) channel codes are commonly used to improve the energy ef-ficiency of wireless communication systems. On the transmitter side, an FEC encoder addsredundancy to the data in the form of parity information. Then, at the receiver, a FEC decoderis able to exploit the redundancy in such a way that a reasonable number of channel errors canbe corrected. The various factors that affect the channel coding design are shown in the figure4.3.

For example, given a certain transmission channel, it is feasible to design a coding and modula-tion system, which can further reduce the BER achieved. This typically implies increased imple-mentational complexity and coding/interleaving delay as well as reduced effective throughput.

Different solutions are possible when optimising different coding features. For example, inmany applications the most important coding parameter is the achievable coding gain, whichquantifies the amount of bit-energy reduction attained by a codec at a certain target BER.


4.4 Forward Error Correction / Channel Coding

Figure 4.2: Coding Chain for the HS-DSCH Transport Channel


4.4 Forward Error Correction / Channel Coding

Viewing this system optimisation problem from a different perspective, it is feasible to transmitat a higher bit rate in a given fixed bandwidth by increasing the number of bits per modulatedsymbol. However, when aiming for a given target BER, the channel coding rate has to be re-duced, in order to increase the transmission integrity [22]. Naturally, this reduces the effectivethroughput of the system and results in an overall increased system complexity. When the chan-nel’s characteristic and the associated bit error statistics change, different solutions may becomemore attractive. According to 3GPP specifications, these factors were considered and differentFECs were presented, such as convolutional coder with bit rate of 1/2 and 1/3 and turbo coderwith bit rate 1/3 [3]. But for HS-DSCH channel, turbo coder with bit rate of 1/3 is applied asFEC. But the bit rate could be varied according to the required BER by applying puncturing,which would be detailed in the later sections. Different sub-blocks of turbo encoder is explainedin sections 4.4.1 to 4.5.1.

Figure 4.3: Factors affecting the design of channel coding and modulation scheme

4.4.1 Cyclic Redundancy CheckCyclic Redundancy Check (CRC) codes are used for error detection on decoded transportblocks. CRC is an extension of the parity checking codes, instead of adding one bit to a block ofdata, several bits are added. This gives the ability to reliably detect longer error sequences, anda lower chance of undetected errors when the error detection capability of code is exceeded [15].

As per 3GPP, there are different cyclic generator sequences, but for this specific application[3], the parity bits are generated by the cyclic generator sequence as in the equation 4.1

gCRC24(D) = D24 + D23 + D6 + D5 + D + 1 (4.1)

Denoting the bits in transport layer by aim1, aim2, aim3, . . . , aimAi, and the parity bits by

pim1, pim2, pim3,. . . , pimLi. Ai is the size of the transport block i, m is the transport block num-

ber and Li is the number of parity bits. Li takes the value 24. The encoding is performed in thefollowing manner. The polynomial aim1D

Ai+23 + aim2DAi+22 + . . . + aimAi

D24 + pim1D23 +


4.5 Turbo Coder

pim2D22 + . . . + pim23D

1 + pim24 is recursively divided by gCRC24 mentioned in the equation4.1 gives a remainder of 0.

With correct choice of parameters, and Li bit, CRC allows the detection of all error sequencesof Li bits or less, and longer error sequences have only a one in 2Li chances of escaping detec-tion. CRC’s ability to detect all error sequences of less than a given length is useful becausein communication systems, especially in channel coding where errors tend to occur in bursts.A CRC of a given length Li is added independently to each transport block (TrCH) with thetransport channel before the transport blocks are concatenated and coded.

CRCs can be implemented as a linear feedback shift register (LFSR), with one element for

Figure 4.4: LFSR for Cyclic Redundancy Check

each bit. The figure 4.4 shows a shift register used to generate the 24 bit CRC. Each connectionin the register corresponds to a term in the generator polynomial. For each element in the shiftregister, the corresponding power of D is present in the generator polynomial the connection tothe next element is thorough an exclusive OR gates (XOR).

A CRC check can be implemented by setting the initial state of shift register to all 0, and thenshift the Ai received data bits to the register. The final state of the register will give the CRC bitsthat would have been sent if the received data bits were transmitted. These bits are compared tothe received bits; if the two sets of bits are the same then there have been no detected errors inthe received bits. This is called as CRC pass. If they are different then there has been at leastone error amongst the Ai data bits or Li parity bits which is called as CRC fail.

4.5 Turbo CoderTurbo codes [18] are used for error control, especially in wireless systems. A turbo encoderconsists of two recursive systematic convolutional component encoders connected in paralleland separated by a pseudo random interleaver. A turbo decoder, which is iterative, is typicallybased on either a Soft Output Viterbi Algorithm (SOVA) or a Maximum A Posteriori (MAP)algorithm. MAP is roughly three times more computationally complex than SOVA, but provides0.5dB of coding gain [19].


4.5 Turbo Coder

Figure 4.5: Turbo encoder schematic

4.5.1 Turbo EncoderA turbo encoder consists of two "recursive systematic convolutional" encoders (RSC), con-nected in parallel [19]. The input to the second encoder is an interleaved version of the inputto the first encoder. This structure is called parallel because the input to both of the encodersis the same set of bits as shown in the figure 4.5, rather than the output of one being the inputto the other, as in serial concatenated convolutional codes (SCCC). The component codes in aturbo encoder are two recursive systematic convolutional encoders. The are called "systematic"because one of the outputs of the encoders is the input itself, whereas the property "recursive"comes due to the presence of a feedback loop in the encoders. A rate 1/3 turbo encoder is shownin figure 4.6.

The generator matrix of a rate 1/2 component code can be represented as

G(D) =

[1

g1(D)

g0(D)

], (4.2)

where D represents the delay and g0(D) and g1(D) are the feedback and feed forward poly-nomials respectively. The degree of these polynomials is n, which depends on the number ofdelays in the convolutional encoders. As shown in figure 4.6, the same set of input bits is en-coded twice, once by each component encoder. The input to the first encoder is the originalinput sequence x, whereas the input to the second encoder is the interleaved version of x, de-noted by x. The first output of the turbo encoder, which is the first output of the first encoder aswell, is the original input sequence itself, denoted by y0. The second output of the first encoderis the parity information y1. For the second encoder we are only concerned with the parity out-put y2 and not with the systematic bits. Hence the three outputs y0, y1 and y2 are multiplexed toform the resultant output of the turbo encoder. Hence the resulting rate of this turbo encoder is1/3.

4.5.2 Interleaving for Turbo CodingTn turbo coding, interleaving is employed before the information data is encoded by the secondcomponent encoder. In general, the interleaver size N is significantly larger than the codememory and the interleaver vector elements are chosen randomly [18]. The interleaver playsthree important roles in turbo encoders. Firstly, it generates long block codes by using small


4.5 Turbo Coder

Figure 4.6: The HSDPA Turbo Encoder

memory convolutional codes, and hence achieves a coding gain. Secondly, it decorrelates theinputs to the two encoders, so that an iterative suboptimum decoding algorithm can be applied,based on the information exchange between the two encoders. Due to the use of interleaver,there is a high probability that all the three output bits corresponding to an input bit are notcorrupted at the same time (burst error), and hence after correcting some errors in the firstdecoder, a few more errors can be corrected in the second decoder. This process can be carriedout iteratively, exchanging information between the two decoders again and again, and hencecorrecting more errors in every iteration. The interleaving must be available at the decodersas well, in order for the decoding to be performed correctly. Finally, breaks the low weightinput sequences, and hence increase the code free Hamming distance or reduce the numberof codewords with small distances in the code distance spectrum. The most commonly usedinterleavers are pseudo-random interleavers as mentioned in the following section.

Random Interleavers

In the random interleaver a block of N input bits is read into the interleaver and read out ran-domly. The interleaver vector π(i), i ∈ 1, 2, . . . , N , can be generated according to the followingalgorithm, which requires N steps:

• Step 1: Choose randomly an integer i1 from the set A = {1, 2, . . . , N}, according to auniform distribution between 1 and N , with the probability of p(i1) = 1/N . The choseninteger i1 is set to be π(1).

• Step k: (k > 1) Choose randomly an integer ik from the set Ak = {i ∈ A, i 6=i1, i2, . . . , ik−1}, according to a uniform distribution, with the probability of p(ik) =1/(N − k + 1). The chosen integer ik is set to be π(k).

When k = N , the last integer iN is set to be π(N).When the interleaver size is N = 2m − 1, a pseudo-random interleaver can be generated bym-stage shift register with linear feedback as illustrated in figure 4.7. The feedback polynomialrepresented by

1 + a1D + a2D2 + . . . + amDm (4.3)

If the initial stage of the shift register is not the all-zero state, the shift register will go through


4.5 Turbo Coder

Figure 4.7: A general m-stage shift register with linear feedback

all 2m − 1 states cyclically. Therefore, the state of the m-stage shift register can represent theinterleaving function. A disadvantage of random interleavers is that it requires different inter-leave and deinterleave unit with separate hardware and lookup tables.

4.5.3 PuncturingAfter encoding, puncturing is performed on the encoded bitstream. In order to provide unequalerror protection to the encoded bitstream, different levels of puncturing are applied at differentparts of the encoded bitstream. The puncturing is employed such that the higher rate codesare embedded in the lower rate codes. The codes generated by this scheme of puncturing arecalled rate compatible punctured codes. In this report rate compatible punctured turbo codes areemployed to provide different levels of error protection to the different parts of the bitstream.All the higher level codes were derived from the same rate 1/3 mother code, produced by theencoder shown in figure 4.6. According to [21], changing the rate from R = 1/3 to R = 1/2by deleting half the check bits will at a BER of 10−3 reduces the coding gain with 0.5dB.

Figure 4.8: A typical section of a code trellis


4.5 Turbo Coder

Figure 4.9: Structure of Turbo Decoder

4.5.4 Turbo DecodingThe decoding of the turbo codes can be performed using the Maximum A Posteriori (MAP)algorithm or the Maximum Likelihood (ML) algorithm based on the overall trellis of the code.However, these methods can be implemented only for short interleavers and are too complexfor medium and long interleavers. One of the important practical features of turbo codes is theuse of a simple suboptimum algorithm for decoding. One important reason why simple MAPdecoding is not practical is that the overall trellis for the turbo codes is time varying and thenumber of states grows exponentially with the size of the interleaver in the turbo codes. Hencethe MAP algorithm can only be used to decode in the case of very short interleavers. Due tothese reasons, iterative decoding algorithms are used to decode the turbo codes. Different iter-ative decoding algorithms like the Iterative Log-Map [18] and the Iterative Soft Output ViterbiAlgorithm (SOVA) [18][19] are used to decode the turbo codes. In this report both algorithmsare briefly explained. The figure 4.9 shows the structure of a turbo decoder. It can be noticethat a feedback path exists from decoder 2 to decoder 1, it is used when several iterations aremade on the code. To make iterative decoding possible it is necessary to use soft in/soft out(SISO) decoder algorithms for the component decoders. In this report both iterative SOVA andLog-Map algorithms are presented.

Figure 4.10: Soft Output Viterbi Algorithm


4.5 Turbo Coder

Decoding of turbo codes using iterative SOVA

The block diagram of the iterative SOVA is shown in figure 4.10. The input to the first decoderis the received bits corresponding to the actual information (systematic) bits, and the output ofthe first component encoder, represented by r0 and r1 respectively. This first decoder generatesa soft estimate of the output and the extrinsic information

∧1e. This extrinsic information is

passed through the same interleaver as the one used in the encoder and is passed to the seconddecoder, to be used as an estimate of the priori probability by the second decoder. The inputsto the second decoder are the received bits corresponding to the output of the second encoder,represented by r2, and r0 passed through the same interleaver as used at the encoder, representedby r0. The second decoder also generates a soft decision

∧2 and the extrinsic information

∧2e.

This extrinsic information is then passed back to the first encoder after deinterleaving, as ana priori estimate for the next decoding iteration. The decoder generates a hard decision afterthe desired number of iterations I are performed and then deinterleaves it and passes it to theoutput.

Decoding of turbo codes using Log-Map Algorithm

A Turbo decoder consists of two soft-in/soft-out (SISO) decoders, each corresponding to oneof the constituent codes. Decoding is done data-block wise in an iterative process with usuallyfive to ten iterations. During this process the SISO decoders exchange so-called a priori infor-mation. In the following we assume the SISO decoders being realized by using the Maximum APriori (MAP) algorithm, which outperforms the SOVA. To avoid numerical problems withoutdegrading the decoding performance it is mandatory to implement the MAP algorithm. MAPalgorithm is an algorithm for estimating random parameters with prior distributions. Here it isused to estimate the most likely information bit to have been transmitted in a coded sequence.The first simplification to the computations of the MAP decoder algorithm is to operate in logdomain. This converts all multiplications to addition and divisions to subtractions and elimi-nates exponentials entirely without affecting BER performance, thus reducing the number ofinstructions required. Resulting in a key operation of ln(eδ1 + eδ2 ), which is in the contextof Turbo decoders usually written as max ∗ (δ1, δ2). Using the Jacobian logarithm this term istransformed into:

max ∗ (δ1, δ2) = max(δ1, δ2) + ln(1 + e−|δ1−δ2|), (4.4)

where the approximation max∗(δ1, δ2) ≈ max(δ1, δ2) is assumed for the therefore sub-optimalMax-Log-MAP decoder. The total number of states M is given by the constraint length K ofthe code M = 2K − 1, and the successors/predecessors (i, j) of an encoder state (m) dependon the component code structure. L is the size of the data block, which is defined in 3GPPstandard in the range of 40 . . . 5114 bits.

A turbo decoder consists of two soft-input soft-output (SISO) decoders and one interleaver/deinterleaver between them. Decoding process in a turbo decoder is performed iterativelythrough the two SISO decoders via the interleaver and the deinterleaver. Figure 4.11 showsthe structure of a turbo decoder, where I and D denote interleaver and deinterleaver, respec-tively. The input symbols, x and y1, and a priori value, Le1 (initial value is zero), are used fordecoding process in the SISO decoder 1 that produces log-likelihood ratio, Llr1, and a priori


4.5 Turbo Coder

Figure 4.11: Soft-Input Soft-Output Turbo Decoder Structure

value, Le2. Then, the input symbol data, x (via interleaver) and y2, and a priori value, Le2 (in-terleaved value of Le2), are used in the SISO decoder 2 that produces Le1 for the SISO decoder1 and soft-output value, Llr2. Turbo decoder can achieve high performance in terms of BER atvery low SNR with iterating these processes.

4.5.5 The MAP and Log-MAP AlgorithmsThe MAP algorithm is optimum but also that it is computationally complex, therefore rarelyused. Instead the suboptimum Log-Map or SOVA algorithms are used but still with near opti-mum performance [21]. Here the Log-MAP will be treated and the equation needed for imple-mentation is explained.

Figure 4.12 shows the block of the Log-MAP algorithm. It takes three inputs, L(u) whichis loglikelihood value for each bit in the sequence. On the first iteration the loglikelihood valueis equal to the a priori value of the bit source, that is:

L(uk) = lnP (uk = +1)

P (uk = −1)(4.5)

Lc = 4 · a · Eb

N0

(4.6)

Lc is called the soft value of the channel. It consists of channel estimates for the fading am-plitude and the signal to noise ratio of each bit, it is calculated out of Equation 4.6 where arepresents the fading amplitude of the channel (For AWGN Channel a = 1). The y input is thereceived sequence of systematic and check bits, they must be input as an ordered sequence ofy1,k, y2,k, y1,k+1, y2,k+1, . . . , where y1,k is the received value from the y0 on figure 4.6.

The output of the Log-MAP algorithm is L(u) which denotes the value called extrinsic value ofeach bit in the sequence. This value corresponds to an estimate of the a posteriori probabilityand is used as input to L(u) for the next iteration. The final output L(u) is only used in the lastiteration, it is calculated from 4.10 and is used for detecting the estimated bit sequence. Themagnitude of L(u) represents the confidence of the estimated bit and the sign determines thehard detection symbol.


4.5 Turbo Coder

The MAP algorithm can provide us with

L(uk) = lnP (uk = +1 | y)

P (uk = −1 | y)= ln

∑(s′,s) p(s′, s, y)∀uk = +1∑(s′,s) p(s′, s, y)∀uk = −1

(4.7)∑(s′,s) p(s′, s, y) is taken over all existing transitions in the trellis from state s′ to s with the

information bit either +1 or -1. In Appendix ?? it is shown that the MAP algorithm can be ap-proximated by the Log-MAP algorithm, which is suboptimal but less computational complex.

The algorithm must be calculated in 4 steps with gamma as the first step.

Figure 4.12: The Implemented Log-MAP Decoder

• Step 1: Calculate GammaGamma can be calculated for all possible transitions in the trellis but will give zero proba-bility if it is a non existing transition for the state machine. In the specific implementationof the algorithm the zero probabilities are not necessary to calculate as they are dependenton the convolutional code used and therefore static at the moment of implementation, thisis a source of reduced computational complexity.

• Step 2: Calculate AlphaThe calculation of alpha is done recursively, using alphas from the previous state to cal-culate the present state, also the transition probability gamma is used for the calculation.

For calculating alpha for the first state there is no previous alpha to use in the calcula-tion, this is where trellis termination is done. If state machine had been reset for state"00" at the beginning it would be known that α(0) = 1, but for an unterminated code thestate machine could be in any state. However if the bit source is random with equal prob-ability of "1" and "0" the states "000", "001", "011", "010", "100", "101", "110", "111"must also be equally likely with probability of 0.125.

αstart(S) = 0.125lnαstart(S) ≈ −2.079 (4.8)

• Step 3: Calculate BetaBeta is calculated in the same way as alpha, except that it calculates backwards in thetrellis. Also, the terminal state of the trellis should be known to calculate beta optimally,however without termination the beta probabilities can be obtained using the same argu-ment as for alpha.

βEnd(S) = 0.125lnβEnd(S) ≈ −2.079 (4.9)


4.6 Rate Matching and HARQ

• Step 4: Iterate or calculate Le(u) in the final iterationFor every iteration the final step is to calculate the extrinsic output. This is either passedon to the following decoder or used to calculate Equation 4.10.

Le(u) = Lc · y + L(u) + Le(u) (4.10)

In this equation L(u) has been delivered by the previous decoder and Le(u) is obtainedin the present iteration. A level detection of equation 4.10 gives the estimated bit stream.

4.6 Rate Matching and HARQThe aim of the rate matcher is to adjust the effective code rate to the value according to the LinkAdaptation (LA)decision for the active user. Using the bits provided by the turbo encoder, therate matcher either removes bits (puncturing) or repeats bits (repetition) to get to the correcteffective rate dictated by the LA algorithm, since the higher layers assign a rate matching at-tribute to each transport channel as shown in the figure 1.3. In this project, rate matcher is usedfor puncturing of the bits from the turbo encoder. This attribute is semi-static and can only bechanged through higher layer signalling.

The functionality which includes the rate matcher in the model presented by the 3GPP spec-ifications [3] is called HARQ (Hybrid Automatic Repeat Request). This functionality matchesthe number of bits at the output of the channel coder to the number of bits of the physical chan-nel set to which the HS-DSCH is mapped. The HARQ is controlled by the redundancy version(RV) parameters received from higher layers. The exact set of the bits (i.e., the distributionof the puncturing/repetition positions) at the output of the HARQ functionality depends on thenumber of input bits, the number of output bits, and the RV parameters. As HARQ requiresinformation from the higher layers, which is not available for implementation, this part is notdealt in the project.

Figure 4.13: Constellation Rearrangement for 16-QAM


4.7 Constellation Rearrangement

4.7 Constellation Rearrangement16-QAM is a quadrature amplitude modulation based on a constellation of 16 symbols depictedin figure . The bits to be transmitted are grouped in blocks of four. Each one of these blocksdefines a constellation symbol that is then transmitted over a communication channel. Moreprecisely, denoting the four bits by i1q1i2q2. Among the four bits forming a symbol in 16-QAMthe probability of error can be considerably less for the most significant bits (MSBs) than forthe less significant bits (LSBs). For example, consider the symbol 2 (0010) of the constellationin figure , for the first bit to be demodulated erroneously the perturbation of the real part ofthe transmitted signal has to be three times that necessary to induce an error in the third bit.In order to compensate for this effect, bits can be rearranged before retransmission in such amanner that some less protected bits become more protected. More precisely, denoting the fourbits by i1q1i2q2, one of the four transformations in table 4.1 is applied before they are mappedto a constellation symbol.

Constellationversion pa-rameter

Output bit se-quence

Operation

0 i1q1i2q2 None (mapping as in figure ??)1 i2q2i1q1 Swapping MSBs with LSBs2 i1q1i2q2 Inversion of LSBs’ logical values3 i2q2i1q1 Both swapping and inversion

Table 4.1: Constellation Rearrangement for 16-QAM

4.8 Adaptive ModulationThe modulation techniques are used to increase the capacity and speed of a wireless networkfor HSDPA system. Different order modulations allows sending more bits per symbol and thusachieve higher throughputs or better spectral efficiencies. When using a modulation techniquesuch as 16-QAM and QPSK, better signal-to-noise ratios (SNRs) are needed to overcome inter-ference and maintain a certain bit error ratio (BER) [25].

The use of adaptive modulation allows a wireless system to choose the highest order mod-ulation depending on the channel conditions. The figure 4.14 shows, a general estimate ofdifferent modulation techniques depending on channel conditions. As the range is increased,lower modulation like QPSK is used, but getting closer can utilize higher order modulationlike 16-QAM for increased throughput. In addition, adaptive modulation allows the system toovercome fading and other interference.

4.8.1 QPSKQPSK (which is QAM without an amplitude component) is inherently robust and economical.The system is less bit-efficient, but more noise-resistant and has advantages in its ability to op-


4.9 Spreading and Scrambling

Figure 4.14: Adaptive Modulation and Coding

erate over long distances with many interfering sources such as those found in a neighbourhoodnetwork. Different data streams are sent on the I and Q channels. The two dedicated physicalchannels, DPCCH and DPDCH are time multiplexed. In downlink, transmission of commonchannels are done and all of which are transmitted continuously.

4.8.2 16-QAMQAM systems combine PSK and ASK as it changes both the amplitude and phase of the car-rier while performing the modulation, resulting in increase the number of states per symbol.QAM is used in some uplink/downlink traffic designs, but is less noise resistant, though morebit-efficient, than QPSK.

Odd numbered bits in the input stream are combined in pairs to form one of 4 levels whichmodulate the sine term. Even numbered bits are similarly combined to modify the cosine term.Sine and cosine terms are then combined as shown in the equation 4.11. The constellationpoints obtained after demodulation for 16-QAM is shown in right hand side of the figure .

V (t) = x(t).cos t/2 + y(t).sin t/2 (4.11)

4.9 Spreading and ScramblingHSDPA uses a complex radio waveform that is encoded to ensure that only the desired recipientis able to correlate the sequences. Each physical channel is spread with unique variable spread-ing sequence. The downlink physical channel is spread by a combination of a channelizationcode and a scrambling code.

The channelization code is used to increase the data rate to the transmit rate. User informa-tion symbols are spread over a wide bandwidth by multiplying the user symbols with quasirandom bits called chips. At its simplest level, the channelization process transforms each data


4.10 The Additive Noise Channel

symbol into several chips. The ratio of symbol to chip is called as the spreading factor.

A transport channel can be mapped to a number of physical channels or a physical channelcan be shared by a number of transport channels, depending on the data rates of the transportchannels and the data rates of the available physical channels.

After the transport channels are mapped to the physical channels the data on physical chan-nels is modulated. The modulation that can be used depends on the physical channel. OnDPDCHs and PDSCHs only QPSK modulation can be used [4]. On HS-PDSCH QPSK as wellas 16QAM modulation can be used. Modulation on HS-PDSCH is controlled by an adaptivemodulation and coding (AMC) algorithm [8].

On the downlink, the transmitted symbol rate of HSDPA is 3.84Mcps and the modulation isQPSK and 16-QAM. For QPSK each transmitted symbol represents 2 bits of information whereas for 16-QAM 4 bits of information for one transmitted symbol. Considering 16-QAM, the rateat the modulator is essentially 15.36Mbps (3.84× 4). Thus, if the data rate is at 30Kbps, it willneed to be spread by a factor of 512 in order to create the required chip rate for transmission.Similarly if the actual data rate is 3.84Mbps, it will only need to be spread by a factor of 4 toget the same chip rate. This shows that processing gain is not high, for the same reception per-formance and error protection, the lower spreading factor will have more errors than the higherspreading factors ( i.e., it will require more power or better sensitivity to transmit higher datarates) [16].

The channelization codes are used for separation of the different downlink physical channelswithin one cell. In each cell of the same set of OVSF channelization codes is used. The datarate of the physical channel determines the SF that is used to spread the physical channel. TheSF of HS-PDSCH is always 16, so to change the data rate of HS-PDSCH the modulation or thecoding rate of the channel code must be changed. The channelization process uses orthogonalcodes (OVSF), to spread the data. In order for OSVFs to be used to separate signals, they mustbe synchronous to remain orthogonal. Within a particular cell’s range, the OVSF are used toseparate users in the downlink. However, the node-Bs are not synchronized in the downlink,which shows that OVSF channelization is not sufficient for separating node-Bs. As withoutsynchronization, the signals could interfere with each other. Thus a second set of codes is usedto spread, in a process called scrambling. Thus, first in the chain of transmitters is the channel-ization process using the OVSFs, followed by the scrambling process, which will identify thenode-Bs as well as the individual UEs as shown in the figure (4.15). The scrambling processuses PN codes. After being spread by the OVSF to get the chip rate required, the bits are thenspread again by a PN code.

4.10 The Additive Noise ChannelIt is the simplest mathematical model for a communication channel. Figure 4.16 shows themathematical representation of this channel where n(t) represents an additive random noise.This kind of noise may arise from electronic components and amplifiers at the receiver of the


4.10 The Additive Noise Channel

Figure 4.15: Downlink Spreading and Modulation

communication system or from interference encountered in transmission as in the case of radiosignal transmission [23].

This kind of noise is characterised statistically as a Gaussian noise process: hence the resultingmathematical model for the channel is usually called the Additive White Gaussian Noise Chan-nel (AWGN channel). This channel is usually used for communication systems analysis anddesign because its mathematical tractability.

Figure 4.16: The Additive Noise Channel


Chapter 5Simulation

The simulation model which is shown in figure 3.2 according to 3GPP specification has beenimplemented in Matlab and is documented in this section.

5.1 Simulation CharacteristicsThe main characteristics related to the simulation of the designed system are reported in table5.1. The choices of the values of the each parameter is chosen according to 3GPP specificationand are discussed below

• Coding Rate r : The design turbo coder is composed of two RSCs and a systematic part,thus involving three outputs (the systematic output of each RSC is considered as a "sink").Puncturing the outputs in order to increase the rate upto 1/2 is possible.

• Constraint Length K : In order to simplify the investigations, only K = 4 is used asconstraint length for simulations i.e., 4 state PCCC with optimal polynomials.

• Block length k : Three different block sizes are simulated, block lengths allowed to in-vestigate the effects of the interleaver size according to 3GPP specification used are 40,1024 and 5114. The interleaver size is equal to block length.

• Number of iterations = 1, 3 and 5: Increasing the number of iterations is known asincreasing the decoding delay. Therefore it was decided to set the upper limit for theiterative process to 5 iterations.

5.2 Simulated PerformancesThis section considers the performance of the modelled system described in the section 4.2 andthe iterative decoding based on the log-map algorithm and SOVA described in section 4.5.4.Several interleaver lengths are simulated and investigated. The effects of the number of decod-ing iterations are also analysed.


5.2 Simulated Performances

Parameter Type/SizeChannel AWGNCRC length 24Encoder Rate 1/3 Turbo CoderConstraint length K 4 (1011,1101)Block length k 40, 1024 & 5114Turbo interleaving Pseudo-random InterleaverDecoder Log-Map & SOVAModulation QPSK, 16-QAM

Table 5.1: Simulated Parameters for the System Model

Figure 5.1: Symbol Error Rate for Modulation Schemes

Figure 5.2: Bit Error Rate for Modulation


5.2 Simulated Performances

The SER and BER performances of the QPSK and 16-QAM modulation schemes is shownin figures 5.1 and 5.2 respectively on AWGN channel. However, the BER and SER is more for16-QAM comparing with QPSK on noisy channel conditions. These modulation schemes areperformed utilising non-spread data burst. There is an improvement of 7dB in Eb/N0 using16-QAM for SER at the order of 10−3 and similarly an improvement of 1dB in Eb/N0 using16-QAM for BER at the order of 10−3 compared to QPSK.

The turbo decoder was implemented from 1 to 5 iterations using Log-Map and SOVA algo-

Figure 5.3: Comparison of Frame Error Rates using Log-Map and SOVA algorithms

rithms as shown in the figures 5.5 and 5.3. From the first iteration to the third iteration a largeimprovement can be noticed whereas from the third to fifth the improvement is small later onalmost no improvement is made. The test shows that using the log-map decoder gives a bet-ter performance with low BER (Bit error rate) and FER (Frame error rate) compared to SOVAdecoder.

Figure 5.4: Comparison of Bit Error Rates using Log-Map and SOVA algorithms


5.3 A Comparison of Decoding Algorithms

SOVA Log-MAPAdditions 2 ·2M +6 ·M +14 16 · 2M − 1Multiplications by ±1 8 8Max Operations 2·2M + 6 ·M + 14 4 · 2M − 2lookup Operations 4 · 2M − 2Bit Comparison 6(M + 1)Inversion 3 3

Table 5.2: Computational complexity comparison of Decoders where M =Encoder Memory Order

5.3 A Comparison of Decoding AlgorithmsIn many cases, the complexity necessary to directly implement the mathematical equationsdescribed thus far in this chapter preclude a physical implementation. This is due, as it wassuggested earlier, to the massive amount of computation that is needed either on a chip or costconstrained area. The size of the final implemented design can, and often will, impose per-formance and throughput constraints. Table 5.2 presents a rough comparison of the differingcomplexities of log-MAP and SOVA decoders based on the number of math operations.

SOVA is basically a traditional Viterbi algorithm (VA) plus soft information calculation opera-tions. A SOVA algorithm for practical implementation was introduced in [32], where the SISOdecoding is performed in two stages: best path selection (a classical VA) and LLR/extrinsicinformation generation (including a second VA with double-trace-back operation and LLR up-dating). Following the decoding procedure described in [32], the complexity of SOVA is cal-culated and presented in table 5.2. Assuming the max, table-lookup and inversion operationsall result in similar complexity as the add operation, for a memory order of 4, the total num-ber of operations performed for each decoded bit are 382, and 166 for Log-MAP, and SOVA,respectively. It is observed that the numbers of operations of the Log-MAP is about 1.6 timesthat of the SOVA algorithm. The execution time of each SISO algorithm on the same computersimulation platform could also be an indication of the computational complexity. Using a PCplatform equipped with an 2800-MHz Pentium-4 processor and a 1024-MB memory, the aver-age CPU time taken by the decoders to process a block of data of length 5114 bits is reported infigure 5.5. It is shown that the MAP algorithm takes twice the time of that of the Log-MAP al-gorithm. The complexity of Log-MAP is approximately 1.5 times of that of the Max-Log-MAPalgorithm and about 2.5 times of that of SOVA.

Taking into account that the simulations were performed on a general purpose CPU, this dia-gram provides only an illustrative comparison rather than accurate comparison from hardwareimplementation perspective, since no specific optimizations are applied to any of these decodingalgorithms. Nevertheless, it does provide certain insight on the complexity comparison whenthe implementation platform is a DSP or general-purpose processor.


5.3 A Comparison of Decoding Algorithms

Figure 5.5: Comparison of Bit Error Rates using Log-Map and SOVA algorithms

5.3.1 Computational Complexity of Log-MAP AlgorithmLog-MAP algorithm requires the bit stream to be split up into frames of the interleaver length,here the logarithm was implemented for N=40, 1024 and 5114 bits. Log-MAP is a symbolto symbol detecting algorithm compared for instance to the Maximum Likelihood SequenceEstimating Algorithm which operates on a whole sequence of data. The latter algorithm willhave to compare 2N sequences in order to find the most optimum one and for N = 1024 thiswould be far too complicated. But for a symbol to symbol detecting algorithm the complexitywill be linearly proportional will N , a larger N would just translate into a larger delay before aresult can be calculated because all bits have to be available when the algorithm starts. Henceit should be sufficient to calculate the computations needed to output one bit which then can bemultiplied with the bit rate to give the number of computations per second.

For each iteration of the algorithm it is required to calculate γ, α, β and Le(u) once for allbits in the sequence. In table ?? is counted how many operations each step requires.

This result is for an S = 8 state convolutional encoder as component code, if more memory

add/subtract multiplyγ 0 2Sα 2S-1 3Sβ 2S-1 3SLe(u) 2S-2 2S+1Extrinsic Value 0 2CT 6S-4 10S+2

Table 5.3: Complexity of Log-MAP algorithm in terms of States

is used in the encoder the complexity of the calculations will correspond to S by the relationsas in the table 5.3. Therefore the total computational complexity for the overall decoder ca be


5.4 Conclusion

written asCcomputation =

∑M

CT · I ·K (5.1)

where I is the number of iterations, K is the information blocklength, and M is the number ofdecoders.

5.4 ConclusionAccording to the rugby meta model, the first three steps were discussed. The theories related tothe analytical model i.e., the specification of the system, the system model along with the simu-lations were presented. The goal being development of AMC coding system, testing was doneusing log-map and SOVA decoders. We have developed two modulation schemes i.e., QPSKand 16-QAM. The BER using log-map algorithm is less compared to the SOVA algorithm.Log-map decoder is much complex than using SOVA decoder but with high performance.


Part II

Design Model


Chapter 6Design Methodology

6.1 DK MethodologyThe DK Design Suite enables the designer to enter system descriptions in a high level program-ming language like Handel-C, simulate and debug the code using an integrated developmentenvironment (IDE). The DK Suite includes the Data Streaming Manager (DSM) and PlatformAbstraction Layer (PAL) which facilitate the development of Handel-C applications and themigration of software from microprocessor implementations to FPGAs. PAL provides a consis-tent API (Application Programming Interface) through which Handel-C applications can accesshardware I/O and other features. DSM is an integration mechanism for software applicationsexecuting in processors and functions in FPGA hardware.

The DK compiler allows the user to perform technology mapping of general logic into device-specific logic blocks. The Electronic Design Interchange Format (EDIF) output of the DKcompiler is a device-specific netlist, which uses logic gates to describe the design. The Xil-inx Place/Route tools perform technology mapping, which involves translating this gate-levelnetlist into the Look-Up Tables (LUTs) physically present in the FPGA. The DK Suite gener-ates a report of the number of LUTs, flip-flops and memory bits synthesized for each line ofHandel-C code. The Logic Estimation Tool that is a part of the DK Suite provides logic areaand depth summary from a pre-place and route estimation based upon the design and the targetdevice specifications.

Alternatively, the program code can be debugged. The debugger is a cycle accurate simulator.It allows to test the implementation without a real FPGA. Execution speed can be estimatedwith the debugger, as it gives information about the clock cycles passed. This allows the de-signer to experiment with different optimization strategies within the DK environment until thedesign goals are met, before placement of the design.

6.1.1 Handel-C LanguageHandel-C is a C like language used to describe a digital design. It is based on CSP algebra andstructurally based on Occam [40], a parallel programming language. Handel-C is a superset of


6.1 DK Methodology

Figure 6.1: Design Flow


6.1 DK Methodology

Figure 6.2: DK Design Suite


6.2 Design Flow using System Generator

ANSI-C with additional libraries to support the hardware functionalities. Although the syntaxof Handel-C is similar to C, it is different as it is used to compile the design directly for therespective reconfigurable hardware and thus optimization is done.

Handel-C has some key features as it shows parallel programming constructs, channel commu-nications, pipelined instructions etc. which makes Handel-C a very strong language to describedigital design. More is explained in the appendix (C). But Handel-C does not support floatingpoint. Even there is no math library exist as a part of Handel-C but there are several librarieslike Cordic library are available for the mathematical functions and signal processing librarieslike FFT are part of Celoxica PDK suite.

Methodology

In hardware/software co-design, Handel-C is good option as it describes the behavioral level ofa system in terms of C syntax and supports parallel execution in digital design. Handel-C can beused to model in data path and control logic of a design similar to software language like C. InHandel-C each assignment takes one clock cycle and everything else is free. This feature givescontrol over clock timing and it could be possible to change the design starting from sequentialexecution of statements to add parallelism meeting certain timing constraints.

Handel-C supports two targets. The first is a simulator target that allows development andtesting of code without the need to use any hardware. This is supported by a debugger and othertools. The second target is the synthesis of a netlist for input to place and route 1 tools. Thisallows the design to be translated into configuration data for particular chips. An overview ofthe process is shown in figure 6.3. When compiling the design for a hardware target, Handel-C emits the design in Electronic Design Interchange Format (EDIF) format. A cycle count isavailable from the simulator, and an estimate of gate count is generated by the Handel-C com-piler. To get definitive timing information and actual hardware usage, the place and route toolsneed to be invoked [41].

6.2 Design Flow using System GeneratorThe System Generator design flow can be broken into 6 sections and is illustrated in figure 6.2.

1. DSP System Modeling Using familiar tools like MATLAB and Simulink, users developmodels of their DSP systems. System Generator includes a Xilinx blockset that comprisesbasic level building blocks like FFTs, and advanced DSP algorithms like digital downconverters. Users can also bring in their own HDL Modules via HDL co-simulation, orwrite MATLAB code for combinational control logic or statemachine.

2. System Generation System Generation for DSP is invoked from Simulink through theSystem Generator for DSP token. Pushing the "Generate" button generates VHDL and

1Place and route is the process of translating a netlist into a hardware layout.


6.2 Design Flow using System Generator

Figure 6.3: Overview of the process of translating code into hardware using Handel-C [41]

Figure 6.4: Design Flow using System Generator


6.3 DSP Builder Design Flow

cores for all the Xilinx Blocks on the sheet containing the token, and on any sheets be-neath it in the design hierarchy. FPGA designs are generated using Xilinx optimizedLogiCOREs, ensuring that the most efficient implementation is being produced.

3. HDL Synthesis Once VHDL has been generated by System Generator for DSP, users maywant to synthesize this for optimal FPGA implementations whether it be for high perfor-mance or optimal area. Users can choose from one of three popular synthesis enginesincluding Xilinx’s own XST, Synplify Pro from Synplicity and Leonardo Spectrum fromMentor Graphics.

4. Simulation/Verification A VHDL testbench and data vectors can also be created by Sys-tem Generator for DSP. These vectors represent the inputs and expected outputs seen inthe Simulink simulation, and allow the designer to easily see any discrepancies betweenthe Simulink and VHDL simulation results. Modelsim can be used to conduct simula-tions of DSP systems prior to implementation. If doing HDL co-simulation, ModelSimis required.

5. FPGA Implementation Finally, designers use the ISE implementation tools to place routeand verify the design in a Xilinx FPGA.

6. In-System Debug Use the Hardware co-simulation capability to accelerate simulation andverify your design in hardware. Including ChipScope Pro to your design flow will alowreal-time debugging at system speed.

6.3 DSP Builder Design FlowWhen using DSP Builder, a design model is created using the Matlab/Simulink software. Afterthe model is created, VHDL files for synthesis and Quartus II compilation can be obtained orgenerate files for VHDL or Verilog HDL simulation. The design flow involves the followingsteps:

1. Create a model with a combination of Simulink and DSP Builder blocks using the MAT-LAB/Simulink software.

2. Use the SignalCompiler block to analyze your design.

3. Simulate the model in Simulink using a Scope block to monitor the results.

4. Run SignalCompiler to setup RTL simulation and synthesis.

5. Perform RTL simulation.

6. Use the output files generated by the DSP Builder SignalCompiler block to perform RTLsynthesis.

7. Compile the design in the Quartus II software.

8. Download to a hardware development board and test.


6.3 DSP Builder Design Flow

Figure 6.5: System-level Design Flow using DSP Builder


6.4 Hardware Software Co-design

6.4 Hardware Software Co-designSince 1990, the term Hardware Software Co-design is used to describe a blend of problemsrelevant both to the hardware and software components in an IC design. The co-design enablesa software developer an inner view of the virtual hardware under development before the siliconfabrication is available and enables a hardware developer to exercise the design with the actualoperational software.

One of the key differences in hardware and software design approach is the execution scheme.The software design typically follows a sequential scheme while the hardware design followsa parallel scheme. In software design, the operating system emulates parallelism by threading;however it is only time division in CPU time. Contrarily the hardware implement real paral-lelism, therefore the hardware designs are modular and are synchronized by a clock to maintaincoordination between processes.

6.4.1 Driving factors behind Hardware-Software Co-designThe majority of the systems is programmable, and thus consists of hardware and software com-ponents. The value of a system can be measured by some objectives that are specific to itsapplication domain (e.g., performance, design, and manufacturing cost, and ease of programma-bility) and it depends on the hardware and software components [42]. Hardware/Software co-design means meeting system-level objectives by exploiting the synergism of hardware andsoftware through their concurrent design. Computer-aided design tools for co-design raisingthe potential quality and shortening the development time of the product. The trend towardsmaller mask-level geometrics leads to higher integration and higher cost of fabrication, henceto the need reduce hardware design over large production volumes. This suggests the idea ofusing software as a means of differentiating products based on the same hardware platform. Asa result, increasing the software on semiconductor chips, are referred as systems on chip. Thushardware (e.g., cores) and software (e.g., microkernels) can be viewed as commodities withlarge intellectual property values.

FPGA has blurred the distinction between hardware and software [42]. The hardware circuitis configured at manufacturing time. The functions of a hardware circuit could be chosen bythe execution of a program. The program could be modified at run-time. The co-design can becharacterized on degree of programmability and implementation features.

6.4.2 Degree of ProgrammabilityThe hardware is programmed by some software programs to perform the desired functions.Hence the abstraction level used for programming models is the means of interaction betweenhardware and software. The important issue related to programming the level at which program-ming is performed. The highest abstraction level is the application levels, where the system isrunning dedicated software programs that allow the user to specify desired functionality optionsusing a specific language. Hardware-level programming means configuring the hardware (af-ter manufacturing) in a desired way like microprogramming. Reconfigurable circuits (FPGA)


6.5 Generic Co-design Flow

allow to configure the interconnections among logic blocks and to determine their personality.Reconfiguration can be global or local (the entire circuit or a portion can be altered) and maybe applied more than once. Thus, these can be modified in both the data path and control path.

6.4.3 Implementation FeaturesSystem implementation deals with circuit design style, manufacturing technology and inte-gration level. System-level field programmability can be achieved by storing programs inread/write memories and exploiting programmable interconnections. In FPGAs circuit con-figuration is achieved by programming connections using transistors driven by memory arrays.

6.5 Generic Co-design FlowAlthough many research topics attempt to integrate the different stages and tendencies of themethodology in a unique tool, there is still the need for a general and mature one. Thus givingsupport during the high-level stages of the flow, enabling an effective design space exploration,before tackling the low-level steps and thus ending with the final technology. This shouldprevent from costly and time-consuming redesign loops.

Co-design stages

A generic co-design flow is presented in this section in order to define the concepts and charac-teristics attached to the co-design flow.

Heterogeneous systems are complex to design because of the number of parameters and de-sign choices to take into account. The main goal in the area of hardware/software co-designis to find a systematic design technique to specify, analyze, verify and synthesize mixed hard-ware/software embedded systems at a high abstraction level, thus to take most of the decisionsat the system-level during the earlier stages of the design, in order to avoid as much as possibledesign loops.

As a result, it has been possible to divide the co-design flow in two consecutive parts: thehigh-level and the low-level phases as shown in the table (6.1). The first one allows the de-signer to validate the system functionality and to evaluate various trade-off alternatives on anideal platform, whereas the second one performs the synthesis and the integration between thehardware and software parts, before the final co-verification.

A generic co-design flow is presented in the figure (6.6) and the modern co-design frame-work can be decomposed into the following steps:

ModellingDuring this step, the requirements of the application as shown in the figure (6.6) are translatedinto a formal description of all the functionalities. An abstract behavioural description is givenfor the complete system, regardless of the target architecture. Using this representation, differ-ent implementation alternatives can be evaluated.



High Level Flow

• Modelling

• System Analysis

• Design Space Exploration

– Partitioning

– Architecture Selection

– Co-simulation

Low Level Flow

• Low level synthesis

– Hardware Synthesis

– Compilation

• Integration

• Low level Co-Verification

Table 6.1: Modern Co-design Framework



System AnalysisThis step consists in going through all the system specification as shown in the figure (6.6)in order to extract the parameters representing the constraints for the future hardware/softwarepartitioning. The analysis stage has to emphasize two aspects of the system:

• The static aspect: memory space, parallelism rate, and all the parameters that could beobtained after a static compilation.

• The dynamic aspect: obtained after simulation or profiling. Constraints like the averageexecution time or the data transfer rate for the different code segments can be obtained.

Figure 6.6: Generic Co-Design Flow

Design Space ExplorationThis step forms the heart of the co-design process. It may be decomposed in three interacting



tasks: partitioning, architecture selection and co-simulation.

For the implementation, the components are chosen to include and how these should be con-nected in the hardware architecture: this is called the architecture selection phase. It must alsobe decided which part of the behaviour should be implemented on which components: this iscalled the partitioning phase. These two activities are influenced by performance requirements,implementation cost, reconfigurability, and application-specific issues. Co-simulation evaluatesthe system behaviour from a functional point of view or a timing point of view, in order tovalidate either the specification or the performed partitioning.

Low-Level SynthesisThe partitioned hardware and software specifications are translated into their final form, netlistfor the hardware and an assembly code for the software. The process of translation is referredto as hardware synthesis for the hardware blocks and compilation for software components.

IntegrationThe critical issue in the integration of heterogeneous systems is the HW/SW interface synthe-sis. Use of Intellectual Properties (IPs) in complex designs is used for the implementation ofstandard interfaces and buses.

Low-Level Co-VerificationThe low-level models are simulated with a higher level of detail. At this stage, area, time andpower figures are known and can be used to derive the exact characteristics of the completesystem.

System productionFinally, the system is produced first as a prototype, and after some other verification and vali-dation the mass production starts.

Major advantages of co-designIt is now possible to briefly present how the co-design flow influences the embedded systemdesign process, revealing the major advantages of the co-design approach.

• A detailed high-level specification of the system behaviour is made prior to architectureselection and partitioning. So, by means of analysis, more information is available, andthese crucial steps can thus be made with increased accuracy.

• A uniform description of hardware and software modules allows the exchange of parts ofthe system between different partitions or later stages of development. It also enhancesthe possibility of early debugging and validation of the complete system.

• The co-design permits to bring closer the development of the hardware and software parts,thus reducing the cost of the final integration between the different technology domains.

At the system specification level, the functionality that is written in a high-level language isverified. The performance of the partitioned and allocated system is verified against various


6.6 Followed Methodology

requirements and constraints, such as timing, area, clock rate, energy, etc. Once the partitionis set and the hardware and software architectures are determined, the focus moves to verifyingtheir behaviors. The hardware and software designs are implemented and verified separatelyalong with the interface between them. Again, verification takes place on the system level afterHW/SW integration step. If all the performance constraints are met and the cost of the designis acceptable, the co-design process stops. Otherwise, the process is repeated from the systempartitioning step to optimize the design until a satisfactory system implementation is found.

6.6 Followed Methodology


Chapter 7Design Space Exploration

7.1 Introduction

7.2 Design SpecificationThe digital system design process starts with a specification of the intended design and endswith an implementation model that describes the implementation system and its componentsaccurately. In order to obtain the implementation model accurately, the design specification anddesign model needs to be defined clearly. Concerning the design specification, once the designidea and market requirements have been analysed, the designer write specification that definesthe design’s functionality and the interface to the environment in which the design is operatedand to limit the design complexity.

7.2.1 Cost Function for the ImplementationIn general when making a decision, leading to an outcome, there are few parameters involvedin the design. The suitable architecture can be analysed and decided based on the parameters,such as algorithm, constraint and platform in the following manner.

{Architecture(i)} = f{Algorithm(i), Constraints(i), P latform(i)}

It means that the intrinsic property of the algorithm, the constraints and the final target platformcan have a strong influence on final architecture decision. That means, the final architectureshould inherit the intrinsic properties of the algorithm.

The parameters for choosing the final architecture can be described by the cost function basedon the common design metrics in the following way:

C = f{TE, A,N, TD}

• Execution time (TE): The execution time is the time needed to execute the algorithm,which corresponds to the performance in the design metrics. Mainly the word lengthdetermine the execution time, i.e., longer word length results in longer execution time, aslonger arithmetic operations are to be performed.


7.3 Design Space Exploration (DSE)

• Area (A): Area is defined as the amount of hardware used in the system. It relates tothe physical size of the product and has the influence on the power consumption and thefinancial cost. Applied technology and the word length determines the area.

• Numerical properties (N ): Rounding noise is produced when data are rounded to finiteword length. Applied word length determines the amount of the rounding noise. i.e.,longer word length causes less noise. Rounding noise can cause degradation in algorithmperformance in comparison. Algorithms sensitive to rounding noise may become unstablein the worst case.

• Development time (TD): Development time is defined as the time needed to design andimplement the algorithm onto the simulation platform.

The purpose of this cost function is to optimise the execution time and the area usage withrespect to the timing constraint. The other parameters are not considered due to complexity ofthe project.

7.3 Design Space Exploration (DSE)Design space exploration (DSE) is the process that optimally maps requirement of system ontothe target architecture. In general, DSE efficiently prunes the design space and identifies zonesto be developed. The system-level DSE with the aid of EDA tools can explore and characterizethe design space rapidly by concerning the specified application and a set of design metrics. Thegoal of DSE is to explore the design space of the system in order to converge towards promisingarchitectural-application matching rapidly. Considering design decision based on DSE, it iscrucial to get early estimation to shorten design delay and to measure rapidly the impact ofalgorithmic choices or transformations and to adapt the following architectural choices to theapplication parallelism. The information from DSE can be used in three ways:

• When using a fixed architecture, DSE information guides to an algorithmic choices.

• For a fixed specification, different implementations can be selected using DSE.

• When neither the specification nor the architecture are definitely set, the designer canrefine conjointly both aspects based on the information from DSE.

Therefore, by focusing on the DSE the designer is guided very early in the design processfor evaluating rapidly the impact of the algorithmic choices and choosing or building the mostappropriate architecture for the final application. The process of DSE definitely decreases thetask complexity of refinement and time to market of development product[48].

7.4 Conclusion


Part III

Hardware Implementation


Chapter 8Hardware Implementation

8.1 Introduction

8.2 Methodology

8.2.1 SpecC MethodologyThis section explains about the modem standard based on [4]. SpecC design process is chosento represent or develop the system, starting with specification model and going through ar-chitecture exploration and communication synthesis, finally implementing modem on specificarchitecture.

In this case the modem specifications are described in the table (8.1). The modem standardspecifies timing constraints of 2ms on the modulating latency, between feeding data into mod-ulator and receiving data from demodulator output.

Parameter SpecificationRadio frame 15 Slots, 38400 Chips,

10msSlot 2560 ChipsSubframe 2ms, 3 slots

Table 8.1: Specification for the Modem

Specification for the Modem

Specification model is the model with the highest level of abstraction. It is an accurate modelof the intended system in terms of pure functionality but does not reflect its structure or timing[46]. Typically, the specification model executes in zero simulation time. Neither the computa-tion nor the communication is modelled with timing. The model reflects only the algorithmicbehavior without implying anything about the system architecture to be implemented as shownin the figure (8.1).


8.2 Methodology

Figure 8.1: Modem Top level specification

An HS-PDSCH may use QPSK or 16QAM modulation symbols. In figure (8.2), M is thenumber of bits per modulation symbols i.e. M=2 for QPSK and M=4 for 16QAM. The sub-frame and slot structure of HS-PDSCH are shown in figure (8.2). The slot formats are shown intable (8.2).

Figure 8.2: Slot Formats

Slot format#i

Channel BitRate (kbps)

ChannelSymbol Rate(ksps)

SF Bits/HS-DSCHsubframe

Bits/Slot Ndata

0(QPSK) 480 240 16 960 320 3201(16QAM) 960 240 16 1920 640 640

Table 8.2: Slot Formats

Representation of modem using spec C methodology

Modulator Functionality: QAM transmitter comprises of an encoder block, which allocatesthe 16 quantized levels of data to 4 levels of I and Q components each. Both I and Q are pulseshaped using Root Raised Cosine filter and then multiplied with sine and cosine respectively.The two streams are then added together.

In modulation, multiplying (bit by bit) I and Q components and the carrier waves generated.The cosine wave is multiplied with I component whereas the sine wave is multiplied with theQ component. The waveforms obtained from I and Q component multiplied with the carrier isthen added to get the symbol which form the transmitted signal.


8.3 Clock Frequency

Demodulator Functionality: The demodulation is a straight forward way and is more or

Figure 8.3: Modem Functionality

less a reverse process of modulation. The demodulator receives a modulated symbol stream atthe rate of 240kbps.

Incoming symbol streams are received and the corresponding data is reconstructed. Finallythe bit streams are passed through a Low Pass Filter in order to increase the shape. A moredetailed block diagram of the demodulator is shown in the figure (8.3) at all levels of hierarchy.Compared to the modulation process, demodulation is much simpler and computationally muchcheaper.

8.3 Clock FrequencyONe of the important consideration in any FPGA design is to decide what clock speed is neededwithin the FPGA. The fastest clock in the design will determine the clock rate that the FPGAmust be able to handle. The maximum clock rate is determined by the propagation time, P, of asignal between two flip-flops in the design. If P is greater than the clock period, T, then whenthe signal changes at one flip-flop, it doesn’t change at the next stage of logic until two clockcycles later. Each clock used in an FPGA design, no matter the rate of the clock, must have lowskew. The skew, S, is the maximum delay from the clock input of one flip-flop to the clock input


8.4 Partition and Allocation

of another flip-flop. For the circuit to work properly, the skew must be less than the propagationtime between the two flip-flops [49].

8.4 Partition and Allocation

8.5 Handel-C implementation

8.6 Hardware-Software Testing

8.7 Optimization

8.8 Conclusion


Chapter 9Conclusion


Appendix ADifferent Types of Channels in HSDPA

A.1 Transport channelsTransport channels are services offered by Layer 1 to the higher layers. General concepts abouttransport channels are described in [9].

A transport channel is defined by how and with what characteristics data is transferred overthe air interface. A general classification of transport channels is into two groups:

• Dedicated channels, using inherent addressing of UE;

• Common channels, using explicit addressing of UE if addressing is needed.

A.1.1 Dedicated transport channelsThere exists two types of dedicated transport channel, the Dedicated Channel (DCH) and theEnhanced Dedicated Channel (E-DCH).

DCH - Dedicated Channel The Dedicated Channel (DCH) is a downlink or uplink transportchannel. The DCH is transmitted over the entire cell or over only a part of the cell using e.g.beam-forming antennas. This is a bi-directional channel for transporting dedicated user andcontrol data. It serves the DTCH and DCCH logical channels. The DPDCH and DPCCHphysical channels are used or transporting data[12].

E-DCH - Enhanced Dedicated Channel The Enhanced Dedicated Channel (E-DCH) is anuplink transport channel.

A.1.2 Common transport channelsThere are seven types of common transport channels: BCH, FACH, PCH, RACH, CPCH,DSCH and HS-DSCH.


A.1 Transport channels

BCH - Broadcast Channel The Broadcast Channel (BCH) is a downlink transport channelthat is used to broadcast system- and cell-specific information. The BCH is always transmittedover the entire cell and has a single transport format. It transports data from the logical BCCHin the direction of the UE using the P-CCPCH physical channel. The BCH is always transmittedusing the same transport format in the entire cell. The system information transported by theBCH is critical for all the subscribers, i.e. UEs. This means the system information mustreliably reach the entire cell, i.e. the BCH (or the P-CCPCH physical channel used for transportpurposes) should be transmitted with sufficient power. The system information is transmittedat a very low data speed of 3.4 kbits/sec so that even low-end mobile stations are capable ofprocessing all of the necessary system data with no limitations [12].

FACH - Forward Access Channel The Forward Access Channel (FACH) is a downlinktransport channel. The FACH is transmitted over the entire cell. It is capable of transport-ing smaller quanitities of data from different logical channels such as BCCH, CCH, DCCH andCTCH when required in the direction of the UE. Transmission is either over the entire cell or toa specific receiver.

PCH - Paging Channel The Paging Channel (PCH) is a downlink transport channel. ThePCH is always transmitted over the entire cell. This unidirectional channel transports data fromthe PCCH logical channel containing paging messages in the direction of the UE over the entirecell.

RACH - Random Access Channel The Random Access Channel (RACH) is an uplink trans-port channel. The RACH is always received from the entire cell. It is susceptible to the risk ofcollision with other RACHs from other UEs who attempt simultaneous access since the time ofaccess can be chosen arbitrarily within the time slot limits.

CPCH- Common Packet Channel Th CPCH is a true packet channel which can be usedto send data packet-by-packet. Network access is comparable to access via the RACH, and apacket directly follows the confirmation by the AP-AICH physical channel.

DSCH- Downlink Shared Channel The DSCH is a downlink channel that carries dedicatedcontrol or traffic data and is used in nconjunction with a dedicated channel (DCH).

HS-DSCH – High Speed Downlink Shared Channel The High Speed Downlink SharedChannel is a downlink transport channel shared by several UEs. The HS-DSCH is associatedwith one downlink DPCH, and one or several Shared Control Channels (HS-SCCH). The HS-DSCH is transmitted over the entire cell or over only part of the cell using e.g. beam-formingantennas.


A.2 Physical channels

A.2 Physical channelsPhysical channels are defined by a specific carrier frequency, scrambling code, channelizationcode (optional), time start & stop (giving a duration) and, on the uplink, relative phase (0 orπ/2). The downlink E-HICH and E-RGCH are each further defined by a specific orthogonalsignature sequence. Scrambling and channelization codes are specified in [4]. Time durationsare defined by start and stop instants, measured in integer multiples of chips. Suitable multiplesof chips also used in specifications are[2]:

• Radio frame: A radio frame is a processing duration which consists of 15 slots. Thelength of a radio frame corresponds to 38400 chips.

• Slot: A slot is a duration which consists of fields containing bits. The length of a slotcorresponds to 2560 chips.

• Sub-frame: A sub-frame is the basic time interval for E-DCH and HS-DSCH transmissionand E-DCH and HS-DSCH-related signalling at the physical layer. The length of a sub-frame corresponds to 3 slots (7680 chips).

The default time duration for a physical channel is continuous from the instant when it is startedto the instant when it is stopped.

Transport channels are described (in more abstract higher layer models of the physical layer) asbeing capable of being mapped to physical channels. Within the physical layer itself the exactmapping is from a composite coded transport channel (CCTrCH) to the data part of a physicalchannel. In addition to data parts there also exist channel control parts and physical signals.

A.3 Physical signalsPhysical signals are entities with the same basic on-air attributes as physical channels but donot have transport channels or indicators mapped to them. Physical signals may be associatedwith physical channels in order to support the function of physical channels.


Appendix BDerivation of the log MAP algorithm

B.1 The MAP algorithmThe Maximum aposteriori probability algorithm can provide us with

L(uk) = lnP (uk = +1 | y)

P (uk = −1 | y)= ln


(B.1)

∑(s′,s) p(s′, s, y) is taken over all existing transitions in the trellis from state s’ to s with the

information bit being either +1 or -1.

From [21] we have the joint probability p(s′, s, y) to be equal to the product of three terms

p(s′, s, y) = p(s′, yj<k) · p(s, y | s′) · p(yj>k | s)= p(s′, yj<k) · P (s | s′) · p(y | s′, s) · p(yj>k | s)= αk−1(s

′) · γk(s′, s) · βk(s) (B.2)

yj,k denotes the sequence of received symbols yj from the beginning of the trellis up to t =k − 1, yj>k is the corresponding sequence of symbols to the end of the trellis.

The forward recursion of the MPA algorithm is given by

αk(s) =∑

s′

γk(s′, s) · αk−1(s

′) (B.3)

it can been from the equation B.3 that each step in the trellis depend on the previous calculatedα value and γk. Henceforth all α is calculated for all k until the end of the trellis is reached.Now β can be calculated as

βk−1(s′) =

∑s

γk(s′, s) · βk(s) (B.4)

Also β is calculated of its predecessor in the trellis. Here it is useful to have a terminated trelliswhere the first and last state in the trellis is known. If the termination is done so all paths start


B.1 The MAP algorithm

and end in the zero state we have αstart(0)=1and βend(0)=1.

Where the trellis has a transition from state s′ to s the branch transition probability is givenby

γk(s′, s) = p(yk | uk) · P (uk) (B.5)

Using the log-likelihood

L(u) = lnP (u+1)

1− P (u = +1)(B.6)

the a priori value is given by

P (Uk±1) =exp[(uk)]

1± exp[L(uk)]=

(exp[−L(uk)/2]

1 + exp[−L(uk)]

)·exp[L(uk)uk/2] = Ak ·exp[L(uk)uk/2]

(B.7)For systematic convolutional codes the conditional probability [21]

p(yk | uk) = Bk · exp[1

2Lcyk,1uk +

1

2

n∑υ=2

Lcyk,υxk,υ] (B.8)

Where some of the parity bits xk,2,. . . might be punctured, the summation over υ is only takenover unpunctured bits.

It can also be seen that Ak and Bk is constant for all k when uk is constant either +1 or −1,hence when inserted in equation B.1 they can be put outside the summation and cancelled out.

Now the transition probability γ can be written as

γ(s′, s) = exp[1

2ukL(uk) · exp[

1

2Lcyk,1 +

1

2

n∑υ=2

Lcyk,υxk,υ]

= exp[1

2ukL(uk) · exp[

1

2Lcyk,1] · exp[

1

2

n∑υ=2

Lcyk,υxk,υ]

= exp[1

2uk(L(uk) + Lcyk,1)] · exp[

1

2

n∑υ=2

Lcyk,υxk,υ] (B.9)

The first exponential function is common for all terms in the sum of equation B.1 and willtherefore cancel out. Now equation B.1 can be rewritten as

L(uk) = Lcyk,1 + L(uk) + ln

∑(s′,s) γ

(e)k (s′, s)αk−1(s

′)βk(s)∀uk = +1∑(s′,s) γ

(e)k (s′, s)αk−1(s′)βk(s)∀uk = −1

(B.10)

This is th estimated aposteriori probability of the received signal, the sign of equation ?? givesthe hard decision output and the magnitude the confidence of the decision.


B.2 The log MAP algorithm

B.2 The log MAP algorithmThe MAP algorithm is the optimum symbol to symbol decoder for the systematic convolutionalcodes [21], but it can be made less computational demanding while still being close to optimumusing log likelihoods and the approximation ln(eL1 + eL2) = max[L1, L2] [21].

For each step in the trellis we have to compute the probabilities of all possible transition asdescribed by the joint probability p(s′, s, y). From the MAP algorithm we have the aposterioriprobability

L(uk) = lnP (uk = +1 | y)

P (uk = −1 | y)= ln


(B.11)

by using approximation ln(eL1 + eL2) = max[L1, L2] we get

L(uk) = ln∑(s′,s)

p(s′, s, y); (∀uk = +1)− ln∑(s′,s)

p(s′, s, y); (∀uk = −1)

≈ maxuk=+1[ln p(s′, s, y)]−maxuk=−1[ln p(s′, s, y)] (B.12)

From equation B.2 we have

ln p(s′, s, y) = ln αk−1(s′) + ln γk(s

′, s) + ln βk(s) (B.13)

and of equation B.3

αk(s) =∑

s′

γk(s′, s) · αk−1(s

′)

ln αk(s) = ln∑

s′

γk(s′, s) · αk−1(s

′)

= maxs′ [ln γk(s′, s) + ln αk−1(s

′)] (B.14)

and of equation B.4

βk−1(s′) =

∑s

γk(s′, s) · βk(s)

ln βk−1(s′) = ln

∑s

γk(s′, s) · βk(s)

= maxs[ln γk(s′, s) + ln βk(s)] (B.15)

the last expression ln γk can be found of equation B.9

γ(s′, s) = exp[1

2uk(L(uk) + Lcyk,1)] · exp[

1

2

n∑υ=2

Lcyk,υxk,υ]

ln γ(s′, s) =1

2uk(L(uk) + Lcyk,1) +

1

2

n∑υ=2

Lcyk,υxk,υ (B.16)

Now the resulting log MAP expression for L(uk) can be found to give


B.2 The log MAP algorithm

L(uk) = maxuk=+1[ln p(s′, s, y)]−maxuk=−1[ln p(s′, s, y)]= maxuk=+1[ln αk−1(s

′)) + ln γk(s′, s) + ln βk(s)]

−maxuk=−1[ln αk−1(s′)) + ln γk(s

′, s) + ln βk(s)]

= maxuk=+1[ln αk−1(s′)) + ln βk(s) +

1

2

n∑υ=2

Lcyk,υxk,υ]

−maxuk=−1[ln αk−1(s′)) + ln βk(s) +

1

2

n∑υ=2

Lcyk,υxk,υ] (B.17)

Instead of calculating the actual probabilities in the optimum MAP algorithm, a close estimatescan now be found by searching the trellis for the maximum path described by eqautions B.14,B.15, B.16 and B.17.


Appendix CHandel-C Language

Handel-C is a high-level programming language designed for compiling programs into syn-chronous hardware. It was created at the now discontinued Hardware Compilation Group atOxford, and is currently being developed by Embedded Solutions Ltd., a spin-off companystarted by people from the University of Oxford.[37]

Handel-C does not require an intermediate HDL step; it converts the codes directly to netlist.The designers accelerate the algorithm implementation, because Handel-C supports inherenthardware parallelism and the usual C-like coding style. Handel-C design cycle is much shorterthan the other HDL based designs. This is because of the key advantages demonstrated byHandel-C: its support for sequential C-like logic, its compact representation of functionality,and its software-like methodology which provides fast turnouts in the design methodology [38].

C.1 Comparison of ANSI C and Handel-CHandel-C is a subset of ANSI C with additional constructs to support required hardware func-tionalities. Although syntax of Handel-C is very much like standard C, it is a different languageby itself. Though a large number of types, statements and operators of ANSI C are availablein Handel-C, it is much different from standard C in a number of aspects. It is used to compilethe design directly to reconfigurable hardware, i.e. FPGA, by creating the information neededby the FPGA implementation tools. That is why Handel-C contains elements that are obviouslynot part of standard C.

The figure C.1 gives a graphical representation of differences of instruction sets between thetwo languages. Handel-C algorithms are coded with sequential software style with a ’par’ con-struct that implements parallelism. A channel ’chan’ is provided that allows for communicationand synchronization between two parallel branches of operations. Traditional sequential soft-ware development environment of Handel-C allows the designer to describe the behavior of theintended hardware in the same sense as a software programmer describes the intended behaviorof a processor executing their program. This is fundamentally different from using a standardC/C++ syntax to describe the structure of a hardware implementation. Handel-C is used forcycle-accurate modeling and the high-level design of hardware. Important differences between


C.2 Efficient Use of Hardware in Handel-C

Figure C.1: Comparison of Ansi-C and Handel-C

Handel-C and ANSI C reflect the following [39]:

• Handel-C does not support recursion, thus functions cannot be called recursively, as it isso in ANSI C. This is because the hardware does not have a kind of ’call stack’, as it doesin software. However, Handel-C does not support compile-time macro recursion.

• Handel-C does not allow side effects as ANSI C does, e.g. a statement like "a = b∗c++;"are not supported.

• Handel-C does not have floating point types such as float and double. To implementfloating-point operations in hardware, one must define the algorithms at the bit level.However, the Handel-C compiler DK1 includes a library for a fixed and floating-pointarithmetic.

• Available standard libraries in Handel-C are very limited. In principle, C/C++ librariescan be called from Handel-C compiler, but they can be used for simulation and debuggingand cannot be used for hardware description.

C.2 Efficient Use of Hardware in Handel-CHandel-C has a single native Integer data type, but fixed and floating point data types are sup-ported by means of a library. Integers may be signed or unsigned. Each bit of a variable ismapped to the hardware in one flip-flop, thus the number of flip-flops needed for one variabledepends on the width of the variable. One interesting feature of Handel-C is that the designer


C.3 Timing and Control in Handel-C

needs to specify the variable width. A variable can be declared as:

signed int 7 number;

In this case, ’number’ can be any number from −26 to 26 − 1 with a specified width for vari-ables, the hardware is used more efficiently than any other HDLs. This feature carefully avoidsunnecessary allocation of hardware.

C.3 Timing and Control in Handel-CHandel-C is designed across a very simple timing model. In the synthesized hardware, theevaluations of expressions and the execution of statements are synchronized to one or moreclock. It is said that every assignment takes exactly one clock cycle and everything else is freein Handel-C. Expressions are constructed using combinatorial logic, and data is clocked onlywhen one assignment is performed. Consider the following commands:

F = a; f + +; f+ = b;

This takes three clock cycles to complete. In contrary, considering the following:

F = a + 1 + b;

This takes only one clock cycle and gives the same result as the first three expressions.

It is straightforward in Handel-C to look at a piece of code and tell which instructions exe-cute on which clock cycles. This gives the programmer opportunity to have a full control ofwhat is happening in the design at any time. Thus, the designer can keep track of the flow ofthe execution and it is very easy to predict and control the performance of a program in terms ofthe clock cycle. However, there is a penalty imposed for this simple timing model, i.e. the morethe complex the expression, the deeper the logic required to implement the expression. Deeperlogic will have a higher propagation delay. This in turn limits the maximum clock rate at whichthe design can be run.

One disadvantage that this timing model brings is the side effects of the codes. Side effectsare not accepted in Handel-C, because side effects cause multiple assignments, which is a verypopular feature in ANSI C. Thus, familiar codes like the following cannot be executed properlyin DK:

q = p + +;

Instead it should be broken into two statements as the following:

p = p + 1; q = p;


C.4 Parallel Hardware Generation

C.4 Parallel Hardware GenerationPerhaps the most significant difference in hardware and software design is hardware parallelism.Whilst software is usually sequential (one statement at a time), the hardware usually performsmany operations at a time. In the hardware, the instructions are executed via a dedicated hard-ware part, thus they occur strictly concurrently, where else software is coded sequentially, thuscodes are also executed one by one. Of course, such applications may run in a truly concur-rent fashion in multi-processor systems, provided there are enough processors and inter-threadcommunications is limited [39]. Handel-C efficiently utilizes the inherent parallelism in thehardware by using a simple ‘par’ command. Handel-C programs achieve dramatic speedups byexecuting several tasks at the same time. Handel-C is sequential by default, so to take advan-tage of hardware parallelism, explicit ‘par’ statement is supported in Handel-C. Consider thefollowing codes:

par{a = 10;b = 20;c = 30;}

seq{a = 10;b = 20;c = 30;}

The ‘par’ statement will execute all three assignments in one cycle, whereas the ‘seq’ state-ment will take three clock cycles for three variable assignments. Using arrays of functions orby generating the inline code, large blocks of functionality can be generated that execute in aparallel style. Hardware can be replicated using the construct

par (i = 0; i < 10; i + +){a[i] = b[i];}

which results in 10 parallel assignment operations. Multiple copies of functions may be createdby using the macro language of Handel-C, by using the inline keyword which instantiates acopy of a function every time it is referenced or by explicitly defining an array of functions asin the following fragment that defines 32 instances of a function:

void evaluate[32](unsigned int population){...}


C.5 Channel Communication

C.5 Channel CommunicationChannels are provided in Handel-C to communicate between processes. When data is writtenin a channel, a copy of data written in the channel is sent to the receiving process, where it mustbe read. When a channel is initiated, that is when a process writes in a channel, it waits for theother process to read from the channel. Similarly, when a process reads from a channel, then itwaits until the sending process writes to the channel. In this way, parallel blocks can exchangedata at any point of execution time. In addition, channels are used to provide synchronizationbetween parallel processes.

C.6 External Communication via InterfacesCommunication between the hardware and the outside world is performed using interfaces.This may be specified as input or output and, as with assignment, a write-to or a read-froman interface will take one clock cycle. The language allows the designer to target particularhardware, assign input and output pins, specify the timing of signals, and generally control thelow level hardware interfacing details. Macros are available to help target particular devices[41].

C.7 Bit Level OperatorsHandel-C provides a number of bit manipulation operators that are not available in ANSI C.The following bit operators are provided:

y = x < −n Takes the nleast significant bits from xy = x \\ n Drops the nleast significant bits from xy = x@z Concatenates the bit patterns that represent x and zy = x[5 : 3] Selects bits 3,4 and 5 from x

The width operator returns the number of bits in an expression.

C.8 Calling C/C++ Libraries in Handel-CIt is possible to call C/C++ libraries function in Handel-C directly. In this way, some usefulfunctions from C/C++ can be used very effectively in Handel-C, such as read and write to files,etc. similarly, almost all C/C++ statements can be used in Handel-C simulations. However theC/C++ libraries cannot be used for hardware compilation, in this case, all C/C++ codes andlibraries have to be converted to Handel-C.

C.9 Some Restrictions when using Handel-C and FPGAsBecause Handel-C targets hardware, it imposes some programming restrictions when comparedto a traditional C compiler. These need to be taken into consideration when designing code that


C.9 Some Restrictions when using Handel-C and FPGAs

can be compiled by Handel-C.

Firstly, there is no stack available, so recursive functions cannot be directly supported by thelanguage.

Secondly, the size of memory that can be implemented using standard logic cells on an FPGAis limited, because implementing memory is an inefficient use of FPGA resources. However,some FPGAs have internal RAM that can be used by Handel-C. For example, the Xilinx Virtexand Spartan series support internal memory that Handel-C allows the user to declare as RAMor Read Only Memory (ROM). For example the definition

ram int 8 mem[128];

declares a RAM block of 128 cells, each 8 bits wide, which can be accessed as a standardarray.

A limitation of using RAM or ROM is that it cannot be accessed more than once per clockcycle. This restricts the potential for parallel execution of code that accesses RAM or ROM.

Thirdly, expressions are not allowed to have side effects, since this would break the singlecycle assignment rule. Therefore code such as

a = + + b;is not allowed and needs to be re-written as:

b = b + 1;a = b;


Bibliography

[1] Hans Peter Schwefel, "Wireless Networks-III Fall 2005", Lecture Slides, Aalborg Univer-sity, 2005

[2] 3GPP TS 25.211 V6.5.0 (2005-06),Technical Specification Group Radio Access Network;Physical channels and mapping of transport channels onto physical channels (FDD) (Re-lease 6)

[3] 3GPP TS 25.212 V6.5.0 (2005-06), Technical Specification Group Radio Access Network;Multiplexing and channel coding (FDD)(Release 6)

[4] 3GPP TS 25.213 V6.4.0 (2005-09), Technical Specification Group Radio Access Net-work;Spreading and modulation (FDD)(Release 6)

[5] 3GPP TS 25.308 V6.3.0 (2004-12), Technical Specification Group Radio Access Network;High Speed Downlink Packet Access (HSDPA); Overall description; (Release 6)

[6] 3GPP TR 25.855 V5.0.0 (2001-09), Technical Specification Group Radio Access Net-work; High Speed Downlink Packet Access; Overall UTRAN Description (Release 5)

[7] 3GPP TR 25.858 V5.0.0(2002-03), Technical Specification Group Radio Access Network;High Speed Downlink Packet Access: Physical Layer Aspects (Release 5)

[8] 3GPP TR 25.950 V4.0.1 (2005-07), Technical Specification Group Radio Access Net-work; UTRA High Speed Downlink Packet Access (Release 4)

[9] 3GPP TS 25.302 V6.5.0 (2005-09), Technical Specification Group Radio Access Network;Services provided by the physical layer (Release 6)

[10] TSGR1#16(00)1316, "TR on HSDPA"

[11] TSGR1#17(00)1395, "Adaptive Modulation and Coding (AMC)"

[12] Reinhold Kruger and Heinz Mellein, "UMTS Introduction and Measurement", Rohde andSchwarz.

[13] Harri Holma and Antti Toskala, "WCDMA for UMTS", Third Edition, John Wiley andsons Limited, 2004


BIBLIOGRAPHY

[14] Pablo Jose, Ameigeiras Gutierrez, "Packet Scheduling and Quality of Service in HSDPA",Ph.D Thesis report, Aalborg University, 2003

[15] Rudolf Tanner and Jaason Woodard, "WCDMA Requirements and Practical Design", FirstEdition, John Wiley and sons Limited, 2004

[16] Andrew Miceli, "Wireless Technician’s Handebook", Second Edition, Mobile Communi-cations Series, Artech House, 2003

[17] Christina Gessner, "UMTS Booster-HSDPA paves the way for mobile multimedia devel-opment", Telecommunications Magazine, International Issue, October 2005 issue.

[18] B. Vucetic and J. Yuan, "Turbo Codes", First Edition, Kluwer Academic Publishers, 2000.

[19] M. Farooq Sabir, Rashmi Tripathi, Brian L. Evans and Alan C. Bovik, "A Real-Time Em-bedded Software Implementation of a Turbo Encoder and Soft Output Viterbi Algorithmbased Turbo Decoder", Dept. of Electrical and Comp. Eng., The University of Texas,Austin, TX, 2002.

[20] L. Perez, J. Seghers, and D. J. Costello, "A Distance Spectrum Interpretation of TurboCodes," IEEE Transactions on Information Theory 42 (November 1996): 1698U1708.

[21] Hagenauer Joachim, Elke Offer and lutz Papke, "Iterative Decoding of Binary Blockand Convolutional Codes", IEEE Transactions on Information Theory, Volume.42 No.2,March 1996.

[22] L. Hanzo, T. H. Liew, B. L. Yeap, "Turbo Coding, Turbo Equalisation and Space-TimeCoding for Transmission over Wireless Channels", Department of Electronics and Com-puter Science, University of Southampton, UK

[23] John G. Proakis, Masoud Salehi, "Communication System Engineering", Prentice-Hall,Inc., 1994.

[24] Matthew C. Valenti and Jian Sun, "Handbook of RF and Wireless Technologies", 25thSeptember 2003, Pages: 375-400.

[25] Sam W. Ho, Intel, http://www.intel.com/netcomms/technologies/wimax/303788.pdf, 22ndNovember, 2004.

[26] M.Frodigh, S.Parkvall, C.Roobol, P.Johansson and P.Larsson, Future-Generation WirelessNetworks, IEEE Personal Communications, Volume.8, Issue:5, Oct 2001.

[27] S.Parkvall, E.Dahlman, P.Frenger, P.Beming and M.Persson, The evolution of WCDMAtowards higher speed downlink packet data access, Vehicular Technology Conference,2001. VTC 2001 Spring. IEEE VTS 53rd, Volume:3, 2001.

[28] B.Vucetic, J.Yuan, "Turbo Codes-Principles and Applications", Kluwer Academic Pub-lishers, 2002.


BIBLIOGRAPHY

[29] Patrick Robertson, Emmanuelle Villebrun and Peter Hoeher, "A comparison of optimaland sub-optimal MAP decoding algorithm operating in the log domain" in Proc. ICCŠ95,pp. 1009U1013.

[30] L.Bahl, J.Cocke, F.Jelinek, and J.Raviv, "Optimal decoding of linear codes for minimizingsymbol error rate", IEEE Transaction on Information Theory, vol IT-20, pp. 284-287,March 1974.

[31] Yanhui Tong, Tet-Hin Yeap and Jean-Yves Chouinard, "VHDL Implementation of a TurboDecoder with Log-MAP based Iterative Decoding", IEEE Transaction on Instrumentationand Measurement, Vol.53, No.4, August 2004.

[32] C. Berrou, P. Adde, E. Angui, and S. Faudeil, "A low complexity softoutput Viterbidecoder architecture," in Proc. IEEE ICC 1993, Geneva, Switzerland, May 1993, pp.737U740.

[33] G. Masera, G. Piccinini, M. Roch, and M. Zamboni, "VLSI architectures for turbo codes",IEEE Trans. On VLSI Systems, Vol. 7, No. 3, September 1999, pp. 369-379.

[34] Stephen Brown and Jonathan Rose, "Architecture of FPGAs and CPLDs: A tutorial",IEEE Journal of Design and Test of Computers, Vol 13, No.2, pp. 42-57, 1996

[35] Jordy Potman, Fokke Hoeksema and Kees Slump, " Tradeoffs between Spreading Factor,Symbol Constellation Size and Rake Fingers in UMTS", University of Twente, Depart-ment of Electrical Engineering, Signals and Systems Group, Enschede - The Netherlands,30th October 2005.

[36] F. Adachi, M. Sawahashi, and K. Okawa, "Tree-structured generation of orthogonalspreading codes with different lengths for forward link of DS-CDMA mobile radio", IEEElectronic Letters, Volume 33, No. 1, 2nd January 1997.

[37] http://web.comlab.ox.ac.uk/oucl/work/christian.peter/overview_handelc.html, 14th Octo-ber 2005.

[38] Celoxica White Paper, "Handel-C for Hardware Design", Celoxica Limited UK, August2002.

[39] Yannick Le Moullec, " HW Platform Analysis, Compilation and Optimisation Fall2005",Lecture Slides, Aalborg University, 2005

[40] Hoare C.A.R., "Communicating Sequential Processes", Prentice-Hall International, En-glewood, 1985.

[41] Peter N. Martin, "Genetic Programming in Hardware", PhD thesis, Department of Com-puter Science, University of Essex, Spring, 2003.

[42] Giovanni De Micheli, Rajesh K. Gupta, "Hardware/Software Co-Design", Proceedings ofIEEE, Vol.85, No.3, March 1997.


BIBLIOGRAPHY

[43] Axel Jantsch, Shashi Kumar, Ahmed Hemani, "The Rugby Model: A Conceptual Framefor the Study of Modelling, Analysis and Synthesis Concepts of Electronic Systems"

[44] Jantsch, A., S. Kumar, A. Hemani, 2000: A Metamodel for Studying Concepts in Elec-tronic System Design. IEEE Design Test of Computers, Vol. 17, No. 3, July/September2000. pp. 78-85.

[45] Daniel D. Gajski and Robert H. Kuhn, "Guest Editor’s Introduction: New VLSI Tools",IEEE Computer, December 1983, pages 11-14.

[46] Daniel D.Gajski, Jianwen Zhu, Rainer Domer, Andreas Gerstlauer, Shuqing Zhao,"SpecC: Specification Language and Methodology", Kluwer Academic Publishers, Sec-ond Print, 2001

[47] Heiko Michel, Alexander Worm, Michael Munch and Norbert Wehn, "Hardware/SoftwareTrade-offs for Advanced 3G Channel Coding", IEEE Computer Society, Proceedings ofDesign, Automation and Test in Europe conference and exhibition, 2002

[48] Juergen Jaeger, Shawn McCloud, "The four Rs of efficient system design", EmbeddedSystems Design, 3rd, January, 2005

[49] Tim Behne, "FPGA Clock Schemes", Embedded Systems Design,http://www.embedded.com/story/OEG20030210S0053, 2nd October, 2003.


hsdpa-adaptive modulation and coding acceleration with …kom.aau.dk/group/06gr1043/master.pdf ·...

Documents