vhdl implementation of cordic algorithm for wireless …19532/fulltext01.pdf · vhdl implementation...

VHDL implementation of CORDIC algorithmfor wireless LAN

Master thesis performed in Electronics Systems by

Anastasia Lashko,

Oleg Zakaznov

LiTH−ISY−EX−3515−2004

Linköping, 2004

VHDL implementation of CORDIC algorithmfor wireless LAN

Master thesis in Electronics Systems

at Linköping Institute of Technology

by

Anastasia Lashko,Oleg Zakaznov

LiTH−ISY−EX−3515−2004

Supervisor: Kent Palmkvist

Examiner: Kent Palmkvist

Linköping, 20th of February, 2004

Avdelning, InstitutionDivision, Department

Institutionen för systemteknik581 83 LINKÖPING

DatumDate2003−02−20

SpråkLanguage

RapporttypReport category

ISBN

Svenska/SwedishX Engelska/English

LicentiatavhandlingX Examensarbete

ISRN LITH−ISY−EX−3515−2004

C−uppsatsD−uppsats

Serietitel och serienummerTitle of series, numbering

ISSN

Övrig rapport____

URL för elektronisk versionhttp://www.ep.liu.se/exjobb/isy/2004/3515/

TitelTitle

VHDL Implementation of CORDIC Algorithm for Wireless LAN

Författare Author

Anastasia Lashko, Oleg Zakaznov

SammanfattningAbstract

This work is focused on the CORDIC algorithm for wireless LAN. The primary task is to create aVHDL description for CORDIC vector rotation algorithm.

The basic research has been carried out in MATLAB. The VHDL implementation of the CORDICalgorithm is based on the results obtained from the MATLAB simulation. Mentor Graphics FPGAAdvantage© for Xilinx 4010XL FPGA has been used for the hardware implementation.

NyckelordKeyword

FPGA, CORDIC, implementation, wireless LAN, intearliving, pipelining

Abstract

This work is focused on the CORDIC algorithm for wireless

LAN. The primary task is to create a VHDL description for

CORDIC vector rotation algorithm.

The basic research has been carried out in MATLAB. The

VHDL implementation of the CORDIC algorithm is based

on the results obtained from the MATLAB simulation.

Mentor Graphics FPGA Advantage© for Xilinx 4010XL

FPGA has been used for the hardware implementation.

− i −

− ii −

Acknowledgments

This thesis has been written in Electronics Systems division,

department of Electrical Engineering, Linköpings Universitet in the

contents of the international master program in SoCware − Integrated

Systems for Communication and Media.

We would like to give thanks to all ISY department staff,

especially, our supervisor Kent Palmkvist for his useful advices and

continuous attention to us.

We also thank all our neighbors, especially Deborah Capello for

her help during all the time we were writing our thesis.

Special thanks our friend Fedor Merkelov for giving many

advices in VHDL language and MATLAB.

− iii −

− iv −

Table of Contents

1 Introduction.........................................................................................1

1.1 About CORDIC..............................................................................1

1.2 Wireless LAN.................................................................................1

1.3 Background....................................................................................2

2 WLAN basics.......................................................................................3

2.1 Modulation.....................................................................................3

2.2 OFDM system model......................................................................4

2.3 Maximum likelihood synchronization.............................................6

2.3.1 Correlation properties of r(·)...................................................6

2.3.2 The likelihood function............................................................7

2.3.3 Simultaneous estimation..........................................................8

2.3.4 Synchronization.......................................................................9

2.4 Correcting sample frequency error................................................10

3 Algorithm...........................................................................................13

3.1 Modes of the CORDIC algorithm.................................................13

3.2 Arithmetics...................................................................................15

3.3 Another method to implement CORDIC algorithm.......................17

4 Solution..............................................................................................19

4.1 MATLAB program.......................................................................19

4.2 VHDL program............................................................................24

4.2.1 Interleaving...........................................................................25

Input control block .....................................................................27

Arithmetic blocks, CORDIC.......................................................28

Output control block...................................................................28

4.2.2 Pipelining..............................................................................29

Input control block......................................................................29

− v −

Arithmetic blocks.......................................................................30

Output control block...................................................................30

4.3 FPGA implementation..................................................................31

5 Conclusion..........................................................................................33

5.1 Future work..................................................................................33

References.............................................................................................35

Abbreviations........................................................................................37

Appendix A MATLAB program..........................................................39

Appendix B Wave diagrams................................................................41

− vi −

Index of figuresFigure 1. QAM constellation......................................................................................3

Figure 2. OFDM system model...................................................................................5

Figure 3. Structure of OFDM signal with cyclically extended frames.........................6

Figure 4. Block diagram of timing and frequency synchronization algorithms...........9

Figure 5. Signals that generate the estimates...............................................................9

Figure 6. Receiver structure for correcting sample frequency error...........................10

Figure 7. The rotation and vectoring modes of the CORDIC algorithm....................13

Figure 8. Block diagram of CORDIC algorithm.......................................................15

Figure 9. Calculations of vector angle.......................................................................20

Figure 10. Angle absolute values..............................................................................21

Figure 11. Accuracy calculation................................................................................22

Figure 12. Main block of VHDL program................................................................24

Figure 13. Structure diagram of interleaved VHDL block........................................25

Figure 14. Input control block..................................................................................27

Figure 15. CORDIC block........................................................................................28

Figure 16. Output control block................................................................................28

Figure 17. Structure diagram of pipelined VHDL architecture..................................29

Figure 18. Input control block..................................................................................29

Figure 19. Arithmetic blocks for pipelined version...................................................30

Figure 20. Output control block................................................................................30

Index of TablesTable 1. CORDIC accuracy and minimal number of steps........................................23

Table 2. Angle calculation........................................................................................26

Table 3. Signal structure...........................................................................................26

Table 4. FPGA utilization.........................................................................................31

− vii −

− viii −

1 Introduction 1

1 Introduction

1.1 About CORDICAll trigonometric functions can be computed using vector rotation. The

CORDIC (COordinate Rotation DIgital Computer) algorithm was developed by

Volder [6] in 1959. It rotates the vector, step−by−step, with a given angle.

Additional theoretical work has been done by Walther [7] in 1971. The main

principle of CORDIC are calculations based on shift−registers and adders

instead of multiplications, what saves much hardware resources.

CORDIC is used for polar to rectangular and rectangular to polar

conversions and also for calculation of trigonometric functions, vector

magnitude and in some transformations, like discrete Fourier transform (DFT)

or discrete cosine transform (DCT).

In particular case, the CORDIC algorithm is used in wireless LAN

(WLAN) by receivers.

1.2 Wireless LAN

Wireless LAN is specified with IEEE 802.11 standard. It was accepted in

1999 and led to organization of local networks development. WLAN is a

flexible data communications system implemented to extend or substitute for, a

wired LAN. Radio frequency (RF) technology is used by a wireless LAN to

transmit and receive data over the air, minimizing the need for wired

connections. A WLAN enables data connectivity and user mobility [8].

2 1 Introduction

1.3 Background

The main purpose of current work is to make the necessary research and

implement the CORDIC algorithm for wireless LAN. This algorithm calculates

the angle of a received vector in a signal constellation by means of the

arctangent function, and it is used in receivers to find frequency offset and phase

shift.

The work is targeted to research the number of bits of the input vector and

angle signals and specify the number of steps for a recursive CORDIC

algorithm. The research and intermediate results are made in the MATLAB 6.5

environment, and the final program has been implemented in the VHDL

language for FPGA.

2 WLAN basics 3

2 WLAN basics

This chapter describes the basic theoretical aspects, regarded to current

work, which show how the angle calculation is used by a receiver.

2.1 Modulation

Modulation is the process of transforming the carrier into waveforms

suitable for transmission across channel, according to parameters of modulated

signal [2].

The most useful type of modulation for OFDM (orthogonal frequency

division multiplexing) is quadrature amplitude modulation (QAM). It is shown

on Figure 1. QAM changes phase and amplitude of the carrier, and it is a

combination of amplitude shift keying (ASK) and phase shift keying (PSK).

Figure 1. QAM constellation

2

4

6

2 4 6−2−4−6

−2

−4

−6

QPSK

16−QAM

64−QAM

I

Q

4 2 WLAN basics

Equation 1 shows how amplitude and phase are combined in QAM, represented

in so−called IQ−form [1]:

s t =I k cos ωc t BQk sin ωc t =Ak cos ωc t+φk , (1)where

Ak= I k2+Qk

2 − amplitude,

φk=tanB1 Qk

I k

− phase.

In this work 64−QAM has been used.

2.2 OFDM system modelConsider the transmission of a signal through a wireless channel.

Assume that OFDM modulation is used. It could be amplitude modulation as

well as PSK. The signal in a channel is normally represented as a sequence of

complex numbers, xk, from the signal constellation, which is modulated with N

subcarriers by an inverse DFT. It is divided in frames. Each OFDM frame

consist of N+L samples (see Figure 2), where last L samples are copy of first L

samples (cyclic prefix). The transmission is performed over a discrete−time

channel [3].

2 WLAN basics 5

Figure 2. OFDM system model

Assume that transmitted signal is affected by a complex, additive white

Gaussian noise (AWGN), n(k). There are two uncertainties that may affect the

received signal: uncertainty in the arrival time, n(k) = δ(k − Θ), where Θ is the

integer−valued unknown arrival time of a frame, and the uncertainty in carrier

frequency due to a difference between transmitter and receiver frequency, which

is modeled as multiplicative distortion factor ej2πεk/N, where ε denotes the

difference in the transmitter and receiver oscillators relative to the inter−carrier

spacing. Hence, the received data is:

r(k) = (s(k − Θ) + n(k − Θ))·ej2πεk/N.

Data yk is extracted from r(k) by removing the cyclic prefix (first L

symbols) and applying DFT to N remaining symbols of a frame.

Since the symbol error highly depends on Θ and ε, these parameters are

considered in the next chapter.

Channel

IDF

T

DF

T

Ser

ial /

Par

alle

l

Par

alle

l / S

eria

lx0

x N − 1

x k

s0

s k

s N + L − 1

s(k) r(k)

r0

rk

r N + L − 1

y0

yk

yN − 1

. . .

. . .

. . .

. . .

6 2 WLAN basics

2.3 Maximum likelihood synchronization

This paragraph illustrates the main theoretical aspects regarded to the

synchronization in WLANs. More formally this material is shown in [3].

2.3.1 Correlation properties of r(·)

Consider that an observation interval of sample OFDM frame consists of

2N + L consecutive samples of r(k) contain one complete (N + L) OFDM frame

(see Figure 3).

Collect the observed samples in (2N + L) x 1 − vector

r «�[r(1) ... r(2N + L)]T. Define the index sets I « [Θ, Θ + L − 1] and

I’ « [Θ + Ν, Θ + Ν + L − 1]. The set I’ thus contains the indices of the data

samples that are copied into the cyclic prefix, and the set I contains the indices

of this prefix. Notice that the samples in the cyclic prefix are their replicas,

k ∈ I ∪ I’ are pairwise correlated, i.e., ∀k ∈ I:

Figure 3. Structure of OFDM signal with cyclically extended frames,s(k).

I I'

k2N + L

s(k)

1 Θ

Frame i

Observation interval

Frame i −1 Frame i +1

Θ + L Θ + N

2 WLAN basics 7

E r k r k+m =σ s

2+σn2 m=0

σ s2⋅eℑ2 ρε m=N

0 othervise

where σ s2«E s k 2 and σn

2«E n k 2 , while the remaining samples

k ∉ Ι ∪ Ι’ are mutually uncorrelated.

2.3.2 The likelihood function

The log−likelihood function for Θ and ε is the logarithm of the

probability density function of observing the 2N + L samples in r given the

arrival time Θ and the carrier frequency offset ε:

Λ Θ ,ε =log f r|Θ ,ε . (2)

The ML (maximum likelihood) estimate Θ̂ , ε̂ is the maximizing

argument of this function. If the condition (Θ, ε) is dropped for notational

clarity, (2) can be written as:Λ Θ ,ε =

= log ∏k∈I

f r k ,r k+N ∏k∉I ∪I’

f r k =

= log ∏k∈I

f r k ,r k+N

f r k f r k+N∏

k

f r k

(3)

where f(·) denotes the probability density function of its respective arguments.

The last factor in (3) is independent of Θ and may be dropped. Then, (3)

becomes:

Λ Θ ,ε = ∑k=Θ

Θ+LB1

2 ℜ e j2Πε r k r∗ k+N Bρ r k 2+ r k+N 2 , (4)

where ρ«σ s

2

σ s2+σn

2 is the magnitude of the correlation coefficient between

r(k) and r(k + N).

,

8 2 WLAN basics

2.3.3 Simultaneous estimation

The maximization of the log−likelihood function, can be performed in

two steps:

maxΘ ,ε

Λ Θ ,ε =maxΘ

maxε

Λ Θ ,ε =maxΘ

Λ Θ , ε̂ Θ

To find the maximum with respect to the frequency offset ε, we set the

derivate of the log−likelihood function (4) to zero and obtain

tan ε̂ Θ =B

ℑ ∑k=Θ

Θ+LB1

r k r∗ k+N

ℜ ∑k=Θ

Θ+LB1

r k r∗ k+N

.

By substitution of the ML−estimate ε̂ Θ the log−likelihood function

for Θ ,Λ Θ , ε̂ Θ can be calculated, and the maximum with respect to Θ can

be found. The simultaneous ML−estimation of Θ and ε becomes:

Θ̂=arg maxΘ

λ Θ , ε̂=B 1

2πγ Θ |Θ=Θ̂ ’ (5)

where

λ Θ =2 ∑k=Θ

Θ+LB1

r k r∗ k+N Bρ ∑k=Θ

Θ+LB1

r k 2+ r k+N 2 ,

γ Θ =r ∑k=Θ

Θ+LB1

r k r∗ k+N .

2 WLAN basics 9

2.3.4 Synchronization.

A synchronizing algorithm based on the estimators in (5) can now be

designed. Figure 4 shows the estimators in a block diagram. Figure 5 shows two

essential signals that are generated if a received signal consisting of several

OFDM frames is processed continuously by this estimator.

Figure 4. Block diagram of timing and frequency synchronization algorithms

ρ•2

ρ•2

zN

(•)∗

2•

r B 1

2π

⊕ ⊕

⊗

movingsum

movingsum

argmax

r(k)

−

+

λ(θ)

γ(θ) ε̂

Θ̂

Figure 5. Signals that generate the estimates

10 2 WLAN basics

The estimation methods can be improved if the parameters Θ and ε can

be considered constant over several OFDM frames. If the observation interval

contains M completed OFDM frames the log−likelihood function of Θ and ε

given this observation interval becomes

Λ Θ ,ε = 1

M∑m=0

M B1

Λm Θ ,ε ,

where Λm Θ ,ε is the log−likelihood function of Θ and ε given that

only frame m is observed. Thus, to obtain the log−likelihood function, the log−

likelihood functions of the individual frames must be averaged.

2.4 Correcting sample frequency errorThe receiver structure for correcting sample frequency error is shown on

Figure 6. The rotation caused by the sampling frequency offset can be corrected

after the DFT processing by derotating the subcarriers.

Figure 6. Receiver structure for correcting sample frequency error

The non−synchronized sampling receiver shows an additional block

named "rob/stuff", just after the ADC. This block is necessary, because the drift

in sampling instant will eventually be lager than the sampling period. When it

occurs, the "rob/stuff" block will either "rob" one sample from the signal or

"stuff" a duplicate sample, depending on the receiver clock, whether it faster or

slower than the transmitter clock. This process prevents the receiver sampling

decision H(f)

fixedXtal

ROTOR FFT ADCrob/stuff

DPLL

⊕r(t)

n(t)

s(t)

2 WLAN basics 11

instant from drifting so much that the symbol timing would be incorrect. The

"ROTOR" block performs the required phase corrections with the information

provided by the Digital Phase−Locked Loop (DPLL) that estimates the sampling

frequency error [2].

3 Algorithm 13

3 Algorithm

CORDIC is based on the common rotation equations:x’=x⋅cosφBy⋅sinφ=cosφ⋅ xBy⋅tanφy’=y⋅cosφ+x⋅sinφ=cosφ⋅ y+x⋅tanφ

(6)

If tan φ = ±2−i, multiplication in (6) can be performed by a simple shift

operation. This allows the vector to be rotated by desired angle in a sequence of

smaller rotations by angle φι = ± tan−1 (2−i):

xi+1 = ki [xi − yi · di · 2−i]

yi+1 = ki [yi + xi · di · 2−i],

where k i=cos tanB1 2Bi =1 ⁄ 1+2B2i , d i=±1 .

3.1 Modes of the CORDIC algorithm

CORDIC rotator works in two modes: rotation and vectoring (Figure 7)

[5].

Figure 7. The rotation (X’, Y’) and vectoring(X, Y) modes of the CORDIC algorithm

Y’

Y

R

X’ X

θα

14 3 Algorithm

In the first mode it rotates the input vector by specified angle. The angle

accumulator is initialized with the desired rotation angle. For this mode,

CORDIC equations are:

xi+1 = xi − yi * di* 2−i

yi+1 = yi + xi * di* 2−i

zi+1 = zi − di * tan−1( 2−i),

where d i=B1, z i<0

1, otherwise

This gives the following result:

xn = An· [x0 cosz0 − y0 sinz0]

yn = An· [y0 cosz0 − x0 sinz0]

zn = 0,

where An = ∏n

1+2B2i

.

In the vectoring mode input vector is being rotated to the x axis while

recording the angle required to make that rotation. The result of the vectoring

operation is a rotation angle and the scaled magnitude of original vector. The

CORDIC equations in this mode are:

xi+1 = xi − yi * di* 2−i

yi+1 = yi + xi * di* 2−i

zi+1 = zi − di * tan−1( 2−i),

where d i=B1, y i<0

1, otherwise(7)

Then:

xn = An20

20 yx +

yn = 0

3 Algorithm 15

zn = z0 + tan−1 (0

0x

y)

An = ∏ −+n

i221

The CORDIC algorithm in every mode is limited between 2

π− and 2π .

This limitation is caused by the first rotation angle φ0 = tan−1(20).

3.2 ArithmeticsFigure 8 represents the block diagram of conventional CORDIC

algorithm [10], based on ripple carry adders or subtractors.

Figure 8. Block diagram of CORDIC algorithm

A/S

±

A/S

±

A/S

±

A/S

±

A/S

±

A/S

±

X0

Y0

Xn

Yn

σ̄0

σ̄1

σ̄nB1

σ0

σ1

σnB1

X1 Y1X

12B1

Y1 2B1

XnB1

2n+1YnB1

2n+1XnB1

YnB1

16 3 Algorithm

An adder/subtractor (A/S), depending on a selection input, performs an

addition or a subtraction. This input indicates whether an operand is negative.

The basic cell of A/S is decomposed by two functions with 4 bits input each.

One of them is for calculating the output and another to transmit the carry.

According to this an N−bit A/S can fit in (2N+1)/2 CLBs (configurable logic

block). The additional half CLB is required for introducing the least significant

bit (LSB) one in case of the substraction. The critical path here is indicated by

the ripple carry propagation and the routing delay of the A/S wire. This net has a

fan−out of 2N in this case. It decreases the performance of the circuit and it is

the main disadvantage of conventional CORDIC implementations. As the

solution to this, redundant arithmetic could be used to increase the speed of the

CORDIC. Implementation avoids the carry propagation from the LSB to the

most significant bit (MSB), due to its carry−free property.

Redundant arithmetic is good to accelerate those operations, which have

a long propagation delay. On the other hand redundant arithmetic also has some

disadvantages. For example, it is impossible to detect the sign of a redundant

number without checking all the digits which expects a propagation from the

MSB to the LSB. Another problem, that the redundant arithmetic uses digit set

{−1,0,1}, and needs more hardware resources to execute simple tasks, than the

conventional one which uses digit set {−1,1}.

According to results of the research, the redundant arithmetic is more

accurate, but it needs much more hardware than the conventional arithmetic and

for this reason conventional arithmetic has been used in this work.

3 Algorithm 17

3.3 Another method to implement CORDIC algorithm

This part demonstrates some other implementations of the CORDIC

algorithm.

The equations bellow demonstrate how the rotation could be performed

by finite number of so−called micro−rotations

x i+1=x iBµi⋅y i 2Bi

y i+1=y i+µi⋅x i 2Bi

Θi+1=ΘiBµi⋅atan 2Bi

where µi=±1

After (n + 1) iterations the basic equations of CORDIC system become:

x n=K1 x 0 cos Θ +y0 sin Θ

yn=K1 y0 cos Θ Bx 0 sin Θ

There are two methods to implement CORDIC: merged and truncated

algorithms [10]. The main idea of the merged algorithm is to combine two

iterations into one, which results in a reduction of the number of iteration.

The truncated algorithm based on the first iterations of the conventional

algorithm. When iterations increases, the micro−rotation of angle Θi decreases

progressively, the last iteration could be replaced by a full rotation around a

reduced angle.

4 Solution 19

4 SolutionThe work has been started with the research in MATLAB. There are two

versions of VHDL program, based on MATLAB research.

4.1 MATLAB programIt is good approach to start to research the CORDIC algorithm in the

MATLAB environment.

MATLAB has been chosen for this purpose because:

: It provides very good arithmetic methods, functions, and graphical interface,

that will decrease the time of programming.

: The MATLAB language allows to create both small draft applications and

big complicated programs.

: Models are easy to debug.

It works in many different operating systems as Windows, UNIX

(Solaris) and Mac OS.

The main use of the MATLAB program is to debug the CORDIC

algorithm and to understand how accurate the angle value should be.

20 4 Solution

Consider the signal constellation shown on Figure 9. The coordinates of

the received vector are normally not exactly in the center of a constellation

point. This is caused by channel noise, which is assumed to be complex additive

white Gaussian noise (AWGN) [2].

The accuracy of vector value in the receiver depends on its number of

bits. The main task is to make the calculated angle value point the vector to the

same constellation point, not to nearest one. This is the cause that determines

the required accuracy of the angle calculation.

The program in MATLAB calculates the first 20 steps of the CORDIC

algorithm and displays how accurate each step is.

It takes the closest vectors in the signal constellation as input. The

distance between them, ∆x, is defined by the value of the least significant bit.

Figure 9. Calculations of vector angle.

Basic vector

Error vector

QAM−64 signalconstellation

Φ

Φ+∆Φ

∆Φ

∆x

I

Q

4 Solution 21

For example, if it equals to 2−2, then the distance between these two

vectors is 0.25, and the accuracy, ∆Φ, should be less than 0.02.

The program looks onto the CORDIC arctangent values on each step,

and interpolates them to an exponent using a built−in MATLAB function. Then,

depending on this interpolation and inaccuracy values, it finds the minimal

number of steps necessary to calculate the angle.

The text of program is given in Appendix A.

As a result, program plots two graphics: the angle values at each step,

and the accuracy of those values, as is shown on Figures 10 and 11.

Figure 10. Angle absolute values

22 4 Solution

Figure 11 shows CORDIC algorithm calculated in 2 ways:

� when factor d in (7) belongs to conventional arithmetic, i.e., d ∈ (−1; 1);

� when d belongs to set (−1; 0; 1) from the redundant arithmetic.

As it can be seen from this Figure, the redundant arithmetic does not

have much gain against conventional arithmetic, but it takes more hardware

resources to decide whether to perform rotation on each step or not. Hence the

redundant arithmetic is not a good approach, and for this reason the

conventional one is selected to write the program.

This program has been used to research accuracy and minimal number

of CORDIC steps. From geometry point of view, input vectors must be

maximally remote from the center of signal constellation to obtain maximum

Figure 11. Accuracy calculation

4 Solution 23

accuracy value. This gives minimal value of ∆Φ (see Figure 9). The result is

shown in Table 1.

Table 1. CORDIC accuracy and minimal number of steps

Vectorwidth*, (bit) Range* Vector accuracy Angle accuracy, (rad) Nmin

4 [−8; 7] 20 = 1 0.07677189 ≥ 2−4 5

5 [−8; 7.5] 2−1 = 0,5 0.03934975 ≥ 2−5 6

6 [−8; 7.75] 2−2 = 0,25 0.01985645 ≥ 2−6 7

7 [−8; 7.875] 2−3 = 0.125 0.00996644 ≥ 2−7 8

8 [−8; 7.9375] 2−4 = 0.0625 0.00499195 ≥ 2−8 9

9 [−8; 7.96875] 2−5 = 0.03125 0.00249801 ≥ 2−9 10

10 [−8; 7.984375] 2−6 = 0.015625 0.00124951 ≥ 2−10 11

11 [−8; 7.9921875] 2−7 = 0.0078125 0.00062488 ≥ 2−11 12

12 [−8; 7.99609375] 2−8 = 0.00390625 0.00031247 ≥ 2−12 13*effective width, same for real and imaginary parts

According to a given specification the input vector has 12 bit length and

therefore its accuracy is 2−8. Hence, the angle accuracy is 2−12 and the number of

steps required for calculations is 13. Due to the probability theory, the received

vector is normally close to the center of constellation point and not close to a

boundary between points. This means that so high accuracy is not actually

needed for calculations and the number of steps can be reduced.

From the theoretical point of view, IEEE 802.11a standard has data bit

rate of 20MSamples/s and FPGA has the clock frequency of 100 MHz that gives

the sample periods of 5 clock cycles. Then, the effective number of steps must

be 5·k, where k has a natural value.

From research in MATLAB it has been found that 5 CORDIC steps are

not enough to obtain accurate results, but 10 steps give higher accuracy. Even

higher accuracy could be given by 15 CORDIC steps, but it is actually not

proved. Hence, the number of steps required to get the necessary angle accuracy

should be equal to 10. This allows for an input vector of 9 bit for both real and

24 4 Solution

imaginary parts and output 12 bits (3 bit per integer part and 9 bits per fractional

part).

There is also a tradeoff to specify bit width for the internal vector signals

because the vectoring mode CORDIC angle rotation uses the sign of the

imaginary part to specify the direction of the rotation, and it must be correct in

all steps. Since the imaginary part gets closer and closer to ’0’, it must have

enough round bits. The practical tests in the VHDL testbench show that 15 to 16

bits are enough.

4.2 VHDL program

There are two realizations of the VHDL program in Mentor Graphics

FPGA Advantage© tool for Xilinx 4010XL FPGA. Since the found number of

steps in MATLAB is equal to 10, one VHDL program uses interleaving of

processing blocks and other uses pipelining.

The main block is the same for both programs, and it is shown on Figure

12. It inputs a complex vector. The signal start indicates that values on input are

ready, and the program should output angle value.

This block consists of two control units and two arithmetic blocks.

Figure 12. Main block of VHDL program.

clk

rst

start

in_real

in_img

angleCORDICmain

4 Solution 25

Consider each program separately.

4.2.1 Interleaving

The structure diagram of interleaved program is given on Figure 13.

This block is responsible for vector transformation into the internal

format, because the CORDIC algorithm rotates the vector in the right half of the

complex plane, but in wireless LAN the vector directs to any side of the signal

constellation.

Figure 13. Structure diagram of interleaved VHDL block.

start1

start2

start

clk

rst

angle1

angle2

angle

input_control

output_control

CORDIC

CORDIC

c0_imgc0_real

start1

start2angle1

angle2

in_real

in_img

GLOBAL:

q

start1

start2

q

q

c0_img

c0_img

c0_real

c0_real

26 4 Solution

Hence the input vector is transformed according to this requirement.

When the input vector has negative real part, it is inverted and the initial angle

value is plus or minus π (see Table 2). In the other case the input vector is not

changed and the initial angle value is 0.

The quarter signal indicates the quarter of the original vector for the

internal blocks of the program.

Table 2. Angle calculation

Quarter signal, Qi Angle range Initial angle value

"00", "10" [−π/2; π/2] Angle0 = 0

"01" [π/2; π] Angle0 = π

"11" [−π; −π/2] Angle0 = −π

The input control block reads input signals and outputs two−sample

period start signals and internal vector values to each CORDIC block. Then,

each block performs the CORDIC vector rotations and outputs the angle to the

second control unit which actually is a multiplexer.

All signals are presented with fixed point format. Their structure is

shown in table 3:

Table 3. Signal structure

Signal name Type Interpretation Range

in_real, in_img std_logic_vector (11 downto 0) S###.######## [−8; 7.99609]

x, y (internalsignals)

std_logic_vector (15 downto 0) S#####.##...## [−32; 31.9990234]

angle1, angle2 std_logic_vector (11 downto 0) S##.######### [−4; 3.998046]

Signals x and y are used inside the arithmetic blocks. They have enough

guard bits to avoid overflows and round bits to keep the necessary accuracy of

vector rotations.

Consider each block.

4 Solution 27

Input control block

Figure 14 displays the structure of input control block. The time diagram

for the program is shown in Appendix B. On a high level of the start signal, it

looks to its internal flag, inverse it and outputs start1 if it is low and start2

otherwise. These control signals indicate that 1st or 2nd arithmetic block,

correspondently, must start its calculation. Also it transforms the input vector

values into the internal 16−bit representation. If the real part is negative, both

parts are taken with negative sign (c0_real, c0_img) = (−in_real, −in_img).

Signal Q (quarter) consists of two bits which are the sign of in_img and in_real

signals.

Figure 14. Input control block

input_control

clkrst

start

start1start2

in_real

in_img

c0_realco_img

q

28 4 Solution

Arithmetic blocks, CORDIC

On a high level of start signal each CORDIC block resets its internal

counter (cnt) signal value to 1 and calculates the 1st CORDIC step. Calculated

values are stored in signals angle, x and y for angle, real and imaginary parts,

respectively. Then the block begins to perform the CORDIC rotations while cnt

is less than N_steps which is specified as a generic constant. After that, when

cnt is equal to N_steps, the angle signal is ready, and the block stops doing

CORDIC rotations and waits for the next start signal or, if next start signals

arrives instead, it correctly outputs angle value and starts its next loop.

Output control block

This is the smallest block. It plays a multiplexer role. It outputs angle1

signal when start1 comes or angle2 on start2 signal.

Figure 15 CORDIC block

CORDIC

clk

rst

startangle

in_real

in_imgq

Figure 16. Output control block

output_control

clk

rst

start1 angle

start2

angle1

angle2

4 Solution 29

4.2.2 Pipelining

This program is similar to program described above, but it divides the

CORDIC block into two parts, CORDIC1 and CORDIC2, with 5−clock cycle

per sample period each. The structural diagram is shown on Figure 17.

Input control block

This block is similar to input control block for interleaving version. It

transforms in_real and in_img signals into the internal format and outputs the q

signal in the same way. It also outputs pipelined start1 signal for all blocks

which indicates that a new sample period has begun.

Figure 17. Structure diagram of pipelined VHDL architecture

in_real

in_img

start

c0_real

c0_img

q : (1:0)

start1start1

start1

c1_real

c1_img

angle1

angle_in angle

output_control

CORDIC2CORDIC1input_control

clk

rst

GLOBAL:

Figure 18. Input control block.

in_real

in_imgstart

input_control

clkrst

c0_real

c0_img

q : (1:0)

start1

30 4 Solution

Arithmetic blocks

These blocks are shown on Figure 19. They have the same structure, but

the 1st block performs the first 5 CORDIC steps, and the 2nd does the last 5

steps. As in the interleaved version, each block uses its internal counter signal to

make rotations. The main difference between these two blocks is that the second

block does not need the angle quarter signal (q), and it does not output vector

value either.

Output control block

This block has also been slightly changed. For the pipelined program it

acts as the flip−flop for angle_in signal and outputs it when it is ready, i.e., on a

high level of pipelined signal start1.

Figure 19. Arithmetic blocks for pipelined version.

clkrst

clkrst

c0_real

c0_img

start

q : (1:0) start

c1_real

c1_img

angle1

CORDIC2CORDIC1

c1_real

c1_img

angle1

angle_in

Figure 20. Output control block

start1angle_in angleoutput_

controlclkrst

4 Solution 31

4.3 FPGA implementation

Both programs have been implemented in Leonardo tool for FPGA

4010xLPC84. Table 4 demonstrates that they have used about 50% of available

resources.

Table 4. FPGA utilization

Resource Interleaving Pipelining

Used Available Utilization Used Available Utilization

IOs 39 61 63.93% 39 61 63.93%

FG FunctionGenerators

447 800 55.88% 407 800 50.50%

H FunctionGenerators

198 400 49.50% 143 400 35.75%

CLB flip−flops 140 800 17.50% 136 800 17.12%

Theoretically interleaving of two blocks requires almost twice more area

than pipelining. This is proved by Table 4 despite of optimization of all blocks.

Hence, the pipelining is more preferable than interleaving, but if the arithmetic

block is complex and cannot be split into smaller blocks, the interleaving is the

only way to solve the task with all timing constraints.

5 Conclusion 33

5 Conclusion

In this work regarded to CORDIC implementation in VHDL there has

been done a specific research on angle calculation in wireless LAN receiver

block. As a result of this research there has been found vector and angle widths

necessary and sufficient for an implementation and the number of CORDIC

steps for calculating an arctangent function.

Then two programs have been written on VHDL language: one for an

interleaving and the other for a pipelining realizations, and there is found that a

pipelining implementation is better than an interleaving because it saves

hardware. Both programs have been implemented for FPGA 4010xLPC84 for

further implementation.

5.1 Future workDuring this work the program was implemented for the existing version

of FPGA with clock frequency of 12MHz that does not give the required sample

rate of 20 MSamples/sec. Therefore this program must be implemented to a new

version of FPGA with 100 MHz clock cycle to obtain the necessary sample rate.

References 35

References

[1] Nee R., Prasad R.: "OFDM for Wireless Multimedia Communications",

Boston: Artech House, 2000.

[2] Terry J., Heiskala J.: "OFDM Wireless LANs: A Theoretical and Practical

Guide", Indianapolis, Ind.: Sams, 2002.

[3] Sandell M., van de Beek J.−J., Börjesson P.: "Timing and Frequency

Synchronization in OFDM Systems Using the Cyclic Prefix", pp 16−19, Essen,

Germany, Desember 1995.

[4] Andraka R.: "A Survey of CORDIC Algorithms for FPGA Based

Computers", 1998

[5] Valls, J.; Kuhlmann, M.; Parhi, K.K.: "Efficient Mapping of CORDIC

Algorithms on FPGA", Signal Processing Systems, 2000. IEEE Workshop, pp

336−343 , 11−13 Oct. 2000.

[6] Volder J.E.: "The CORDIC Trigonometric Computing Technique", IRE

Trans. Electronic Computers, vol. EC−8, pp 330−334, 1959.

[7] Walter J.S. "A Unified Algorithm for Elementary Functions", Proc. Spring.

Joint Comput. Conference, vol. 38, pp 379−385, Jul. 1992.

[8] http://www.wirelesslan.tk/

[9] Hemkumar N.D.: "A Systolic VLSI Architecture for Complex SVD",

Houston, Texas, 1991.

[10] Kharrat, M.W.; Loulou, M.; Masmoudi, N.; Kamoun, L.: "A new method

to implement CORDIC algorithm", Electronics, Circuits and Systems, 2001.

ICECS 2001. The 8th IEEE International Conference on , vol 2, pp 715 − 718

Sept. 2001

[11] Armstrong, J.R., Gray F.G.: "VHDL Design Representation and Synthesis"

2nd edition, Upper Saddle River, N.J.: Prentice Hall PTR, 2000

Abbreviations 37

AbbreviationsADC − analog−digital convertor

A/S − adder/subtractor

ASK − amplitude shift keying

CORDIC − COordinate Rotation DIgital Computer

CLB − configurable logic block

DFT − discrete Fourier transform

DPLL − digital phase−locked loop

LSB − least significant bit

MSB − most significant bit

OFDM − orthogonal frequency−division multiplexing

QAM − quadrature amplitude modulation

PSK − phase shift keying

RF − radio frequency

VHDL − Very high speed integrated circuit Hardware Description Language

WLAN − Wireless Local Area Network

Appendix A MATLAB program 39

Appendix A MATLAB program% Calculation of atan(y(1)/x(1))

clear;delta = 2^(−8);

% values on input:% given vector: error vector:x(1) = 6+delta; x1 = 6;y(1) = 8−delta; y1 = y(1);

% initial valuesz(1) = 0;n = 40;N_st = 0;

% Control value:z(n+2) = atan(y(1)/x(1));

% calculate necessary vector accuracy (to differ two given vectors):j = 0:1:n;a1(j+1) = abs(z(n+2)−atan(y1/x1));a(1) = z(n+2);xx(1) = x(1); yy(1) = y(1); zz(1) = 0; % values for the table for printing

% in MatLab command window% Calculate CORDIC values, inaccuracy and minimal number of steps:for i = 1:n if y(i) < 0 % conventional arithmetic d(i) = 1; else d(i) = −1; end if abs(yy(i)) < abs(xx(i)*2^(−i+1)) % redundant arithmetic d1(i) = 0; else if (yy(i) − xx(i)*2^(−i)) < 0 d1(i) = 1; else d1(i) = −1; end end x(i+1) = x(i) − y(i)*d(i)*2^(−i+1); y(i+1) = y(i) + x(i)*d(i)*2^(−i+1); z(i+1) = z(i) − d(i)*atan(2^(−i+1)); xx(i+1) = xx(i) − yy(i)*d1(i)*2^(−i+1); yy(i+1) = yy(i) + xx(i)*d1(i)*2^(−i+1); zz(i+1) = zz(i) − d1(i)*atan(2^(−i+1)); % calculate the inaccuracy of i−th step z(i): a(i+1) = abs(z(i+1) − z(n+2)); aa(i+1) = abs(zz(i+1)− z(n+2));end

% interpolated inaccuracy values:logp1 = polyfit(j,log10(a),1); % linear interpolation of log10(inaccuracy)logpred1 = 10.^polyval(logp1,j);logpred2 = logpred1 * expm(max(log(a)−log(logpred1))); % upper bound of interpolation% Number of steps, correctedN_st = 0;for i = j if (a1(i+1) > logpred2(i+1)) & (N_st == 0) N_st = num2str(i); endend

% vector values for plotting:a2(j+1) = z(j+1); % calculated CORDIC valuesa21(j+1)= zz(j+1);a3(j+1) = z(n+2); % angle value of given vectora4(j+1) = z(n+2)+a1(j+1); % upper bound of CORDIC inaccuracya5(j+1) = z(n+2)−a1(j+1); % lower bound −"−

% plot the inaccuracy in the logarythmic scalefigure(1);semilogy (j, a, ’b−o’, j, aa, ’g−x’, j, a1, ’r−−’, j, logpred1, ’k−.’, j, logpred2, ’r:’);title(’Inaccuracy graph’); grid on;xlabel([’Number of steps (’ N_st ’)’]); ylabel(’\Delta\phi’);

40 Appendix A MATLAB program

% plot the CORDIC angle valuesfigure(2); plot(j, a2, ’b−’, j, a21, ’g−’, j, a3, ’g−.’, j, a4, ’r:’, j, a5, ’r:’);title(’Arctangent calculations’); grid on;xlabel([’Number of steps (’ N_st ’)’]); ylabel(’\phi’);

Appendix B Wave diagrams 41

Appendix B Wave diagrams

På svenska

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –under en längre tid från publiceringsdatum under förutsättning att inga extra−ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat förickekommersiell forskning och för undervisning. Överföring av upphovsrättenvid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning avdokumentet kräver upphovsmannens medgivande. För att garantera äktheten,säkerheten och tillgängligheten finns det lösningar av teknisk och administrativart.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman iden omfattning som god sed kräver vid användning av dokumentet på ovanbeskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådanform eller i sådant sammanhang som är kränkande för upphovsmannens litteräraeller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press seförlagets hemsida http://www.ep.liu.se/

In English

The publishers will keep this document online on the Internet − or its possiblereplacement − for a considerable time from the date of publication barringexceptional circumstances.

The online availability of the document implies a permanent permission foranyone to read, to download, to print out single copies for your own use and touse it unchanged for any non−commercial research and educational purpose.Subsequent transfers of copyright cannot revoke this permission. All other usesof the document are conditional on the consent of the copyright owner. Thepublisher has taken technical and administrative measures to assure authenticity,security and accessibility.

According to intellectual property law the author has the right to bementioned when his/her work is accessed as described above and to be protectedagainst infringement.

For additional information about the Linköping University Electronic Pressand its procedures for publication and for assurance of document integrity,please refer to its WWW home page: http://www.ep.liu.se/

© Anastasia Lashko, Oleg Zakaznov

vhdl implementation of cordic algorithm for wireless …19532/fulltext01.pdf · vhdl implementation...

Documents