cordic algorithm

ABSTRACT

CHAPTER-1

CORDIC THEORY: AN ALGORITHM FOR VECTOR ROTATION

All of the trigonometric functions can be computed or derived from functions using

vector rotations, as will be discussed in the following sections. Vector rotation can also

be used for polar to rectangular and rectangular to polar conversions, for vector

magnitude, and as a building block in certain transforms such as the DFT and DCT. The

CORDIC algorithm provides an iterative method of performing vector rotations by

arbitrary angles using only shifts and adds. The algorithm, credited to Volder, is derived

from the general (Givens) rotation transform

x’ xcosf-ysinf y’ ycosf xsinf

which rotates a vector in a Cartesian plane by the angle f. These can be rearranged so that:

x’ cosf x- ytanf

y’ cosfy xtanf

So far, nothing is simplified. However, if the rotation angles are restricted so that tan(f-

2-i, the multiplication by the tangent term is reduced to simple shift operation. Arbitrary

angles of rotation are obtainable by performing a series of successively smaller elementary

rotations. If the decision at each iteration, i, is which direction to rotate rather than

whether or not to rotate, then the cos(di) term becomes a constant (because cos(di) =

cos(-di)). The iterative rotation can now be expressed as:

X i+1= Ki xi- yid 2 –i]

Y i+1= Ki yixid 2 –i]

WhereKi costan -12 -i 1 1 2di + or -1

Removing the scale constant from the iterative equations yields a shift-add algorithm for

vector rotation. The product of the Ki’s can be applied elsewhere in the system or treated as

part of a system processing gain. That product approaches 0.6073 as the number of iterations

goes to infinity. Therefore, the rotation algorithm has a gain, An of approximately 1.647. The

exact gain depends on the number of iterations, and obeys the relation

An = n

The angle of a composite rotation is uniquely defined by the sequence of the directions of the

elementary rotations. That sequence can be represented by a decision vector. The set of all

possible decision vectors is an angular measurement system based on binary arctangents.

Conversions between this angular system and any other can be accomplished using a look-

up. A better conversion method uses an additional adder-subtractor that accumulates the

elementary rotation angles at each iteration. The elementary angles can be expressed in any

convenient angular unit. Those angular values are supplied by a small lookup table (one entry

per iteration) or are hardwired, depending on the implementation.

arctangent base, this extra element is not needed.

The CORDIC rotator is normally operated in one of two modes. The first, called rotation by

Volder, rotates the input vector by a specified angle (given as an argument). The second

mode, called vectoring, rotates the input vector to the x axis while recording the angle required to

make that rotation. In rotation mode, the angle accumulator is initialized with the desired

rotation angle. The rotation decision at each iteration is made to diminish the magnitude of

the residual angle in the angle accumulator. The decision at each iteration is therefore

based on the sign of the residual angle after each step. Naturally, if the input angle is

already expressed in the binary arctangent base, the angle accumulator may be

eliminated.

xi1 xiyidi 2-i

yyixid 2-i

zi1 zidi tan-i 2-i di= -1 if zi < 0, +1 otherwise

which provides the following result:

x A x cosz y sinz n n 0 0 0 0

yA y cosz x sinz n n 0 0 0 0

z n=0

In the vectoring mode, the CORDIC rotator rotates the input vector through whatever angle is necessary to

align the result vector with the x axis. The result of the vectoring operation is a rotation angle and the scaled

magnitude of the original vector (the x component of the result). The vectoring function works by seeking

to minimize the y component of the residual vector at each rotation. The sign of the residual y component is

used to determine which direction to rotate next. If the angle accumulator is initialized with zero, it will

contain the traversed angle at the end of the iterations. In vectoring mode, the CORDIC equations are:

x1 xiyidi 2-i

y i+1yixid 2-i

zi1 zidi tan-i 2-i

The CORDIC rotation and vectoring algorithms as stated are limited to rotation angles between -/2 and /2.

This limitation is due to the use of 20 for the tangent in the first iteration. For composite rotation angles larger than

/2, an additional rotation is required. Volder[4] describes an initial rotation /2. This gives the correction

iteration:

x’dy

y’dx

z’zd2

where d = +1 if y<0, -1 otherwise.

There is no growth for this initial rotation. Alternatively, an initial rotation of either or 0 can be made, avoiding

the reassignment of the x and y components to the rotator elements. Again, there is no growth due to the

initial rotation:

x’dx

y’dy

z’ = z if d= 1, or z - if d= -1 d = -1 if x<0, +1 otherwise.

Both reduction forms assume a modulo 2 representation of the input angle. The style of first reduction is

more consistent with the succeeding rotations, while the second reduction may be more convenient when

wiring is restricted, as is often the case with FPGAs.

The CORDIC rotator described is usable to compute several trigonometric functions directly and

others indirectly. Judicious choice of initial values and modes permits direct computation of sine, cosine,

arctangent, vector magnitude and transformations between polar and Cartesian coordinates.

Sine and Cosine

The rotational mode CORDIC operation can simultaneously compute the sine and cosine of the

input angle. Setting the y component of the input vector to zero reduces the rotation mode result to

xn Anx0cosz 0

yn Anx0sinz 0

By setting x0 equal to 1/ An, the rotation produces the unscaled sine and cosine of the angle

rgument, z0. Very often, the sine and cosine values modulate a magnitude value. Using

other techniques (e.g., a look up table) requires a pair of multipliers to obtain the

modulation. The CORDIC technique performs the multiply as part of the rotation operation,

and therefore eliminates the need for a pair of explicit multipliers. The output of the

CORDIC rotator is scaled by the rotator gain. If the gain is not acceptable, a single

multiply by the reciprocal of the gain constant placed before the CORDIC rotator will yield

unscaled results. It is worth noting that the hardware complexity of the CORDIC

rotator is approximately equivalent to that of a single multiplier with the same word

size.

Polar to Rectangular Transformation

A logical extension to the sine and cosine computer is a polar to Cartesian coordinate

transformer. The transformation from polar to Cartesian space is defined by:

x=rcos

y = rsin

As pointed out above, the multiplication by the magnitude comes for free using the

CORDIC rotator. The transformation is accomplished by selecting the rotation mode with

x0=polar magnitude, z0=polar phase, and y0=0. The vector result represents the polar input

transformed to Cartesian space. The transform has a gain equal to the rotator gain, which

needs to be accounted for somewhere in the system. If the gain is unacceptable, the polar

magnitude may be multiplied by the reciprocal of the rotator gain before it is presented to the

CORDIC rotator

General vector rotation

The rotation mode CORDIC rotator is also useful for performing general vector

rotations, as are often encountered in motion correction and control systems. For general

rotation, the 2 dimensional input vector is presented to the rotator inputs. The rotator rotates the

vector through the desired angle. The output is scaled by the CORDIC rotator gain, which

must be accounted for elsewhere in the system. If the scaling is unacceptable, a pair of

constant multipliers is required to compensate for the gain. CORDIC rotators may be

cascaded in a tree architecture for general rotation in n-dimensions. Some optimization of

multidimensional rotation is possible to permit computational savings over the general

n-dimensioned case, as reported by Hsiao et al.

ArctangentThe arctangent, =Atan(y/x), is directly computed using the vectoring mode

CORDIC rotator if the angle accumulator is initialized with zero. The argument must

be provided as a ratio expressed as a vector (x, y). Presenting represent infinity (by setting x=0).

Since the arctangent result is taken from the angle accumulator, the CORDIC rotator growth does not affect

the result.

Vector MagnitudeThe vectoring mode CORDIC rotator produces the magnitude of the input vector as a

byproduct of computing the arctangent. After the vectoring mode rotation, the vector is

aligned with the x axis. The magnitude of the rotated vector. The magnitude result is scaled

by the processor gain, which needs to be accounted for elsewhere in the system. This

implementation of vector magnitude has a hardware complexity of roughly one

multiplier of the same width. The CORDIC implementation represents a significant

hardware savings over an equivalent Pythagorean processor. The accuracy of the

magnitude result improves by 2 bits for each iteration performed.

Cartesian to Polar transformation

The Cartesian to Polar transformation consists of finding the magnitude (r=sqrt(x2+y2)) and phase angle

(=atan[y/x]) of the input vector, (x, y). The reader will immediately recognize that both functions are provided

simultaneously by the vectoring mode CORDIC rotator. The magnitude of the result will be scaled by the

CORDIC rotator gain, and should be accounted for elsewhere in the system. If the

gain is unacceptable, it can be corrected by multiplying the resulting magnitude by the reciprocal of the gain

constant.

Inverse CORDIC functions

In most cases, if a function can be generated by a CORDIC style computer, its inverse can also

be computed. Unless the inverse is calculable by changing the mode of the rotator, its

computation normally involves comparing the output to a target value. The CORDIC inverse is

illustrated by the Arcsine function.

Arcsine and Arccosine

The Arcsine can be computed by starting with a unit vector on the positive x axis, then

rotating it so that its y component is equal to the input argument. The arcsine is

then the angle subtended to cause the y component of the rotated vector to match the

argument. The decisionfunction in this case is the result of a comparison between

the input value and the y component of the rotated vector at each iteration:

x1 xiyidi 2-i

y i+1yixid 2-i

zi1 zidi tan-i 2-i

The arcsine function as stated above returns correct angles for inputs -1 < c/Anx0 < 1, although the accuracy suffers

as the input approaches 1 (the error increases rapidly for inputs larger than about 0.98). This loss of accuracy

is due to the gain of the rotator. For angles near the y axis, the rotator gain causes the rotated vector to be

shorter than the reference (input), so the decisions are made improperly.

The gain problems can be corrected using a “double iteration algorithm”[9] at the cost of an increase

in complexity.The Arccosine computation is similar, except the difference between the x component and the

input is used as the decision function. Without modification, the arccosine algorithm works only for inputs

less than 1/An, making the double iteration algorithm a necessity. The Arccosine could also be computed by

using the arcsine function and subtracting /2 from the result, followed by an angular reduction if the result is in

the fourth quadrant.

IMPLEMENTATION IN AN FPGAThere are a number of ways to implement a CORDIC processor. The ideal architecture

depends on the speed versus area tradeoffs in the intended application. First we will

examine an iterative architecture that is a direct translation from the CORDIC

equations. From there, we will look at a minimum hardware solution and a maximum

performance solution.

Iterative CORDIC Processors

An iterative CORDIC architecture can be obtained simply by duplicating each of the three

difference equations in hardware as shown in Figure 1. The decision function, di, is driven by

the sign of the y or z register depending on whether it is operated in rotation or vectoring

mode. In operation, the initial values are loaded via multiplexers into the x, y and z registers.

Then on each of the next n clock cycles, the values from the registers are passed through the

shifters and adder-subtractors and the results placed back inthe registers. The shifters are modified

on each iteration to cause the desired shift for the iteration. Likewise, the ROM address is

incremented on each iteration so that theappropriate elementary angle value is presented

to the z adder-subtractor. On the last iteration, the results are read directly from the adder-

subtractors. Obviously, a simple state machine is required keep track of the current iteration,

and to select the degree of shift and ROM address for each iteration.The design depicted in

Figure 1 uses word-wide data paths (called bit-parallel design). The bit-parallel variable shift

shifters do not map well to FPGA architectures because of the high fan-in required. If

implemented, those shifters will typically require several layers of logic (i.e., the signal will

need to pass through a number of FPGA cells). The result is a slow design that uses a large

number of logic cells.

A considerably more compact design is possible using bit serial arithmetic. The simplified

interconnect and logic in a bit serial design allows it to work at a much higher clock rate than

the equivalent bit parallel design. Of course, the design also needs to clocked w times for each

iteration (w is the width of the data word). The bit serial design consists of three bit serial adder-

subtractors, three shift registers and a serial Read Only Memory (ROM). Each shift register has a

length equal to the word width. There is also some gating or multiplexers to select taps

off the shift registers for the right shifted cross terms (shifting is accomplished using bit

delays in bit serial systems). In this design, w clocks are required for each of the n

iterations, where w is precision of the adders. In operation, the load multiplexers on the left are

opened for w clock periods to initialize the x, y and z registers (these registers could also be

parallel loaded to initialize). Once loaded, the data is shifted right through the serial adder-

subtractors and returned to the left end of the register. Each iteration requires w clocks to

return the result to the register. At the beginning of each iteration, the control state machine

reads the sign of the y (or z) register and sets the add/subtract controls accordingly.

The appropriate tap off the register for the cross terms is also selected at the beginning

of each iteration. During the nth iteration, the results can be read from the outputs of the

serial adders while the next initialization data is shifted into the registers

The simplicity of the bit serial design is apparent. Even in this case, the wiring of the

shift tap multiplexers can present problems in some FPGAs (this is one place where tri-state

long lines can come in handy). Even so, the interconnect is minimal and the logic between

registers is simple. This combination permits bit clock ratesnear the maximum toggle

frequency of the FPGA. The possibility of using extreme bit clock frequencies makes up for

the large number of clock cycles required to complete each rotation Now, if the design isin a

Xilinx 4000E series part, the shift registers can be implemented in the CLB RAM[2]. The

RAM emulates a shift register by incrementing the read/write address after each

access. The dual port capability of the CLB RAM provides the capability to read

two locations in the 16x1 RAM simultaneously [9]. By properly sequencing the second

address, the effect of the shift tap multiplexer is achieved without a physical

multiplexer. The result is the shift register and multiplexerfor word lengths up to 16 bits are

implemented in a single CLB (plus 8 CLBs for the 2 address sequencers and iteration

counter, which are shared by the three shifters). The serial ROM also uses the CLB for data

storage. One CLB is required for every two iterations. The 16 bit, 8 iteration CORDIC

processor shown in Figure 3 uses only21 CLBs, and will run at bit rates up to about 90

MHz (mainly limited by the RAM write cycle). This translates to about a 1.5S processing

time, which is only about three and a half times longer than the best one could expect from the

much larger bit parallel iterative solution

On-Line CORDIC Processo

The CORDIC processors discussed so far are iterative, which means the processor has to

perform iterations at ntimes the data rate. The iteration process can unrolled so that each of n

processing elements always performs the same iteration. An unrolled CORDIC processor is shown

in Figure 4. Unrolling the processor results in two significant implifications. First the shifters

are each a fixed shift,which means that they can be implemented in the wiring.

Second, the lookup values for the angle accumulator are distributed as constants to each

adder in the angleaccumulator chain. Those constants can be hardwired instead of

requiring storage space. The entire CORDIC processor is reduced to an array of

interconnected adder-subtractors. The need for registers is also eliminated, making the

unrolled processor strictly combinatorial. The delay through the resulting circuit would be

substantial, but the processing time is reduced from that required by the iterative circuit (if by

nothing else than the set-up and hold times of the register). Most times, especially in an FPGA, it

does not make sense to use such a large combinatorial circuit. The unrolled processor is

easily pipelined by inserting registers between the adder-subtractors. In the case of most

FPGA architectures there are already registers present in each logic cell, so the addition of the

pipeline registers has no hardware cost The unrolled processor can also be converted to a bit

serial design. Each adder subtractor is replaced by a serial adder-subtractor, separated by w bit

shift registers. The shift registers are necessary to extract the sign of the y or z element

before the first bits (lsbs) reach the next adder-subtractors. The right shifted cross terms are

taken from fixed taps in the shift registers. Some method of sign extension for the shifted

terms is required too

INTRODUCTION TO VERILOG HDL

What is HDL

A typical Hardware Description Language (HDL) supports a mixed-level description in

which gate and netlist constructs are used with functional descriptions. This mixed-level

capability enables you to describe system architectures at a high level of abstraction, then

incrementally refine a design’s detailed gate-level implementation.

HDL descriptions offer the following advantages:

• We can verify design functionality early in the design process. A design written as an HDL

description can be simulated immediately. Design simulation at this high level — at the

gate-level before implementation — allows you to evaluate architectural and design decisions.

• An HDL description is more easily read and understood than a netlist or schematic description.

HDL descriptions provide technology-independent documentation of a design and its

functionality. Because the initial HDL design description is technology independent, you

can use it again to generate the design in a different technology, without having to

translate it from the original technology.

• Large designs are easier to handle with HDL tools than schematic tools.

Verilog Overview :

Introduction

Verilog is a HARDWARE DESCRIPTION LANGUAGE (HDL). A hardware description

Language is a language used to describe a digital system, for example, a microprocessor or

a memory or a simple flip-flop. This just means that, by using a HDL one can describe any

hardware (digital ) at any level.

Verilog provides both behavioral and structural language structures. These structures allow

expressing design objects at high and low levels of abstraction. Designing hardware with a

language such as Verilog allows using software concepts such as parallel processing and

object-oriented programming. Verilog has a syntax similar to C and Pascal.

Design Styles

Verilog like any other hardware description language permits the designers to create a design in

either Bottom-up or Top-down methodology.

Bottom-Up Design

The traditional method of electronic design is bottom-up. Each design is performed at the

gate-level using the standard gates. With increasing complexity of new designs this

approach is nearly impossible to maintain. New systems consist of ASIC or

microprocessors with a complexity of thousands of transistors. These traditional bottom-up

designs have to give way to new structural, hierarchical design methods. Without these new

design practices it would be impossible to handle the new complexity.

Top-Down Design

The desired design-style of all designers is the top-down design. A real top-down design

allows early testing, easy change of different technologies, a structured system design and

offers many other advantages. But it is very difficult to follow a pure top-down design. Due

to this fact most designs are mix of both the methods, implementing some key elements of

both design styles.

Complex circuits are commonly designed using the top down methodology. Various

specification levels are required at each stage of the design process.

Abstraction Levels of Verilog

Verilog supports a design at many different levels of abstraction. Three of them are very

important:

Behavioral level

Register-Transfer Level

Gate Level

Behavioral level

This level describes a system by concurrent algorithms (Behavioral). Each algorithm itself is

sequential, that means it consists of a set of instructions that are executed one after the other.

Functions, Tasks and Always blocks are the main elements. There is no regard to the structural

realization of the design.

Register-Transfer Level

Designs using the Register-Transfer Level specify the characteristics of a circuit by operations

and the transfer of data between the registers. An explicit clock is used. RTL design contains

exact timing possibility; operations are scheduled to occur at certain times. Modern definition

of a RTL code is "Any code that is synthesizable is called RTL code".

Gate Level

Within the logic level the characteristics of a system are described by logical links and

their timing properties. All signals are discrete signals. They can only have definite logical

values (`0', `1', `X', `Z`). The usable operations are predefined logic primitives (AND, OR,

NOT etc gates). Using gate level modeling might not be a good idea for any level of logic

design. Gate level code is generated by tools like synthesis tools and this Netlist is used for gate

level simulation and for backend.

vlsi design flow

Introduction

Design is the most significant human endeavor: It is the channel through which creativity is

realized. Design determines our every activity as well as the results of those activities; thus it

includes planning, problem solving, and producing. Typically, the term "design" is applied

to the planning and production of artifacts such as jewelry, houses, cars, and cities. Design is

also found in problem-solving tasks such as mathematical proofs and games. Finally,

design is found in pure planning activities such as making a law or throwing a party.

More specific to the matter at hand is the design of manufacturable artifacts. This

activity uses all facets of design because, in addition to the specification of a producible

object, it requires the planning of that object's manufacture, and much problem solving

along the way. Design of objects usually begins with a rough sketch that is refined by

adding precise dimensions. The final plan must not only specify exact sizes, but also include a

scheme for ordering the steps of production. Additional considerations depend on the

production environment; for example, whether one or ten million will be made, and how

precisely the manufacturing environment can be controlled.

A semiconductor process technology is a method by which working circuits can be

manufactured from designed specifications. There are many such technologies, each of

which creates a different environment or style of design.

INTRODUCTION OF VLSI

Very-large-scale integration (VLSI) is the process of creating integrated

circuits by combining thousands of transistor-based circuits into a single chip. VLSI

began in the 1970s when complex semiconductor and communication

technologies were being developed. The microprocessor is a VLSI device. The

term is no longer as common as it once was, as chips have increased in complexity

into the hundreds of millions of transistors.

Overview

The first semiconductor chips held one transistor each. Subsequent

advances added more and more transistors, and, as a consequence, more

individual functions or systems were integrated over time. The first integrated

circuits held only a few devices, perhaps as many as ten diodes, transistors,

resistors and capacitors, making it possible to fabricate one or more logic gates on

a single device. Now known retrospectively as "small-scale integration" (SSI),

improvements in technique led to devices with hundreds of logic gates, known as

large-scale integration (LSI), i.e. systems with at least a thousand logic gates.

Current technology has moved far past this mark and today's microprocessors

have many millions of gates and hundreds of millions of individual transistors.

At one time, there was an effort to name and calibrate various levels

of large-scale integration above VLSI. Terms like Ultra-large-scale Integration

(ULSI) were used. But the huge number of gates and transistors available on

common devices has rendered such fine distinctions moot. Terms suggesting

greater than VLSI levels of integration are no longer in widespread use. Even VLSI

is now somewhat quaint, given the common assumption that all microprocessors

are VLSI or better.

As of early 2008, billion-transistor processors are commercially

available, an example of which is Intel's Montecito Itanium chip. This is expected

to become more commonplace as semiconductor fabrication moves from the

current generation of 65 nm processes to the next 45 nm generations (while

experiencing new challenges such as increased variation across process corners).

Another notable example is NVIDIA’s 280 series GPU.

This microprocessor is unique in the fact that its 1.4 Billion transistor

count, capable of a teraflop of performance, is almost entirely dedicated to logic

(Itanium's transistor count is largely due to the 24MB L3 cache). Current designs,

as opposed to the earliest devices, use extensive design automation and

automated logic synthesis to lay out the transistors, enabling higher levels of

complexity in the resulting logic functionality. Certain high-performance logic

blocks like the SRAM cell, however, are still designed by hand to ensure the

highest efficiency (sometimes by bending or breaking established design rules to

obtain the last bit of performance by trading stability).

What is VLSI?

VLSI stands for "Very Large Scale Integration". This is the

field which involves packing more and more logic devices into smaller and

smaller areas.

VLSI

1. Simply we say Integrated circuit is many transistors on one chip.

2. Design/manufacturing of extremely small, complex circuitry using

modified semiconductor material

3. Integrated circuit (IC) may contain millions of transistors, each a few mm

in size

4. Applications wide ranging: most electronic logic devices

History of Scale Integration

late 40s Transistor invented at Bell Labs

late 50s First IC (JK-FF by Jack Kilby at TI)

early 60s Small Scale Integration (SSI)

10s of transistors on a chip

late 60s Medium Scale Integration (MSI)

100s of transistors on a chip

early 70s Large Scale Integration (LSI)

1000s of transistor on a chip

early 80s VLSI 10,000s of transistors on a

chip (later 100,000s & now 1,000,000s)

Ultra LSI is sometimes used for 1,000,000s

SSI - Small-Scale Integration (0-102)

MSI - Medium-Scale Integration (102-103)

LSI - Large-Scale Integration (103-105)

VLSI - Very Large-Scale Integration (105-107)

ULSI - Ultra Large-Scale Integration (>=107)

Advantages of ICs over discrete components

While we will concentrate on integrated circuits , the

properties of integrated circuits-what we can and cannot efficiently put in an

integrated circuit-largely determine the architecture of the entire system.

Integrated circuits improve system characteristics in several critical ways. ICs have

three key advantages over digital circuits built from discrete components:

Size. Integrated circuits are much smaller-both transistors and wires

are shrunk to micrometer sizes, compared to the millimeter or

centimeter scales of discrete components. Small size leads to

advantages in speed and power consumption, since smaller

components have smaller parasitic resistances, capacitances, and

inductances.

Speed. Signals can be switched between logic 0 and logic 1 much

quicker within a chip than they can between chips. Communication

within a chip can occur hundreds of times faster than communication

between chips on a printed circuit board. The high speed of circuits

on-chip is due to their small size-smaller components and wires have

smaller parasitic capacitances to slow down the signal.

Power consumption. Logic operations within a chip also take much

less power. Once again, lower power consumption is largely due to

the small size of circuits on the chip-smaller parasitic capacitances

and resistances require less power to drive them.

VLSI and systems

These advantages of integrated circuits translate into advantages at the system

level:

Smaller physical size. Smallness is often an advantage in itself-

consider portable televisions or handheld cellular telephones.

Lower power consumption. Replacing a handful of standard parts

with a single chip reduces total power consumption. Reducing

power consumption has a ripple effect on the rest of the system:

a smaller, cheaper power supply can be used; since less power

consumption means less heat, a fan may no longer be necessary;

a simpler cabinet with less shielding for electromagnetic shielding

may be feasible, too.

Reduced cost. Reducing the number of components, the power

supply requirements, cabinet costs, and so on, will inevitably

reduce system cost. The ripple effect of integration is such that

the cost of a system built from custom ICs can be less, even

though the individual ICs cost more than the standard parts they

replace.

Understanding why integrated circuit technology has such profound influence on

the design of digital systems requires understanding both the technology of IC

manufacturing and the economics of ICs and digital systems.

Applications

Electronic system in cars.

Digital electronics control VCRs

Transaction processing system, ATM

Personal computers and Workstations

Medical electronic systems.

Etc….

Applications of VLSI

Electronic systems now perform a wide variety of tasks in daily life.

Electronic systems in some cases have replaced mechanisms that operated

mechanically, hydraulically, or by other means; electronics are usually smaller,

more flexible, and easier to service. In other cases electronic systems have

created totally new applications. Electronic systems perform a variety of tasks,

some of them visible, some more hidden:

Personal entertainment systems such as portable MP3 players

and DVD players perform sophisticated algorithms with

remarkably little energy.

Electronic systems in cars operate stereo systems and displays;

they also control fuel injection systems, adjust suspensions to

varying terrain, and perform the control functions required for

anti-lock braking (ABS) systems.

Digital electronics compress and decompress video, even at high-

definition data rates, on-the-fly in consumer electronics.

Low-cost terminals for Web browsing still require sophisticated

electronics, despite their dedicated function.

Personal computers and workstations provide word-processing,

financial analysis, and games. Computers include both central

processing units (CPUs) and special-purpose hardware for disk

access, faster screen display, etc.

Medical electronic systems measure bodily functions and perform

complex processing algorithms to warn about unusual conditions.

The availability of these complex systems, far from overwhelming

consumers, only creates demand for even more complex systems.

The growing sophistication of applications continually pushes the design and

manufacturing of integrated circuits and electronic systems to new levels of

complexity. And perhaps the most amazing characteristic of this collection of

systems is its variety-as systems become more complex, we build not a few

general-purpose computers but an ever wider range of special-purpose systems.

Our ability to do so is a testament to our growing mastery of both integrated

circuit manufacturing and design, but the increasing demands of customers

continue to test the limits of design and manufacturing

ASIC

An Application-Specific Integrated Circuit (ASIC) is an

integrated circuit (IC) customized for a particular use, rather than intended for

general-purpose use. For example, a chip designed solely to run a cell phone is an

ASIC. Intermediate between ASICs and industry standard integrated circuits, like

the 7400 or the 4000 series, are application specific standard products (ASSPs).

As feature sizes have shrunk and design tools improved over

the years, the maximum complexity (and hence functionality) possible in an ASIC

has grown from 5,000 gates to over 100 million. Modern ASICs often include

entire 32-bit processors, memory blocks including ROM, RAM, EEPROM, Flash and

other large building blocks. Such an ASIC is often termed a SoC (system-on-a-

chip). Designers of digital ASICs use a hardware description language (HDL), such

as Verilog or VHDL, to describe the functionality of ASICs.

Field-programmable gate arrays (FPGA) are the modern-day technology for

building a breadboard or prototype from standard parts; programmable logic

blocks and programmable interconnects allow the same FPGA to be used in many

different applications. For smaller designs and/or lower production volumes,

FPGAs may be more cost effective than an ASIC design even in production.

1. An application-specific integrated circuit (ASIC) is an integrated circuit

(IC) customized for a particular use, rather than intended for general-

purpose use.

2. A Structured ASIC falls between an FPGA and a Standard Cell-based

ASIC

3. Structured ASIC’s are used mainly for mid-volume level designs

4. The design task for structured ASIC’s is to map the circuit into a fixed

arrangement of known cells

Migrating Projects from Previous ISE Software Releases

When you open a project file from a previous release, the ISE® software prompts you to migrate

your project. If you click Backup and Migrate or Migrate Only, the software automatically

converts your project file to the current release. If you click Cancel, the software does not

convert your project and, instead, opens Project Navigator with no project loaded.

Note: After you convert your project, you cannot open it in previous versions of the ISE

software, such as the ISE 11 software. However, you can optionally create a backup of the

original project as part of project migration, as described below.

To Migrate a Project

1. In the ISE 12 Project Navigator, select File > Open Project.

2. In the Open Project dialog box, select the .xise file to migrate.

Note You may need to change the extension in the Files of type field to display .npl

(ISE 5 and ISE 6 software) or .ise (ISE 7 through ISE 10 software) project files.

3. In the dialog box that appears, select Backup and Migrate or Migrate Only.

4. The ISE software automatically converts your project to an ISE 12 project.

Note If you chose to Backup and Migrate, a backup of the original project is created at

project_name_ise12migration.zip.

5. Implement the design using the new version of the software.

Note Implementation status is not maintained after migration.

Properties

For information on properties that have changed in the ISE 12 software, see ISE 11 to ISE 12

Properties Conversion.

IP Modules

If your design includes IP modules that were created using CORE Generator™ software or

Xilinx® Platform Studio (XPS) and you need to modify these modules, you may be required to

update the core. However, if the core netlist is present and you do not need to modify the core,

updates are not required and the existing netlist is used during implementation.

Obsolete Source File Types

The ISE 12 software supports all of the source types that were supported in the ISE 11

software.

If you are working with projects from previous releases, state diagram source files (.dia), ABEL

source files (.abl), and test bench waveform source files (.tbw) are no longer supported. For state

diagram and ABEL source files, the software finds an associated HDL file and adds it to the

project, if possible. For test bench waveform files, the software automatically converts the TBW

file to an HDL test bench and adds it to the project. To convert a TBW file after project

migration, see Converting a TBW File to an HDL Test Bench.

Migrating Projects from Previous ISE Software Releases

When you open a project file from a previous release, the ISE® software prompts you to migrate

your project. If you click Backup and Migrate or Migrate Only, the software automatically

converts your project file to the current release. If you click Cancel, the software does not

convert your project and, instead, opens Project Navigator with no project loaded.

Note After you convert your project, you cannot open it in previous versions of the ISE

software, such as the ISE 11 software. However, you can optionally create a backup of the

original project as part of project migration, as described below.

To Migrate a Project

1. In the ISE 12 Project Navigator, select File > Open Project.

2. In the Open Project dialog box, select the .xise file to migrate.

Note You may need to change the extension in the Files of type field to display .npl

(ISE 5 and ISE 6 software) or .ise (ISE 7 through ISE 10 software) project files.

3. In the dialog box that appears, select Backup and Migrate or Migrate Only.

4. The ISE software automatically converts your project to an ISE 12 project.

Note If you chose to Backup and Migrate, a backup of the original project is created at

project_name_ise12migration.zip.

5. Implement the design using the new version of the software.

Note Implementation status is not maintained after migration.

Properties

For information on properties that have changed in the ISE 12 software, see ISE 11 to ISE 12

Properties Conversion.

IP Modules

If your design includes IP modules that were created using CORE Generator™ software or

Xilinx® Platform Studio (XPS) and you need to modify these modules, you may be required to

update the core. However, if the core netlist is present and you do not need to modify the core,

updates are not required and the existing netlist is used during implementation.

Obsolete Source File Types

The ISE 12 software supports all of the source types that were supported in the ISE 11

software.

If you are working with projects from previous releases, state diagram source files (.dia), ABEL

source files (.abl), and test bench waveform source files (.tbw) are no longer supported. For state

diagram and ABEL source files, the software finds an associated HDL file and adds it to the

project, if possible. For test bench waveform files, the software automatically converts the TBW

file to an HDL test bench and adds it to the project. To convert a TBW file after project

migration, see Converting a TBW File to an HDL Test Bench.

Using ISE Example Projects

To help familiarize you with the ISE® software and with FPGA and CPLD designs, a set of

example designs is provided with Project Navigator. The examples show different design

techniques and source types, such as VHDL, Verilog, schematic, or EDIF, and include different

constraints and IP.

To Open an Example

1. Select File > Open Example.

2. In the Open Example dialog box, select the Sample Project Name.

Note To help you choose an example project, the Project Description field describes

each project. In addition, you can scroll to the right to see additional fields, which

provide details about the project.

3. In the Destination Directory field, enter a directory name or browse to the

directory.

4. Click OK.

The example project is extracted to the directory you specified in the Destination Directory

field and is automatically opened in Project Navigator. You can then run processes on the

example project and save any changes.

Note If you modified an example project and want to overwrite it with the original example

project, select File > Open Example, select the Sample Project Name, and specify the same

Destination Directory you originally used. In the dialog box that appears, select Overwrite the

existing project and click OK.

Creating a Project

Project Navigator allows you to manage your FPGA and CPLD designs using an ISE® project,

which contains all the source files and settings specific to your design. First, you must create a

project and then, add source files, and set process properties. After you create a project, you can

run processes to implement, constrain, and analyze your design. Project Navigator provides a

wizard to help you create a project as follows.

Note If you prefer, you can create a project using the New Project dialog box instead of the

New Project Wizard. To use the New Project dialog box, deselect the Use New Project wizard

option in the ISE General page of the Preferences dialog box.

To Create a Project

1. Select File > New Project to launch the New Project Wizard.

2. In the Create New Project page, set the name, location, and project type, and

click Next.

3. For EDIF or NGC/NGO projects only: In the Import EDIF/NGC Project page,

select the input and constraint file for the project, and click Next.

4. In the Project Settings page, set the device and project properties, and click

Next.

5. In the Project Summary page, review the information, and click Finish to

create the project.

Project Navigator creates the project file (project_name.xise) in the directory you specified.

After you add source files to the project, the files appear in the Hierarchy pane of the Design

panel. Project Navigator manages your project based on the design properties (top-level

module type, device type, synthesis tool, and language) you selected when you created the

project. It organizes all the parts of your design and keeps track of the processes necessary to

move the design from design entry through implementation to programming the targeted

Xilinx® device.

Note For information on changing design properties, see Changing Design Properties.

You can now perform any of the following:

Create new source files for your project.

Add existing source files to your project.

Run processes on your source files.

Modify process properties.

Creating a Copy of a Project

You can create a copy of a project to experiment with different source options and

implementations. Depending on your needs, the design source files for the copied project and

their location can vary as follows:

Design source files are left in their existing location, and the copied project

points to these files.

Design source files, including generated files, are copied and placed in a

specified directory.

Design source files, excluding generated files, are copied and placed in a

specified directory.

Copied projects are the same as other projects in both form and function. For example, you can

do the following with copied projects:

Open the copied project using the File > Open Project menu command.

View, modify, and implement the copied project.

Use the Project Browser to view key summary data for the copied project and

then, open the copied project for further analysis and implementation, as described in

Using the Project Browser.

Note Alternatively, you can create an archive of your project, which puts all of the project

contents into a ZIP file. Archived projects must be unzipped before being opened in Project

Navigator. For information on archiving, see Creating a Project Archive.

To Create a Copy of a Project

1. Select File > Copy Project.

2. In the Copy Project dialog box, enter the Name for the copy.

Note The name for the copy can be the same as the name for the project, as long as you

specify a different location.

3. Enter a directory Location to store the copied project.

4. Optionally, enter a Working directory.

By default, this is blank, and the working directory is the same as the project directory.

However, you can specify a working directory if you want to keep your ISE® project

file (.xise extension) separate from your working area.

5. Optionally, enter a Description for the copy.

The description can be useful in identifying key traits of the project for reference later.

6. In the Source options area, do the following:

o Select one of the following options:

o Keep sources in their current locations - to leave the design

source files in their existing location.

If you select this option, the copied project points to the files in their

existing location. If you edit the files in the copied project, the changes

also appear in the original project, because the source files are shared

between the two projects.

o Copy sources to the new location - to make a copy of all the

design source files and place them in the specified Location directory.

If you select this option, the copied project points to the files in the specified

directory. If you edit the files in the copied project, the changes do not appear in

the original project, because the source files are not shared between the two

projects.

o Optionally, select Copy files from Macro Search Path directories to

copy files from the directories you specify in the Macro Search Path property in

the Translate Properties dialog box. All files from the specified directories are

copied, not just the files used by the design.

Note If you added a netlist source file directly to the project as described in

Working with Netlist-Based IP, the file is automatically copied as part of Copy

Project because it is a project source file. Adding netlist source files to the project

is the preferred method for incorporating netlist modules into your design,

because the files are managed automatically by Project Navigator.

o Optionally, click Copy Additional Files to copy files that were not

included in the original project. In the Copy Additional Files dialog box, use the

Add Files and Remove Files buttons to update the list of additional files to copy.

Additional files are copied to the copied project location after all other files are

copied.

7. To exclude generated files from the copy, such as implementation results and

reports, select Exclude generated files from the copy.

When you select this option, the copied project opens in a state in which processes have

not yet been run.

8. To automatically open the copy after creating it, select Open the copied project.

Note By default, this option is disabled. If you leave this option disabled, the original

project remains open after the copy is made.

Click OK.

Creating a Project Archive

A project archive is a single, compressed ZIP file with a .zip extension. By default, it contains all

project files, source files, and generated files, including the following:

User-added sources and associated files

Remote sources

Verilog `include files

Files in the macro search path

Generated files

Non-project files

To Archive a Project

1. Select Project > Archive.

2. In the Project Archive dialog box, specify a file name and directory for the ZIP

file.

3. Optionally, select Exclude generated files from the archive to exclude

generated files and non-project files from the archive.

4. Click OK.

A ZIP file is created in the specified directory. To open the archived project, you must first

unzip the ZIP file, and then, you can open the project.

Note Sources that reside outside of the project directory are copied into a remote_sources

subdirectory in the project archive. When the archive is unzipped and opened, you must either

specify the location of these files in the remote_sources subdirectory for the unzipped project, or

manually copy the sources into their original location.

CONCLUSION

The CORDIC algorithms presented in this paper are well known in the research and super-computing circles.

It is, however, my experience that the majority of today’s hardware DSP designs are being done by

engineers with little or no background in hardware efficient DSP

algorithms. The new DSP designers must become familiar with these algorithms and the techniques for

implementing them in FPGAs in order to remain competitive. The CORDIC algorithm is a powerful tool in

the DSP toolbox. This paper shows that tool is available for use in FPGA

based computing machines, which are the likely basis for the next generation DSP systems.

REFERENCES

[1] Ahmed, H. M., Delosme, J.M., and Morf, M., "Highly Concurrent Computing Structure for

Matrix Arithmetic and Signal Processing," IEEE Comput. Mag., Vol. 15, 1982, pp. 65-82.

[2] Alfke, P., “Efficient Shift Registers, LFSR Counters, and Long Pseudo Random Sequence

Generators,” Xilinx application note, August, 1995.

[3] Andraka, R. J., “Building a High Performance Bit-Serial Processor in an FPGA,”

Proceedings of Design SuperCon '96, Jan 1996. pp5.1 - 5.21

[4] Deprettere, E., Dewilde, P., and Udo, R., "Pipelined CORDIC Architecture for Fast VLSI

Filtering and Array rocessing," Proc. ICASSP'84, 1984, pp. 41.A.6.1-41.A.6.4

[5] Despain, A.M., "Fourier Transform Computations Using CORDIC Iterations," IEEE

Transactions on Computers, Vol.23, 1974, pp. 993-1001.

[6] Duh, W.J., and Wu, J.L., "Implementing the Discrete Cosine Transform by Using

CORDIC Techniques,"Proceedings the International Symposium on VLSI Technology,

Systems and Applications, Taipei, Taiwan, 1989, pp. 281-285

[7] Duprat, J. and Muller, J.M., "The CORDIC Algorithm: New Results for Fast VLSI

Implementation," IEEE Transactions on Computers, Vol. 42, pp. 168-178, 1993

8] Hsiao, S.F. and Delosme, J.M., "The CORDIC Householder Algorithm," Proceedings of the

10th Symposium on Computer Arithmetic, pp. 256-263, 1991. [9] Hu, Y.H., and Naganathan, S.,

"A Novel Implementation of Chirp Z-Transformation Using a CORDIC Processor," IEEE

Transactions on ASSP, Vol. 38, pp. 352-354, 1990.

[10] Hu, Y.H., and Naganathan, S., "An Angle Recoding Method for CORDIC Algorithm

Implementation", IEEE Transactions on Computers, Vol. 42, pp. 99-102, January

cordic algorithm

Documents

1 ki yi

input vector

vectoring

vector rotation

rotation angles

angle accumulator

rotation

angle