7929 chapter 4read.pudn.com/downloads113/doc/472253/dsp systems... · architecture, as shown in...

DSP systems– interfacingwith the outside world4

C H A P T E R

In this chapter many issues related to DSP peripheral components will be consid-

ered. This will include a consideration of both on- and off-chip components and

their operation. Many ideas presented in Chapter 3 such as the use of efficient

pipelines, fast data manipulation and arithmetic operations all rely on the ability of a

DSP device to move large amounts of data quickly and seamlessly into and out of

the DSPs processing core. To facilitate this effective movement and storage of data,

most DSPs incorporate a wide range of peripheral management facilities and these

will be considered here. As with Chapter 3, a hypothetical DSP device will be intro-

duced which incorporates a generic set of typical features. The hypothetical device

will be used as the basis for discussions and then related to real life examples, again

taken mainly from the Texas Instruments range of DSP devices.

4.1 DSP devices – beyond the core 00

4.2 Hardware interfacing and I/O control 00

4.3 System management and control 00

4.4 All the analog bits and pieces(i.e. ADC, DAC, anti-aliasing, over-sampling, etc) 00

4.5 Getting signals in 00

4.6 Getting signals out 00

4.7 Getting signals in and out 00

4.8 Digital up- and down-conversion 00

4.9 Interfacing with the real world 00

4.10 Questions 00

4.11 References 00

In Chapter 3 many issues related to the operation and use of the central code of the DSPprocessor were introduced. In particular the operation of the ALU, hardware multiplier,various registers, control registers and the accumulator were all considered. In this sectionwe move on to consider the other facilities typically found on a DSP device. Figure 4.1shows the hypothetical DSP device introduced in the previous chapter. As already men-tioned we will use this device again as the basis for our discussion and as with Chapter 3we will relate the generic DSP features presented in the hypothetical device with examplestaken from real devices.

In Figure 4.1, the central core is shown at the center of the DSP device. In this sectionwe will consider the components arranged outside the core but still inside the DSP deviceitself. Most of these components are able to run in parallel with the DSP core, taking littleof the core’s available instruction cycle time. A good example of this is the on-chip timerwhich can be initialized to interrupt the DSP device at a regular rate, e.g. every 1 ms. If thetimer was not available this task would be carried out using a software routine whichwould directly impact on the core itself. By delegating tasks to on-chip components, theprocessing core can be freed up to handle the more demanding arithmetic requirementsof the algorithm. Of course at start-up, when the DSP is reset, all of the peripheral com-ponents must be initialized and set to a known state. This has already been considered alittle in Section 2.2.1, although initialization issues will also be considered further here.

We start by looking at each of the components shown in Figure 4.1, as follows.

4.1 DSP devices – beyond the core

208 Chapter 4 ❚ DSP systems – interfacing with the outside world

Externalmemoryinterface

Datamemory

Programmemory

Generalpurpose

I/Ointerface

DMAprocessor

Interruptcontrol

unit

Data bus

Program bus

Instructioncache

Timer(0)

Timer(1)

Addressgeneration

unit (0)

Addressgeneration

unit (0)

DSPcentralcore

Multi-channelserial port (0)

Clock(PLL)

Multi-channelserial port (1)

Wait stategenerator

Host portinterface

Powermanagement

Test and emulationlogic (JTAG)

Register file

Figure 4.1Hypothetical DSPdevice

The idea of the Harvard architecture used within most DSP devices has already beenintroduced in Chapter 3. Harvard architectures, as already mentioned, make use of sepa-rate program and data storage areas that can be simultaneously accessed. This makesmany processing operations far more efficient than would be possible on a traditional VonNeumann architecture. The program and data storage areas may be constructed using arange of different memory types as appropriate for any given application. Most DSPs areprovided with a limited amount of on-chip memory which can be accessed at full speed;also this is usually divided into program and data areas which can truly be accessed simul-taneously. If extra memory space is required for a particular application, then this can beadded via an external memory interface which provides data, address and control buses tothe outside world. One of the problems with interfaces to external memory is that the on-chip dual bus architecture is rarely replicated to the outside world and so simultaneousaccess to external memory are not possible. For this reason, high-speed processing opera-tions, using simultaneous accesses to memory, usually requires that data and programinstructions reside on-chip (Ref. 4.1).

It is common practice to represent the addressable memory or I/O space of a DSPdevice using a memory map. The memory map of a TMS320C54x DSP device is shown inFigure 4.2. In fact, this is the memory map of a C548 DSP device which is one of anumber of different devices within the C54x family. The memory map of Figure 4.2,which is quite typical of many DSP devices, shows the two identifiable areas of programand data memory. This is further subdivided into areas of internal and external memoryspace. The memory space defined for this device is actually organized into three individu-ally selectable spaces labelled as program, data, and I/O space.

The memory map of the C548 DSP shown in Figure 4.2 can be set to one of a numberof different configurations according to settings given in the Processor Mode StatusRegister, PMST, associated with this device. The operation of the PMST register is dis-cussed in Section 3.7.5. In summary, two register flags are important for memoryconfiguration on the C54x devices, the OVLY flag and the MP/MC flag. The OVLY flag isused to enable or disable the mapping of data memory into program memory space andthe MP/MC flag is used to enable or disable the on-chip ROM and hence determines theboot mode of the device.

Most DSP devices are provided with a limited amount of on-chip read-only memory,ROM, and random access memory, RAM, and the C54x is no exception to this. All C54xdevices contain both RAM and ROM. Among the different C54x devices, two types of RAMare represented: dual-access RAM, DARAM, and single-access RAM, SARAM. Table 4.1shows the allocation of internal memory for each of the different devices in the C54x family.

On-chip ROM. The on-chip ROM is part of the program memory space and, for somedevices, forms part of the data memory space. The amount of on-chip ROM available oneach device varies, as indicated in Table 4.1. On devices with a small amount of ROM (2Kwords), the ROM contains a boot loader, which is useful for booting to faster on-chip orexternal RAM during the start-up sequence. The boot loader algorithm initializes the DSPto a known state and provides a simple mechanism by which the user’s application codecan be loaded onto the DSP and program execution initiated. The boot loader is very flex-

Memory structures 4.1.1

DSP devices – beyond the core 209


'548 Program memory0000h

2000h

4000h

6000h

8000h

A000h

C000h

E000h

FFFFh

OVLY = 0

OVLY = 1

2000h–7FFFh External (paged)

2000–7FFFh On-chip SARAM

OVLY = 0

OVLY = 1

0000h–1FFFh External (paged)

0000h–007Fh Reserved0080h–1FFFh On-chip DARAM

8000h–EFFFh External (paged)

MP/MC = 0

MP/MC = 1

F000h–F7FFh ReservedF800h–FF7Fh On-chip ROMFF80h–FFFFh Interrupt vectors

F000h–FF7Fh External (paged)FF80h–FFFFh Interrupt vectors

'548 Data memory0000h

2000h

4000h

6000h

8000h

A000h

C000h

E000h

FFFFh

2000–7FFFh On-chip SARAM

0000h–005Fh Memory-mapped registers

0060h–007Fh Scratch-pad DARAM

0080h–1FFFh On-chip DARAM

8000h–FFFFh External

Figure 4.2 Memorymap showing theprogram and dataspace defined for aTMS320C548 DSPdevice

Table 4.1 TexasInstruments C54xDSP – on-chipprogram and datamemory

Memory type C541 C542 C543 C545 C546 C548

ROM 28K 2K 2K 48K 48K 2K

Program 20K 2K 2K 32K 32K 2K

Program/data 8K 0 0 16K 16K 0

DARAM* 5K 10K 10K 6K 6K 8K

SARAM* 0 0 0 0 0 24K

* The dual-access RAM (DARAM) and single-access RAM (SARAM) can be configured as data memory or program/data memory.

ible and allows the application code to be loaded from slow external ROMs or via a serialinterface, host port or through the use of JTAG. On devices with larger amounts of ROM,a portion of the ROM may be mapped into both data and program space. The largerROMs are also custom ROMs: where the user provides the code or data to be pro-grammed into the ROM in object file format and Texas Instruments generates theappropriate process mask to program the ROM.

On-chip dual-access RAM (DARAM). The DARAM is composed of several blocks.Because each DARAM block can be accessed twice per machine cycle, the central process-ing unit, CPU, can read from and write to a single block of DARAM in the same cycle. TheDARAM is always mapped in data space and is primarily intended to store data values. Itcan also be mapped into program space and used to store program code.

On-chip single-access RAM (SARAM). The SARAM is also composed of several blocks.Each block is accessible once per machine cycle for either reading or writing. The SARAMis always mapped in data space and is primarily intended for storage of data values. It canalso be mapped into program space and used to store program code.

Memory-mapped registers. The data memory space contains memory-mapped registersfor the CPU and the on-chip peripherals. These registers are located on data page 0, sim-plifying access to them. The memory-mapped access provides a convenient way to saveand restore the registers for context switches and to transfer information between theaccumulators and the other registers.

It has been mentioned that the C54x DSP presented in this example is able to make use ofdual-access RAM, DARAM. This type of fast random access memory can be written to orread from twice in each instruction cycle and hence the potential data throughput rate ofthe DSP device is increased. Dual accesses are achieved through the use of a multiple busarchitecture, as shown in Figure 4.3. The diagram of Figure 4.3 shows the eight internalbuses used on the C54x DSP. Of these, four are allocated to addressing and another fourallocated to carry program or data information. Specifically, the C54x DSP uses threeinternal data ‘highways’ labelled CB, DB and EB with associated address buses CAB, CDBand CEB respectively. Also one of the busses is allocated to carrying program instructionsand is labelled PB with an associated program address bus, PAB.

Although internally many DSP devices, including the C54x, are structured with amultibus architecture, this is rarely replicated to the outside world. Most DSPs are pro-vided with some sort of memory interface that will multiplex the internal multibusstructure down to a single set of external I/O lines including one full-width address anddata bus. In the case of the C54x DSP this interface is represented by the block shown onthe right hand side of Figure 4.3. This fact is confirmed by observing external pin connec-tions provided on the DSP chip itself, as identified in Figure 4.4.

Because DSP devices make use of a Harvard architecture incorporating at least one programand data memory area, they will usually have separate hardware to generate the requiredaddressing information. In the case of the program address generation unit, this will incor-porate the program instruction counter, PC, which is used to step through the program


Address generation units 4.1.2


Data address generationlogic (DAGEN)

ARAU0, ARAU1AR0–AR7

ARP, BK, DP, SP

PROGRAM address generationlogic (PAGEN)

PC, IPTR, RC,BRC, RSA, REA

System controlinterface

PAB

PB

CAB

CB

DAB

DB

EAB

EB

Memory andexternalinterface

Peripheralinterface

Figure 4.3 TexasInstruments C54x DSPdevice internaladdress anddata/program busarchitecture

Texas InstrumentsC54x DSP

A0–A15 External address bus D0–D15 External data bus

Controlbus

Figure 4.4 TexasInstruments C54x DSPexternal interfaceconnections

instructions. The program address generator is usually capable of simple arithmetic opera-tions so that it can calculate and perform branches forward and backward into programmemory space. The program address generation unit is also often able to perform hardwarelooping of instructions so that a segment of code can be run a number of times with verylittle additional processing overhead. In order to perform hardware looping the programaddress generator usually incorporates a loop counter, which decrements every time the seg-ment of code has run, and a loop start and stop address register, so that the beginning andend of the code segment can be identified. Finally, the program address generation unit usu-ally performs program jumps to interrupt vector addresses. When a DSP interrupt line isenabled, the current value of the program counter, PC, is stored onto the system stack andthe appropriate address of the interrupt service routine is temporarily placed into the PC.When the interrupt service routine has completed its operation, program execution isreturned to where it left off by restoring the original PC value from the stack.

The data address generation unit is usually able to perform a number of arithmeticoperations, similar to the program address generation unit described in the previous para-graph. The data address generator is responsible for generating the appropriate addressesinto data memory as directed by the instructions in the instruction pipeline. Circular andbit-reversed addressing modes are performed by the data address generator. Circular, ormodulo, addressing is used to implement data buffers that wrap round, and bit-reversedaddressing is used primarily to re-order sequences of Fourier transform data, as describedin Section 7.3.5.

Referring to Figure 4.3, the PAGEN and DAGEN units are used on the TI C54x DSPto generate program and data addresses respectively. The function of these is summa-rized as follows.

On the C54x DSP the PC, which is used as a pointer to individual instructions, is loadedby the program address generation logic, PAGEN. Typically, the PAGEN increments the PCas sequential instructions are fetched. However, the PAGEN may load the PC with a non-sequential value as a result of certain instructions or other operations. Operations that causea discontinuity in the sequential PC flow include branches, calls, returns, conditional opera-tions, single instruction repeats, multiple instruction repeats, reset, and interrupts. For callsand interrupts, the current PC is saved onto the stack, which is referenced by the stackpointer, SP. When the called function or interrupt service routine is finished, the PC valuethat was saved is restored from the stack via a return instruction.

The C54x offers seven basic data addressing modes and these are implemented usingthe Data Address Generation, DAGEN, logic. The data addressing modes are defined asfollows. Immediate and absolute, which incorporate the data addressing information intothe instruction word. Accumulator based addressing which uses a value held in an accu-mulator as the effective address of a memory location. Direct addressing which uses sevenbits of the instruction to encode an offset relative to a data page pointer DP or to the stackpointer, SP. The offset plus DP or SP determine the actual address in data memory. Datapages refer to the organisation of memory on the C54x DSPs, where the memory is organ-ized as 256 pages of 128-word blocks. Indirect addressing which uses a set of auxiliaryregisters to access memory: the auxiliary registers can be used as pointers to data memoryand may be specified with an auto increment and decrement operation. Associated withthe auxiliary registers is an arithmetic logic unit that is used to autoincrement/decrementthe auxiliary registers when they are used as memory pointers. The auxiliary register file


and associated arithmetic logic unit, ARAU, are shown in Figure 4.5. Memory-mappedregister addressing modifies the memory-mapped registers without affecting either thecurrent DP value or the current SP value. Stack addressing manages the addition andremoval of items from the system stack.

The memory arrangements on the Texas Instruments C6xxx devices is some what differ-ent to that shown for the C54x. The C6xxx is provided with internal data and programspaces of 64 kbytes each and externally the C6xxx can access an additional 52 Mbytes ofmemory space. As would be expected, separate memory controllers are provided to controladdressing modes and accesses from other peripherals, e.g. the DMA controller. Because theC6xxx processing core is provided with two sides, A and B, the data memory controller isconfigured to allow multiple accesses from both sides during a single machine cycle, thusmaximizing the achievable processing throughput. The internal structure of the C6xxx isshown in the block diagram of Figure 4.6, where the interconnection between different on-chip peripherals and the memory areas is shown. Additionally, Figure 4.7 shows a moredetailed view of the interconnection between the internal functional units of the C6xxx pro-cessing core and the memory controllers (Refs. 4.2, 4.3 and 4.4).

On the C6xxx DSP the processing core fetches very long, 256-bit wide, instructionwords from program memory in order to supply up to eight 32-bit instructions duringevery clock cycle, i.e. one 32-bit instruction per functional unit. The C6xxx architecturefeatures controls by which all eight units do not have to be supplied with instructions ifthey are not ready to execute. The first bit of every 32-bit instruction determines if thenext instruction belongs to the same execute packet as the previous instruction, orwhether it should be executed during the following clock cycle as a part of the next exe-cute packet. Fetch packets are always 256 bits wide; however, the execute packets can varyin size. The variable length execute packets are a key memory saving feature which is notfound on other similar architectures.


Figure 4.5 TexasInstruments C54x DSPAuxiliary registers andassociated arithmeticlogic units

The processing flow begins when a 256-bit wide instruction fetch packet is fetched from aprogram memory. The 32-bit instructions destined for the individual functional units are‘linked’ together by setting the least significant bit, LSB, position of the instructions. Theinstructions that are ‘chained’ together for simultaneous execution, up to eight in total,compose an execute packet. A ‘0’ in the LSB of an instruction breaks the chain, effectivelyplacing the instructions that follow it in the next execute packet. If an execute packetcrosses the 256-bit wide fetch packet boundary, the assembler places it in the next fetchpacket, while the remainder of the current fetch packet is padded with NOP instructions.The number of execute packets within a fetch packet can vary from one to eight. Executepackets are dispatched to their respective functional units at the rate of one per clock cycleand the next 256-bit fetch packet is not fetched until all the execute packets from the cur-rent fetch packet have been dispatched. After decoding, the instructions simultaneouslydrive all active functional units for a maximum execution rate of eight instructions everyclock cycle. While most results are stored in 32-bit registers, they can be subsequentlymoved to memory as bytes or half-words as well. All load and store instructions are byte,half-word, or word addressable.

The C6xxx DSP core features two ‘operational’ sides, A and B (Figure 4.7). Each sidecontains four units and a register file. One set contains functional units .L1, .S1, .M1, and.D1; the other set contains units .D2, .M2, .S2, and .L2. The two register files each contain


Figure 4.6 TexasInstruments C6xxxDSP architecture,showing on-chipperipherals and theirinterconnection

16 32-bit registers giving a total of 32 general-purpose registers. The two sets of functionalunits, along with two register files, compose sides A and B of the C6xxx core. The fourfunctional units on each side can freely share the 16 registers belonging to that side.Additionally, each side features a single data bus connected to all the registers on the otherside, by which the two sets of functional units can access data from the register files on theopposite side. If all of the inputs to a functional unit on a particular side of the core arederived from registers on the same side, then all units can be serviced in a single clockcycle. Register access using the register file across the processing core, i.e. from the oppos-ing side, can only support one read and one write per cycle.

Two sets of data addressing units, .D1 and .D2, are responsible for all data transfersbetween the register files and the memory. The data address driven by the .D unitsallows data addresses generated from one register file to be used to load or store data toor from the other register file. The C6xxx DSP supports a variety of indirect addressingmodes using either linear or circular addressing modes with 5- or 15-bit offsets. Allinstructions are conditional, and most can access any one of the 32 registers. Some reg-isters, however, are singled out to support specific addressing or to hold the conditionfor conditional instructions, if the condition is not automatically ‘true’. The two .Mfunctional units are dedicated for multiplies. The two .S and .L functional units performa general set of arithmetic, logical, and branch functions with results available duringevery clock cycle.


Figure 4.7 TexasInstruments C6xxxDSP internalarchitecture, showinginterconnectionsbetween theprocessing core, thefunctional units andon-chip memory andperipheral devices

On the C6xxx DSP the 64 kbytes of internal data RAM are organised as two blocks of32 kbytes located from address 8000 0000HEX to 8000 7FFFHEX and 8000 8000HEX to 8000FFFFHEX. The blocks are then further subdivided into four 4 kbanks of 16-bit halfwords,as shown in Table 4.2. Simultaneous access to the data memory by the DMA controller,side A and side B of the DSP core, is possible as long as they do not access the same bankwithin the same block.

The arrangement of memory banks and blocks used by the data memory controller ofthe C6xxx DSP is shown in Figure 4.8. This diagram also shows the interface between thedata memory controller and the side A and side B of the C6xxx DSP.

The complete memory map of the C6xxx DSP can be configured in one of two differentways, MAP 0 and MAP 1, and these are summarized in Table 4.3. They differ in that MAP 0has external memory mapped at address 0, and MAP 1 has internal memory mapped ataddress 0. The shaded regions indicate memory that is mapped to external space. The CE0,1,2and 3 label indicates the external chip enable signal used to select the specified memory area.

In order to put the previous sections into context we will now consider the configurationof a system which uses the C6xxx DSP from Texas Instruments. In fact, we will look at the


Bank 0 Bank 1 Bank 2 Bank 3

First address 80000000 80000001 80000002 80000003 80000004 80000005 80000006 80000007

(Block 0) 80000008 80000009 8000000A 8000000B 8000000C 8000000D 8000000E 8000000F

• • • • • • • •

• • • • • • • •

• • • • • • • •

80007FF0 80007FF1 80007FF2 80007FF3 80007FF4 80007FF5 80007FF6 80007FF7

Last address 80007FF8 80007FF9 80007FFA 80007FFB 80007FFC 80007FFD 80007FFE 80007FFF

(Block 0)

First address 80008000 80008001 80008002 80008003 80008004 80008005 80008006 80008007

(Block 1) 80008008 80008009 8000800A 8000800B 8000800C 8000800D 8000800E 8000800F

• • • • • • • •

• • • • • • • •

• • • • • • • •

8000FFF0 8000FFF1 8000FFF2 8000FFF3 8000FFF4 8000FFF5 8000FFF6 8000FFF7

Last address 8000FFF8 8000FFF9 8000FFFA 8000FFFB 8000FFFC 8000FFFD 8000FFFE 8000FFFF

(Block 1)

System example – Texas Instruments C6xxx EVM memory 4.1.3configuration

Table 4.2 Internal data memory organization of the TMS320C6201 (Revision 3) DSP


Figure 4.8 TexasInstruments C6201internal memoryorganization

Address Range (Hex) Size (Bytes) MAP 0 MAP 1

0000 0000 – 0000 FFFF 64K Extnl memory interface CE 0 Intnl program RAM0001 0000 – 003F FFFF 4M–64K Extnl memory interface CE 0 Reserved0040 0000 – 00FF FFFF 12M Extnl memory interface CE 0 Extnl memory interface CE 00100 0000 – 013F FFFF 4M Extnl memory interface CE 1 Extnl memory interface CE 00140 0000 – 0140 FFFF 64K Intnl program RAM Extnl memory interface CE 10141 0000 – 017F FFFF 4M–64K Reserved Extnl memory interface CE 10180 0000 – 0183 FFFF 256K Internal peripheral bus EMIF registers0184 0000 – 0187 FFFF 256K Internal peripheral bus DMA controller registers0188 0000 – 018B FFFF 256K Internal peripheral bus HPI register018C 0000 – 018F FFFF 256K Internal peripheral bus McBSP 0 registers0190 0000 – 0193 FFFF 256K Internal peripheral bus McBSP 1 registers0194 0000 – 0197 FFFF 256K Internal peripheral bus Timer 0 registers0198 0000 – 019B FFFF 256K Internal peripheral bus Timer 1 registers019C 0000 – 019F FFFF 256K Internal peripheral bus interrupt selector registers01A0 0000 – 01FF FFFF 6M Internal peripheral bus (reserved)0200 0000 – 02FF FFFF 16M External memory interface CE 20300 0000 – 03FF FFFF 16M External memory interface CE 30400 0000 – 7FFF FFFF 2G–64M Reserved8000 0000 – 803F FFFF 64K Internal data RAM8040 0000 – FFFF FFFF 2G–64K Reserved

Table 4.3 Texas Instruments C6xxx DSP complete memory map

C6xxx EVM, the evaluation board provided by Texas Instruments. The addition to the on-chip memory mentioned in the previous section, the EVM provides two external memoryareas, one of which is very fast and the other not so fast. The first area of external memoryis a 64K×32-bit memory bank which runs at the full 133 MHz bus speed, i.e. with anaccess time of 7.5 ns. The second external memory area is in the form of two banks of1M×32-bit SBRAM with an access time of 10 ns, i.e. the external bus can only run at 100MHz when accessing this memory. An expansion memory connector is also provided toenable asynchronous memory and memory-mapped devices to be added using a daugh-terboard. The memory arrangement is shown in Figure 4.9.

The External Memory Interface, EMIF, address and data busses are buffered as soon asthey exit the DSP chip in order to preserve signal integrity. Buffering is important whenusing external busses because timing errors and loading effects can easily give problemswhen running at high speeds. A 32-bit bus switch is also used to isolate the rest of the databus during SBSRAM accesses, as shown in Figure 4.9. This isolation preserves signalintegrity and allows the EVM to run the SBSRAM at the full 133 MHz bus speed.

Very fast line drivers are used to buffer the address and control signals, which limits theloading on the DSPs outputs. Also, fast data transceivers are used to provide voltage trans-lation and the necessary drive to the expansion connector on the EVM board. Thecomplete memory map for the C6xxx EVM board is shown in Table 4.4. The memorymap shown is that of configuration MAP-0 and hence the external SBRAM is mappedinto the memory space starting at address zero.

Although it is quite reasonable to expect a DSP device to be able to access on-chip periph-erals at the full internal bus speed, it is quite likely that not all external devices will operateat this full speed. This is particularly true of external read only memories, ROMs, which


32-bit bus switch

Buf

fer

TI C6xDSP

SBSRAM133 MHz64k×32

Expansion memoryconnector and buffer

SDRAM100 MHz1M×32

SDRAM100 MHz1M×32

32-bit data bus

Figure 4.9 TI C6xxxEVM external memoryconfiguration

Wait state generator – coping with slow memory 4.1.4

are generally only used at start up during the boot procedure. Clearly slower memories arecheaper so it is often a matter of providing a DSP system with the minimal amount of fastmemory and using slower memory devices wherever possible. External memory-mappedI/O devices, such as AD/DA converters and CODECs may also not be able to interface tothe DSP at the full operating speed. For this reason most DSP devices incorporate a mech-anism by which the external busses may be slowed down momentarily during an access toa slow device. The mechanism uses a technique where extra clock cycles are insertedduring the external bus access: these extra clock cycles are referred to as wait states. Waitstate insertion can be implemented in hardware by the addition of a small amount of


Table 4.4 TexasInstruments C6xxxEVM board, MAP-0memory map

External

Memory Size

Start Address End Address Space (Bytes) Description

00000000 0003FFFF CE0 256K SBSRAM

00040000 00FFFFFF CE0 16M – 256K Unused

01000000 012FFFFF CE1 3M Asynchronous expansion memory

01300000 0130003F CE1 64 PCI add-on registers

01300040 0130FFFF CE1 64K – 64 Unavailable

01310000 01310003 CE1 4 PCI FIFO

01310004 0131FFFF CE1 16 Audio codec registers


01320010 0137FFFF CE1 320K Reserved

01380000 0138001F CE1 32 DSP control/status registers


01380000 0x13FFFF CE1 448K Reserved

01400000 0140FFFF N/A 64K Internal program memory (IPM)

01410000 017FFFFF N/A 4M – 64K Reserved (future IPM)

01800000 01BFFFFF N/A 4M Internal peripherals

01C00000 01FFFFFF N/A 4M Reserved

02000000 023FFFFF CE2 4M SDRAM (bank 0) or optional asynchronous expansion memory

02400000 02FFFFFF CE2 12M Reserved

03000000 033FFFFF CE3 4M SDRAM (bank 1) or optional asynchronous expansion memory

03400000 03FFFFFF CE3 12M Reserved

04000000 7FFFFFFF N/A 1984M Reserved

80000000 8000FFFF N/A 64K Internal data memory (IDM)

80010000 803FFFFF N/A 4M – 64K Reserved (future IDM)

80400000 FFFFFFFF N/A 2044M Reserved

external logic which holds a control pin on the DSP in an inactive state until the externaldevice is ready to respond. More typically on current DSP devices, wait state insertion isimplemented under software control and the necessary wait state logic is incorporatedinto the DSPs own internal circuitry.

On the Texas Instruments C54x DSP, a software programmable wait state generator isprovided which can extend external bus cycles up to seven machine cycles so that veryslow external devices can be interfaced. The software wait state generator is incorporatedwithout any external hardware. For off-chip memory accesses, from zero to seven waitstates can be specified within the software wait state register, SWWSR. For software waitstate purposes the external address space is divided up into different regions, each ofwhich can be independently controlled by the SWWSR. The division of external programand data I/O space is summarized in Table 4.5. Figure 4.10 shows the bus timing of theC54x DSP when a read–read–write access is made to external memory with no wait statesinserted. In comparison Figure 4.11 shows the same access but this time to slowermemory with a single wait state inserted.

Although the C54x DSP provides software control over external wait state insertion,external devices requiring more than seven wait states can be interfaced using the hard-ware READY line. When all external accesses are configured for zero wait states, thesoftware wait state generator is shut down and placed into a low power state.

As already mentioned, the software programmable wait state generator used on theC54x DSP is controlled by the 16-bit software wait state register, SWWSR. The SWWSRis used to control the two program and data spaces, each consisting of two 32 kwordblocks and the I/O space consisting of one 64 kword block. Each of these blocks has acorresponding 3-bit field in the SWWSR. These fields are shown in Figure 4.12 anddescribed in Table 4.5.

The value stored in each of the 3-bit fields of the SWWSR identifies the number ofwait states to be inserted when accessing each of the memory areas, as defined in Table4.5. Between 0 and 7 wait states can be inserted, corresponding to the 3-bit binarynumber stored in the SWWSR register fields. The minimum value, which adds no wait


Figure 4.10 Memoryaccess at full speed,i.e. with no wait stateinsertion

states, is 0002. Alternatively a value of 7, i.e. 1112, causes the maximum number of waitstates to be inserted.

Figure 4.13 shows a block diagram of the C54x DSPs on-chip wait state generator logicthat is used when accessing external program space. When an external program access isdecoded, the appropriate field of the SWWSR is loaded into the counter. If the field is not0002, a not-ready signal is sent to the DSP core and the wait state counter is started. Thenot-ready condition is maintained until the counter decrements to 0 and the external


Figure 4.11 Memoryaccess at reducedspeed using two waitstates

Reserved/XPA

15

R

I/O

14–12

R/W

Data

11–9

R/W

Data

8–6

R/W

Program

5–3

R/W

Program

2–0

R/W

Figure 4.12 SoftwareWait State Register,SWWSR, of the TI C54x DSP

Table 4.5 SWWSRbit fields used todefine wait stateinsertion for definedareas withinmemory

SWWSR bit field Memory space affected

15 Reserved

14–12 I/O space 0000–FFFFHEX.

11–9 data space 8000–FFFFHEX.

8–6 data space 0000–7FFFHEX.

5–3 Program space 8000–FFFFHEX.

2–0 Program space 0000–7FFFHEX.

READY line is set high. The external READY and the wait state READY are ORed togetherto generate the DSP WAIT signal. Finally, when the DSP WAIT signal is cleared, the DSPwill complete the bus access.

When the DSP is reset, all fields in the SWWSR are set to 1112, so that all externalmemory is accessed using the maximum number of wait states. This feature ensures thatthe DSP can communicate with slow external memories during processor initialization.

The Direct Memory Access, DMA, facilities provided on some DSP devices can be used toautomatically transfer blocks of data between memory areas and external devices withoutdisrupting the flow of normal DSP execution. The DMA processor is usually in the formof an on-chip operational block that can, once correctly initialized, operate independentlyof the main processing core. This is an extremely useful facility because it helps to free-upmore of the processing time for more useful and arithmetically demanding operations.The structure of the memory within a DSP system is an important issue if the DMA facil-ity is to be able to perform any useful function. This is because DMA accesses to memoryareas are likely to take place concurrently with accesses by the main processing core. If theDMA processor and the processing core are required to access a particular area ofmemory at the same time then either there will be a memory conflict or some mechanismhas been put into place so that a conflict does not occur. One approach is to divide thememory areas into separate blocks which can be independently and simultaneouslyaccessed or the DMA processor should be allowed to access the memory during an idlepart of the DSP instruction cycle. With highly optimized DSP processors where efficientpipelining techniques are used, it is unlikely that an idle period will exist and so thememory division approach is most commonly used. The Texas Instruments C6xxx DSP isa good example of this where, as we have already seen (see Table 4.2), the data memory isdivided into a series of banks and blocks which can be accessed simultaneously withoutconflict. On the C6xxx the data memory space is also managed by a data memory con-troller which arbitrates between accesses in the background in order to avoid potential


1-to-2decoder

Y0

Y1

G

A

PSEL

A15

WAIT

CYCLE

DSP core

READY

3-bitcounter

5–3

2–0

SWWSR

Externallogic

Wait state generator

Figure 4.13 Softwarewait state generatorblock diagram for theTexas InstrumentsC54x DSP

The direct memory access processor, DMA 4.1.5

conflicts. A good example of the use of DMA is in a system where a high-speed CODECtransfers sample data to and from the DSP continuously. If the DSP core itself had tohandle the exchange of CODEC data, through the use of a software routine, then a signifi-cant chunk of processing time would be wasted. With the use of DMA, the DMAprocessor can be configured to handle the exchange of sample data, place it in a specifiedmemory location and then flag the DSP core, through the use of an interrupt, to tell it thatdata are ready and waiting. In this DMA-CODEC example, typically the DMA processorwould be configured to transfer a whole block of sample data into memory and then flagthe DSP core only when the block transfer is complete. Using the block transfer method,often referred to as frame based processing, a further reduction in the number of timesthe DSP core is interrupted is achieved and hence an even greater increase in processingthroughput can be obtained.

A number of scenarios may exist for the frame based DMA processing example previ-ously discussed. The DMA processor may be set to operate its memory accesses in circularbuffer mode such that old sample data is always overwritten. The circular buffer DMAmode requires relatively little memory, but the DSP core will be required to collect thesamples continuously if they are not to be overwritten. An alternative approach is to usetwo independent frame stores such that the DMA processor can access one store whilstthe DSP core accesses the other. When the DMA processor has completed all accesses tothe first store then the DMA processor and DSP core will swap frame stores and carry on.This mode of access is sometime referred to as ping-pong access for obvious reasons. Adiagram depicting the ping-pong mode of access is shown in Figure 4.14. This diagramcould represent the Texas Instruments C6xxx processor where the separate frame storesare allocated into independent memory blocks so that all accesses can take place concur-rently without conflicts.


Output frame AInput frame A

Memory block 0

Output frame BInput frame B

Memory block 1

DMAprocessor

CODEC

DSPcore

Swap

fram

esw

hen

com

plet

e

OUT

IN

Figure 4.14 DMA andDSP core accesses toframe based memorystores

In addition to the comprehensive arrangement of program and data memory areas, allDSP devices incorporate an extensive array of I/O interfacing capabilities. This includes aparallel interface that allows external devices to be mapped into the addressable I/O space,a range of serial interfaces and interrupt control lines. The trend for many DSP peripher-als, such as CODECs and AD/DA converters, is to use a serial connection to interface tothe DSP. In most cases serial connection is quite straightforward both in terms of softwarehandling on the DSP side and the physical layout on the DSP board. Clearly the three orfour connections required for a serial interface is far easier to route on a printed circuitboard than the 30 or so required for a conventional parallel interface. We start this sectionwith an overview of the parallel I/O and interrupt facilities found on a typical DSP device.This is then followed by a look at serial interface mechanisms (Ref. 4.5).

It has already been mentioned in the discussion about external program and data memorythat the on-chip multibus architecture is rarely replicated to the external connection pinsof the DSP device (see section 4.1.1). Instead the usual arrangement is that one completeset of address and data lines are replicated to the outside world and accesses to externalprogram and data memory must therefore share these connections. This arrangement isalso true for external I/O interfacing, i.e. the same data and address bus lines are used.

The question then arises as to how the DSP can access the different external spacesusing only one address and data bus. Usually additional control lines are used, as shown inFigure 4.15, each distinguishing between the different addressable program, data and I/O

Hardware interfacing and I/O control 4.2

Hardware interfacing and I/O control 225

General purpose I/O interface 4.2.1

DSP external memory interface

DSP core

Data space

Program space

I/O space

Address bus Data busDat

a se

lect

Prog

ram

sel

ect

I/O

sel

ect

R/W

sel

ect

Figure 4.15 ExternalI/O and memoryspace

areas. Also, additional instructions are usually available that specifically access the I/Ospace so that a value held in a register can be passed directly to a location in I/O space forexample. In the case of the Texas Instruments C54x DSP the I/O memory space is anexternal 64 kword address space, ranging from address 0000HEX to FFFFHEX. Two assemblyinstructions, PORTR and PORTW, are used to read from and write to this space. Read tim-ings vary from those of the program and data memory spaces and can be independentlyset with wait-state values to facilitate access to a range of I/O-mapped devices.Furthermore, with the use of the external ready input signal, READY, devices can begranted extra bus cycles to complete an access. When communicating with slower devices,the DSP waits until the other device completes its function and sends the READY signal tocontinue execution. The mutually exclusive interfaces to external memory and I/O spaceare controlled on the C54x by the MSTRB and IOSTRB signals. MSTRB is activated formemory accesses, program or data, and IOSTRB is used to access I/O ports. The R/Wsignal controls the direction of the access and the PS,DS and IS signals are used to selectthe relevant chip select logic. The key external interface signals on the C54x are summa-rized in Table 4.6.

In order to gain a better understanding of how the external busses operate, it is helpfulto consider their operation when used to access the different addressable spaces. Figures4.16 and 4.17 demonstrate the sequencing of external signals used to access a memoryinterface and I/O space, respectively.

With reference to Figure 4.16, the sequence of events during the memory read–read–write operation is as follows. Starting with the read operation, the DSPs externalmemory interface places the correct address and program memory select line, PS, into therequired state thus enabling the address decode logic associated with the required memorylocation. The R/W select line is held by the DSP in the appropriate state, high in this casebecause it is a memory read. The DSP also holds the MSTRB line low, indicating that this isa memory space selection. The memory device is given a short time to respond to the read


Table 4.6 TexasInstruments C54xDSP externalinterface signals

Signal Name Description

A0–A15 Address bus

D0–D15 Data bus

MSTRB External memory access strobe

PS Program space select

DS Data space select

IOSTRB I/O access strobe

IS I/O space select

R/W Read/write signal

READY Data ready to complete cycle

HOLD Request for control of memory interface

HOLDA Acknowledge HOLD request

IACK Interrupt acknowledge

access request and the DSP captures the state of the data bus just before the second fallingedge of CLKOUT, as shown in Figure 4.16. The cycle repeats for the second read operationand the data is captured from the bus just before the third falling edge of CLKOUT.

The memory write procedure shown in Figure 4.16 follows a similar sequence of eventsto the read procedure; however, this time the R/W line is active low. Also the DSP placessignals onto both the address and data bus long enough for the memory device torespond. The data is strobed into the memory device on the rising edge of the MSTRBsignal shown during the write operation. This ensures that any transients present in thebus signals will have settled before the data is finally strobed in.

The I/O space read–write–read operation shown in Figure 4.17 follows a similarsequence to the memory accesses shown in Figure 4.16. This time however, the IOSTRBand IS signals are used to strobe data and select the I/O address map.


Figure 4.16 Externalmemory interfacesignals during aread–read–writeoperation

Figure 4.17 I/Ospace access signalsused during an I/Oread–write–readoperation

Interrupts can be hardware or software driven signals that cause the DSP to stop its mainprogram and execute another function called an interrupt service routine, ISR. When theISR has finished execution, program flow will continue from where it left off before theinterrupt occurred. Interrupts are usually generated by hardware events such as an exter-nal device needing to transfer data to the DSP. Typically devices such as ADC/DACs orCODECs will use interrupts to flag a request to send or receive data to or from the DSPdevice. Other interrupts may be generated internally within the DSP. This could, forexample, come from a serial communications port, indicating that new data has arrived,or from an on-chip timer. Software generated interrupts can also be generated and mostDSP devices have specific instructions used to trigger a software interrupt. Software inter-rupts can be useful for error catching or when implementing a real-time operating systemin which multiple software threads need to execute.

Within a real DSP application, it is likely that different interrupts will be able to occurat any time or even at the same time and so it is necessary for the DSP to be able to priori-tize the order in which the ISRs are executed. For this reason an interrupt priority levelcan usually be set for each of the interrupt types. Clearly an interrupt from a high-speeddata acquisition device is likely to be more time-critical than an interrupt associated withswitching an indicator lamp on or off. Usually some of the hardware interrupt signal linesappear on the external connection pins of the DSP device, while others are internal onlyand are associated with specific types of event. A list showing the different interruptsavailable on the C54x DSP is shown in Table 4.7. This type of list is often referred to as aninterrupt vector table, the reasons for this will be considered shortly.

The interrupt list shown in Table 4.7 gives an indication of the wide range of interruptsthat are likely to be found on any DSP device. Interrupt numbers 19–25 are associated withon-chip facilities, such as the timer, serial ports and host port, interrupts 16–18 and 24 areused for user defined uses and appear as physical signal lines on the external connectionpins of the device. A wide range of user definable software interrupts are available on thisparticular DSP, these include interrupts, 2–15, the priority level of which are user defined.Interrupt 1 is known as a nonmaskable interrupt, NMI. Unlike other interrupts a NMIcannot be disabled and is typically allocated to time-critical interrupt sources. Usually a reg-ister associated with the different interrupts is used to enable or disable the interrupt signals,apart from the NMI interrupt. This register is often referred to as an interrupt mask register,IMR, an example of which is shown for the Texas Instruments C54x DSP in Figure 4.18.

In addition to using the IMR register to enable or disable individual interrupts there isa flag in the Status control register called the Interrupt Mask, INTM. This can be used toglobally enable or disable all interrupts, as discussed in Section 3.7.5. Note that this flagdoes not affect the RESET or NMI interrupts.

The last remaining interrupt to be mentioned is the RESET interrupt, number 0. Thisparticular interrupt is used to trigger a complete reset of the DSP and is used during start-up. The reset interrupt can be triggered through a software or hardware event and is usedto place the DSP into a known state where all registers, counters and on-chip peripheralsare set to default values. During start-up, the reset interrupt is used to place the DSP intothe default state and call the application program which is at a user-specified location inprogram memory.

4.2.2 Interrupt control unit


Another register associated with interrupts is used on the C54x DSP. This is referred to asthe interrupt flag register, IFR, as shown in Figure 4.19. This register has two functions:firstly, it provides a software readable indicator showing which interrupt events have taken


Table 4.7 TexasInstruments C54xDSP interrupt vectortable

INTR No Priority Name Location Function

0 1 RS/SINTR 0 Reset (hardware and software reset)

1 2 NMI/SINT16 4 Nonmaskable interrupt

2 – SINT17 8 Software interrupt #17

3 – SINT18 C Software interrupt #18




7 – SINT22 1C Software interrupt #22




11 – SINT26 2C Software interrupt #26



14 – SINT29 38 Software interrupt #29, reserved

15 – SINT30 3C Software interrupt #30, reserved

16 3 INT0/SINT0 40 External user interrupt #0



19 6 TINT/SINT3 4C Internal timer interrupt

20 7 BRINT0/SINT4 50 Buffered serial port receive interrupt

21 8 BXINT0/SINT5 54 Buffered serial port transmit interrupt

22 9 TRINT/SINT6 58 TDM serial port receive interrupt

23 10 TXINT/SINT7 5C TDM serial port transmit interrupt


25 12 HPINT/SINT9 64 HPI interrupt

26-31 – – 68-7F Reserved

Figure 4.18 TexasInstruments C54x DSPInterrupt MaskRegister, IMR

Figure 4.19 Interruptflag register, IFR, usedon the TexasInstruments C54x DSP

place. The second function of the IFR is to provide a software mechanism by which inter-rupts can be acknowledged and cleared. When an interrupt occurs the corresponding flag inthe IFR is set and remains in this state until the DSP core acknowledges and clears the flag.

In a typical application the arrangement may be as shown in Figure 4.20 where pro-gram flow switches from the main routine to the interrupt service routine, ISR, when aninterrupt event occurs. The sequence of events which take place immediately after theinterrupt ensure that the correct service routine is called and that the ISR is not repeatedlyre-triggered.

On the Texas Instruments C54x DSP the processing core operates in the followingmanner when a maskable interrupt is requested:

� The corresponding bit in the IFR is set.

� The acknowledgment conditions, INTM = 0 and IMR bit = 1, are tested. If the conditionsare true, the CPU acknowledges the interrupt, generating an interrupt acknowledge, IACK,signal. Otherwise, the DSP ignores the interrupt and continues with the main program.

� When the interrupt has been acknowledged, its flag bit in the IFR is cleared to 0 andthe INTM bit is set to 1 so that no other maskable interrupts can take place.

� The current value of the program counter, PC, is saved onto the stack.

� The DSP branches to and executes the interrupt service routine, ISR. What actuallyhappens is that the DSP core branches to the interrupt vector address associated withthe interrupt.

� The ISR is concluded by a return instruction, which pops the return address off the stack.

� The DSP continues with the main program.

In the case of the C54x DSP, the interrupt causes the program flow to branch to one of anumber of memory locations depending upon the specific interrupt that occurred. Thelocations are defined in the interrupt vector table, as shown previously in Table 4.7 andalso shown in a different form in Figure 4.21. For example, if the external interrupt pinINT3 is toggled by some external device, the program flow would initially branch to thevector address FFE0h and run the code at this memory location. The address FFE0h is thedefault location of the INT3 ‘vector’ and program flow will always jump to this locationwhen the INT3 pin is toggled. It is the programmer’s task to place meaningful code at thisaddress, usually by including a vectors.asm file when compiling and linking the code.


Initialization ( )

Main

while (1){ .. wait for ever}

ISR_1( ) ISR_2( )

Interrupt

Return

Figure 4.20 Typicalreal-time applicationusing event driveninterrupt serviceroutines, ISRs

Because a small amount of code, no more than four words, can be located at the vectoraddress it is possible, if the ISR is very short, to place its code within the vector table asshown for the HPINT/SINT9 interrupt in Figure 4.21. More typically though, the onlyinstruction that will be placed at the vector address will be a branch instruction which re-directs program flow to a specific ISR located elsewhere in program memory.

Also shown in Figure 4.21 is the vector address FE80h which is associated with thenon-maskable RESET interrupt. It has already been mentioned that when the C54x DSP isreset or during its start-up sequence, many registers are set to predefined values. One ofthese is the program counter, PC, which is set to the vector address FF80h. The program-mer should therefore place a branch instruction at this location which causes programflow to be re-directed to the programmer’s own application program. If the programmerfails to set the RESET vector correctly then it is likely that the application program willnever start. The sequence of events followed by the C54x DSP when it receives a maskableor non-maskable interrupt request is summarized in the flow diagram of Figure 4.22.

Non-maskable interrupts are very similar in operation to the maskable type alreadyconsidered. The main difference is that the interrupt acknowledge signal, IACK, isissued immediately and program flow branches directly to the relevant interrupt vectoraddress, after first storing the program counter value onto the stack and disabling allfurther interrupts through the use of the global interrupt disable flag INTM in the DSPcontrol register.

The previous example detailing the interrupt structure of the C54x DSP is quitegeneric in that many of the ideas presented are equally applicable to the operation ofinterrupts on other DSP devices. This is certainly true of the TI C6xxx device which alsomakes use of an interrupt vector table and an assortment of maskable and non-maskable


RESET

NMI

SINT17

SINT18

SINT19

SINT20

SINT22

INT3/SINT8

HPINT/SINT9

Branch toreset

routine

Branch toNMI

serviceroutine

Branch toINT3

serviceroutine

.sect vectors.asmReset B SysInit

nopnop

NMI Instr1Instr2B nmi_isr

: : etc. :INT3 B Int3_isr

nopnop

HPINT Instr1Instr2Return

FF80h

FF84h

FF88h

FF8Ch

FF90h

FF94h

FF98h

FFE0h

FFE4h

FFE8h

Reserved

FFFFh

etc.

Figure 4.21 Interruptvector table andassociated interruptservice routines, ISRs


No

PC saved on software stack

Interrupt service routine run

Return instruction restors PC

Main program continues

Hardware interruptor

INTR instruction?

Interrupt acknowledged;IACK generated

IMR maskbit = 1?

INTM = 0?

Interruptmaskable?

Interrupt request received

No

Yes

No

Yes

Yes

No

Yes

INTM set to 1

Figure 4.22 Flowdiagram of interruptoperation for theTexas InstrumentsC54x DSP

interrupt types. The main difference is in the detail, the exact location of the vector tableand the naming conventions used for the various signals. Also, in the case of the C6xxx,additional interrupts associated with on-chip peripherals are present such as the multi-channel serial ports and the extensive DMA features.

Because the C6xxx DSP fetches packets of eight 32-bit instructions at a time, thenumber of instructions that can be stored at each vector address is eight, i.e. one fetchpacket. The fetch packets used on the C6xxx DSP and associated with interrupts arereferred to as Interrupt Service Fetch Packets, ISFPs. As with the C54x, the group ofinstructions placed at the vector address may form the ISR in their own right and simplyreturn flow immediately back to the main routine after they have completed or a branchinstruction may be used to jump to a more substantial service routine elsewhere inmemory. Both of the situations just described are depicted in Figure 4.23.

All DSP devices incorporate some form of serial communications interface which can beused to easily connect the DSP device to a wide range of different types of peripheral device(Refs. 4.6 and 4.7). Commonly, peripheral devices interfaced to DSPs include AD/DA con-verters, CODECs and other DSP devices for inter-processor communication. The standardserial communication interface uses a four wire bus comprising a data clock, frame sync,data receive and data transmit signals: this arrangement is shown in Figure 4.24. The data


Figure 4.23 Twodifferent types of ISFPused on the TexasInstruments C6xxxDSP

Serial ports and the multi-channel buffered serial port 4.2.3

clock signal is used as the reference by which the serial data bits are shifted in or out, theframe sync is used to indicate the start of a 8-, 16- or 32-bit word and the data receive andtransmit lines carry the actual data. Such a simple interface mechanism eases the problemsof circuit board layout and reduces the potential bottlenecks that would occur if real-timedata I/O of this type was performed over the DSPs parallel external interface. Some DSPdevices offer modes of serial data transfer that reach quite impressive peak data transferrates of the order of 50–80 Mbits/s per channel. Also, because many DSPs incorporate anumber of independent high speed serial ports the combined data transfer rate on and offchip can sometimes exceed 200–300 Mbits/s.

For most DSP serial ports the frame sync and clock signals can be generated by theDSP itself under user control via a set of serial port configuration registers. Alternativelythe signals can be derived from the external device. The arrangement just described isreferred to as a Master–Slave mode of operation. In the first scenario the DSP operates asthe Master and the external device the Slave, and in the second scenario the reverse is true.Inside the DSP a set of registers are usually available to control the serial port itself interms of Master–Slave mode of operation, clocking and frame rates, data widths and soon. Additional registers, often referred to as the Data Receive, Rx, Register and DataTransmit, Tx, Register are used to pass Rx and Tx data between the DSP core and thecommunications interface. When new data arrives on the receive side or is required on thetransmit side the communications interface logic usually generates an interrupt signal toflag its requirements. On some DSPs the interrupt will be responded to by the processingcore, although on more recent devices an on-chip DMA processor will handle the inter-change of data via a user specified location in memory. When the DSP core is continuallyinterrupted by regular sample data requests issued by the communications interface, theassociated processing overhead can be excessive. By using a DMA processor to managedata transfers the processing overhead for the DSP core will be significantly reducedbecause the DMA can be set to transfer large blocks of data and only interrupt the core atthe end of each new block. For example the DMA transfers could be set to use a 1024sample buffer in memory: the DMA processor would send and receive each 1024 sampleblock and only interrupt the DSP core when the transfer is complete.

An example showing the signal states for the four wire serial interface is shown inFigure 4.25. The signals shown are FSync, CLK, DRx and DTx, representing the framesync, data clock, data receive and transmit signals, respectively. The sequence shows thefull duplex mode of operation, that is, both transmitted and received data are communi-cated simultaneously. Figure 4.25 also shows two extra signals representing the receive andtransmit interrupts that are generated inside the DSP device. These are used to indicatethat new data is ready or that it is required for transmission.


Seria

l int

erfa

ce

DSP

dev

ice DRx

DTx

CLK

FSync Seria

l int

erfa

ce

COD

EC

Figure 4.24 Basicfour wire serialcommunications

The communications port available on the C54x DSP from Texas Instruments is verymuch as described in the previous subsection. One of the features that distinguishes thedifferent devices within the C54x family is the number and type of serial communicationsports that are available. Ports can be either standard serial, buffered serial which are ineffect a standard port with a built-in DMA processor or TDM serial. TDM stands forTime Division Multiplex and is a more advanced type of serial port that allows up toseven devices to communicate over a slightly modified four wire connection.

In the standard mode of serial communication, the C54x DSP makes use of threememory-mapped 16-bit registers and two additional registers that are not accessible to theDSP core itself. The memory-mapped registers are the data receive register, DRR, intowhich new data arriving into the port is placed and the data transmit register, DXR, wherethe user application places new data to be transmitted. The third memory-mapped regis-ter is the serial port configuration register, SPC, which is used to set up various aspects ofthe serial communications. The 16-bit SPC register is shown in Figure 4.26.

We will consider some of the key SPC register bits in the following discussion. This isparticularly useful because many of these will have an equivalent that can be found onmost other DSP devices. Starting from the LSB end of the SPC register and movingupward, the DLB bit sets the serial port into a loop back mode of operation. Loop back isequivalent to physically connecting the DTx output directly to the DRx input. This can beuseful when setting up and testing a serial communication port or algorithm. The FO bitselects the word size to be used in transmission and reception; the possibilities on theC54x DSP are 16 or 8 bits. The FSM, MCM and TXM bits are used to set the frame syncand clock operating modes, in effect enabling them to be configured to be derived on-chipor sourced externally. XRST and RRST are used to reset and enable the transmission andreception sides of the serial port, respectively. XSEMPTY is a flag used for error trappingand is used by the communication port to indicate that the transmit buffer has been emp-tied before new data was available, in effect indicating a buffer underflow has occurred.


D0 D1 .. .. .. .. .. .. .. .. .. .. .. .. .. D15 D0 D1 ..

MSB

LSB

D0 D1 .. .. .. .. .. .. .. .. .. .. .. .. .. D15 D0 D1 ..

MSB

LSB

CLK

FSync

DTx

TxINT

DRx

RxINT

Figure 4.25 Four wirecommunicationsusing FSync, CLK, DTxand DRx. Additionalsignals are shown toindicate the interruptsTxINT and RxINT.

Figure 4.26 TexasInstruments C54x DSP16-bit serial portconfiguration, SPC,register

RSRFULL is used by the receive side of the port to indicate that the receive register hasoverflowed, i.e. new data has arrived and the previous data has not yet been collected.

Internally the basic C54x serial port is arranged as shown in Figure 4.27. The data busshown at the top of the diagram is the standard 16-bit data bus used by the DSP core toaccess the all memory-mapped registers including the serial port data transmit register,DXR, and the data receive register, DRR. An additional two registers are used, the RSR andXSR: these are shift registers that cannot be directly accessible by the DSP core. The namesRSR and XSR are used to denote the receive and transmit shift registers respectively.Assuming that the serial port control register, SPC, has been set to full duplex, bi-direc-tional, operation the sequence of events will be as follows. New data to be transmitted isloaded by the DSP core, under software control, into the DXR register. When the previoustransmission data has been shifted out of the XSR, transmission shift register, the newdata to be sent is parallel loaded from the DXR. Each time the transmission has completedand all of the data bits in the XSR have been shifted out, an interrupt is generated whichflags the DSP core for new transmission data. The sequence is very similar on the receiveside: each time a complete data word has been shifted into the RSR, receive shift register, itis parallel loaded into the DRR and an interrupt generated to indicate to the DSP core thatnew data is available. On the C54x DSP, the receive and transmit interrupts are labelledTRINT and TXINT, and these can be observed in the interrupt vector table shown previ-ously in Section 4.2.2. In fact these interrupts are associated with the slightly moreadvanced TDM serial port which is available on some of the C54x DSPs. Also note that,because the data passes through two registers for either receive or transmit, a usefuldouble buffering of data takes place.


Figure 4.27 TexasInstruments C54x DSPserial port interfaceblock diagram

Some of the C54x DSP include a more advanced buffering scheme that makes use of atype of DMA access to on-chip data memory. In this buffered serial port, the incomingdata is automatically placed into a user specified circular buffer and an interrupt only gen-erated when the buffer is full. Also for the buffered serial port outgoing data is placed intoa second circular buffer which only needs to be re-filled when the buffer is empty. Thebenefit of this buffering scheme is that the interrupt overhead for the DSP core is vastlyreduced, since the core only needs to service an interrupt when the buffers need attention.Figure 4.28 shows a block diagram of the buffered serial port used on the C54x DSP.

The buffered serial port incorporates the standard serial port, previously mentioned, inaddition to an autobuffering unit, ABU. The autobuffering unit contains its own set ofaddressing registers that can be used to implement a circular buffer within a user specifiedarea of data memory. There are some restrictions placed on the specific area of datamemory that can be used and this is limited to reside within the on-chip dual accessRAM. This ensures that simultaneous accesses to this memory area by the ABU and theDSP core will not cause timing conflicts. In addition to automatically generating addressesand storing/retrieving communication data, the ABU takes control of the serial port


C54x DSP Data memory interface

Dat

a bu

s

Addr

ess

bus 11

16

Read Write

Autobuffering unit (ABU)

Control XRDY RRDY BXINT BRINT

C54xDSP coreinterface

Interruptlogic

Interruptcontrol

BXINT

BXINT

WXINT

WRINT

BCLKX

BFSX

BDX

BDR

BCLKR

BFSR BSPCBDRR

BRSR

BXSR

BDXR BSPCE

Serial portcontrol logic

Serial port interface

Figure 4.28 TexasInstruments C54x DSPbuffered serial portblock diagram

transmit and receive interrupts. The ABU only issues an interrupt when the receive andtransmit buffers are full/empty, respectively. Note that in the diagram the labels BXINTand BRINT are used to denote the transmit and receive interrupts of the buffered serialport. Additional registers are provided for the buffered serial port to control the memoryaddressing action and buffer modes – these will not be considered here.

In addition to standard serial and autobuffered communications, some of the C54xDSP devices support a time division multiplexed, TDM, mode of operation. TDM modeallows more than one external device to be connected in parallel onto the four wire bus.On the C54x the maximum number of external parallel devices that can be connected tothe TDM bus is eight, although this includes the master DSP processor itself. Some imple-mentations of the TDM communication port allow many more devices to be connected,for example the C6xxx DSP supports 128 parallel devices on each four wire TDM bus.Although in principle many different devices could be connected to the same TDM bus,such as other processors and CODECs, etc, in practice only devices that support TDM canbe used. Very few CODECs or AD/DA converters are commercially available that fullysupport the TDM mode of operation, although it is often possible to add extra logic cir-cuitry to facilitate this mode. Most often TDM operation will be used for processor-to-processor communication where for example an array of C54x DSPs will be configuredin a multiprocessing arrangement.

Time division multiplexing, TDM, is a scheme in which the available transmission timeis divided into a number of intervals or slots. Each time slot represents a transmissionchannel in which all data must be sent for that particular channel. The collection of chan-nel time slots forms a full ‘frame’ period and the bursts of data sent for each channel areoften referred to as data packets. This arrangement is shown in Figure 4.29 where a four-channel frame containing 16-bit data packets is used. Channel 1 is active during the firstcommunications period and during every fourth period thereafter. The remaining threechannels are interleaved in time with channel 1.

As previously mentioned the C54x TDM port can support up to eight channels. Thespecific operation of the TDM bus on the C54x is configurable through software via aTDM bus configuration register. This can be used to determine which device is to trans-mit or receive within each of the available time slots. The configuration is very flexible andallows more than one receiver to take data during any time slot. For example, a system


D0 D1 .. .. .. .. .. .. .. .. .. .. .. .. .. D15DTx

LSB

MSB

CLK

One complete data packet

CH1CH4 CH2 CH3 CH4 CH1 CH2

TDM frame

TDM channelsetc.

Figure 4.29 Timedivision multiplexed,TDM, data packetstructure

could be configured so that all devices receive data from a master device transmittingduring channel time slot one, while at the same time device 3 could be set to be the onlydevice receiving channel 2 data and so on. Figure 4.30(a) shows the architecture and con-nection arrangements for TDM on a C54x DSP. The four wire bus consists of theconventional serial port bus connections of clock, frame, and data wires plus an additionalwire, TADD, that carries the device addressing information. The data wire, TDAT, isslightly different to that discussed previously in that it is bi-directional and is used fortransmission and reception of data during different time periods. The TADD signal is alsobi-directional. Any device within the TDM system can be configured to drive the bi-direc-tional TDAT and TADD buses during different time slots within a given frame.

As was the case for standard serial port communications, all TDM port operations aresynchronized to the clock and frame signals. In the TDM system these are labelled TCLKand TFRM. These signals are derived from one device within the system and it is the pro-grammer’s responsibility to ensure that this is the case.

The bi-directional TDM address line, TADD, and TDM data line, TDAT, can only bedriven by one device during any single time slot. The TDAT and TADD outputs should bein a high impedance state during that slot. This is achieved through proper programmingof the TDM port control registers. When one device transmits, all other devices in thesystem, including the one driving the time slot, sample the TDAT and TADD lines todetermine if the current transmission represents valid data to be read. When a device rec-ognizes an address to which it is supposed to respond, a valid TDM read occurs and thedata value is transferred to the receive register. A receive interrupt is also generated withinthe receiving device to indicate that new valid data has arrived.


Figure 4.30 Timedivision multiplexingusing the TexasInstruments C54xDSP. (a) TDM systemarchitecture and (b)normal connection ofTDM pins on a C54xDSP chip

On the C54x DSP the TDM port operation is controlled by six memory-mapped registers,these are shown in Figure 4.31. TRCV and TDXR registers have the same functions as theDRR and DXR registers respectively and are used to hold valid receive and transmit data.The TSPC register is identical to the SPC register previously discussed although certain bitpositions are interpreted slightly differently. These bit positions in the TSPC registerdetermine such things as which device is to provide the TCLK and TFRM timing signalssince no more than one device within a system should provide the reference data andframe clocks. Usually one device is initialized at start-up to provide the reference clockand this device remains in this state at all times.

Other registers used in the TDM serial communications port include the TCSR, TRTAand TRAD. The TDM channel select register, TCSR, is used to determine the time slotduring which each device will transmit. An important system level constraint is that nomore than one device can transmit during the same time slot and because devices do notcheck for bus contention, it is important that the programmer specifies this correctly via theTCSR. There is no limitation on the number of transmission time slots allocated to a partic-ular device. For example if a device within a system is to be allocated TDM channel slots oneand four then bit1 and bit4 in the TDM channel select register, TCSR, should be set high.

The TDM receive/transmit address register, TRTA, within each device specifies thereceive and transmit address of the device. The lower half, RA0 – RA7, specifies the receiveaddress and the upper half, TA0 – TA7, specifies the transmit address. When the TDMsystem buses are active, the TADD line will carry 8-bit address information about whichdevice is transmitting at any time. All devices in the system continually sample the TADDline and compare the address word obtained with the value held in their TRTA register. Ifthe address indicated by the sampled value corresponds with an address in the TRTA thenthe device will successfully receive the data. The transmit address, TRTA bits TA0 – TA7, isthe address that the device drives onto the TADD line during a transmit operation withinan assigned channel slot. The transmit address establishes which receiving devices mayexecute a valid TDM receive on the driven data.

The last additional register used for TDM communication is the TDM receive addressregister, TRAD. This register holds information about the previous address informationthat was sent and can be used to help verify the relationship between instruction cyclesand TDM port timing.

In principle the serial communication port available on the Texas Instruments C6xxxDSP processor is very similar to that considered for the C54x DSP. The main difference isthat the C6xxx ports are automatically buffered by the use of the DMA processor which isavailable on this device. Also, the C6xxx ports support multichannel, TDM, operation for


Figure 4.31 TexasInstruments C54x DSPTDM serial portregisters

up to 128 channels. The C6xxx Multichannel Buffered Serial Port, McBSP, as it is calledalso contains a wider range of control registers that facilitate a selection of different datasizes, 8, 12, 16, 20, 24 and 32 bits, companding and signal polarity selection. Furthermorethe C6xxx McBSP contains a highly programmable sample rate generator from which theclock and frame signals can be derived. Although this C6xxx McBSP provides a widerange of control and configurations options that can support just about any device youwould wish to connect, the basic principle of operation is very similar to the previousserial communication schemes discussed. This device can be configured so that the DMAprocessor collects and places data within a specified memory area on the DSP device sothat frame based processing can take place.

Figure 4.32 shows a block diagram of the C6xxx McBSP. The diagram shows thenormal memory-mapped registers, DRR, DXR and SPCR as seen previously, although onthis system all registers are 32 bits wide. Additional registers include the SRGR, which isthe sample rate generator control register and is used to determine the transmission clockand frame rates. A number of registers are also available to control various aspects of mul-


Figure 4.32 Blockdiagram of the TexasInstruments C6xxxDSP multichannelbuffered serial port

tichannel, TDM, operation. Separate interrupt lines are provided that can be configuredto flag data events to the DSP core or the DMA processor.

The TI C6xxx DSP supports and is compatible with companded data streams.Companding is a scheme where linear data is compressed before transmission so thatshorter word lengths can be used and hence a reduction in transmission bandwidth canbe achieved. The companding scheme available on this device supports A-law and µ-lawcompression/expansion of the data stream. In Japan and the United States, the µ-law stan-dard is used which allows a 14-bit dynamic range to be compressed to an 8-bitrepresentation. In Europe the A-law standard is used which compresses a 13-bit dynamicrange, also to 8 bits. The scheme used is shown in Figure 4.33. Data companding can beenabled or disabled by changing the appropriate bit in the transmit and receive controlregisters. The companding scheme allows compatibility of the transmitted and receivedserial data streams with the standards set out in the CCITT G.711 recommendation.When the McBSP is set to the companding mode of operation, received data is decom-pressed, i.e. expanded, from the 8-bit serial representation to the 14- or 13-bitrepresentation, depending upon the companding law is used. The expanded data is in alinear format that can be freely manipulated arithmetically as required by the application.After processing, the data can be retransmitted and automatically compressed by theMcBSP according to the required companding law. The µ-law companding ratio isdescribed by the following relationship:

log(1 + µn)m = for n ≥ 0

log(1 + µ)

where m is the output magnitude, n is the magnitude of the input, and µ is the compres-sion factor which is a positive value that defines the exact compression curve used.


1011101110111111

0111010000110001

Mulichannel buffer serial port

Data expansion

µ-law compressedserial input data stream

1011101110111111

0111010000110001

Data compression

µ-law compressedserial output data stream

Dataprocessingfunction

using linear datarepresentation

on DSPdevice

8-bit compressed data8-bit compressed data 14-bit linear data

Figure 4.33Companding schemeused on the TI C6xxxDSP multichannelbuffered serial port

The choice of compression factor, µ, must be constant throughout the system and dif-ferent values can be defined to suit specific applications such as voice or music.

The host port interface provides a simple and efficient mechanism by which to connect tostandard microprocessor buses. This is a parallel connection which is entirely separate to themain processor address and data bus interface but, in common with these, allows paralleldata to be transferred in either direction. The host interface mechanism usually shares anarea of addressable memory, which resides on the DSP device, between the host and theDSP core. The interface is also asynchronous so that the host, with its separate clock, canaccess any shared memory location as required. Similarly the DSP core can access any of theshared memory locations. A good example of the use of a host interface would be a systemwhere an ‘intelligent’ control panel is interfaced to the DSP. By ‘intelligent’ we are actuallyreferring to the fact that the control panel in this example is assumed to contain an embed-ded microprocessor. The microprocessor scans and reads the control panel knobs andswitches and passes the relevant information back, in real time, to the DSP algorithm via thehost port interface. As with the previous discussions covering on-chip peripheral interfacehardware, let us consider the host port interface in the context of a specific device. We willconsider the host port interface, HPI, provided on the Texas Instruments C54x DSP. In fact,the interface provided on the C54x is almost identical to that provided on the C6xxx, so wewill not consider the C6xxx further here. The only significant difference between the C54xand C6xxx HPI interfaces is that the C54x is used to transfer 16-bit data over an 8-bit inter-face bus, while on the C6xxx, 32-bit data is transferred via a 16-bit bus. The data transferrates are also significantly higher on the C6xxx which can be configure to use the DMAprocessor to arbitrate and manage all HPI data transfers.

As already mentioned the HPI interface provided on the C4x DSP is configured as an8-bit interface over which 16-bit data words can be transferred. 16-bit words are trans-ferred in two parts, i.e. first byte followed by second byte and this can be configured to bein high byte, low byte order or vice versa. As can be seen in Figure 4.34, the host side of


Host port interface 4.2.4

Figure 4.34 TexasInstruments C54x DSPhost port interface,HPI

the interface uses 8 data lines, 2 address lines, a read/write line and data latch and strobe.Additionally there is an interrupt connection by which the DSP can request new datafrom the host. The host can also interrupt the DSP although no specific host – DSP inter-rupt line is provided, instead it is automatically generated when the host writes new data.

The two address lines allow the host to directly communicate with each of the HPI regis-ters. The three HPI registers are as follows: HPIA, which is the data memory address register,HPIC, the control register and HPID the data register. Via the HPI, the host and the DSPcore can be configured to share a 2 Kblock of the DSPs dual access memory. The HPIA reg-ister is used as the address pointer into the dual access memory space. When the hostrequests a transfer of data, it will initially set the required memory address by loading theHPIA register with the appropriate hexadecimal number. The host then transfers the lowand high data bytes into the predefined memory location. The internal arrangement of theHPI on the C54x DSP can be seen in Figure 4.35, where the data and address interface andshared memory areas are shown. The DSP core and the host interface can simultaneouslyaccess the shared memory space without concern for possible bus contention problems.

In this section we shall consider the operation and configuration of the four remainingon-chip peripherals, namely the system clock, timer unit, JTAG scanning logic and thepower management module. All of these peripherals are independent blocks that performa crucial role within any DSP system.


HPIcontrollogic

HPI memory block

AddressData

MUX

MUX

DSP address

DSP data

HD(7–0)

Interfacecontrolsignals

Address register

HPIcontrolregister

Host port interface

16

16

8

8

16

Data latch

Figure 4.35Simplified internalblock diagram of theTexas InstrumentsC54x DSP host portinterface

4.3 System management and control

A fundamental component of any real time digital signal processing system is its clock. Ifa DSP system does not have a well defined system clock that is rigidly stable with varia-tions in temperature and time then all other aspects of system performance will ultimatelysuffer. Most DSP devices make use of a crystal based external clock source against whichall system events are synchronized. Ideally the frequency of the external clock should bechosen to be as low as practically possible because high clocking rates tend to increase theoverall power requirements and electromagnetic radiation of the system. Unfortunatelythere is also a requirement for high clocking rates so that the achievable processing speedis high enough for the most demanding applications. For these reasons all DSP devicesincorporate quite a complex on-chip clock generator that can be configured in order tosuit different applications. At the heart of any DSP clock generator is the phase lockedloop, PLL, circuit which can be used to multiply the frequency of an external clock sourceby a user specified PLL factor. The benefit offered by the PLL approach is that it facilitatesa wide range of clocking modes which can be derived from a single external oscillator.Also, because the frequency of the clock inside the DSP chip can be multiplied up to beconsiderably higher than the external clock the potential for electromagnetic interferenceand clock contamination is greatly reduced. A block diagram of a PLL circuit is shown inFigure 4.36 followed by a brief explanation of its operation.

Figure 4.36 shows the basic configuration of a PLL circuit in which the main compo-nents are the VCO, filter, phase detector and programmable divider. In the diagram,example frequencies and a divider ratio are shown. These will be used in the followingdiscussion. Two signals are applied to the input of the phase detector circuit. The firstsignal is the external clock which is usually based on a very stable crystal oscillator and thesecond input signal is a feedback signal taken from the programmable divider. The pro-grammable divider is a simple logic circuit that divides an incoming clock by a set binaryratio, e.g. ÷2, ÷4, ÷8, etc. The higher the division ratio the higher the resulting PLL outputfrequency will be. The phase detector compares the two incoming signals and produces anoutput signal which is proportional to the difference between them. The proportional dif-ference signal or error signal is filtered and used to drive the voltage control input of theVCO. The term‘VCO’ in this case is used to describe a voltage controlled oscillator. If anerror signal is produced by the phase detector, indicating that the two input signals are not

System clock and PLL unit 4.3.1

System management and control 245

Programmabledivider

PLL multiplicationratio

FilterPhase

detectorVCO CLK_out

(40 MHz)

CLK_in

External crystallocked clock source

(10 MHz)

Figure 4.36 Phaselocked loop, PLL,circuit used toincrease the DSP’s on-chip clocking rates

similar, then the resulting VCO control voltage will force a frequency change at the VCOoutput. At some point the VCO output frequency will stabilize such that the output of thedivider circuit perfectly matches the external clock frequency and phase. The time takenfor the PLL to stabilize is known as the PLL lock time and will usually only last for a veryshort period at system start-up. Any drift at the output of the VCO will be immediatelycorrected by the resulting error signal produced by the phase detector. In the diagramexample values are given for the input clock (10 MHz), the divider ratio (÷4) and theresulting VCO output frequency (40 MHz).

In terms of its implementation on a DSP device, the system clock and associated PLLcircuit are usually controlled via external configuration pins or software or both. Theexternal configuration pins are used to select the PLL frequency multiplication ratio (i.e.the programmable divider value) at start-up. The software controllable PLL circuits areprovided with a register through which the multiplication ratio can be selected. Usuallythe PLL ratio and system clock settings are set during the start-up initialization and arenever changed thereafter.

On the C54x DSP two possible scenarios apply depending upon which device is used:some allow the PLL ratio to be set via hardware configuration pins while others make useof a programmable register. Here, we will consider the C548 because it incorporates bothhardware and software mechanisms. The hardware configuration of the PLL divisionratio, which is used as the default start-up value, is determined through the use of threeexternal pins labelled CLKMD1, CLKMD2 and CLKMD3. Using these pins the C548 DSPcan support up to eight different PLL settings, although the maximum PLL frequency islimited to 40 MHz. In addition to the PLL scheme already discussed, the C548 DSP incor-porates a programmable divider logic circuit at the VCO clock output. This allows for awider range of possible system clock frequencies.

Table 4.8 shows the PLL settings for different signal levels attached to the external clockmode, CLKMD1–3, pins.

On the C548 DSP the PLL circuitry can also be configured using software control. Thisallows a much wider range of control compared to that provided by the clock mode pins.Specifically the external clock can be multiplied by one of 31 ratios ranging from 0.25 to15 and divided by either 2 or 4. Software configuration is achieved using a simple memory


Table 4.8 TexasInstruments C548DSP clock circuitconfigurationsettings

Mode Select Pins

CLKMD1 CLKMD2 CLKMD3 Clock Mode

0 0 0 PLL × 3 with external source


1 0 0 PLL × 3 with internal source

0 1 0 PLL × 1.5 with external source

0 0 1 Divide-by-2 with external source

1 1 1 Divide-by-2 with internal source


0 1 1 Stop mode

mapped register called the clock mode, CLKMD, register. Figure 4.37 shows a diagram ofthe CLKMD register and an explanation of its use follows.

The CLKMD registers is a 16-bit memory mapped register that can be written to justlike any other memory-mapped register. The function of many of the CLKMD bit posi-tions are self explanatory, such as PLL ON/OFF. The main bit positions used forcontrolling the clock rate are PLLMUL and PLLDIV. These bit positions relate to the pro-grammable divider seen previously in Figure 4.36. The PLLCOUNT bit position is used todetermine the number of clock cycles taken before the PLL begins clocking the processor.

The on-chip timer unit is used to count events and generate interrupts accordingly.Usually the counter/timer can be used to count a number of system clock cycles and gen-erate a periodic interrupt which can be used as the basis for accurate sample rategeneration. The counter/timer is also useful when implementing a real-time operatingsystem or task scheduler which needs to share out processing time between a series of dif-ferent algorithms. Although it is quite simple to implement some form of counter/timerentirely in software, the associated overhead can be quite considerable. Also if a highlyaccurate sampling clock is required, the software approach is not desirable because it isdifficult to guarantee its performance at all times. The on-chip counter/timer is thereforea highly useful added feature of most DSP devices. In addition to counting the internalsystem clock many DSPs include an external connection pin that can be used to countexternal events, for example in an active speed control application the counter could beused to count pulses taken from a tacho generator attached to a motor. The output of thetimer counter is also often available at an external connection pin so that it can be used toprovide a regular sampling pulse that can be used by external A/D or D/A converter.

The block diagram of the timer used on the Texas Instruments C54x DSP is shown inFigure 4.38. This device uses three 16-bit memory-mapped registers to control the timeroperation. These are the timer register, TIM, the period register, PRD, and the timer con-trol register, TCR. The TCR is used to enable or disable the timer and to determine theoperating mode, i.e. whether it operates continuously or just counts once and stops. Thetimer period on the C54x DSP is controlled through the use of the PRD register and a 4-bit divider ratio specified in the TDDR bits of the timer control register. These are shownin Figure 4.39.

Initially, the timer PRD register and TDDR field of the control register are loaded withthe required count value. The PRD is 16 bits wide and the TDDR is 4 bits wide so thetimer has a total count resolution of 20 bits. When the PRD and TDDR have been loadedwith the required values, the timer is reset and enabled and down counting begins.Because the timer is driven by the system clock, every new system clock pulse will result inthe timer counter being decremented by one. At some point the pre-scaler, PSC, will be


PLLSTATUS

0

R

PLLNDIV

1

R/W

PLLON/OFF

2

R/W

PLLCOUNT

10–3

R/W

PLLDIV

11

R/W

PLLMUL

15–12

R/W

Figure 4.37 Clockmode control register,CLKMD, used on theC548 DSP

On-chip timer units 4.3.2

decremented to zero and a ‘borrow’ output pulse is generated. The borrow pulse is used tore-load the TDDR value back into the PSC and decrement the TIM count value by one.This process continues until the TIM count reaches zero at which point a timer interrupt,TINT, and timer output pulse, TOUT, will be generated. The PRD value is also re-loadedinto the TIM register and counting down begins again. The process repeats continuallyuntil the timer is reset or halted.

A very similar scheme to that employed on the C54x DSP is used on the C6xxx,although the timer has a higher resolution, the control register provides greater controlover the operating modes and the C6xxx device contains two timers. The C6xxx timer isbased on a 32-bit counter and it can be clocked from either an external or internal source.

A block diagram of the timer used in the C6xxx DSP is shown in Figure 4.40. Althoughit looks more complex than the previous example shown for the C54x DSP, the principleof operation is the same. A counter is loaded from a memory-mapped register with theperiod and then down counting is synchronized to the chosen clock source. As before,when the counter reaches zero, a timer interrupt is generated the counter period re-loadedfrom the memory-mapped register. The C6xxx DSP timer also provides some control overthe width of the resulting output pulse.

The JTAG test and emulation logic provided on many DSP devices is a means by whichaspects of hardware and software designs can be evaluated in-circuit and tested whilst anapplication is running in real time. JTAG actually stands for the Joint Test Action Group,which is a working group based mainly of leading electronic manufacturers who set out toestablish a common standard for in-circuit testing and emulation. The result was the IEEE


PSC

Borrow

TDDR

TIM

Borrow

PRD

TINT

TOUT

CLK

RESET

Loadenable

Loadenable

Figure 4.38 Blockdiagram of the timerused in the TexasInstruments C54x DSP

TDDR

3–0

TSS

4

TRB

5

Reserved

9–6

Free

10

Soft

1115–12

PSC

Figure 4.39 Timercontrol register, TCR,used in the TexasInstruments C54x DSP

4.3.3 Test and emulation logic (JTAG)

1149.1 standard which describes a set of rules by which a common test bus system willoperate. The idea of the test bus system is that complex components within a system canbe linked to a common bus through which they can be probed and tested whilst still in-circuit. This provides a simple mechanism by which modern systems containing acollection of highly integrated densely packed chips can be tested with relative ease. TheJTAG standard allows the user to force conditions within a device as required to perform aparticular test, so for example using the JTAG interface a user could force the logic level ofcertain connection pins on a device to specific values and subsequently observe the result-ing operation. This technique is often referred to as boundary scan, because the boundaryconditions, i.e. those of the external connection pins, can be monitored and controlled. Asubset of the JTAG standard also allows other aspects of a device’s performance and oper-ation to be controlled. Of particular interest to the DSP developer is the ability to use theJTAG interface when testing software. In this mode of operation the DSP device can besingle stepped, register values and memory areas monitored and forced and peripheral


Figure 4.40 Blockdiagram of the timerused in the TexasInstruments C6xxxDSP

devices configured. In Chapter 2 many aspects of DSP emulation were considered and inparticular the operation of DSP development tools referred to as Code Composer Studiofrom Texas Instruments. In fact Code Composer Studio is an example of a hosted softwarepackage that communicates with target hardware entirely through a fast JTAG emulationport. All aspects of software debugging, loading software, testing and uploading data tothe host are performed via the JTAG link. The JTAG support provided on the target DSPwill run in parallel to the operation of the DSP device so that, for example, the valuewithin a target register could be monitored without the need to halt the DSP itself. Thephysical connection between the host computer and the DSPs JTAG port is made via astandard 14-pin header as shown in Figure 4.41.

The header carries serial data between the target device and the host via the TDO, testdata out, and TDI, test data in, connectors. The test clock which is used to synchronise alldata transfers is carried on the TCK connection and a return clock signal is carried on theTCK_RET connection. Pin six acts as key so that the test cable cannot be installed incor-rectly and other connection pins such as TMS, EM0 and EMU1 are used to controlvarious operating modes of the interface.

The C6xxx DSP evaluation board, EVM, is provided with an embedded JTAG emula-tion system so that the DSP device, its peripherals and software operation can beevaluated and application software debugged. The standard JTAG connection from theDSP device is interfaced to an on-board test bus controller chip which then connects tothe host PC via the PCI bus. The test bus controller, TBC, chip is used to provide low-levelcontrol signals to the DSP device and relay information back to the PC host. In additionto the on-board TBC, support for a standard 14 pin header is also provided which willinterface to an industry standard test system. The C6xxx EVM JTAG test bus configura-tion is shown in Figure 4.42. A set of DIP switches and/or software switches determine theTBC source for the EVM, i.e. whether the on-board or external TBC is used. To the PChost side, the on-board TBC provides memory-mapped control of the JTAG interface.This appears, to the PC, as 24 addressable 32-bit memory locations through which testdata and control signals can be passed. The flexible two-way approach, i.e. on-board orexternal, allows the EVM board to operate inside or outside the host PC. The benefit ofoperating inside the PC, using the on-board emulation logic, is that no additional devel-opment support hardware is required and there is no need for a boot ROM duringstart-up because boot can be downloaded via the PC interface.


Figure 4.41 Standard14-pin JTAG interfaceconnector

DSP devices, and any other device for that matter, dissipate the most power when switch-ing from one logic state to another. In the case of a microprocessor or DSP switching is acontinuous process and is driven by the system clock. With complex DSP devices switch-ing at high clock rates, the associated power dissipation can be a considerable portion ofthe system’s power budget requirements. For this reason many DSP manufacturers pro-vide methods for placing a device into a low power state where the clock is effectivelydisconnected from certain parts or all of the DSP device. During this low power sleepingstate the operating context of the device is preserved because the power supply remainsconnected all of the time. By operating context we are referring to the processing state ofthe device, i.e. the conditions within the registers and memory areas. When no switchingtakes place within the device, the power consumption is reduced to a minimal level. Oftenthe sleeping low power state can be initiated by either a software command or by togglingan external ‘hold’ pin. Obviously when the device is sleeping and no clock source is active,no useful processing work can take place. In order to wake up the device and bring it outof the low power state, the usual procedure is to provide an interrupt event usuallythrough an external interrupt pin or via a host/serial interface. When the device receivesthe interrupt it will enable its clock source and carry on with normal processing. Whendesigning for low power applications such as those requiring battery operation, the sleep-ing state can be used during IDLE periods where no processing function is needed.


Figure 4.42 TexasInstruments C6xxxEVM board flexibleJTAG interface

Power management unit 4.3.4

Quite frankly, there is no point getting too excited about DSP if you cannot easily get sig-nals into or out of the device. There is equally little value in DSP if, in the process of signalconversion, the information is noticeably corrupted (distorted). Further, if the cost ofsignal conversion is too great, the power consumption too high, or the board space toolarge, you might as well put this book to one side and pick up a good novel instead.Fortunately, for the majority of applications, none of the above holds true, which is obvi-ously a big factor in the success or DSP today.

Most signals currently processed by DSP are analog in origin or destination, e.g.speech, music, radio waves, pictures. Some may be digitized in advance (e.g. CD or MP3files) prior to reaching your DSP. Others you will need to convert yourself.

For those cases where the task of getting signals into of out of the DSP rests squarely onyour shoulders, then this section is for you. Apart from issues of cost, power consumptionand chip size, where you are at the mercy of the manufacture, performance parameterssuch as dynamic range, distortion, sampling rate and SNR can be optimized to a greater orlesser extent by your own designs and device selection.

Before we can unpack the treasure chest of analog to digital and digital to analog con-verters, we, however, first need to do a bit of lateral thinking about ‘sampling rate’.

Let us tackle the big question first – what sampling rate to use? There is a simple answerand a not so simple answer. The simple answer is you need to sample the analog signal at arate at least twice the value of the highest input signal frequency – one of the manyShannon theorems. Shannon however stuck his neck out a bit further and stated that allthat was actually needed was to sample at a rate greater than twice the bandwidth of theinput signal. This is called the:

2 × bandwidth rule

This distinction between 2 × frequency and 2 × bandwidth is very important as dis-cussed below.

Nyquist, who is usually credited with the sampling theorem, went on to say that if samplingoccurs at less than twice the signal bandwidth, then a phenomena called aliasing will occur.

To help explain the issues surrounding the choice of sampling rate and how it affects theoperation of DSP so fundamentally, consider an analog input signal with time waveformshown in Figure 4.43. This signal is ideally sampled by the analog to digital conversionprocess at regular intervals, with the value of the waveform at each sampling instant con-verted into a digital number for subsequent processing by the DSP.

4.4 All the analog bits and pieces (i.e. ADC, DAC, anti-aliasing, over-sampling, etc)


4.4.1 Setting the sample rate

4.4.2 Baseband sampling

Because the word length of the digital number is finite, it can only ever represent a finitenumber of discrete analog levels. If the digital number corresponds to a level close to theactual sample value, then the sample (quantization) error is small. If the digital numberdoes not closely match the input level, then the quantization error is high. The greater theword length used to represent the samples, the greater the set of possible sample valuesavailable and the smaller the quantization error.

The impression given by the sampled section of the waveform in Figure 4.43 is that, ifyou were to join up the dots, you could faithfully recreate the waveform shape – in otherwords, the impression is that the sampling rate is high enough to capture the fastestchanges in the waveform and as such satisfy the Shannon/Nyquist criteria.

Take a look at Figure 4.44, however, which shows what actually was happening inbetween the sample periods for that portion of the waveform. Clearly, in this fictitiousexample, the sampling rate was nowhere near high enough to represent accurately theinformation, and the waveform that you might have predicted from Figure 4.43 would infact have been an ‘alias’ of the true signal.

Let us now revert to sine waves to help understand the sampling process further. Figure4.45 shows a sine wave with frequency of f1 = 1 MHz, sampled at a rate of fs = 6f1 = 6MHz. The process of sampling is represented by multiplication of the input signal with astream of impulses of fixed height (see Section 9.3.5) for theoretical backup!).

The spectrum of the analog input consists of a single spectral line as expected, whilstthe spectrum of the impulse train in the time domain is a set of equal height discretespectral lines in the frequency domain, spaced at the sampling frequency (rate) of 6 MHz.The output of the multiplication (sampling) process thus has a spectrum consisting ofcomponents spaced at the sum and difference frequencies between the input, f1, and all ofthe sampler spectral lines.

All the analog bits and pieces 253

Quantizationlevels

Analog inputWaveform samples

t

Figure 4.43 Samplingand Quantization ofan Analog Signal

Quantizationlevels

Analog input

t

Figure 4.44 Actualwaveform for Figure4.43 between samplepoints

Now, what happens if we wish to re-construct the original sine wave from the samplevalues. We somehow need to ‘interpolate’ between the sample points, ensuring that werecover a sine wave and not some other shape still matching the sample points. Using adigital to analog converter to turn the binary number to an actual sample value is theobvious first step (Section 4.6.1), and intuitively we know that if we then filter the DACoutput, we will smooth out the sample values to give a sine wave shape, Figure 4.46. Thisexample clearly demonstrates however that a pure sine wave will only be reproduced if allof the components of the sampled signal, except for those corresponding to the originalinput signal, are fully suppressed (stating the obvious really!). The filter following theDAC must therefore be a low pass filter (in this example) able to suppress componentswith frequencies ≥ fs – f1 = 5 MHz.

Now, consider the case where we reduce the sample rate for the sine wave to just 3f1 = 3MHz (i.e. 3 times the highest input signal frequency), Figure 4.47.


t

t

t

fs = 6f1

ff1ffs–f1 fs+f1

ffs 2fs 3fs

Figure 4.45 SampledSine Wave (fs = 6f1)

t

f

t

f

f1

fs–f1 fs+f1

fc > fs–f1

fc < fs–f1

ff1

DAC

DAC

t

fs = 6f1

Figure 4.46Reconstructed SineWave

Here, the spectrum of the sampled signal in much more crowded, with the separationbetween the line representing the input spectral component at 1 MHz now only a shortdistance from the next component at 2 MHz. Remembering now that for accurate recon-struction of the sine wave we need to filter out all but the 1 MHz component of thesampled signal, the low pass filter now needs to be much sharper (i.e. higher order). Notethat the filter which worked perfectly for the case where fs = 6f1 is now no longer adequatefor the job.

From here on, it is not a great step of reasoning to see that if we were to reduce thesampling rate still further to fs = 2f1, then the next highest component reduces in fre-quency to the point where they converge. This just happens to be the magic point wherethe sample rate (2 MHz) exactly equals twice the input frequency (1 MHz) – the limit atwhich the Shannon/Nyquist sampling criteria is satisfied. It is obvious that, as weapproach this limit, the ‘reconstruction’ filter needed on the DAC output becomes moreand more difficult to implement, eventually becoming an impractical ‘brick’ wall filter.

In a few simple steps, we have deduced that we must sample at a rate of at least twice thehighest input frequency in order to be able to accurately reconstruct the waveform fromthe samples taken. This however would be a misleading deduction as will hopefullybecome apparent from the following discussion.

Consider our 1 MHz input sine wave, now joined by two further sine waves 10 kHz lowerand higher in frequency at 0.99 MHz and 1.01 MHz, representing an AM (amplitude modu-


t

f

t

f

t

ff1

fs 2fs 3fs

fs–f1 fs+f1

fs = 3f1

Figure 4.47 SampledSine Wave (fs = 3f1)

Increasing the sampling rate above the minimum required (over-sampling) relaxes thespecification of the reconstruction filter.

HOT TIP

Intermediate frequency (IF) sampling 4.4.3

lated) signal. We know already that sampling this composite signal at 3 MHz presents noreal problems. Now observe what happens (Figure 4.48) if we sample at only 300 kHz!

We are now clearly sampling at a rate much lower than the highest input frequency:however, in the sampled output the three spectral components representing the inputAM waveform are still clearly present and distinguishable, albeit replicated many times.If we were now to output this sampled waveform through a DAC with a reconstructionfilter, selecting only the first set of three components, we would recover the waveform inFigure 4.49.

Whilst not at the same ‘carrier’ frequency as the original input, the shape of the signal(the AM information) is still very much intact. In fact, it is a relatively simple matter torecreate the original input signal by mixing this waveform with a carrier sine wave at 900kHz and filtering the upper sideband from the mixing process (Figure 4.50).

This simple example demonstrates that it is indeed possible to sample at a rate muchlower than the highest input frequency of a waveform and still get a faithful, reconstruc-tion of the original waveform from the sampled version. This is because the bandwidth ofthe input signal was only 20 kHz (much lower than the sampling rate of 300 kHz), and theShannon/Nyquist criteria are not violated.


t t

tfs = f1/3

ffs 2fs 3fs

ff1 f

Figure 4.48 IFSampling Example

t

f ff1

t

Original input Reconstructed output(subsampling)

Figure 4.49Reconstruction of IFSampled Waveform atBaseband

This process is often referred to as ‘under-sampling’ which rather implies that theShannon/Nyquist criteria is being violated. In fact, we know this is not the case since it isthe sampling rate relative to the input signal bandwidth that matters and not the inputfrequency itself. We shall therefore endeavour to refer to this process as IF sampling orsub-sampling.

To leave the discussion on sampling at this point would be wholly inappropriate, as wehave not adequately addressed the issue of aliasing of the input signal – or rather how toprevent it! We have seen that for correct reconstruction, we need to comply with the 2 ×bandwidth rule. This also holds for digitization.

Sticking with the digital IF example, consider now our original AM input signal,together with a fourth input sine wave located at 1.5 MHz (Figure 4.51).

Again, sampling at 300 kHz, we now see that the two sets of input signals are now nolonger distinguishable and it would be impossible to separate one from the other in areconstruction process. This is because we have now violated the 2 × bandwidth rule(input bandwidth = 1.5 MHz – 0.990 MHz = 510 kHz). In order to avoid this ‘aliasing’problem, we must restrict the input bandwidth by pre-filtering the signal to a bandwidthof less than fs/2, selecting either the AM signal centered on 1 MHz, or the sine wave cen-tered on 1.5 MHz if this is of interest (Figure 4.52).

As was found valid in the reconstruction process, the specification for the anti-alias filteris more relaxed if the sample rate is not paired down to the absolute minimum. A 3 × band-width factor is often used by practicing engineers as a good compromise. In some caseshowever, ‘over-sampling’ by very large amounts is employed to dramatically simplify therequirements for anti-alias and reconstruction filters as described in the next subsection.

Just before we leave this subsection, a brief note on noise. From the idealized portrayalof input and output signals thus far, you could be forgiven for assuming that anti-alias


t

ff1f

t

t

f

Figure 4.50 Ability torecover original IFsignal from basebandwaveform

A major system benefit of IF sampling in many applications is the inherent down-conversion of the input signal frequencies to the baseband space (see Section 5.10).

HOT TIP


t

f

t

f

t

ff1

fs 2fs 3fs

Figure 4.51 Digital IFExample Violating the2 × Bandwidth Rule

t

f

t

f

t

f

t

ff1 f1

fs 2fs 3fs

Figure 4.52 UsingPre-Filtering (Anti-alias Filtering) toComply with the 2 × Bandwidth Rule.

Zone 1 Zone 2 Zone 3 Zone 4 Zone 5 Zone 6 Zone 6

fs/2 fs 3fs/2 2fs 5fs/20 f

Figure 4.53 NyquistZones for IF Sampling(Sub-sampling)

filters would not be necessary if the input was known not to contain any components out-side the 2 × bandwidth zone. If there was truly nothing at all lurking there, then fine, butdo not forget the noise. Figure 4.53 shows the zones (often called Nyquist zones), whereenergy will be reflected by the sampling process into the baseband (DSP) zone. Any noiseor spurious components within these zones will fold back on top of the wanted signal andcan easily add up to a significant level of unwanted and avoidable corruption of your care-fully sampled waveform.

In the previous section, we established that sampling at a rate greater than twice the inputsignal bandwidth was necessary both to avoid aliasing at the input and reconstructionproblems at the output. We also saw that increasing the sampling rate beyond the bareminimum would greatly relax the specification on these external analog filters.

A major application where massive over-sampling has been effectively employed is indigital audio processing. The sound card in every new computer uses a minimum of 64 ×bandwidth sampling (referred to as 64 times over-sampling) in order to simplify the spec-ification and cost of the analog filters and ensure that they introduce negligible distortioninto the sampled or reconstituted hi-fi signal. Figure 4.54 compares the filter specificationfor a 3 × bandwidth digital audio solution and the 64 × bandwidth approach.

The anti-alias and reconstruction filters in the former case would need to be perhaps12th-order Bessel filters, requiring multiple op-amps for realization. Even then, the gain andphase response of the Bessel filter in the audio passband will not be entirely flat, sending jit-ters down the spine of the connoisseur classical music enthusiast. In contrast, the filter forthe 64 × bandwidth approach can be realized using a single resistor and capacitor.

Over-sampling is great news for the analog interface engineer, who, having designed in hisresistor and capacitor, has gone home early. What about the poor DSP engineer who isnow presented with a set of samples to process, converging on his DSP input port at a rateseveral times greater than they really need to be. Fortunately, the answer is elegantlysimple – throw them away (Figure 4.55). OK, this is perhaps a bit too simple as the 2 ×bandwidth rule must not be violated even within the DSP domain. However, with thejudicious use of one or more digital filters to keep Shannon and Nyquist happy, the task ofsample rate reduction (decimation) is easily achieved. To find out more you need to get


Pay particular attention to spurious clock components which could alias into thebaseband zone. Often these are coupled onto the input of the ADC from neighboringPCB tracks, exasperated by poor earthing design. The problem is particularly importantif the coupling occurs after the anti-alias filter as the pickup has a clear path to thebaseband zone. Tying system clocks to the sampling rate (or visa versa) can greatlyease this problem.

HOT TIP

Over-sampling 4.4.4

Interpolation and decimation 4.4.5

the other side of the digital filter learning curve or simply jump to the discussion on deci-mation in Toolbox II (Section 6.9.9.1).

Equally, with the analog engineer having specified that the output sample rate must bemany times that need for your DSP activity, how do you up the sample rate? The answeragain is trivial – simply insert lots of zero samples at regular intervals (interpolation) untilthe required output sampling rate is reached. As with decimation, a sprinkling of digitalfilters will ensure no violation of the 2 × bandwidth rule (Toolbox II, Section 6.9.9.2).

For all the wireless (radio) engineers out there, just a brief word on using the samplingprocess to facilitate up- and down-conversion of signals.

We have seen above that sampling at a rate below the input frequency does not neces-sarily violate the 2 × bandwidth rule and will in fact result in a mixing down of the inputsignals to a lower frequency which is the difference between the input frequency and theclosest multiple of the sampling rate. Careful selection of the sampling rate means that it


Zone 1 Zone 2 Zone 3 Zone 4 Zone 5 Zone 6 Zone 6fs/2 fs 3fs/2 2fs 5fs/20 f

Zone 1 Zone 2

fs/20 f

f1

f1

fs = 64f1

fs = 3f1Figure 4.54 3 ×Bandwidth vs 64 ×Bandwidth over-sampling

Decimation

Figure 4.55 Adjustingthe Sample Rate – ASalesman’s View

4.4.6 Up- and down-conversion

is possible to use the sampling process as a replacement for the local oscillator and mixertraditionally found in a receiver system.

Is there a catch? Yes! The analog to digital converter must be capable of handling sig-nals at the input signal frequency without introducing excessive distortion, loss and noise,even though it will only be performing conversions at a much lower rate. This sometimesbrings a cost, size and power consumption penalty.

To achieve up-conversion from sampling, take another look at the output of the DACfor a sampled sine wave (Figure 4.56).

If we put a bandpass filter on the output, rather than the low pass filter previously dis-cussed, we can select an up-converted version of the same waveform. Again, the provisohere is that the DAC is able to produce these signals at this frequency without adding toomuch noise and distortion.

A comprehensive discussion of alternative ways of achieving up- and down-conversionis given in ToolBox I (Section 5.10) and also picked up later in this chapter (Section 4.8).

In the world of block diagrams and salesmen, the process of ‘getting signals in’ is simplyachieved by connecting an analog to digital converter to the DSP (Figure 4.57). (Youmight even be tempted by the option of a DSP with inbuilt converter.) There appears tobe no need for pre-filtering of the signal or for logic to interface the ADC to the DSP. In arelatively small (but thankfully growing) number of cases this could be true. Both TI andAnalog Devices for example now market converters with ‘zero glue logic’ interfaces totheir own DSP devices. For most cases, however, some logic interfacing is needed andcareful study of the data sheets and timing diagrams is advised.

Getting signals in 261

f

t

f f1fs–f1 fs+f1

DAC

t

Figure 4.56 UsingAliasing for Up-conversion

Getting signals in 4.5

ADC DSP

t

Figure 4.57Simplified Analog toDSP Interface

ADC’s come in almost as many flavors as ice creams, and at least as much care is needed inchoosing the former as is required with the latter.

A popular and readily understood type of ADC is the Flash ADC (Figure 4.58). This iscapable of very high speed conversion and thus can accommodate high sampling rates,but in its basic form is very power hungry.

The flash converter operates by simultaneously presenting the input signal to a bank of2N–1 comparators, whose reference voltages are set by a resistor chain to exactly corre-spond to all of the possible sample levels which can be represented by the converter. Theoutput from each comparator (either a 1 or a 0) is then encoded into an N-bit word repre-senting the input sample level. This approach is the most simple, most intuitive and alsothe fastest solution for ADC implementation. For large numbers of bits (e.g. >14 bits), thenumber of resistors needed (2N–1) becomes prohibitively large for most practical applica-tions. Also, the power consumption is considerably higher than for some of the slightlyslower and more exotic solutions.

4.5.1 Analog to digital converters


Analoginput

Referencevoltage

Enco

ding

logi

c

N

Clock

Digitaloutput

Figure 4.58 FlashADC Block Diagram

Figure 4.59 shows the output spectrum of a typical flash converter as determined bytaking an FFT (Fast Fourier Transform – ToolBox III, Chapter 7) of the converter outputsamples for a pure sine wave input.

One thing is immediately apparent. The spectrum does not simply consist of the pureinput sine wave component, but also has a mass of other components spread throughout themeasurement band. These largely arise from the inevitable quantization error (noise) alludedto in Section 4.4, where the converter is trying to represent the analog input level from a finitenumber of available sample values (dictated by the number of bits in the ADC).

For our test full scale sine wave, the effective signal to quantization noise ratio as afunction of converter resolution can be readily determined (with a few basic assumptions)to be (Ref. 4.8)

SNR = (6.02N + 1.76)dB

where N is the number of bits for the converter.The converter resolution for the ADC generating the plot in Figure 4.59 is 12 bits,

giving a theoretical SNR of 74 dB. Looking at the plot, the difference in levels between thesine wave component and the individual noise components is much nearer 104 dB. Thereason for the difference between these two values is that the SNR formula refers to thewhole noise contribution, comprising the sum of all the individual noise componentsmaking up the FFT. The reason for pointing out this feature is it can be used to increasethe effective resolution of a converter by trading off sampling rate as shown below.

Assume that we need to achieve a minimum 70 dB SNR in the conversion process for agiven audio application. The formula above suggests a minimum of 12-bits converter res-olution is required (full scale sine wave input and ideal converter), where the noise withinthe whole band from 0 Hz to fs/2 is included in the measurement. Now, if we employ forexample 8 × bandwidth sampling (Figure 4.60), we see that the actual audio signal only


f

|X|

74 dB

104 dB

Figure 4.59 FFT ofTypical ADC Samples

Over-sampling to achieve processor gain 4.5.2

occupies 1/4 of the baseband (0 to fs/2) space whereas the noise is spread uniformly acrossthe band (again, a bit of a simplification). If we were to now digitally filter this sampledsignal, we could remove approximately 3/4 of the noise, increasing the signal to noise ratioby a factor of 4 or 6 dB. This effective increase in SNR is termed the processing gain,achieved by over-sampling the input relative to the 2 × bandwidth rule.

A simple formula for the maximum processing gain can easily be derived:

processing gain (dB) = 10 log(sample rate/ 2 × signal bandwidth)

where it is assumed that digital filtering is employed to restrict the sample bandwidth toexactly match the wanted input signal bandwidth and that the noise is uniformly distributed.

Thus, if we use a 128 times over-sampling design (typically found in minidisc recordersand PC sound cards), we can achieve a real 18 dB improvement in signal to quantizationnoise, effectively increasing the resolution of the converter from 12 bits to 15 bits.

By way of an example, consider using this method to improve the performance of adata converter within a digital cellular phone. The signal bandwidth for a GSM cellularchannel is 200 kHz. It is now possible to obtain high speed ADC’s with a sampling rate of80 MSPS and 14-bit resolution, giving an impressive measured 75 dB SNR over the full 0to fs/2 bandwidth. The processing gain possible is thus:

processing gain (dB) = 10 log (80 000 000/2 × 200 000) = 26 dB

resulting in a very respectable 75 + 26 = 101 dB SNR for the sampled GSM signal.

This concept of processing gain leads us nicely into the topic of sigma–delta converters.These take the notion of processing gain to the extreme to achieve very high performance,low cost, low power devices ideally suited for audio applications, with simple analog inter-facing (little or no anti-aliasing filters – Section 4.4.4).

A block diagram of a basic sigma–delta converter (again of the salesman variety forease of explanation) is shown if Figure 4.61.


f

|X|

ADC Decimation

Digitalfilter

Quantization noisesuppressed by digital filter

Figure 4.60Over-Sampling andQuantization Noise

4.5.3 Sigma–delta converters

The converter is essentially a highly over-sampling 1 bit ADC (the comparator) followedby digital filtering and decimation to realize the processing gain. The effective perform-ance of the converter is greatly enhanced by the addition of circuitry to shape thequantization noise such that, instead of being uniformly spread throughout the 0 to fs/2band, it is minimized in the band of interest (Figure 4.62).

For a typical 128 × bandwidth over-sampling system, the processing gain alone wouldgive an extra 3 bits of effective resolution (i.e. a 4-bit converter). Noise shaping howevercan extend this effective resolution much further, with some sigma–delta converters nowachieving 24-bit accuracy for audio band applications. The modern converters use a muchmore sophisticated form of noise shaping processing than that shown in Figure 4.62,which is a simple first-order sigma–delta design, but the basic principle of exploiting pro-cessing gain with noise shaping remains the same.

In fact, there are many tens of ADC methods in use, Successive Approximation,Multipass, Interpolating, Subranging, Bit-Per-Stage, to name but a few of the flavors, eachpotentially having some benefit in performance over its rivals. Luckily, it is not usuallynecessary to understand how the ADC works in order to make the correct choice of con-verter for the application. Instead, careful study of the performance specifications on thedata sheet will determine the best choice for your application.


DAC

Digital filterand decimation

Clock

Integrator

1–bit ADC

N bitsAnaloginput

Figure 4.61 BasicSigma–DeltaConverter

f

Digital filterresponse

Unshaped quantizationnoise

1st order sigma–deltanoise shaping

Nth order sigma–deltanoise shaping

Figure 4.62Quantization NoiseShaping inSigma–DeltaConverters

Before leaving the section on ADCs, it would be unfair not to mention the role of the muchignored sample and hold (track and hold) component in successful operation of the conver-sion process. Successful analog to digital conversion requires the analog signal value(sample) that is next in line for digitization to be held constant whilst the conversion calcu-lation is taking place. Enter the S/H (T/H) device. This device commonly works by using theinput voltage/current to charge up a capacitor until it reaches the sample value, with thecapacitor then holding the charge (hopefully) whilst the conversion process is performed.This is a very crucial task. The time taken for the capacitor to charge to the correct samplevoltage places an upper limit on the maximum sample rate for the whole converter. Theaccuracy with which the capacitor holds the charge during conversion directly impacts theaccuracy of the digitized sample (the number of bits resolution). The noise, distortion, anddynamic range properties of the S/H all set limits on the ADC performance.

In the early days of ADC operation, the S/H device was a separate component andcould be selected independently of the ADC. In the interests of ease of design, cost andpower consumption, the S/H is now included within the ADC chip, and again, knowledgeof the operation of a S/H becomes a matter of academic interest only and its performanceis subsumed within that of the whole ADC on the data sheet specs.

With converters frequently optimized to application, it is not possible to list all convertertypes and devices here. Table 4.9 gives a summary of current ADC performance for thetwo dominant sectors of the market: audio codecs, with obvious application in consumerhi-fi products, and high-speed converters, with growing application in video and wirelessreceiver applications.

4.5.4 Sample and hold devices (track and hold)


4.5.5 ADC device availability

Table 4.9 Summaryof Typical ADCDevices

A/D Converters

Sampling Rate Resolution S/Nq Max Input Frequency Power

Audio ADCs

96kSPS 16 78dB 48kHz 40mW



High Speed ADCs

400MSPS 8 bits 43dB 1GHz 3W

250MSPS 10 bits 56dB 400MHz 2W

120MSPS 12 bits 70dB 350MHz 1W

80MSPS 14 bits 77dB 300MHz 1.2W

65MSPS 14 bits 75dB 500MHz 0.6W

The table clearly shows the trade offs between speed and resolution, and between reso-lution and power consumption that is always present to tax the designer. Up-to-dateinformation on converter performance and availability can be found on theDSPStore.com™ web site.

Attaching a digital to analog converter to your DSP is the basis of getting signals out, butas we have discovered in the sampling discussion, there is a lot more to the design processthan that.

Figure 4.63 shows the components that make up the DAC chain. In addition to the con-version of the digital word to a discrete voltage or current level, a zero-order hold is usedto ‘hold’ the signal level until the next update. To smooth out the recovered waveform,reconstruction filtering, meeting the criteria given in Section 4.4.1 is needed.

We have seen that over-sampling on the output (providing samples more frequently thanis strictly necessary to satisfy the Shannon/Nyquist 2 × bandwidth rule) can greatly reducethe specification of the reconstruction filter. We shall again consider just two of the manyvarieties of DAC architectures.

Figure 4.64 shows the components of the most straightforward DAC, which essentiallyoperates as the reverse of the flash ADC described earlier.

Getting signals out 267

Getting signals out 4.6

Digital to analog converters 4.6.1

N-bitsamples

t t t

DACZero-order

hold

Figure 4.63 DACOutput Chain

The zero-order hold process introduces a small error into the frequency response ofthe output of the DAC, giving it a sin x/x or sinc weighting. This can be overcome byimplementing an ‘inverse sinc’ digital compensation filter prior to the DAC chain asdiscussed in Toolbox I (Section 5.4.5). Some DAC devices have an in-builtcompensation filter so check the data sheet carefully.

HOT TIP

The resistor divider DAC 4.6.2

A known reference voltage is applied across a chain of 2N–1 resistors such that the voltagemeasured at any of the tap points exactly corresponds to one of the 2N available outputsample values for an N-bit converter. By switching in the tap point that corresponds to thevalue of the digital word to be converted, a basic DAC operation is achieved.

The over-sampling approach with a 1-bit DAC is just as effective for output conversion asthe 1-bit ADC with over-sampling and noise shaping is for input conversion.

A simple sigma–delta DAC is shown in Figure 4.65.The first element in the sigma–delta DAC is the interpolation process, which inserts

zero samples between each valid sample to realize the increase in data rate, together with


Referencevoltage

Switc

h de

code

r

N

Digitalinput

Analogoutput

Figure 4.64 BasicResistor Divider DAC

4.6.3 Sigma–delta DAC

digital filtering to perform part of the signal reconstruction (see Section 4.4 and ToolboxII, Section 6.9.9). The digital sigma–delta modulator performs a shaping function on thequantization noise, such that in the final reconstructed output, most of the quantizationnoise is pushed out of the band of interest.

The 1-bit DAC is basically a switch, selecting typically a 0 or positive voltage reference level(single supply operation), operating at the very high sampling rate. The analog filter smoothesout the transitions on the output to yield a continuous and high fidelity output waveform.

With converters frequently optimized to application, it is not possible to list all convertertypes and devices here. Table 4.10 gives a summary of current DAC performance for thetwo dominant sectors of the market – audio codecs, with obvious application in consumerhi-fi products, and high speed converters, with growing application in video and wirelessreceiver applications.

The table clearly shows the trade offs between speed and resolution, and between resolution and power consumption that is always present to tax the designer. Up-to-date information on converter performance and availability can be found on theDSPStore.com™ web site.

Getting signals out 269

Interpolationfilter

Sigma–deltamodulator

1-bit DAC

Analogreconstruction

filter

N bits at fs N bits at k.fs 1 bit at k.fs

Analogoutput

Figure 4.65 BasicSigma Delta DAC

DAC device availability 4.6.4

Table 4.10Summary of TypicalDAC Devices

D/A Converters

Sampling Rate (settling time) Resolution Dynamic Range THD Power

Audio DACs

192kSPS 16 95dB 90dB 40mW

192kSPS 24 120B 100dB 70mW

High Speed DACs

500MSPS 10 bit 80dB 250mW



With a large number of DSP applications (particularly audio products) requiring bothADC and DAC conversion, usually in stereo, it is not surprising that the manufacturershave produced single-chip devices incorporating multiple DAC’s and ADC’s on the samedevice. These all use sigma–delta conversion technology which has the big advantage oftaking up a comparatively small amount of silicon area with low power consumption, yetoffering high resolution.

Figure 4.66 gives an example of such a converter manufactured by CrystalSemiconductors (Cirrus Logic). The device boasts 2 × 24 bit ADCs, and 6 × 24 bit DACs,all with independent digital volume control. These combo converters represent excellentvalue for money for the number and resolution of channels provided, and the ‘stereo’facility also makes them well suited for quadrature processing in many other DSP applica-tion including wireless, (see Section 4.8).

Table 4.11 gives a summary of current combined ADC/DAC audio converters.Up-to-date information on converter performance and availability can be found on the

DSPStore.com™ web site.

4.7 Getting signals in and out


Digital volume ∆Σ DAC #1






Dig

ital f

ilter

s w

ithde

-em

phas

is

Seria

l aud

ioda

ta in

terf

ace

Anal

og lo

w-p

ass

and

outp

ut s

tage

AOUT1

AOUT2

AOUT3

AOUT4

AOUT5

AOUT6

LRCKSCLK

SDIN1SDIN2SDIN3

SDOUT

Dig

ital f

ilter

s

Left ADC

Right ADC AINR+

AINL+

AINR–

AINL–

AGNDDGNDMCLK

Clock manager

Mute controlControl point

SCL/CCLK SDA/CDIN AD0/CS MUTEC RST VD VL VA

FILT

Figure 4.66 CS4228 multi DAC/ADC Audio Converter

There are many applications, wireless and optical communications in particular, where itis necessary to process analog signals at a frequency many times higher than the baseband(0 to fs/2) range where the DSP optimally manipulates the waveform samples.

Consider a mobile phone receiver operating at a carrier frequency of 1.8 GHz, with chan-nels spaced at 200 kHz intervals (Figure 4.67). Whilst it would be nice to be able to sample(digitize) the 1.8 GHz RF signal directly at the antenna (Figure 4.67(a)) the performanceof current ADC technology comes nowhere near to achieving this with any sensible reso-lution or practical power consumption (see Table 4.9 for current ADC specifications). It istherefore necessary to ‘down-convert’ the RF signal using analog components to an inter-mediate frequency (IF) range where digitization is feasible. One option is to use a directconversion of ‘zero-IF’ design (Figure 4.67(b)) where the RF signal is mixed down to twoquadrature components with the channel information centered about 0 Hz, and band-width extending from –100 kHz to 100 kHz. This approach allows the digitization tooccur close to the minimum sample rate (2 × bandwidth = 200 kHz), where converter costand power consumption is usually optimized.

An alternative approach (Figure 4.67(c)) is to insert an additional analog mixing stagewhich can have benefit in terms of additional filtering and gain control. Here, the digitiza-tion still takes place at baseband, but usually the specification of the converters(resolution, dynamic range, etc) can be relaxed compared with the full zero-IF approach.

The third option (Figure 4.67(d)) is to digitize at a non-zero IF, performing the quad-rature down-conversion process in DSP rather than with analog processing. In the digitaldomain, the conversion process can be made near perfect with almost zero quadraturegain or phase imbalance. (This topic is discussed in more detail in Toolbox I, Chapter 5.)

Due to current ADC performance limitations, the sampling rate for this type of designis typically set within the range 20 MSPS to 100 MSPS. In some designs, sub-sampling (cf.Section 4.4), is employed such that the analog input frequency at conversion can be ashigh as 1 GHz with current technology.

Conversion of the digitized IF signal to baseband is a simple process of multiplying thewaveform samples with samples of a sine and cosine wave representing the required channelfrequency (a few lines of code!). This gives two sets of samples representing the received

Digital up- and down-conversion 271

Table 4.11Summary of ComboADC/DAC AudioConverters

Combo ADC/DAC Audio Converters

Sampling Rate Resolution No of Channels Dynamic Range Power

96kSPS 16 bits 4 × ADC, 4 × DAC 77dB 200mW

96kSPS 24 bits 2 × ADC, 6 × DAC 120dB

Digital up- and down-conversion 4.8

Digital down-conversion 4.8.1

signal, with a sampling rate many times higher than needed to represent a single 200 kHz widemobile phone channel (e.g. a sample rate of 80 MSPS, with only 200 kSPS minimum requiredfor each quadrature baseband stream). A few stages of decimation and filtering (see Section4.4.5 and Toolbox II, Section 6.9.9.1) soon do away with the unnecessary samples, to give therequired digital version of the mobile phone signal, ready for all that sophisticated demodula-tion, equalization, decoding, decompression, etc, that is the bread and butter of DSP.

Because this function of IF digitization, quadrature mixing and decimation is socommon in modern receiver designs, a number of dedicated DSP devices have beenrecently developed which are optimized for this task. Designed to sit sandwiched betweena high-speed ADC and a more general purpose DSP, they take the load off the latter bypresenting the general purpose DSP with samples already packed in quadrature form andwith sample rate reduced to a minimum needed to represent any given mobile channel (ofwhatever other application you care to choose!).


ADC DSP(a)

ADC

(b)

ADC

DSP

ADC

(c)

ADC

DSP

ADC(d) DSP

200 kHz

1.8 GHz f

Figure 4.67 ReceiverArchitectures andDigitization Options

Figure 4.68 shows the block diagram for one of these devices manufactured by AnalogDevices. The AD6624 in fact provides a four channel implementation of the digital down-conversion process, typically designed for use in mobile phone base-stations, where largenumbers of channels need to be detected. In this case, each digital oscillator is tuned togenerate an individual channel. The digitally generated sine and cosine wave samples canbe set with fractions of a hertz resolution, and the decimation filters programmed tooperate with a wide range of input signal bandwidths and shapes.

Table 4.12 gives a snapshot of the digital down-conversion devices on the market. Forup-to-date information, check out the web site.

Digital up- and down-conversion 273

Built inSelf Test Circutry

JTAG InterfaceExternal Sync.

Circuit

RAMCoef.Filter

CIC5CRIC2

Re-samplerX

X

NCO

RAMCoef.Filter

CIC5CRIC2

Re-samplerX

X

NCO

RAMCoef.Filter

CIC5CRIC2

Re-samplerX

X

NCO

RAMCoef.Filter

CIC5CRIC2

Re-samplerX

X

NCO

SER

IAL

AND

MIC

ROPO

RT

INPU

T M

ATR

IX

24 bits20 bits18 bits16 bits

LIA-ALIA-B

IENA

INA[13:0]EXPA[2:0]

LIB-ALIB-B

SYNCD

INB[13:0]EXPB[2:0]

SYNCCSYNCBSYNCA

ILNB

Figure 4.69 AnalogDevices AD6624Digital Down-converter

Table 4.12 Selectionof Current DigitalDown-conversionSolutions

Digital Digital Demodulator Devices – Summary

Sample Rate – Input No of Channels Decimation Factor InputData Width

Harris 50216 70MSPS 4 up to 64,000 16

Gray Chip 4014 63MSPS 4 up to 64000 14

Analog AD6624 80MSPS 4 up to 131,000 16

Nat Semi CLC5902 52MSPS 2 up to 16,384 14

Not surprisingly, the same operation of digital converting DSP processed baseband signalsto an IF frequency prior to digital to analog conversion is viewed as highly beneficial fortransmitter design. In this case, the low sampling rate from the general purpose DSPneeds to be increased, using interpolation (Section 4.4) before being multiplied with thesine and cosine samples and output via the DAC.

Again, a growing number of dedicated devices exist to perform this task, an example ofwhich is the AD9857 shown in Figure 4.69. The AD9857 includes a 14-bit high speed DACon the IC, making it a very compact solution for digital up-conversion. Manufacturersalso make multi-channel up-conversion devices.

A list of current devices is given in Table 4.13. For up-to-date information, check outthe web site.

4.8.2 Digital up-conversion


Programmableinterpolator

MU

XINVCIC

DEM

UX

(4%)

Fixedinter-

polator

InverseCIC filter

14

14

I

Q

CIC(2% – 63%) M

UX

MU

X INVSINC

MU

X 14-BITDAC

Outputscalevalue

8

Quadraturemodulator

InverseSINC filter

AD9857

14

DDScore

CO

S

SIN

Clo

ck Inve

rce

SYN

C c

lock

DAC

clo

ck

Inte

rp c

lock

Inte

rp c

ontr

ol

Hal

f-ba

nd c

lock

s

Inve

rse

CIC

con

trol

Inve

rse

CIC

clo

ck

Dat

a cl

ock

Tuningword 32

Timing & controlControl registers

DAC_RSET

IOUT

IOUT

Power-downlogic

Profileselectlogic

Clockmultiplier

(4% – 20%)

Modecontrol

MU

X

REFCLK

REFCLK

Clockinputmode

PLLlock

PS0PS1Digitalpowerdown

Serialport

SYN

CH

SYSC

LK

CICoverflow

ResetTxENABLEPDCLK/FUD

Paralleldata in(14-bit)

Figure 4.69 AD9857 Digital Down-Converter

Digital Modulator Devices – Examples

Sample Rate – Output No of Channels Interpolation Factor Output Data Width

Harris 50415 100MSPS 2 up to 261,184 14 bits (inc 2 × 12 bit DACS)

Gray Chip 4114 70 MSPS 4 up to 64,000 16 bits

Analog Devices AD6622 65MSPS 4 up to 4,096 18 bits

AD9857 80MHz 1 up to 252 14 bits (inc DAC)

Table 4.13 Selection of Current Digital Down-Conversion Solutions

More discussion of digital up and down-conversion can be found in ToolBox I and inthe Software Radio case study, Chapter 10.

The standard serial bus and external interface bus have both been considered in previoussections. These interfaces are provided so that the DSP can perform some useful functionby processing real signals or at least so that data can be moved in and out of the DSPdevice efficiently. For many applications the requirement is for real-time processing ofanalog signals, that is, analog signals must be sampled and converted to a digital formwhich can then be processed by the DSP device. In this section we will consider the stan-dard mechanisms commonly used to interface analog I/O devices to the DSP chip. By wayof a example we will consider the use of a CODEC device. A CODEC, in this context, issimply a chip containing an A/D and D/A converter, analog conditioning circuits, anti-aliasing filters and some form of interface mechanism through which the device can beinitialized and its data passed. Most often the CODEC is the simplest approach that can beused to interface a DSP device to the outside analog world (Refs. 4.9 and 4.10).

The word ‘CODEC’ is used to describe a device which can be used to COde and DECodedata between different formats. In the context we are considering it here, the coding isfrom analog to digital form and the decoding is from digital back to analog. CODEC canalso be used to describe any coder/decoder device or algorithm, for example, an algorithmthat converts between a linear and compressed data format and back again could bedescribed as a CODEC. As already mentioned we will consider analog interface CODECSin this subsection. The block diagram of a standard CODEC device is shown in Figure4.70. The device shown is a Crystal Semiconductors CS4231A stereo CODEC which isoften used for multimedia applications.

The CODEC shown in Figure 4.70 incorporates a parallel port which is generally usedas a low speed connection through which the DSP device can send initialization informa-tion and monitor various aspects of the CODECs operation. The sample data is usually,though not necessarily, sent via the CODECs serial port which is connected directly to oneof the DSPs on-chip serial ports, as described in Section 4.2.3. The serial connection uses afour wire bus comprising a data clock, frame sync, data in and data out lines. The stereoaudio interface provided on the Texas Instruments C6xxx evaluation, EVM, board isshown in block diagram form in Figure 4.71. The C6xxx and CODEC connections thathave already been mentioned can be clearly identified in the EVM board block diagram.

The CS4231A CODEC provides a very compact solution for general purpose audio appli-cations. It requires only a minimal amount of external support components such as basicI/O buffering and filters, power smoothing and a crystal clock. The I/O buffering can bemade using standard op-amp designs and the filters only need to provide AC coupling, i.e.

Interfacing with the real world 275

Interfacing with the real world 4.9

What is a CODEC and how is it used? 4.9.1

an RC filter will suffice. The crystal clocks are used to provide the sample rate for the deviceand this can be set to operate at any one of a number of standard rates as outlined in Table4.14. Because the CODEC incorporates its own sample rate generator it is not necessary for


Filtersand 16-bit

A to Dconverters

Gai

nco

ntro

lbl

ock

MU

X L_lineR_micL_mic

R_auxL_auxR_lineµ-

law

A-la

wco

pres

sor

16 s

ampl

eFI

FO

Filtersand 16-bit

A to Dconverters

Gai

nco

ntro

lbl

ock

µ-la

wA-

law

expa

nder

16 s

ampl

eFI

FO

L_out

R_out

mono

Serial portinterface

Dithergenerator

SDout SDin CLK Fsync

Para

llel p

ort i

nter

face

Dat

aAd

dres

s

R/WENCS

INT

Oscillators

XTAL1 I/O XTAL2 I/O

Gain control Timer

Figure 4.70 CrystalSemiconductorsCS4231A stereomultimedia CODEC

CLKSCLKRCLKXFSRFSKDR

EXT_INTx

CLKS

CE1AREAWE

5 to 3.3 Vvoltage

translation

5 to 3.3 Vvoltage

translation

EA[3:2]

EA[7:0]

EMIFParallelinterface

5 V data 3.3 V data

Address

McBSPSerialinterface

TMS320C6000DSP

CS4231Aaudio CODEC

SCLK

FSYNC

SDOUT

IRQ

SDIN

CSRDWR

A[1:0]

D[7:0]

Figure 4.71 CS4231Astereo CODECinterface used on theTexas InstrumentsC6xxx EVM board

the DSP to provide this from its internal clock source. On the C6xxx EVM the serial port isconfigured such that the CODEC generates the data clock and frame sync signals and theC6xxx McBSP simply synchronizes itself to the CODEC. The CS4231A CODEC is providedwith two clock crystal input connectors and an internal divider circuit so that a wide rangeof different sampling rates can be generated (as already mentioned these are outlined inTable 4.14). The sampling rate is selected under software control through a number of on-chip registers that are provided on the CODEC. The registers are accessible via either theserial port or the parallel port although under normal operation, when continuous sam-pling is taking place, the obvious choice is to use the parallel port for configuration settingand the serial port as a dedicated high speed sample data channel.

The CS4231A CODEC is able to operate using one of a number of different standardserial data formats, all of which the C6xxx DSPs McBSP can accommodate. The most com-monly used formats are the 64-bit types shown in Figure 4.72. In this mode of operation acomplete serial data frame is made up of 64 data bits of which the first 16 correspond to theleft channel data samples, the next 16 are the right channel data samples and the remaining32 bits are used to pass control and status information. An alternative 64-bit data format isalso provided which incorporates the left and right channel data in the first 32 bits which arefollowed by a 32-bit blank period in which no data is sent. The benefit of using the 32 datasamples inserted into the 64-bit frame is that no driver software will be required on the DSPchip to strip out the control information from the data samples.

Figure 4.73 shows the timing of the serial data stream for the standard 64-bit data andcontrol frame format. Other data formats supporting companded data and mono-onlysignals are also supported by the CODEC. Note that the serial interface would normallyrun using full-duplex mode of operation where input and processed output data will passbetween the devices simultaneously.

When the CODEC is operated in 64-bit frame mode, the serial data clock that must begenerated will be at 64 times the sampling rate. For the maximum sampling rate of 48kHz, the serial clock is therefore 3.072 MHz. This sounds like quite a high serial data rate;

Interfacing with the real world 277

Table 4.14 CS4231ACODEC Samplingrates

Sampling rate (kHz)

5.5125 11.025 27.4286 44.1000

6.6150 16.0000 32.0000 48.0000

8.0000 18.9000 33.0750 –

9.6000 22.0500 37.8000 –

16-bit right sample16-bit left sample Control dataControl data

64-bit data and control format

16-bit right sample16-bit left sampleBlank period

32-bit data only format

Standard 64-bit frame

Figure 4.72 Serialstereo data frameformats used on theCS4231A CODEC

however, to put this into context the C6xxx can accommodate a maximum data rate of100 MHz when operating with a system clock of 200 MHz – the CS4321A is operatingwell within the capabilities of the C6xxx DSP.

The parallel interface provided on the CS4231A CODEC has an 8-bit data and 2-bitaddress bus; it is also provided with chip select and read/write strobe lines. This parallelinterface appears as a standard microprocessor interface which can be easily interfaced toa DSP device. On the C6xxx EVM board, already mentioned, the external interface is con-nected and mapped into addressable space such that the CODECs internal registersappear to the DSP programmer as a set of memory-mapped registers that can be writtento and read from just like any other memory location.


L15 L14 .. .. .. ..SDOUT

MSB

CLK

FSync

.. .. .. .. .. C0....

LSB

L15 L14 .. .. .. ..SDIN

MSB

.. .. .. .. .. C0....

LSB

16-bit L+R sample data 32-bit control data

Standard 64-bit serial data frame

Figure 4.73 Serialstereo data frameformats used on theCS4231A CODEC

4.10 Questions

1. Taking each of the following on-chip peripherals inturn, describe how they operate and explain why theyare included in a typical DSP device.

(a) AGU (b) Timer (c) Wait state generator(d) DMA processor (e) JTAG interface

2. For a DSP device of your choice, find out how you caninterface an external memory device in order to expandthe addressable memory space. You should do this in thecontext of a real DSP system, an EVM or similar develop-ment platform. Find out what address/data and controllines are required, how much external space is available,speed requirements of memory devices, and so on.

3. Describe the typical sequence of events that take place,from a hardware point of view, when an external inter-rupt signal arrives at the input of a DSP device.

4. Explain the meaning of the term ‘companding’ in thecontext of data I/O. What benefits are offered by the use

of companding and what steps must be taken whenprocessing streams of companded data?

5. Obtain a data sheet for a multimedia CODEC and inves-tigate the methods used to interface the CODEC to aDSP system of your choice. Consider the serial I/Otiming requirements and data format, additional connec-tions and software initialisation requirements. If you arenot sure where to obtain a relevant data sheet then checkout the DSPstore.com web site for further information.

6. A professional microphone is able to pick up signals inthe range 50 Hz to 20 kHz. Assuming that the signal con-tent from the microphone outside these ranges is zero,what is the minimum sampling rate needed to accuratelycapture the information from the microphone?

7. A studio mixing desk needs a very high level of signal toquantization noise of greater than 120 dB to meet theindustry requirements. What is the minimum number

4.1. TMS320C54x CPU and Peripherals – Reference set, Vol. 1, Texas Instruments, SPRU131d (1997)4.2. TMS320C62xx/C67xx CPU and Instruction set, Texas Instruments, SPRU189c (1998)4.3. TMS320C62xx/C67xx Peripherals reference guide, Texas Instruments, SPRU190b (1998)4.4. TMS320C62xx/C67xx Programmers guide, Texas Instruments, SPRU198b (1998)4.5. TMS320C54x Applications guide – Reference set, Vol. 4, Texas Instruments, SPRU173 (1996)4.6. TMS320C6000 McBSP Interface to the CS4231A Multimedia CODEC, Texas Instruments,

SPRA477 (1998)4.7. Jordan, M., Hardware and software interface issues for DSPs and serial audio CODECs –

Application notes, Crystal Semiconductor Corporation, (1996)4.8. Ifeachor, E. C. and Jervis, B. W., Digital Signal Processing – A Practical Approach, Addison-Wesley4.9. CS4231A Multimedia CODEC Applications Guide, Crystal semiconductor corporation, (1994)4.10. TMS320CC6xxx EVM Users Manual and Technical Reference, Texas Instruments, SPRU269

(1998)

References 279

of bits resolution needed in the ADC converters to sup-port this specification?

8. The input channel spacing for a police mobile radio systemis 25 kHz. A digital receiver design uses an ADC capable ofsampling at 65 MSPS, with a resolution of 14 bits. What is

the processing gain that can be realized for this implemen-tation, and what is the effective number of bits for theADC when this processing gain is fully realized?

Solutions to these questions and additional questions can befound on the DSPStore.com™ web site.

References 4.11

7929 chapter 4read.pudn.com/downloads113/doc/472253/dsp systems... · architecture, as shown in...

Documents