240662_633888485056270520.ppt

7/29/2019 240662_633888485056270520.ppt

1/115

Syllabus Architecture of TMS 320C6x

functional units fetch and execute

Pipelining

Registers

addressing modes

instruction sets

Timers

Interrupts

serial ports

DMA

memory

7/29/2019 240662_633888485056270520.ppt

2/115

Introduction to DSP

A digital signal processor (DSP) is a type ofmicroprocessor that are optimized forDigital signalProcessing

They Integrates system control and math-intensivefunctions

Advantage is speed, cost and energy efficiency.

It is a key component in many communication,medical, military and industrial products.

7/29/2019 240662_633888485056270520.ppt

3/115

FPGA

Field-Programmable Gate Arrays have the capability of being reconfigurable within a

system

But more expensive, have high power dissipation

ASIC

- Application Specific Integrated circuits

can perform specific functions extremely well, andcan be made quite power efficient.

But since ASICS are not field-programmable, theirfunctionality cannot be iteratively changed orupdated while in product development

Alternatives

7/29/2019 240662_633888485056270520.ppt

4/115

Why go digital?

Digital signal processing techniquesare now so powerful that sometimes it

is extremely difficult, if not impossible,

for analogue signal processing toachieve similar performance.

Examples:

FIR filter with linear phase.Adaptive filters.

7/29/2019 240662_633888485056270520.ppt

5/115

With DSP it is easy to:

Change applications.

Correct applications.

Update applications.

Additionally DSP reduces:

Noise susceptibility.

Chip count.

Development time.

Cost.

Power consumption.

7/29/2019 240662_633888485056270520.ppt

6/115

Use a DSP processor when the

following are required:

Cost saving.

Smaller size.

Low power consumption.

Processing of many high frequency

signals in real-time.

Why do we need DSP processors?

7/29/2019 240662_633888485056270520.ppt

7/115

Applications

7/29/2019 240662_633888485056270520.ppt

8/115

General DSP System Block Diagram

P

E

R

I

P

H

E

R

A

L

S

Central

Processing

Unit

Internal Memory

Internal Buses

ExternalMemory

7/29/2019 240662_633888485056270520.ppt

9/115

Classification of DSP

Von Neumann's architecture

Harvard architecture Super Harvard architecture

7/29/2019 240662_633888485056270520.ppt

10/115

VON NEUMANN'S ARCHITECTURE

7/29/2019 240662_633888485056270520.ppt

11/115

One shared memory for instructions (program) and

data with one data bus and one address bus betweenprocessor and memory.

Instructions and data have to be fetched in sequentialorder (known as the Von Neuman Bottleneck), limitingthe operation bandwidth.

Its design is simple

It is mostly used to interface to external memory.

7/29/2019 240662_633888485056270520.ppt

12/115

HARVARD ARCHITECTURE

7/29/2019 240662_633888485056270520.ppt

13/115

uses physically separate memories for theirinstructions and data, requiring dedicated buses for

each of them.

Instructions and operands can therefore be fetchedsimultaneously.

Different program and data bus widths are possible,allowing program and data memory to be betteroptimized to the architectural requirements.

Eg.: If the instruction format requires 14 bits then program busand memory can be made 14-bit wide, while the data bus anddata memory remain 8-bit wide.

7/29/2019 240662_633888485056270520.ppt

14/115

7/29/2019 240662_633888485056270520.ppt

15/115

Efficient Memory Access

OR

Bus

General purpose processors

Early DSP processors

More optimized DSP processors

7/29/2019 240662_633888485056270520.ppt

16/115

Classification of DSP

Fixed pointperforms integer operations Floating pointperforms both integer and floating point

processors

It is the application that dictates which device and platform to

use in order to achieve optimum performance at a low cost.

For educational purposes, use the floating-point device as it can

support both fixed and floating point operations.

Fixed point TMS320C1x, C2x, C5x ..

Floating point TMS320C3x, C4x, C67x .

7/29/2019 240662_633888485056270520.ppt

17/115

Programs in C are more flexible and quicker to develop.

programs in assembly often have better performance;

they run faster and use less memory, resulting in lower cost.

C versus Assembly language

7/29/2019 240662_633888485056270520.ppt

18/115

7/29/2019 240662_633888485056270520.ppt

19/115

7/29/2019 240662_633888485056270520.ppt

20/115

How complicated is the program?

If it is large and intricate, you will probably want to use C.If it is small and simple, assembly may be a good choice.

Are you pushing the maximum speed of the DSP?

If so, assembly will give you the last drop of performance from

the device.

For less demanding applications, you should consider using C.

C / Assembly ?

7/29/2019 240662_633888485056270520.ppt

21/115

How many programmers will be working together?

If the project is large enough for more than one programmer,

lean toward Cuse in-line assembly only for time critical segments.

Which is more important, product cost /

development cost ?If it is product cost, choose assembly;

if it is development cost, choose C.

What is your background?

If you are experienced in assembly (on other microprocessors),choose assembly for your DSP.

If your previous work is in C, choose C for your DSP.

7/29/2019 240662_633888485056270520.ppt

22/115

The Digital Signal Processor Market

7/29/2019 240662_633888485056270520.ppt

23/115

Digital Signal Processor market is dominated by4 companies.

Analog Devices (www.analog.com/dsp)ADSP-21xx 16 bit, fixed point

ADSP-21xxx 32 bit, floating and fixed

Lucent Technologies (www.lucent.com)DSP16xxx 16 bit fixed point

DSP32xx 32 bit floating point

Motorola(www.mot.com)DSP561xx 16 bit fixed point

DSP560xx 24 bit, fixed point

DSP96002 32 bit, floating point

Texas Instruments(www.ti.com)TMS320Cxx 16 bit fixed point

TMS320Cxx 32 bit floating point

7/29/2019 240662_633888485056270520.ppt

24/115

7/29/2019 240662_633888485056270520.ppt

25/115

TMS320 Family

Lowest Cost

Control Systems

Motor Control

Storage

Digital Ctrl Systems

C2000 C5000

Efficiency

Best MIPS

Wireless phones

Internet audio

players

Digital still cameras

Modems

Telephony VoIP

C6000

Multi Channel and Multi

Function App's

Comm. Infrastructure

Wireless Base-stations

Audio and SpeechProcessing

Imaging

Multi-media Servers

Video

Best Performance &Ease-of-Use

7/29/2019 240662_633888485056270520.ppt

26/115

C6000 Roadmap

C6713C62x

Performance

Time

Floating Point

Multi-core C64x DSP

1.1 GHz

C64x

DSP

2nd Generation (Fixed Point)

General

Purpose C6414 C6415 C6416

Media

Gateway

3G Wireless

Infrastructure

C6201

C6701

C6202C6203

C6211C6711

C6204

1st Generation

C6205

C6712C67x

Fixed-point

Floating-point

C6411

7/29/2019 240662_633888485056270520.ppt

27/115

Feature of the TMS320C6x The Texas Instruments TMS320C6x family of

microprocessors is one of the largest VLIW successstories to date

This family of processors are built to deliver speed

Family have different size, cost, memory, peripherals,

power consumption specificationsFixed-point C6201 version 5-ns Instruction Cycle Time

200-MHz Clock Rate

performance of up to 1600 MIPS

Eight 32-Bit Instructions/Cycle

Floating-point C6701 version Can operate at 167MHz

6ns Instruction cycle time

1 giga floating-point operations per second (GFLOPS)

Eg:

7/29/2019 240662_633888485056270520.ppt

28/115

Very Long Instruction Word (VLIW )

refers to a CPU architecture designed to take advantage of

instruction level parallelism executes operation in parallel based on a fixed schedule

determined when programs are compiled.

the order of execution of operations (including which operations

can execute simultaneously) is handled by the compiler hencethe processor does not need the scheduling hardware

VLIW CPUs offer significant computational power with less

hardware complexity greater compiler complexity

VLIW architectures execute multiple instructions/cycle

7/29/2019 240662_633888485056270520.ppt

29/115

VLIW architectures execute multiple instructions/cycleand use simple, regular instruction sets

More parallelism, higher performance

Better compiler targets

7/29/2019 240662_633888485056270520.ppt

30/115

7/29/2019 240662_633888485056270520.ppt

31/115

7/29/2019 240662_633888485056270520.ppt

32/115

Disadvantages of VLIW Architectures

New kinds of programmer/compiler complexity

Programmer (or code-generation tool) must keep

track of instruction scheduling

Deep pipelines and long latencies can be confusing,

may make peak performance elusiveIncreased memory use

High program memory bandwidth requirements

High power consumptionMisleading MIPS ratings

V l iTI

7/29/2019 240662_633888485056270520.ppt

33/115

VelociTI

VLIW modification done by TI is called VelociTI

Reduces code size Increases performance when instructions reside off-chip

C6X architecture is based on the high-performance advanced

VelociTI very-long-instruction-word (VLIW) architecture

developed by Texas Instruments (TI)

an excellent choice for multichannel and multifunction

applications (Several instructions captured & processed

simultaneously)

TMS320C6x with VelociTI Enables Cost-Effective

7/29/2019 240662_633888485056270520.ppt

34/115

TMS320C6x with VelociTI Enables Cost-EffectiveSolutions for Emerging

Applications

Unlimited Internet bandwidth

Universal wireless communication

New telephony features

Remote medical diagnostics

Automated cruise control

Personal home base station

Personalized home security

TMS320C6000 DSP Device Nomenclature

7/29/2019 240662_633888485056270520.ppt

35/115

TMS320C6000. DSP Device Nomenclature

7/29/2019 240662_633888485056270520.ppt

36/115

TMS320C6711

A floating point processorwith VLIW architecture

Internal memory includes a two level cache architecture

- 4KB of level 1 program cache (L1P)

- 4KB of level 1 data cache (L1D)

- 64 KB of RAM / level 2 cache for data/program (L2) Has direct interface to both synchronous memories (SDRAM

and SBSRAM) and asynchronous (SRAM and EPROM)

With 32 bit address bus , total memory space is 232 =4GB

It requires 3.3v for I/O and 1.8v for core

Operates at 150 MHz

perform 900 million floating point operations per second(MFLOPS)

Translates to 1200 million instructions per second (MIPS)

7/29/2019 240662_633888485056270520.ppt

37/115

DSK Contents

7/29/2019 240662_633888485056270520.ppt

38/115

1.8V Power Supply 16M SDRAM 128K FLASHDaughter Card I/F(EMIF Connector)

ParallelPort I/F

PowerJack

PowerLED

3.3V Power Supply

JTAG Header

EmulationJTAG Header

Reset

Line Level Output (speakers)

Line Level Input (microphone)

16-bit codec (A/D & D/A)

Three User LEDs

User DIP

switches

C6711DSP

D. Card I/F(Periph Con.)

TMS320C6711

Block diagram

7/29/2019 240662_633888485056270520.ppt

39/115

Block diagram

7/29/2019 240662_633888485056270520.ppt

40/115

CPU There are two sets of functional units A and B

Each set contains four units and a register file. One set contains functional units .L1, .S1, .M1, and

.D1

the other set contains units .D2, .M2, .S2, and .L2. .M unit : multiplication operation

.L unit : logical and arithmetic operations

.S unit : branch, bit manipulation and arithmeticoperations

.D unit : load/store and arithmetic operations

7/29/2019 240662_633888485056270520.ppt

41/115

7/29/2019 240662_633888485056270520.ppt

42/115

The C67x CPU executes all C62x instructions.

In addition to C62x fixed-point instructions, the six out of

eight functional units (.L1, .S1, .M1, .M2, .S2, and .L2)

also execute floating-point instructions.

The remaining two functional units (.D1 and .D2) also

execute the new LDDW instruction which loads 64 bits

per CPU side for a total of 128 bits per cycle.

TMS320C6711 Memory

7/29/2019 240662_633888485056270520.ppt

43/115

TMS320C6711 Memory

3-Access level of Memory Map

7/29/2019 240662_633888485056270520.ppt

44/115

3 Access level of Memory Map1. L1 Memory

-Cache-based Architecture

-Program Cache & Data Cache

-Size : PC(4Kbyte), DC(4Kbyte)

2. L2 Memory

- Size : 64Kbyte

- Program & Data

3. L3 Memory

External Memory

7/29/2019 240662_633888485056270520.ppt

45/115

7/29/2019 240662_633888485056270520.ppt

46/115

External Memory

- Synchronous Memory

(SRAM, SBSRAM)

- Asynchronous Memory

(SDRAM, EPROM)

Internal Memory

- Program

- Data

Registers:

7/29/2019 240662_633888485056270520.ppt

47/115

g

The two register files each contain 16 32-bit registers for atotal of 32 general-purpose registers (A0~A15, B0~B15)

Interaction with the CPU must be done through these

registers

The four functional units on each side of the CPU can freely

share the 16 registers belonging to that side.

two cross paths 1x and 2x connects all the registers on the

other side

(which can access data from the register files on the

opposite side.)

If register access is by functional units on the same side of

the CPU, register file can service all the units in a single

clock cycle

-register access using the register file across the CPU

supports one read and one write per cycle.

Restrictions on Register Accesses

7/29/2019 240662_633888485056270520.ppt

48/115

Registers A0,A1,B0,B1 are used as conditional registers

Registers A4-A7 and B4-B7 are used for circular addressing

Registers A0-A9 and B0-B9 (except B3) are temporary

registers

Any Registers A10-A15 and B10-B15 used are saved and later

restored before returning from a subroutine

Restrictions on Register Accesses

7/29/2019 240662_633888485056270520.ppt

49/115

Each function unit has read/write ports

Data path 1 (2) units read/write A (B) registers

Data path 2 (1) can read one A (B) register per cycle

40 bit words stored in adjacent register pair

Used in extended precision accumulation

32 LSB bits are stored in even register(eg.A2) and remaining8 bits stored in the 8 LSB of next upper (odd) register(A3)

64 bit is also stored in the similar fashion

Two simultaneous memory accesses cannot use registers of

same register file as address pointers

7/29/2019 240662_633888485056270520.ppt

50/115

C6 i t l b

7/29/2019 240662_633888485056270520.ppt

51/115

C6x internal buses

7/29/2019 240662_633888485056270520.ppt

52/115

7/29/2019 240662_633888485056270520.ppt

53/115

'C6x Peripherals

7/29/2019 240662_633888485056270520.ppt

54/115

C6x Peripherals

C6x

CPU

EMIF

DMA

Boot

External

Memory

EMIFExternal Memory Interface.

A 32-bit bus on which external memories and other devices can beconnected.

It includes features like internal wait state generation and SDRAM control.

The EMIF can interface to both synchronous and synchronous memories.

McBSP

HPI/XB

Timer

PLL

McBSP

7/29/2019 240662_633888485056270520.ppt

55/115

McBSP

2 McBSP Multichannel buffered serial ports.Each McBSP can be used for high speed serial data

transmission with external devices or reprogrammed as generalpurpose I/Os.

McBSP1 is used to transmit and receive audio data from the

AIC23 stereo codec.

McBSP0 is used to control the codec through its serial control

port.

On chip PLL t l k t f l t l l k

7/29/2019 240662_633888485056270520.ppt

56/115

On-chip PLLgenerates processor clock rate from slower external clockreference.

Timersgenerates periodic timer events as a function of the processor clock. Usedby DSP/BIOS to create time slices for multitasking.

Power Down units - Save power for durations when CPU is inactive

EDMA Controller Enhanced DMA controller allows high speed data transfers

without intervention from the DSP.

BOOT- Boot from 4M external block

- Boot from HPI/XB

SBSRAM: Synchronous Burst Static Random Access Memory

Host Port Interface (HPI)

7/29/2019 240662_633888485056270520.ppt

57/115

Host Port Interface (HPI)

The host port interface (HPI) is a parallel port through which a

host processor can directly access the CPUs memory space. The host device is the master of the interface, therefore

increasing its ease of access.

The host and the CPU can exchange information via internal or

external memory. In addition, the host has direct access to memory-mappedperipherals.

Connectivity to the CPUs memory space is provided throughthe DMA controller.

Expansion bus (XB) is a replacement for the HPI, as well as anexpansion of the EMIF.

The expansion provides two distinct areas of functionality (host

port and I/O port) which can co-exist in a system

7/29/2019 240662_633888485056270520.ppt

58/115

CPU operations

Fetch instruction from memory (DSP programmemory)

Decode instruction

Execute instruction including reading datavalues

Program Fetch (F)

7/29/2019 240662_633888485056270520.ppt

59/115

Program Fetch (F)

Program fetching consists of 4 phases

generate fetch address (PG) send address to memory (PS)

wait for data ready (PW)

read opcode (PR)

C6x

Memory PGPS

PW

PR

7/29/2019 240662_633888485056270520.ppt

60/115

Decode Stage (D)

Decode stage consists of two phases dispatch instruction to functional unit (DP)

instruction decoded at functional unit

(DC)

C6x

Memory PGPS

PW

PR DCDP

7/29/2019 240662_633888485056270520.ppt

61/115

Execute Stage (E)

An execute packet (EP) consists of a group ofinstructions that can be executed in parallel within thesame cycle

Number of EP within a fetch packet can vary from one(with 8 parallel instructions) to 8 (with no parallelinstructions)

bit 0 (LSB) of every 32 bit instruction determines if thenext instruction belongs to same EP or not

if 1 same EP

if 0 part of next EP

FETCH and EXECUTION PACKETS(

7/29/2019 240662_633888485056270520.ppt

62/115

(Fetch packet consists of 8 32-bit instructions)

Consider an FP with three EP:

Instruction A

II Instruction B

instruction C

II Instruction D

II Instruction E

Instruction F

II Instruction G

II Instruction H

A D E F G HCB

31 031 0 31 0 31 031 0 31 0 31 0 31 0 31 0

In the fetch packet ,EP1 contains 2 parallelinstructions,EP2 contains 3andEP3 has 3 parallel instructions

Pipelining

7/29/2019 240662_633888485056270520.ppt

63/115

p g

Overlap operations to increase performance

Pipeline CPU operations to increase clock speed over

a sequential implementation

Separate parallel functional units

Peripheral interfaces for I/O do not burden CPU

It is a key feature in DSP to get parallel instructions working properly

Requires careful timing

non pipelined scalar architect re

7/29/2019 240662_633888485056270520.ppt

64/115

non-pipelined scalar architecture

- A processor that executes every instruction one after the

other- may use processor resources inefficiently, potentially

leading to poor performance.

pipelining

- executing different sub-steps of sequential instructionssimultaneously

superscalar architectures

- executing multiple instructions entirely simultaneously

7/29/2019 240662_633888485056270520.ppt

65/115

7/29/2019 240662_633888485056270520.ppt

66/115

7/29/2019 240662_633888485056270520.ppt

67/115

There are 3 stages of pipelining:

P f h f

7/29/2019 240662_633888485056270520.ppt

68/115

Program fetch composed of 4 phases

PGprogram address generateto fetch an address

PSprogram address sendto send the address

PWprogram address ready waitto wait for data

PRprogram fetch packet receiveto read opcode frommemory

Decode stage composed of 2 phasesDPdispatchall the instructions within an FP to theappropriate functional units

DCinstruction decode

Execute stage composed of 6 (fixed point)-10 (floating point)a) multiplication instruction consists of 2 phases due to 1 delay

b) load instruction consists of 5 phases due to 4 delays

c) branch instruction consists of 6 phases due to 5 delays

Pipeline phases

7/29/2019 240662_633888485056270520.ppt

69/115

Program fetch decode execute

PG PS PW PR DP DC E1- E6 (E1-E10 for doubleprecision)

Pipelining effectsClock cycles

1 2 3 4 5 6 7 8 9 10

PG PS PW PR DP DC E1 E2 E3 E4

PG PS PW PR DP DC E1 E2 E3

PG PS PW PR DP DC E1 E2PG PS PW PR DP DC E1

PG PS PW PR DP DC

PG PS PW PR DP

PG PS PW PR

Each row represents an FP

7/29/2019 240662_633888485056270520.ppt

70/115

p

PG of first FP starts in cycle 1,PG of second FP starts in cycle 2

and so on.

Each FP has 4 phases for fetch ,2 phases for decode andexecution phases can take from 1 to 10 phases

At cycle 7,

instruction in the first FP are in the first execution phase E1,

instruction in the second FP is in decoding phase,

instruction in the third FP is in dispatching phase

and so on..

All the instructions are proceeding through various phases

Therefore pipeline is FULL

Most instructions have 1 execute phase

7/29/2019 240662_633888485056270520.ppt

71/115

Multiply (MPY) has 2

Load (LDH/LDW) has 5

Branch (B) has 6 phases

Additional execute phases are associated with floating point anddouble precision type instructions (upto 10 phases)

eg: MPYDP has 9 delay slots and a total 10 phases

Functional unit latency:

The number of cycles that an instruction ties up a functional unit. it is 1 for all instructions except double precision instructions

no other instructions can use the functional unit

it is different from delay slot

eg: MPYDP has 4 functional unit latency but 9 delay slots

delay slot: some instructions that are physically after the instruction areexecuted as if they were located before it.

Classic examples are branch and call instructions, which often execute the

following instruction before the branch or call is performed.

Instruction Set

7/29/2019 240662_633888485056270520.ppt

72/115

Instruction Set

Assembly code format:

Label II [ ] Instruction Unit operands ; comments

A Label represents a specific address/memory location that contains an

instruction or data (label must be in the first column)

Parallel bars (II) are used if the instructions are being executed parallel with

the previous instructions

this field ([ ]) is optional to make the associated instruction conditional

- 5 registers are used as conditional registers

- [A2] specifies that the associated instruction executes if A2 is not zero

- [!A2] associated instructions are executed if A2 is zero

instruction field can be assembler directive or mnemonic

7/29/2019 240662_633888485056270520.ppt

73/115

- assembler directive is a command for assembler

.short : initialize 16 bit integer

.int : initialize 32 bit integer

.float : initialize 32 bit IEEE single precision constant- mnemonic is an actual instruction that executes at run time

Unit field can be any one of the 8 functional units (optional)

Comments starting in column 1 begin with an asterisk or a semicolonwhereas comments starting in any other column must begin with asemicolon

ADD .L1 A3,A7,A7 ; add A3+A7 A7

MPY .M2 A7,B7,B6 ; multiply 16 LSBs of A7,B7 B7

II MPYH .M1 A7,B7,A6 ; multiply 16 MSBs of A7,B7 A6

Eg:

Instruction set

7/29/2019 240662_633888485056270520.ppt

74/115

Instruction set They are designed to make maximum use of the

processors resources and at the same time minimizethe memory space required to store the instructions.

Minimizing the storage space ensures the cost

effectiveness of the overall system.

To ensure the maximum use of hardware of the DSP,

the instructions are designed to perform several

parallel operations in a single instruction, typically

including fetching of data in parallel with mainarithmetic operation.

Instructions are kept short by restricting which register

7/29/2019 240662_633888485056270520.ppt

75/115

Instructions are kept short by restricting which registercan be used with which operations and whichoperations can be combined in an instruction.

Some of the latest processors use VLIW architectures,where in multiple instructions are issued and executedper cycle.

In such architectures the instructions are short anddesigned to perform much less work thus requiringless memory and increased speed because of theVLIW architecture.

7/29/2019 240662_633888485056270520.ppt

76/115

7/29/2019 240662_633888485056270520.ppt

77/115

C67x Addl Instructions (by unit)

7/29/2019 240662_633888485056270520.ppt

78/115

( y )

.S Unit

CMPLTDPRCPSP

RCPDP

RSQRSP

RSQRDP

SPDP

ABSSPABSDP

CMPGTSP

CMPEQSP

CMPLTSP

CMPGTDP

CMPEQDP

.M Unit

MPYI

MPYID

MPYSP

MPYDP

.L Unit

INTSPINTSPU

SPINT

SPTRUNC

SUBSP

SUBDP

ADDDPADDSP

DPINT

DPSP

INTDP

INTDPU

.D Unit

ADDAD LDDW

Control Register File

7/29/2019 240662_633888485056270520.ppt

79/115

7/29/2019 240662_633888485056270520.ppt

80/115

The interrupt flag register(IFR)

7/29/2019 240662_633888485056270520.ppt

81/115

- contains the status of INT4-INT15 and NMI interrupt.

- Each corresponding bit in the IFR is set to 1 when that

interrupt occurs; otherwise, the bits are cleared to 0.- If you want to check the status of interrupts, use the MVC

instruction to read the IFR.

The interrupt return pointer register(IRP)

- contains the return pointer that directs the CPU to the

proper location to continue program execution after

processing a maskable interrupt.

- A branch using the address in IRP (B IRP) in yourinterrupt service routine returns to the program flow when

interrupt servicing is complete.

7/29/2019 240662_633888485056270520.ppt

82/115

Addressing modes

7/29/2019 240662_633888485056270520.ppt

83/115

Determines how one access memory

Addressing refers to means to specify location of operands forinstructions

- types of addressing are called addressing modes

- operands may be input operands for the operation as well asresults of the operation

Addressing modes supported by the TMS320C67x include

register-indirect,

indexed register-indirect,

and modulo addressing (circular addressing).

Immediate data is also supported.

The TMS320C67x does not support modulo addressing for 64-bit data.

7/29/2019 240662_633888485056270520.ppt

84/115

7/29/2019 240662_633888485056270520.ppt

85/115

7/29/2019 240662_633888485056270520.ppt

86/115

7/29/2019 240662_633888485056270520.ppt

87/115

Circular Buffer

7/29/2019 240662_633888485056270520.ppt

88/115

At the beginning of eachsample period,

a new sample will be read into

the circular buffer,overwriting

the oldest sample.The newest sample x(n) will be

stored at the memory location

pointed at by auxiliary register

AR(i).

The need of processing the digital signals in real time,l th t f Ci l B ff i

7/29/2019 240662_633888485056270520.ppt

89/115

evolves the concept ofCircular Buffering. Circular buffers are used to store the most recent

values of a continually updated signal.

Circular buffering allows processors to access a blockof data sequentially and then automatically wraparound to the beginning address exactly the patternused to access coefficients in FIR filter.

Circular buffering also very helpful in implementingfirst-in, first-out buffers, commonly used for I/O and for

FIR delay lines.

7/29/2019 240662_633888485056270520.ppt

90/115

7/29/2019 240662_633888485056270520.ppt

91/115

AMR mode and description

Mode description00 for linear addressing

01 for circular addressing using BK0

For circular addressing using BK1

reserved

7/29/2019 240662_633888485056270520.ppt

92/115

7/29/2019 240662_633888485056270520.ppt

93/115

7/29/2019 240662_633888485056270520.ppt

94/115

7/29/2019 240662_633888485056270520.ppt

95/115

Block size = 2N+1 bytes

7/29/2019 240662_633888485056270520.ppt

96/115

Eg:

7/29/2019 240662_633888485056270520.ppt

97/115

MVK .S2 0X0004,B2

; lower 16 bits to B2

MVKLH .S2 0x0005,B2

; upper 16 bits to B2

The value 0x0004 =(0100) into 16 LSB of AMR sets bit 2 (third bit)to 1 and all other bits to zero.

This sets the mode to 01 and selects register A5 as pointer to

buffer using BK0

The value 0x0005 =(0101) into 16 MSB of AMR sets bits 16 and18 to 1.

This corresponds to value of N used to select size of buffer = 2 N+1

= 64 bytes using BKO

7/29/2019 240662_633888485056270520.ppt

98/115

Reset (RESET)

7/29/2019 240662_633888485056270520.ppt

99/115

Reset (RESET)

Reset is the highest priority interrupt and is used to

halt the CPU and return it to a known state.

The reset interrupt is unique in a number of ways:

- RESET is an active-low signal. All other interruptsare active-high signals.

- RESET must be held low for 10 clock cycles before it

goes high again to reinitialize the CPU properly.

- The instruction execution in progress is aborted andall registers are returned to their default states.

- RESET is not affected by branches.

Nonmaskable Interrupt (NMI)

7/29/2019 240662_633888485056270520.ppt

100/115

- NMI is the second-highest priority interrupt- generally used to alert the CPU of a serious

hardware problem such as imminent power failure.

- For NMI processing to occur, the non maskable

interrupt enable (NMIE) bit in the interrupt enableregister must be set to 1.

7/29/2019 240662_633888485056270520.ppt

101/115

7/29/2019 240662_633888485056270520.ppt

102/115

Multichannel Buffered Serial Port (McBSP)

7/29/2019 240662_633888485056270520.ppt

103/115

The standard serial port interface provides:

Full-duplex communication

Double-buffered data registers, which allow a continuous data stream

Independent framing and clocking for reception and transmission

Direct interface to industry-standard codecs, analog interface chips(AICs), and other serially connected A/D and D/A devices

- Multi channel transmission and reception of up to 128 channels.

An element sizes of 8, 12, 16, 20, 24, or 32-bit.

- 8-bit data transfers with LSB or MSB first.

7/29/2019 240662_633888485056270520.ppt

104/115

7/29/2019 240662_633888485056270520.ppt

105/115

7/29/2019 240662_633888485056270520.ppt

106/115

7/29/2019 240662_633888485056270520.ppt

107/115

7/29/2019 240662_633888485056270520.ppt

108/115

The DMA controller uses the bus request pin to notifyth DSP th t it i d t k t f t

7/29/2019 240662_633888485056270520.ppt

109/115

the DSP core that it is ready to make a transfer to orfrom external memory.

The DSP core completes its current instruction,releases control of external memory and signals theDMA controller via the bus grant pin that the DMAtransfer can proceed.

The DMA controller then transfers the specifiednumber of data words and optionally signalscompletion through an interrupt.

Some processor can also have multiple channels

DMA managing DMA transfers in parallel.

Timer

7/29/2019 240662_633888485056270520.ppt

110/115

Timer

The C67x has two 32-bit general-purpose timers that can beused to:

Time events

Count events

Generate pulses

Interrupt the CPU

Send synchronization events to the DMA controller

7/29/2019 240662_633888485056270520.ppt

111/115

7/29/2019 240662_633888485056270520.ppt

112/115

The timer works in one of the two signaling modes dependingon whether clocked by an internal or an external source.

The timer has an input pin (TINP) and an output pin (TOUT). The TINP pin can be used as a general purpose input, and the

TOUT pin can be used as a general-purpose output.

When an internal clock is provided, the timer generates timingsequences to trigger peripheral or external devices such asDMA controller or A/D converter respectively.

When an external clock is provided, the timer can countexternal events and interrupt the CPU after a specified number

of events.

oa tore pt onsIn 'C6x the instruction set supports several types

7/29/2019 240662_633888485056270520.ppt

113/115

Four load instructions:LDDW Loa 64-bit double word (C67x only)

LDW Load 32-bit word

LDH Load 16-bit half-word (short)

LDB Load 8-bit byte

Three store instructions:

STWSTH

STB

of load/store instructions:

7/29/2019 240662_633888485056270520.ppt

114/115

Load, and Store Paths

7/29/2019 240662_633888485056270520.ppt

115/115

The C67x DSP has two 32-bit paths for loading data from memory tothe register File: LD1 for register file A, and LD2 for register file B. The C67x DSP also has a second 32-bit load path for both register

files A and B. This allows the LDDW instruction to simultaneously load two 32-bit

values into register file A and two 32-bit values into register file B. For side A, LD1a is the load path for the 32 LSBs and LD1b is the

load path for the 32 MSBs. For side B, LD2a is the load path for the 32 LSBs and LD2b is the

load path for the 32 MSBs.

There are also two 32-bit paths, ST1 and ST2, for storing registervalues to memory from each register file.

240662_633888485056270520.ppt

Documents