240662_633888485056270520.ppt
TRANSCRIPT
-
7/29/2019 240662_633888485056270520.ppt
1/115
Syllabus Architecture of TMS 320C6x
functional units fetch and execute
Pipelining
Registers
addressing modes
instruction sets
Timers
Interrupts
serial ports
DMA
memory
-
7/29/2019 240662_633888485056270520.ppt
2/115
Introduction to DSP
A digital signal processor (DSP) is a type ofmicroprocessor that are optimized forDigital signalProcessing
They Integrates system control and math-intensivefunctions
Advantage is speed, cost and energy efficiency.
It is a key component in many communication,medical, military and industrial products.
-
7/29/2019 240662_633888485056270520.ppt
3/115
FPGA
Field-Programmable Gate Arrays have the capability of being reconfigurable within a
system
But more expensive, have high power dissipation
ASIC
- Application Specific Integrated circuits
can perform specific functions extremely well, andcan be made quite power efficient.
But since ASICS are not field-programmable, theirfunctionality cannot be iteratively changed orupdated while in product development
Alternatives
-
7/29/2019 240662_633888485056270520.ppt
4/115
Why go digital?
Digital signal processing techniquesare now so powerful that sometimes it
is extremely difficult, if not impossible,
for analogue signal processing toachieve similar performance.
Examples:
FIR filter with linear phase.Adaptive filters.
-
7/29/2019 240662_633888485056270520.ppt
5/115
With DSP it is easy to:
Change applications.
Correct applications.
Update applications.
Additionally DSP reduces:
Noise susceptibility.
Chip count.
Development time.
Cost.
Power consumption.
-
7/29/2019 240662_633888485056270520.ppt
6/115
Use a DSP processor when the
following are required:
Cost saving.
Smaller size.
Low power consumption.
Processing of many high frequency
signals in real-time.
Why do we need DSP processors?
-
7/29/2019 240662_633888485056270520.ppt
7/115
Applications
-
7/29/2019 240662_633888485056270520.ppt
8/115
General DSP System Block Diagram
P
E
R
I
P
H
E
R
A
L
S
Central
Processing
Unit
Internal Memory
Internal Buses
ExternalMemory
-
7/29/2019 240662_633888485056270520.ppt
9/115
Classification of DSP
Von Neumann's architecture
Harvard architecture Super Harvard architecture
-
7/29/2019 240662_633888485056270520.ppt
10/115
VON NEUMANN'S ARCHITECTURE
-
7/29/2019 240662_633888485056270520.ppt
11/115
One shared memory for instructions (program) and
data with one data bus and one address bus betweenprocessor and memory.
Instructions and data have to be fetched in sequentialorder (known as the Von Neuman Bottleneck), limitingthe operation bandwidth.
Its design is simple
It is mostly used to interface to external memory.
-
7/29/2019 240662_633888485056270520.ppt
12/115
HARVARD ARCHITECTURE
-
7/29/2019 240662_633888485056270520.ppt
13/115
uses physically separate memories for theirinstructions and data, requiring dedicated buses for
each of them.
Instructions and operands can therefore be fetchedsimultaneously.
Different program and data bus widths are possible,allowing program and data memory to be betteroptimized to the architectural requirements.
Eg.: If the instruction format requires 14 bits then program busand memory can be made 14-bit wide, while the data bus anddata memory remain 8-bit wide.
-
7/29/2019 240662_633888485056270520.ppt
14/115
-
7/29/2019 240662_633888485056270520.ppt
15/115
Efficient Memory Access
OR
Bus
General purpose processors
Early DSP processors
More optimized DSP processors
-
7/29/2019 240662_633888485056270520.ppt
16/115
Classification of DSP
Fixed pointperforms integer operations Floating pointperforms both integer and floating point
processors
It is the application that dictates which device and platform to
use in order to achieve optimum performance at a low cost.
For educational purposes, use the floating-point device as it can
support both fixed and floating point operations.
Fixed point TMS320C1x, C2x, C5x ..
Floating point TMS320C3x, C4x, C67x .
-
7/29/2019 240662_633888485056270520.ppt
17/115
Programs in C are more flexible and quicker to develop.
programs in assembly often have better performance;
they run faster and use less memory, resulting in lower cost.
C versus Assembly language
-
7/29/2019 240662_633888485056270520.ppt
18/115
-
7/29/2019 240662_633888485056270520.ppt
19/115
-
7/29/2019 240662_633888485056270520.ppt
20/115
How complicated is the program?
If it is large and intricate, you will probably want to use C.If it is small and simple, assembly may be a good choice.
Are you pushing the maximum speed of the DSP?
If so, assembly will give you the last drop of performance from
the device.
For less demanding applications, you should consider using C.
C / Assembly ?
-
7/29/2019 240662_633888485056270520.ppt
21/115
How many programmers will be working together?
If the project is large enough for more than one programmer,
lean toward Cuse in-line assembly only for time critical segments.
Which is more important, product cost /
development cost ?If it is product cost, choose assembly;
if it is development cost, choose C.
What is your background?
If you are experienced in assembly (on other microprocessors),choose assembly for your DSP.
If your previous work is in C, choose C for your DSP.
-
7/29/2019 240662_633888485056270520.ppt
22/115
The Digital Signal Processor Market
-
7/29/2019 240662_633888485056270520.ppt
23/115
Digital Signal Processor market is dominated by4 companies.
Analog Devices (www.analog.com/dsp)ADSP-21xx 16 bit, fixed point
ADSP-21xxx 32 bit, floating and fixed
Lucent Technologies (www.lucent.com)DSP16xxx 16 bit fixed point
DSP32xx 32 bit floating point
Motorola(www.mot.com)DSP561xx 16 bit fixed point
DSP560xx 24 bit, fixed point
DSP96002 32 bit, floating point
Texas Instruments(www.ti.com)TMS320Cxx 16 bit fixed point
TMS320Cxx 32 bit floating point
-
7/29/2019 240662_633888485056270520.ppt
24/115
-
7/29/2019 240662_633888485056270520.ppt
25/115
TMS320 Family
Lowest Cost
Control Systems
Motor Control
Storage
Digital Ctrl Systems
C2000 C5000
Efficiency
Best MIPS
Wireless phones
Internet audio
players
Digital still cameras
Modems
Telephony VoIP
C6000
Multi Channel and Multi
Function App's
Comm. Infrastructure
Wireless Base-stations
Audio and SpeechProcessing
Imaging
Multi-media Servers
Video
Best Performance &Ease-of-Use
-
7/29/2019 240662_633888485056270520.ppt
26/115
C6000 Roadmap
C6713C62x
Performance
Time
Floating Point
Multi-core C64x DSP
1.1 GHz
C64x
DSP
2nd Generation (Fixed Point)
General
Purpose C6414 C6415 C6416
Media
Gateway
3G Wireless
Infrastructure
C6201
C6701
C6202C6203
C6211C6711
C6204
1st Generation
C6205
C6712C67x
Fixed-point
Floating-point
C6411
-
7/29/2019 240662_633888485056270520.ppt
27/115
Feature of the TMS320C6x The Texas Instruments TMS320C6x family of
microprocessors is one of the largest VLIW successstories to date
This family of processors are built to deliver speed
Family have different size, cost, memory, peripherals,
power consumption specificationsFixed-point C6201 version 5-ns Instruction Cycle Time
200-MHz Clock Rate
performance of up to 1600 MIPS
Eight 32-Bit Instructions/Cycle
Floating-point C6701 version Can operate at 167MHz
6ns Instruction cycle time
1 giga floating-point operations per second (GFLOPS)
Eg:
-
7/29/2019 240662_633888485056270520.ppt
28/115
Very Long Instruction Word (VLIW )
refers to a CPU architecture designed to take advantage of
instruction level parallelism executes operation in parallel based on a fixed schedule
determined when programs are compiled.
the order of execution of operations (including which operations
can execute simultaneously) is handled by the compiler hencethe processor does not need the scheduling hardware
VLIW CPUs offer significant computational power with less
hardware complexity greater compiler complexity
VLIW architectures execute multiple instructions/cycle
-
7/29/2019 240662_633888485056270520.ppt
29/115
VLIW architectures execute multiple instructions/cycleand use simple, regular instruction sets
More parallelism, higher performance
Better compiler targets
-
7/29/2019 240662_633888485056270520.ppt
30/115
-
7/29/2019 240662_633888485056270520.ppt
31/115
-
7/29/2019 240662_633888485056270520.ppt
32/115
Disadvantages of VLIW Architectures
New kinds of programmer/compiler complexity
Programmer (or code-generation tool) must keep
track of instruction scheduling
Deep pipelines and long latencies can be confusing,
may make peak performance elusiveIncreased memory use
High program memory bandwidth requirements
High power consumptionMisleading MIPS ratings
V l iTI
-
7/29/2019 240662_633888485056270520.ppt
33/115
VelociTI
VLIW modification done by TI is called VelociTI
Reduces code size Increases performance when instructions reside off-chip
C6X architecture is based on the high-performance advanced
VelociTI very-long-instruction-word (VLIW) architecture
developed by Texas Instruments (TI)
an excellent choice for multichannel and multifunction
applications (Several instructions captured & processed
simultaneously)
TMS320C6x with VelociTI Enables Cost-Effective
-
7/29/2019 240662_633888485056270520.ppt
34/115
TMS320C6x with VelociTI Enables Cost-EffectiveSolutions for Emerging
Applications
Unlimited Internet bandwidth
Universal wireless communication
New telephony features
Remote medical diagnostics
Automated cruise control
Personal home base station
Personalized home security
TMS320C6000 DSP Device Nomenclature
-
7/29/2019 240662_633888485056270520.ppt
35/115
TMS320C6000. DSP Device Nomenclature
-
7/29/2019 240662_633888485056270520.ppt
36/115
TMS320C6711
A floating point processorwith VLIW architecture
Internal memory includes a two level cache architecture
- 4KB of level 1 program cache (L1P)
- 4KB of level 1 data cache (L1D)
- 64 KB of RAM / level 2 cache for data/program (L2) Has direct interface to both synchronous memories (SDRAM
and SBSRAM) and asynchronous (SRAM and EPROM)
With 32 bit address bus , total memory space is 232 =4GB
It requires 3.3v for I/O and 1.8v for core
Operates at 150 MHz
perform 900 million floating point operations per second(MFLOPS)
Translates to 1200 million instructions per second (MIPS)
-
7/29/2019 240662_633888485056270520.ppt
37/115
DSK Contents
-
7/29/2019 240662_633888485056270520.ppt
38/115
1.8V Power Supply 16M SDRAM 128K FLASHDaughter Card I/F(EMIF Connector)
ParallelPort I/F
PowerJack
PowerLED
3.3V Power Supply
JTAG Header
EmulationJTAG Header
Reset
Line Level Output (speakers)
Line Level Input (microphone)
16-bit codec (A/D & D/A)
Three User LEDs
User DIP
switches
C6711DSP
D. Card I/F(Periph Con.)
TMS320C6711
Block diagram
-
7/29/2019 240662_633888485056270520.ppt
39/115
Block diagram
-
7/29/2019 240662_633888485056270520.ppt
40/115
CPU There are two sets of functional units A and B
Each set contains four units and a register file. One set contains functional units .L1, .S1, .M1, and
.D1
the other set contains units .D2, .M2, .S2, and .L2. .M unit : multiplication operation
.L unit : logical and arithmetic operations
.S unit : branch, bit manipulation and arithmeticoperations
.D unit : load/store and arithmetic operations
-
7/29/2019 240662_633888485056270520.ppt
41/115
-
7/29/2019 240662_633888485056270520.ppt
42/115
The C67x CPU executes all C62x instructions.
In addition to C62x fixed-point instructions, the six out of
eight functional units (.L1, .S1, .M1, .M2, .S2, and .L2)
also execute floating-point instructions.
The remaining two functional units (.D1 and .D2) also
execute the new LDDW instruction which loads 64 bits
per CPU side for a total of 128 bits per cycle.
TMS320C6711 Memory
-
7/29/2019 240662_633888485056270520.ppt
43/115
TMS320C6711 Memory
3-Access level of Memory Map
-
7/29/2019 240662_633888485056270520.ppt
44/115
3 Access level of Memory Map1. L1 Memory
-Cache-based Architecture
-Program Cache & Data Cache
-Size : PC(4Kbyte), DC(4Kbyte)
2. L2 Memory
- Size : 64Kbyte
- Program & Data
3. L3 Memory
External Memory
-
7/29/2019 240662_633888485056270520.ppt
45/115
-
7/29/2019 240662_633888485056270520.ppt
46/115
External Memory
- Synchronous Memory
(SRAM, SBSRAM)
- Asynchronous Memory
(SDRAM, EPROM)
Internal Memory
- Program
- Data
Registers:
-
7/29/2019 240662_633888485056270520.ppt
47/115
g
The two register files each contain 16 32-bit registers for atotal of 32 general-purpose registers (A0~A15, B0~B15)
Interaction with the CPU must be done through these
registers
The four functional units on each side of the CPU can freely
share the 16 registers belonging to that side.
two cross paths 1x and 2x connects all the registers on the
other side
(which can access data from the register files on the
opposite side.)
If register access is by functional units on the same side of
the CPU, register file can service all the units in a single
clock cycle
-register access using the register file across the CPU
supports one read and one write per cycle.
Restrictions on Register Accesses
-
7/29/2019 240662_633888485056270520.ppt
48/115
Registers A0,A1,B0,B1 are used as conditional registers
Registers A4-A7 and B4-B7 are used for circular addressing
Registers A0-A9 and B0-B9 (except B3) are temporary
registers
Any Registers A10-A15 and B10-B15 used are saved and later
restored before returning from a subroutine
Restrictions on Register Accesses
-
7/29/2019 240662_633888485056270520.ppt
49/115
Each function unit has read/write ports
Data path 1 (2) units read/write A (B) registers
Data path 2 (1) can read one A (B) register per cycle
40 bit words stored in adjacent register pair
Used in extended precision accumulation
32 LSB bits are stored in even register(eg.A2) and remaining8 bits stored in the 8 LSB of next upper (odd) register(A3)
64 bit is also stored in the similar fashion
Two simultaneous memory accesses cannot use registers of
same register file as address pointers
-
7/29/2019 240662_633888485056270520.ppt
50/115
C6 i t l b
-
7/29/2019 240662_633888485056270520.ppt
51/115
C6x internal buses
-
7/29/2019 240662_633888485056270520.ppt
52/115
-
7/29/2019 240662_633888485056270520.ppt
53/115
'C6x Peripherals
-
7/29/2019 240662_633888485056270520.ppt
54/115
C6x Peripherals
C6x
CPU
EMIF
DMA
Boot
External
Memory
EMIFExternal Memory Interface.
A 32-bit bus on which external memories and other devices can beconnected.
It includes features like internal wait state generation and SDRAM control.
The EMIF can interface to both synchronous and synchronous memories.
McBSP
HPI/XB
Timer
PLL
McBSP
-
7/29/2019 240662_633888485056270520.ppt
55/115
McBSP
2 McBSP Multichannel buffered serial ports.Each McBSP can be used for high speed serial data
transmission with external devices or reprogrammed as generalpurpose I/Os.
McBSP1 is used to transmit and receive audio data from the
AIC23 stereo codec.
McBSP0 is used to control the codec through its serial control
port.
On chip PLL t l k t f l t l l k
-
7/29/2019 240662_633888485056270520.ppt
56/115
On-chip PLLgenerates processor clock rate from slower external clockreference.
Timersgenerates periodic timer events as a function of the processor clock. Usedby DSP/BIOS to create time slices for multitasking.
Power Down units - Save power for durations when CPU is inactive
EDMA Controller Enhanced DMA controller allows high speed data transfers
without intervention from the DSP.
BOOT- Boot from 4M external block
- Boot from HPI/XB
SBSRAM: Synchronous Burst Static Random Access Memory
Host Port Interface (HPI)
-
7/29/2019 240662_633888485056270520.ppt
57/115
Host Port Interface (HPI)
The host port interface (HPI) is a parallel port through which a
host processor can directly access the CPUs memory space. The host device is the master of the interface, therefore
increasing its ease of access.
The host and the CPU can exchange information via internal or
external memory. In addition, the host has direct access to memory-mappedperipherals.
Connectivity to the CPUs memory space is provided throughthe DMA controller.
Expansion bus (XB) is a replacement for the HPI, as well as anexpansion of the EMIF.
The expansion provides two distinct areas of functionality (host
port and I/O port) which can co-exist in a system
-
7/29/2019 240662_633888485056270520.ppt
58/115
CPU operations
Fetch instruction from memory (DSP programmemory)
Decode instruction
Execute instruction including reading datavalues
Program Fetch (F)
-
7/29/2019 240662_633888485056270520.ppt
59/115
Program Fetch (F)
Program fetching consists of 4 phases
generate fetch address (PG) send address to memory (PS)
wait for data ready (PW)
read opcode (PR)
C6x
Memory PGPS
PW
PR
-
7/29/2019 240662_633888485056270520.ppt
60/115
Decode Stage (D)
Decode stage consists of two phases dispatch instruction to functional unit (DP)
instruction decoded at functional unit
(DC)
C6x
Memory PGPS
PW
PR DCDP
-
7/29/2019 240662_633888485056270520.ppt
61/115
Execute Stage (E)
An execute packet (EP) consists of a group ofinstructions that can be executed in parallel within thesame cycle
Number of EP within a fetch packet can vary from one(with 8 parallel instructions) to 8 (with no parallelinstructions)
bit 0 (LSB) of every 32 bit instruction determines if thenext instruction belongs to same EP or not
if 1 same EP
if 0 part of next EP
FETCH and EXECUTION PACKETS(
-
7/29/2019 240662_633888485056270520.ppt
62/115
(Fetch packet consists of 8 32-bit instructions)
Consider an FP with three EP:
Instruction A
II Instruction B
instruction C
II Instruction D
II Instruction E
Instruction F
II Instruction G
II Instruction H
A D E F G HCB
31 031 0 31 0 31 031 0 31 0 31 0 31 0 31 0
In the fetch packet ,EP1 contains 2 parallelinstructions,EP2 contains 3andEP3 has 3 parallel instructions
Pipelining
-
7/29/2019 240662_633888485056270520.ppt
63/115
p g
Overlap operations to increase performance
Pipeline CPU operations to increase clock speed over
a sequential implementation
Separate parallel functional units
Peripheral interfaces for I/O do not burden CPU
It is a key feature in DSP to get parallel instructions working properly
Requires careful timing
non pipelined scalar architect re
-
7/29/2019 240662_633888485056270520.ppt
64/115
non-pipelined scalar architecture
- A processor that executes every instruction one after the
other- may use processor resources inefficiently, potentially
leading to poor performance.
pipelining
- executing different sub-steps of sequential instructionssimultaneously
superscalar architectures
- executing multiple instructions entirely simultaneously
-
7/29/2019 240662_633888485056270520.ppt
65/115
-
7/29/2019 240662_633888485056270520.ppt
66/115
-
7/29/2019 240662_633888485056270520.ppt
67/115
There are 3 stages of pipelining:
P f h f
-
7/29/2019 240662_633888485056270520.ppt
68/115
Program fetch composed of 4 phases
PGprogram address generateto fetch an address
PSprogram address sendto send the address
PWprogram address ready waitto wait for data
PRprogram fetch packet receiveto read opcode frommemory
Decode stage composed of 2 phasesDPdispatchall the instructions within an FP to theappropriate functional units
DCinstruction decode
Execute stage composed of 6 (fixed point)-10 (floating point)a) multiplication instruction consists of 2 phases due to 1 delay
b) load instruction consists of 5 phases due to 4 delays
c) branch instruction consists of 6 phases due to 5 delays
Pipeline phases
-
7/29/2019 240662_633888485056270520.ppt
69/115
Program fetch decode execute
PG PS PW PR DP DC E1- E6 (E1-E10 for doubleprecision)
Pipelining effectsClock cycles
1 2 3 4 5 6 7 8 9 10
PG PS PW PR DP DC E1 E2 E3 E4
PG PS PW PR DP DC E1 E2 E3
PG PS PW PR DP DC E1 E2PG PS PW PR DP DC E1
PG PS PW PR DP DC
PG PS PW PR DP
PG PS PW PR
Each row represents an FP
-
7/29/2019 240662_633888485056270520.ppt
70/115
p
PG of first FP starts in cycle 1,PG of second FP starts in cycle 2
and so on.
Each FP has 4 phases for fetch ,2 phases for decode andexecution phases can take from 1 to 10 phases
At cycle 7,
instruction in the first FP are in the first execution phase E1,
instruction in the second FP is in decoding phase,
instruction in the third FP is in dispatching phase
and so on..
All the instructions are proceeding through various phases
Therefore pipeline is FULL
Most instructions have 1 execute phase
-
7/29/2019 240662_633888485056270520.ppt
71/115
Multiply (MPY) has 2
Load (LDH/LDW) has 5
Branch (B) has 6 phases
Additional execute phases are associated with floating point anddouble precision type instructions (upto 10 phases)
eg: MPYDP has 9 delay slots and a total 10 phases
Functional unit latency:
The number of cycles that an instruction ties up a functional unit. it is 1 for all instructions except double precision instructions
no other instructions can use the functional unit
it is different from delay slot
eg: MPYDP has 4 functional unit latency but 9 delay slots
delay slot: some instructions that are physically after the instruction areexecuted as if they were located before it.
Classic examples are branch and call instructions, which often execute the
following instruction before the branch or call is performed.
Instruction Set
-
7/29/2019 240662_633888485056270520.ppt
72/115
Instruction Set
Assembly code format:
Label II [ ] Instruction Unit operands ; comments
A Label represents a specific address/memory location that contains an
instruction or data (label must be in the first column)
Parallel bars (II) are used if the instructions are being executed parallel with
the previous instructions
this field ([ ]) is optional to make the associated instruction conditional
- 5 registers are used as conditional registers
- [A2] specifies that the associated instruction executes if A2 is not zero
- [!A2] associated instructions are executed if A2 is zero
instruction field can be assembler directive or mnemonic
-
7/29/2019 240662_633888485056270520.ppt
73/115
- assembler directive is a command for assembler
.short : initialize 16 bit integer
.int : initialize 32 bit integer
.float : initialize 32 bit IEEE single precision constant- mnemonic is an actual instruction that executes at run time
Unit field can be any one of the 8 functional units (optional)
Comments starting in column 1 begin with an asterisk or a semicolonwhereas comments starting in any other column must begin with asemicolon
ADD .L1 A3,A7,A7 ; add A3+A7 A7
MPY .M2 A7,B7,B6 ; multiply 16 LSBs of A7,B7 B7
II MPYH .M1 A7,B7,A6 ; multiply 16 MSBs of A7,B7 A6
Eg:
Instruction set
-
7/29/2019 240662_633888485056270520.ppt
74/115
Instruction set They are designed to make maximum use of the
processors resources and at the same time minimizethe memory space required to store the instructions.
Minimizing the storage space ensures the cost
effectiveness of the overall system.
To ensure the maximum use of hardware of the DSP,
the instructions are designed to perform several
parallel operations in a single instruction, typically
including fetching of data in parallel with mainarithmetic operation.
Instructions are kept short by restricting which register
-
7/29/2019 240662_633888485056270520.ppt
75/115
Instructions are kept short by restricting which registercan be used with which operations and whichoperations can be combined in an instruction.
Some of the latest processors use VLIW architectures,where in multiple instructions are issued and executedper cycle.
In such architectures the instructions are short anddesigned to perform much less work thus requiringless memory and increased speed because of theVLIW architecture.
-
7/29/2019 240662_633888485056270520.ppt
76/115
-
7/29/2019 240662_633888485056270520.ppt
77/115
C67x Addl Instructions (by unit)
-
7/29/2019 240662_633888485056270520.ppt
78/115
( y )
.S Unit
CMPLTDPRCPSP
RCPDP
RSQRSP
RSQRDP
SPDP
ABSSPABSDP
CMPGTSP
CMPEQSP
CMPLTSP
CMPGTDP
CMPEQDP
.M Unit
MPYI
MPYID
MPYSP
MPYDP
.L Unit
INTSPINTSPU
SPINT
SPTRUNC
SUBSP
SUBDP
ADDDPADDSP
DPINT
DPSP
INTDP
INTDPU
.D Unit
ADDAD LDDW
Control Register File
-
7/29/2019 240662_633888485056270520.ppt
79/115
-
7/29/2019 240662_633888485056270520.ppt
80/115
The interrupt flag register(IFR)
-
7/29/2019 240662_633888485056270520.ppt
81/115
- contains the status of INT4-INT15 and NMI interrupt.
- Each corresponding bit in the IFR is set to 1 when that
interrupt occurs; otherwise, the bits are cleared to 0.- If you want to check the status of interrupts, use the MVC
instruction to read the IFR.
The interrupt return pointer register(IRP)
- contains the return pointer that directs the CPU to the
proper location to continue program execution after
processing a maskable interrupt.
- A branch using the address in IRP (B IRP) in yourinterrupt service routine returns to the program flow when
interrupt servicing is complete.
-
7/29/2019 240662_633888485056270520.ppt
82/115
Addressing modes
-
7/29/2019 240662_633888485056270520.ppt
83/115
Determines how one access memory
Addressing refers to means to specify location of operands forinstructions
- types of addressing are called addressing modes
- operands may be input operands for the operation as well asresults of the operation
Addressing modes supported by the TMS320C67x include
register-indirect,
indexed register-indirect,
and modulo addressing (circular addressing).
Immediate data is also supported.
The TMS320C67x does not support modulo addressing for 64-bit data.
-
7/29/2019 240662_633888485056270520.ppt
84/115
-
7/29/2019 240662_633888485056270520.ppt
85/115
-
7/29/2019 240662_633888485056270520.ppt
86/115
-
7/29/2019 240662_633888485056270520.ppt
87/115
Circular Buffer
-
7/29/2019 240662_633888485056270520.ppt
88/115
At the beginning of eachsample period,
a new sample will be read into
the circular buffer,overwriting
the oldest sample.The newest sample x(n) will be
stored at the memory location
pointed at by auxiliary register
AR(i).
The need of processing the digital signals in real time,l th t f Ci l B ff i
-
7/29/2019 240662_633888485056270520.ppt
89/115
evolves the concept ofCircular Buffering. Circular buffers are used to store the most recent
values of a continually updated signal.
Circular buffering allows processors to access a blockof data sequentially and then automatically wraparound to the beginning address exactly the patternused to access coefficients in FIR filter.
Circular buffering also very helpful in implementingfirst-in, first-out buffers, commonly used for I/O and for
FIR delay lines.
-
7/29/2019 240662_633888485056270520.ppt
90/115
-
7/29/2019 240662_633888485056270520.ppt
91/115
AMR mode and description
Mode description00 for linear addressing
01 for circular addressing using BK0
For circular addressing using BK1
reserved
-
7/29/2019 240662_633888485056270520.ppt
92/115
-
7/29/2019 240662_633888485056270520.ppt
93/115
-
7/29/2019 240662_633888485056270520.ppt
94/115
-
7/29/2019 240662_633888485056270520.ppt
95/115
Block size = 2N+1 bytes
-
7/29/2019 240662_633888485056270520.ppt
96/115
Eg:
-
7/29/2019 240662_633888485056270520.ppt
97/115
MVK .S2 0X0004,B2
; lower 16 bits to B2
MVKLH .S2 0x0005,B2
; upper 16 bits to B2
The value 0x0004 =(0100) into 16 LSB of AMR sets bit 2 (third bit)to 1 and all other bits to zero.
This sets the mode to 01 and selects register A5 as pointer to
buffer using BK0
The value 0x0005 =(0101) into 16 MSB of AMR sets bits 16 and18 to 1.
This corresponds to value of N used to select size of buffer = 2 N+1
= 64 bytes using BKO
-
7/29/2019 240662_633888485056270520.ppt
98/115
Reset (RESET)
-
7/29/2019 240662_633888485056270520.ppt
99/115
Reset (RESET)
Reset is the highest priority interrupt and is used to
halt the CPU and return it to a known state.
The reset interrupt is unique in a number of ways:
- RESET is an active-low signal. All other interruptsare active-high signals.
- RESET must be held low for 10 clock cycles before it
goes high again to reinitialize the CPU properly.
- The instruction execution in progress is aborted andall registers are returned to their default states.
- RESET is not affected by branches.
Nonmaskable Interrupt (NMI)
-
7/29/2019 240662_633888485056270520.ppt
100/115
- NMI is the second-highest priority interrupt- generally used to alert the CPU of a serious
hardware problem such as imminent power failure.
- For NMI processing to occur, the non maskable
interrupt enable (NMIE) bit in the interrupt enableregister must be set to 1.
-
7/29/2019 240662_633888485056270520.ppt
101/115
-
7/29/2019 240662_633888485056270520.ppt
102/115
Multichannel Buffered Serial Port (McBSP)
-
7/29/2019 240662_633888485056270520.ppt
103/115
The standard serial port interface provides:
Full-duplex communication
Double-buffered data registers, which allow a continuous data stream
Independent framing and clocking for reception and transmission
Direct interface to industry-standard codecs, analog interface chips(AICs), and other serially connected A/D and D/A devices
- Multi channel transmission and reception of up to 128 channels.
An element sizes of 8, 12, 16, 20, 24, or 32-bit.
- 8-bit data transfers with LSB or MSB first.
-
7/29/2019 240662_633888485056270520.ppt
104/115
-
7/29/2019 240662_633888485056270520.ppt
105/115
-
7/29/2019 240662_633888485056270520.ppt
106/115
-
7/29/2019 240662_633888485056270520.ppt
107/115
-
7/29/2019 240662_633888485056270520.ppt
108/115
The DMA controller uses the bus request pin to notifyth DSP th t it i d t k t f t
-
7/29/2019 240662_633888485056270520.ppt
109/115
the DSP core that it is ready to make a transfer to orfrom external memory.
The DSP core completes its current instruction,releases control of external memory and signals theDMA controller via the bus grant pin that the DMAtransfer can proceed.
The DMA controller then transfers the specifiednumber of data words and optionally signalscompletion through an interrupt.
Some processor can also have multiple channels
DMA managing DMA transfers in parallel.
Timer
-
7/29/2019 240662_633888485056270520.ppt
110/115
Timer
The C67x has two 32-bit general-purpose timers that can beused to:
Time events
Count events
Generate pulses
Interrupt the CPU
Send synchronization events to the DMA controller
-
7/29/2019 240662_633888485056270520.ppt
111/115
-
7/29/2019 240662_633888485056270520.ppt
112/115
The timer works in one of the two signaling modes dependingon whether clocked by an internal or an external source.
The timer has an input pin (TINP) and an output pin (TOUT). The TINP pin can be used as a general purpose input, and the
TOUT pin can be used as a general-purpose output.
When an internal clock is provided, the timer generates timingsequences to trigger peripheral or external devices such asDMA controller or A/D converter respectively.
When an external clock is provided, the timer can countexternal events and interrupt the CPU after a specified number
of events.
oa tore pt onsIn 'C6x the instruction set supports several types
-
7/29/2019 240662_633888485056270520.ppt
113/115
Four load instructions:LDDW Loa 64-bit double word (C67x only)
LDW Load 32-bit word
LDH Load 16-bit half-word (short)
LDB Load 8-bit byte
Three store instructions:
STWSTH
STB
of load/store instructions:
-
7/29/2019 240662_633888485056270520.ppt
114/115
Load, and Store Paths
-
7/29/2019 240662_633888485056270520.ppt
115/115
The C67x DSP has two 32-bit paths for loading data from memory tothe register File: LD1 for register file A, and LD2 for register file B. The C67x DSP also has a second 32-bit load path for both register
files A and B. This allows the LDDW instruction to simultaneously load two 32-bit
values into register file A and two 32-bit values into register file B. For side A, LD1a is the load path for the 32 LSBs and LD1b is the
load path for the 32 MSBs. For side B, LD2a is the load path for the 32 LSBs and LD2b is the
load path for the 32 MSBs.
There are also two 32-bit paths, ST1 and ST2, for storing registervalues to memory from each register file.