Download - Programmable Dsp Lecture4
-
8/11/2019 Programmable Dsp Lecture4
1/24
1P.D. Sawaant
Contents:
Memory space of TMS320C67XX
Program Control..
Interrupts of TMS320C67XX processors.
Pipeline Operation of TMS320C67XX Processors.
On-Chip peripherals.
-
8/11/2019 Programmable Dsp Lecture4
2/24
P.D. Sawaant 2
Memory Map of TMS320C67xx Processor.
The processor uses a two-level cache-based architecture.
The Level 1 program cache (L1P) is a 4K-byte direct-mapped cache and the Level 1 data cache (L1D) is a 4K-
byte 2-way set-associative cache.
The Level 2 memory/cache (L2) consists of a 256K-byte
memory space that is shared between program and data
space.
64K bytes of the 256K bytes in L2 memory can be
configured as mapped memory, cache, or combinations of
the two.
The remaining 192K bytes in L2 serve as mapped SRAM.
-
8/11/2019 Programmable Dsp Lecture4
3/24
P.D. Sawaant 3
Memory Map of TMS320C67xx Processor.
-
8/11/2019 Programmable Dsp Lecture4
4/24
P.D. Sawaant 4
Memory Map of TMS320C67xx Processor.
L2 Memory
Configuration
-
8/11/2019 Programmable Dsp Lecture4
5/24
P.D. Sawaant 5
On Chip Memory and Peripherals for TMS320C67xx Processor.
Processors
(TMS320C67xx)
Data Memory Program
Memory
Peripherals and external memory
interface.
TMS320C6701 16K X 32 16K X 32 A 4-channel DMA, A16-bit HPI, 2-
BSPs, 2-Timers and a 32-bit EMIF
TMS320C6711 32K bits
L1 cache
32K bits
L1 cache
A 16-channel enhanced DMA, a 16-bit
HPI , 2-BSPs, 2-Timers, and a 32-bit
external memory interface.
512K bits unified L2 cache
TMS320C6712 32K bits
L1 cache
32K bits
L1 cache
A 16-channel enhanced DMA, a 16-bit
HPI , 2-Serial ports, 2-Timers, and a 16-
bit EMIF.512K bits unified L2 cache
TMS320C6713 4K bytes
L1 cache
4K bytes
L1 cache
A 16-channel enhanced DMA, a 16-bit
HPI , 2-McBSPs, 2-Timers, and a 32-bit
EMIF.64K bytes L2 cache and
192K bytes L2 SRAM
-
8/11/2019 Programmable Dsp Lecture4
6/24
-
8/11/2019 Programmable Dsp Lecture4
7/24
P.D. Sawaant 7
Interrupts of TMS320C67xx Processors:
Many times, when CPU is in the midst of executing a program, a peripheral device
may require a service from the CPU. In such a situation, the main program may be
interrupted by a signal generated by the peripheral devices. This results in the processorsuspending the main program in order to execute another program, called interrupt
service routine, to service the peripheral device. On completion of the interrupt service
routine, the processor returns to the main program to continue from where it left.
Interrupt may be generated either by an internal or an external device. It may also be
generated by software.
Not all interrupts are serviced when they occur. Only those interrupts that are called
nonmaskableare serviced whenever they occur.
Other interrupts, which are called maskableinterrupts, are serviced only if they are
enabled.
There is also a priority to determine which interrupt gets serviced first if more thanone interrupts occur simultaneously.
Almost all the devices of TMS320C67xx family have 32 interrupts. However, the
types and the number under each type vary from device to device.
Some of these interrupts are reserved for use by the CPU.
-
8/11/2019 Programmable Dsp Lecture4
8/24
P.D. Sawaant 8
Pipeline Operation of TMS320C67XX Processors
The CPU of 67xx devices have a 16-level-deep instruction pipeline.
FETCH Phase: Includes
1) PG (Program Address Generation Phase) : Computes the nextsequential fetch-packet address or branch address.
2) PS(Program-Address-Send phase) : sends the program address to
memory.
3) PW(Program-Address-Ready-Wait phase) : Waits until either amemory access is completed.
4) PR(Program-Fetch-Packet-Receive Phase) : Receives the fetch
packet from memory.
DECODE Phase: Include
5) DP(Instruction-Dispatch phase): Separates fetch packets into execute
packets.
6) DC(Instruction-Decode phase): Decode source register, destination
register and associated paths.
-
8/11/2019 Programmable Dsp Lecture4
9/24
P.D. Sawaant 9
Pipeline Operation of TMS320C67XX Processors
EXECUTE Phase:
It is divided in to phases 7-11(E1-E5).
Different instruction require different number of phases.
CPU executes each instruction within 8-Functional units.
Most instruction require only one execution phase E1 & no delay.
Multiply Instruction like MPY and SMPY require two execution
phases E1 and E2.
This implies that a latency of 2-Instruction cycle and a delay of 1-
Instruction cycle are introduced in multiply instruction.
Latency:
Is the number of cycles between the execution of two consecutive
instruction on the same functional unit.Delay:
Delay is the number of cycles until the result is ready.Eg. LDB & LDH requires E1 to E5, thus latency & delay is 5-Instruction cycles.
Eg. Branch Instruction (B) needs E1 but reaches its target 5 cycles later. Therefore
branch have latency 6-Instruction cycle.
-
8/11/2019 Programmable Dsp Lecture4
10/24
P.D. Sawaant 10
Pipeline Operation of TMS320C67XX Processors
Some Floating point instruction require additional delay slots
(E2-E10). Which comprise the additional delay after the E1 stage of
pipeline.
TMS320C67xx
Pipeline Phases
-
8/11/2019 Programmable Dsp Lecture4
11/24
P.D. Sawaant 11
Pipeline Operation of TMS320C67XX Processors
-
8/11/2019 Programmable Dsp Lecture4
12/24
P.D. Sawaant 12
Pipeline Operation of TMS320C67XX Processors
Diagram shows the progression of instruction cycles in the pipeline.
-
8/11/2019 Programmable Dsp Lecture4
13/24
P.D. Sawaant 13
Pipeline Operation of TMS320C67XX Processors
Parallel Operations:
The instruction word for each functional unit is 32 bits long.
Instructions are fetched 8 at a time consisting of 8 32 =
256 bits.
This group is called a Fetch Packet.
Fetch packets must start at an address that is a multiple of 832-bit words.
Up to 8 instructions can be executed in parallel.
Each must use a different functional unit.
Each group of parallel instructions is called an ExecutePacket.
-
8/11/2019 Programmable Dsp Lecture4
14/24
P.D. Sawaant 14
Peripherals of TMS320C6713
The TMS320C67x devices contain peripherals for
communication with off-chip memory, co-processors, host
processors and serial devices.
The following subsections discuss the peripherals of C6713
processor.
Enhanced DMA (EDMA)
Host-Port Interface (HPI)
External Memory Interface (EMIF)
Multichannel Buffered Serial Port (McBSP)
TimersMultichannel Audio Serial Ports (McASP)
Power Down Logic
-
8/11/2019 Programmable Dsp Lecture4
15/24
P.D. Sawaant 15
Peripherals of TMS320C6713
Enhanced DMA (EDMA):
The EDMA has following features:
Background operation: The DMA operates independently of the CPU.The EDMA has 16-independently programmable channels.
High throughput: Elements can be transferred at the CPU clock rate.
Sixteen channels: The EDMA can keep track of the contexts of
sixteen independent transfers.
Split operation: A single channel may be used simultaneously to
perform both receive and transmit element transfers to or from two
peripherals and memory.
Programmable priority: Each channel has independently
programmable priorities versus the CPU.
-
8/11/2019 Programmable Dsp Lecture4
16/24
P.D. Sawaant 16
Peripherals of TMS320C6713
Enhanced DMA (EDMA):
The EDMA has following features(cont)
Each channels source and destination address registers can have
configurable indexes for each read and write transfer. The address may
remain constant, increment, decrement, or be adjusted by a
programmable value.
Programmable-width transfers: Each channel can be independently
configured to transfer bytes, 16-bit half words, or 32-bit words.
Authentication: Once a block transfer is complete, an EDMA channel
may automatically reinitialize itself for the next block transfer.
Linking: Each EDMA channel can be linked to a subsequent transferto perform after completion.
Event synchronization: Each channel is initiated by a specific event.
Transfers may be either synchronized by element or by frame.
-
8/11/2019 Programmable Dsp Lecture4
17/24
P.D. Sawaant 17
Peripherals of TMS320C6713
Host Port Interface :
HPI is a 16-bit wide parallel port through which a host processor can
directly access the CPUs memory space.
The host device functions as a master to the interface, which increases
ease of access.
The host and CPU can exchange information via internal or external
memory.
The host also has direct access to memory-mapped peripherals.
The HPI is connected to the internal memory via a set of registers.
-
8/11/2019 Programmable Dsp Lecture4
18/24
P.D. Sawaant 18
Peripherals of TMS320C6713
Host Port Interface (cont) :
Either the host or the CPU may use the HPI Control register (HPIC) to
configure the interface.
The host can access the host address register (HPIA) and the host data
register (HPID) to access the internal memory space of the device.
The host accesses these registers using external data and interface
control signals.
The HPIC is a memory-mapped register, which allows the CPU access.
The data transactions are performed within the EDMA, and are
invisible to the user.
-
8/11/2019 Programmable Dsp Lecture4
19/24
i f S320C6 13
-
8/11/2019 Programmable Dsp Lecture4
20/24
P.D. Sawaant 20
Peripherals of TMS320C6713
Multichannel Buffered Serial Port(McBSP):
standard serial port interface provides:
Full-duplex communication
Double-buffered data registers, which allow a continuous data streamIndependent framing and clocking for reception and transmission
Direct interface to industry-standard codecs, analog interface chips
(AICs), and other serially connected A/D and D/A devices.
External shift clock generation or an internal programmable frequency
shift clock.
P i h l f TMS320C6713
-
8/11/2019 Programmable Dsp Lecture4
21/24
P.D. Sawaant 21
Peripherals of TMS320C6713
Multichannel Buffered Serial Port(McBSP) cont
Multichannel transmission and reception of up to 128 channels.
8-bit data transfers with LSB or MSB first.
Programmable polarity for both frame synchronization and data clocks.
Highly programmable internal clock and frame generation.
P i h l f TMS320C6713
-
8/11/2019 Programmable Dsp Lecture4
22/24
P.D. Sawaant 22
Peripherals of TMS320C6713
Timers:
The C62x/C67x has two 32-bit general-purpose timers that can be
used to:
Time events
Count events Generate pulses
Interrupt the CPU
Send synchronization events to the DMA controller
P i h l f TMS320C6713
-
8/11/2019 Programmable Dsp Lecture4
23/24
P.D. Sawaant 23
Peripherals of TMS320C6713
Multichannel Audio Serial Port:
The C6713 processor includes two Multichannel Audio Serial Ports
(McASP).
The McASP interface modules each support one transmit and one
receive clock zone. Each of the McASP has eight serial data pins which
can be individually allocated to any of the two zones.The serial port supports time-division multiplexing on each pin from 2
to 32 time slots.
The McASP also provides extensive error-checking and recovery
features, such as the bad clock detection circuit for each high-frequency
master clock which verifies that the master clock is within a programmed
frequency range.
P i h l f TMS320C6713
-
8/11/2019 Programmable Dsp Lecture4
24/24
P.D. Sawaant 24
Peripherals of TMS320C6713
Power Down Logic:
Most of the operating power of CMOS logic is dissipated during
circuit switching, from one logic state to another.
By preventing some or all of the chips logic from switching,
significant power savings can be realized without losing any data or
operational context.
Power-down mode PD1 blocks the internal clock inputs at the
boundary of the CPU, preventing most of its logic from switching,effectively shutting down the CPU.
Additional power savings are accomplished in power-down mode PD2,
in which the entire on chip clock structure (including multiple buffers) is
halted at the output of the PLL.Power-down mode PD3 shuts down the entire internal clock tree (like
PD2) and also disconnects the external clock source (CLKIN) from
reaching the PLL. Wake-up from PD3 takes longer than wake-up from
PD2 because the PLL needs to be relocked, just as it does following
power up.