introduction to digital signal processors
DESCRIPTION
DIGITAL SIGNAL PROCESSINGTRANSCRIPT
01/03/2007 Bryan Stenquist - University of Utah
1
Overview of topics
• DSP in the world
• Motivations of DSP Architectures
• Overview DSP specific hardware
• Types of DSP Architectures
• State of DSP
01/03/2007 Bryan Stenquist - University of Utah
2
Where DSPs found?
• Consumer products– DVD, MP3, HDTV
• Communications– Radio, Cellular Phones
• Medical– Pacemakers, MRI, Spectrometers
• Industrial– Motor controllers, environmental controls
01/03/2007 Bryan Stenquist - University of Utah
3
Technologies used for Digital Signal ProcessING
• General Purpose Processors (GPP)– Intel IPX400, IBM PowerPC
• Field Programmable Gate Array (FPGA)– Xilinx, Altera
• Application Specific Integrated Circuit (ASIC)
• DSP Processors (or DSPs)
01/03/2007 Bryan Stenquist - University of Utah
4
Advantages of DSPs
• Reprogrammable– High level to assembly programming with well
documented tools
• Cost effective in low volume applications– Modestly priced development hardware
• Desirable speed / cost / efficiency for evaluation of design trade-offs
01/03/2007 Bryan Stenquist - University of Utah
5
Motivations for DSP Architecture
• Most features of DSPs have the structures of a DSP algorithm– Fast multipliers– Multiple Execution Units– Efficient Memory access– Data formats– Streamlines I/O– Special instruction sets
01/03/2007 Bryan Stenquist - University of Utah
6
• Case study: The dot product (FIR filter)
– Requires fast Multiplication and addition of product terms referred to as accumulation.
– MAC (Multiply - ACcumulate)
The Fast Multiplier
0 1 N
0 1 N
where
{ , , , }
{ , , , }
and is the number of terms
y h x
h h h h
x x x x
N
01/03/2007 Bryan Stenquist - University of Utah
7
The Fast Multiplier(cont)
• Other MAC intensive algorithms include:
– convolution,– correlation,– IIR filtering, – Fourier transform
• All DSPs provide single cycle MAC
01/03/2007 Bryan Stenquist - University of Utah
8
The Multiple Execution Unit
• High computation complexity requires multiple types of arithmetic and logic to be preformed in parallel to increase speed.
• Parallel execution units will have:– MAC– ALU– Shifter
01/03/2007 Bryan Stenquist - University of Utah
9
The Multiple Execution Unit (cont.)
• Example: Texas Instruments C6713 DSP– It has 8 execution units ( two sections
called .L, .M, .S, .D )– All blocks are multi-function and can add
Integers– Only the .M blocks can multiply– .D is mostly used to save and store data– .L mostly does logical operations – .S mostly does shifting
01/03/2007 Bryan Stenquist - University of Utah
10
Efficient Memory Access
• Performing a MAC every cycle would require:– (1) instruction (Perform MAC)– (1) retained accumulated value– (2) data ( Two new terms to multiply)
(Lots of Bandwidth for data is required)
01/03/2007 Bryan Stenquist - University of Utah
11
Efficient Memory Access(cont)
• Technique used to provide greater bandwidth– Use of separate Instruction bus and multiple data
buses– Caching of instruction so they don’t consumer
bandwidth– Providing L1 and L2 caching for single cycle access– Automatic address generation that is incremented
without hardware intervention.• FIFO buffers• Auto increment for arrays access• Delay lines
01/03/2007 Bryan Stenquist - University of Utah
12
DSP Data formats
• Most DSPs use Fixed Point Math which is a form of Integer math– Smaller area used on chip– Less power– Decrease cost– Lower precision
• For application which require high precision floating point math is used
01/03/2007 Bryan Stenquist - University of Utah
13
DSP Data Formats (cont.)
• DSP processors use the shortest data word width that can provide the required accuracy– 16-bit fixed point is common– 24 or 32 bit hardware is for better accuracy– Accumulation registers may have more bits to
avoid overflow / saturation
01/03/2007 Bryan Stenquist - University of Utah
14
Zero overhead looping
• Special loop instructions or hardware is used to implement “for loops” such that it will automatically– Store and update an increment – Evaluate a conditional– Repeat only a specific number of times
01/03/2007 Bryan Stenquist - University of Utah
15
Streamlined I/O
• Built-in types of Input/Output (I/O) units are provide on the DSP chip such as– Serial and/or parallel ports– Low overhead interrupt handling– Direct Memory Access (DMA) to provide data
to on chip memory without processor intervention
01/03/2007 Bryan Stenquist - University of Utah
16
The Special Instruction Set
• Special instruction sets are provided for a DSP to maximize hardware use. – Perform parallel operations with single intructions– Include data operations, pointer updates and
arithmetic simultaneously
• They also can minimize memory space used– Use of smaller instruction word length– Change of Hardware configuration to decrease
instructions that are used
• Specialized instruction may require coded in assembly due to added complexity
01/03/2007 Bryan Stenquist - University of Utah
17
Conventional DSP Processors
• Low-cost / Good performance– Operated at 20 to 50 Mhz– Include only 1 MAC an ALU and few additional
execution units• Target applications include
– Disk drives– Answering machines
• Examples: – ADI ADSP-21xx, – TI TMS320C2xxx, – Freescale (motorola) DSP 560xx
01/03/2007 Bryan Stenquist - University of Utah
18
Conventional DSP Processors(cont)
• Mid-range higher performance– Increaseed speed– Extra hardware such as shifter or instruction
cache– Deeper pipeline
• Example include– Freescale 563xx– TI TMS320C5xxx
01/03/2007 Bryan Stenquist - University of Utah
19
Enhanced Conv. DSP Processors
• Maintain cost/performance trade-off for more intensive applications
• Provide Multiple MACs with ALU and shifter with parallel execution.
• Extended instruction sets• Wider data/instruction• Examples include:
– Lucent DSP16xxx– ADI Blackfin BF533
01/03/2007 Bryan Stenquist - University of Utah
20
Multi-issue Architecture
• Motivation: – Enhance-conv DSP use such specialized
hardware ad compound instructions, they require assembly programming
– Provide friendly structures for C compiler optimization
– Use simpler instructions in parallel groups
01/03/2007 Bryan Stenquist - University of Utah
21
Types of Multi-issue Architecture
• VLIW( Very Long Instruction Word) or its hybrid VelocTI (both are TI inventions)– Many execution units each having its own instruction.
Four to eight executions to cycle– Instructions and executions are group at the time the
program is compile/assembled• Superscalar
– Uses special hardware to group instruction and executions in parallel
– Extremely complex design and not deterministic (same code many execute in different orders each time its call changing the time taken to perform)
01/03/2007 Bryan Stenquist - University of Utah
22
State of DSPs
• They have diversified• They are a key part in designs to
– increasing speed– decreasing energy consumption– decrease memory usagewhile balancing cost.
• There is a need for more efficient compilers because assembly is too time intensive