Download - DSP architecture - part 1.ppt
![Page 1: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/1.jpg)
Jgvldgram
Digital signal processors and applications
ByProf. Sridhar Ranganathan
VIT,Chennai
![Page 2: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/2.jpg)
Why DSP?
• DSP algorithms need lot of mathematical operations on every sample of data
• They need to be done quickly [ before next sample of data arrives]
• Deferred processing is NOT possible• General purpose processors provide Add,
subtract and shift operations• They provide multiply and divide but typically
they take lot of memory cycles
![Page 3: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/3.jpg)
DSP need
• General purpose processors also suffer from space constraints
• They also consume lot of power
![Page 4: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/4.jpg)
Hardware features visible through DSP
• Hardware Modulo addressing• Memory architecture designed for streaming
data – they may support several memory accesses per cycle
• DMA• Multiple arithmetic units• Harvard architecture
![Page 5: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/5.jpg)
Features of DSP contd..• Special SIMD instructions• Some processors use VLIW [ very long instruction
word] techniques• Specialized instructions of DSP[ like MAC – Multiply
and accumulate] work quickly• Special algorithms are packaged as libraries for quick
functionality• Bit reversed addressing that would help in calculating
FFT• Deliberate exclusion of memory management unit –
they do not support virtual memory
![Page 6: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/6.jpg)
Data opertions
• Saturation arithmetic – operations that produce overflows will accumulate the maximum number as the result
• Fixed point arithmetic • Single cycle operations
![Page 7: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/7.jpg)
History of DSP
• Originally Bit slice processors are used for implementing DSPs
• Example – AMD2901 4 bit processors• By connecting several AMD20
![Page 8: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/8.jpg)
Multiplier and Multiplier Accumulator[MAC]
• Array multiplication is one of the common operations required in DSP
• Example operations that require array multiplication are – Convolution– Correlation
• One of the important requirements of array multiplier is that we need to process the signals in real time
• Operations related to one sample need to be completed before next sample arrives
• If sampling frequency is 100 Hz, the operations needed by the present sample need to be completed before 0.01s
• Higher the sampling frequency, lesser the time available for computation based on present sample
![Page 9: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/9.jpg)
How to construct a real time array multiplier
• Two approaches– A dedicated MAC unit may be implemented in hardware,
which integrates multiplier and accumulator in a single unit Ex. Motorola DSP5600X
– Have separate multiplier and accumulator. Example for this approach TIDSP320C5X - Here output of the accumulator is stored into product register and content of product register is added to accumulator register in the central ALU
– In both the approaches MAC operation can be completed in one cycle
– Thus the presence of H/W multipliers and multiplier accumulator is one of the mandatory requirements of P-DSP[Programmable DSP]
![Page 10: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/10.jpg)
How the array multiplier operates
• Let the input signal [ present sample and previous M samples are]
Array x=xnxn-1xn-2…………xn-M+3xn-M+2xn-M+1
and the array input corresponding to the impulse response of the sequence is
Array h=h0h1h2…………hM-3hM-2hM-1
• The output at the nth sampling instant yn is obtained by multiplying xn with the array h
xnxn-1xn-2…………xn-M+3xn-M+2xn-M+1
![Page 11: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/11.jpg)
Array multiplier operation - II
• x n+1 is obtained by shifting xn so that the [n+1] th sample becomes first element and all
the elements of the x array are shifted right such that ith element of xn becomes [i+1] the element of xn+1
• The content of the product register is added to accumulator before new product is stored
• Further the content of ‘dma’ is copied to next location whose address is ‘dma+1’.
![Page 12: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/12.jpg)
Harvard architecture
![Page 13: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/13.jpg)
Harvard architecture explained• This employed entirely separate memory systems to store instructions
and data• CPU fetched the next instruction• It also fetched data simultaneously• Its unique feature is instruction address space and data address space are
separate• Each address space can have the same address• So An address does NOT uniquely specify a memory location• You also need to store which address space you are referring to.• This will use two buses – one for accessing instructions and one for
accessing data
![Page 14: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/14.jpg)
Von Neumann architecture
![Page 15: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/15.jpg)
Von Neumann architecture explained
• It employs one address space• Instructions and data are stored in the same address
space• The PC refers to the next instruction• It takes the instruction, examines it and the instruction
would be having pointers to operands• If the pointer gets corrupted, there is a possibility of
program abending• As it fetches instruction and then data, this
architecture is slow • So P-DSPs rarely use this architecture
![Page 16: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/16.jpg)
Modified Harvard architecture
• In a Pure Harvard architecture, mechanisms need to be provided to load programs into program memory and initial data into data memory
• Modern machines use Multiple buses– One will access both program memory and data memory– One will access only data memory– Data can also be transferred from one memory to another
memory• This feature is used in modern day P-DSPs• This is helpful at start time too as constant data can be
transferred from program memory to data memory
![Page 17: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/17.jpg)
Advantage of having multiple busses
• Number of accesses/memory cycle can be increased
• Motorola DSP5600X, DSP96002 have three memory buses and three memory accesses/cycle
• TMS320C54X has four memory buses and four memory accesses/cycle
![Page 18: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/18.jpg)
Multiple access memory
• Memory that permits more than one memory access per cycle is called Multiple access memory
• Dual access RAM technology permits two memory accesses per clock cycle
• Four memory accesses are also possible if Dual access RAM memory is connected to P-DSP with two independent address and data buses
![Page 19: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/19.jpg)
Multiported memory
• No of accesses can be increased using multiport memory
• Typical 2 port memories will have two memory address buses and two data buses
• Thus two different chips need not be used in Harvard architecture
• Disadvantage– Increased complexity– More number of pins, more area and increased cost
![Page 20: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/20.jpg)
VLIW architecture• VLIW – Very long instruction word• Transmeta crusoe is a chip that uses this technique• TMS320C6X also uses similar technique• This reads relatively large group of instructions• They execute them at the same time• For this purpose they have
– Many ALUs– Many Multipliers– Many shifters etc.,
• VLIW is accessed from memory and it specifies the operations and operands for performing on different data paths
• It simply increases the number of instructions executed per cycle• Performance gain with VLIW depends on parallelism achievable
with the algorithm
![Page 21: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/21.jpg)
Instruction pipelining
• An instruction may have many phases– Fetch– Decode– Execute – Write
• Throughput will be low if all these are executed serially as when one stage is busy others are idle
• All these stages could be operated parallely in pipelining technique which will improve throughput
![Page 22: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/22.jpg)
Pipelining diagram
![Page 23: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/23.jpg)
Special addressing modes in P-DSPs
• Short immediate addressing• Short direct addressing• Memory mapped addressing• Indirect addressing• Bit reversed addressing• Circular addressing
![Page 24: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/24.jpg)
Special addressing modes explained - 1
• Short immediate addressing– Operand is specified as a short constant– This forms part of the instruction– Length depends on P-DSP– Example – TMS320C5X – an 8 bit constant could be used
• Short direct addressing– The lower order address of operand is specified as part of the
instruction– Higher order bits could be stored elsewhere – like a page
pointer– Example
• TITMS320 DSP – lower 7 bits are specified in instruction• Motorola DSP5600X lower 6 bits are specified in instruction
![Page 25: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/25.jpg)
Special addressing modes explained - 2
• Memory mapped addressing– CPU registers and I/O registers are accessed as memory locations– This is done by storing them in the initial or final page– Example
• TMS320C5x – page 0 corresponds to CPU registers and I/O registers• Motorola DSP5600X – last page is used
• Indirect addressing– Address of operands can be stored in one of the registers called
indirect access registers– When operands are fetched from addresses specified in registers, the
registers are updated– This is by done having another special CPU or ALU for updating these
addresses– Increment can be 1 or an offset in some special registr
![Page 26: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/26.jpg)
Special addressing modes explained - 3
• Bit reversed addressing – Binary pattern corresponding to a particular decimal number is
obtained by writing the natural binary equivalent in reverse order
– Therefore LSB becomes MSB and MSB becomes LSB– Address is incremented or decremented in bit reversed form
• Circular addressing mode– In real time data will be continuously coming– If they are stored in linear buffers, buffer would be exhausted– If they are stored in circular buffer, new data would overwrite
older data– No need to check whether we have reached the end of buffer
![Page 27: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/27.jpg)
Use of linear buffer
![Page 28: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/28.jpg)
Use of circular buffer
![Page 29: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/29.jpg)
Example of circular addressing
![Page 30: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/30.jpg)
Limitations of circular buffering
![Page 31: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/31.jpg)
Methodology for a circular buffer
![Page 32: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/32.jpg)
On Chip peripherals
• On chip timer– They generate periodic interrupts to the DSP– They also generate sampling clocks for A/D
converters
• Serial port– They enable data communication between P-DSP and peripherals
such as ADC,DAC or a RS-232C device.– These ports have buffers such that the DSP sends data and reads data
to these ports in parallel form but the data is sent out through these ports in serial form and data is read from these ports in serial form
![Page 33: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/33.jpg)
On Chip peripherals contd..
• TDM serial port– a special serial port which permits P-DSP to
communicate with other devices or other P-DSPs using Time division multiplexing format
• Parallel ports– They are faster than serial port
• Bit I-O port– These are only single bit wide– They can be individually set, reset or read– These bits are used for control purposes or for data transfer also
![Page 34: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/34.jpg)
On Chip peripherals contd..
• Host port– A special type of parallel port the P-DSPs have– This enables the P-DSPs to communicate with a processor or a PC which is
called host– They can communicate data through this– They can generate interrupts– They also help P-DSP to load a program from ROM to RAM
• Common ports– They are used for communication between many P-DSPs in a
multiprogrammed system
• On Chip ADCs and DACs– They are used to enable P-DSP communicate with analog world– They are used in cellular phones and tapeless answering machines
![Page 35: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/35.jpg)
TMS320C50
![Page 36: DSP architecture - part 1.ppt](https://reader031.vdocuments.us/reader031/viewer/2022031815/55cf9b0b550346d033a48532/html5/thumbnails/36.jpg)
Complex DSP operations
• Sum of products is the most key element in most DSP algorithms
Algorithm Equation
Finite Impulse Response Filter
M
kk knxany
0
)()(
Infinite Impulse Response Filter
N
kk
M
kk knybknxany
10
)()()(
Convolution
N
k
knhkxny0
)()()(
Discrete Fourier Transform
1
0
])/2(exp[)()(N
n
nkNjnxkX
Discrete Cosine Transform
1
0
122
cos).().(N
x
xuN
xfucuF