digital signal processors - högskolan kristianstad€¦ · signal processing with digital signal...

Digital Signal Processors Introduction / 6

Dr. Wolf-Dieter Heinitz University of Rostock Institute of Automation

Signal Processing with

Digital Signal Processors

Contents

1. Why digital signal processing • Analog and digital signal processing • Advantages and problems of DSP • Useful functions and typical applications

2. Data formats for signal processing • Integer and fractional data formats • Real floating point formats • Dynamic range, saturation and rounding

3. Signal processing with standard microprocessors and PCs • Base functions for digital signal processing - MAC operation • DSP-Filter functions without multiplication • FIR - filter with a fantasy standard microprocessor

4. Historical development and types of DSPs • John von Neumann architecture • Harvard architecture • DSP - types

5. Motorola DSP 56301 • Architectural overview • Hardware structure and bus operations • Instruction set and assembler language • Programming examples • DSP56600-, DSP96000-Family

6. Texas Instruments TMS 320C33 • Architectural overview • Hardware structure and bus operations • Instruction set and assembler language • Programming examples • TMS320C40, C44

7. Overview to other DSPs • Zilog Z89175 / 176; Motorola 566xx • Texas Instruments: C8x, C2xxx, C5xx • Analog Devices: ADSP 21xx, ADSP SHARC 2106x

8. New DSP Technologies and Alternatives • TI C6xxx , • ADSP 21160

University Rostock Institute of Automation 2002 / 2003


1. Why Digital Signal Processing • What is Digital Signal Processing

We can use electronic sensor systems to convert signals to analog electrical signals. For example a microphone for a sound signal. We sample such a signal and convert into a sequence of digital numbers, this conversion process is called analog to digital conversion - ADC. The real time processing which we apply to the signal can be carried out by a digital computer, we call it: Digital Signal Processing - DSP. Once the signals has been processed by the DSP it is still in the form of a sequence of numbers. It must be converted back in to an analog signal by a digital to analog converter - DAC. Then the signal can pass an actuator, for example a loudspeaker. The misconception here is, that digital processing is simple.

• Advantages of digital signal processing Programmability, Stability, Repeatability, Adaptive signal processing and the Possibility of New Functions: − Self test and error correction codes for retrieving or transmission of data. ( for CDs or MODEMS) − Linear phase filter (FIR filter with linear phase response), Notch filter with a steep cut of frequency. − A loss less data compression or special data compression like MPEG − Signal store: loss less data store of analog signals, Systems for artificial echo systems (Dolby Surround)

• Problems and disadvantages − Analog LP Filter to limit the frequency range and to reconstruct an analog signal without the upper

frequency bands − New problems by time and amplitude quantization, Some personal problems by using new technologies. − Power requirement of DSP systems, simple analoug systems are very often cheaper − The main Problem is the speed of DSP systems Particularly the time which is necessary to realise the

arithmetic functions limited the frequency range

• Useful Functions Digital Filtering: FIR, IIR, Matched Filter, Hilbert Transforms, Windowing Numeric & Data Processing: Encrypting/Scrambling, Encoding/Decoding, Scalar, Vector/Matrix-Arithmetic Transcendental Function Computing ( Sin(x), Exp(x) ), Non-linear Functions Signal Processing: Compression and Decompression (i.e. audio and video signals),

Averaging, A/µ - Law Modulation / Demodulation: Amplitude, Frequency, Phase, or Special Digital Modulations Spectral Analysis: DFT, FFT, Sine / Cosine Transforms, MA-, AR- and ARMA-Modelling

• Applications Telecommunication: Voice Mail and Teleconferencing, ISDN and GSM , High Speed Modems and Fax Image Processing Pattern and Optical Character Recognition, Real-time Image Compression Instrumentation: Data Acquisition, Transient and Spectral Analysis, DSOs, Waveform Generation Medical Electronics: X-Ray Analysis with Computer Tomographs, Sonographs, Electrocardiograms Radar & Sonar: Navigation (GPS, Glonass), Search and Tracking, Oil and Geological Exploration Audio and Video: CD / DVD, Player, Digital Radio (DAB), Acoustic and Music Processing High Speed Control: Laser-Printer Servo, Hard-Disk Servo, Robotics, Motor and Engine Controller



2. Data formats for digital signal processing

z i Bi

i n

i n

⋅= −

=

∑ • General : • Integer

Dual numbers: unsigned integer, the digits get the power of 2 Negative numbers two possibilities: absolut value and sign or interger numbers (two's complement)

• Fractional numbers for numbers < 1 The same like integer but with a point (normal integer have the point right)

The digits right of point get the power of 2-x Example: DSP56000 data format

• Real floating-point for a great dynamic range

General: N M B E= • N= number, M = mantissa, E = exponent Question : presentation of positive or negative mantissa and exponent In normal cases the mantissa is normalized hidden bit is possible Example: C31/40 - data format

• Dynamic range, precision, resolution Dynamic: D=20log (max/min) for example: 16 Bit 96dB - CD-Player

• Saturation the signal, or a calculated number will be greater then the range • Rounding: what happens with the bits witch are lower than the LSB Truncation, Rounding Unbiased Rounding (statistical rounding)

3. Signal processing with standard microprocessors and PCs Base functions for digital signal processing - MAC operation FIR - Filter with a fantasy standard microprocessor

• Structure of a simple microprocessor Bus –System

Memory ALU

Data

Register Pointer

Register Program MUL R1

R2 P1

P2 Data R3

R4

• Simple Instruction Set

Load src, dest Arithmetic src, dest Branch cond , dest Ld #abs.value, reg Inc Reg jp abs.adr Ld reg , reg Dec Reg jz cond , abs.adr. Ld (abs.value), reg Cp #abs.value, reg jnz cond , abs.adr. Ld Reg, (abs.adr) (st) Add Reg , reg jc cond , abs.adr. Ld (reg.pnt.), reg Sub Reg , reg jnc cond , abs.adr. Ld Reg, (reg.pnt) (st) Mul Reg , reg djnz reg , abs.adr.



Memory Map for a 20 Tap FIR-Fi50lter

Inp: input data ; new data

Data: FIR data 1 ; D - pointer (P1) data 2 : data 19 Dend: data 20

Coff: coeff.: a19 ; C - pointer (P2) a18 : a1 Cend a0

outp: output data ; output result • 4-Tap-FIR-Filter Assembler Program

;----------------------------------------------------------------------------------------------- Start: ld Data, P1 ; load data pointer Next: ld (Inp), R1 ; read new data from input ld R1, (P1) ; write data into the filter inc P1 cp P1,DEnd+1 ; pointer to the next place jnz NoInc1 ; check end address of pointer ld Data, P1 NoInc1: ld Coff, P2 ; new load of coefficient pointer ld #0, R3 ; sum register R3 = 0 ld #20, R4 ; loop counter R4 = 20 ;----------------------------------------------------------------------------------------------- Step: ld (P1), R1 ; load R1 with oldest value inc P1 cp P1,DEnd+1 ; pointer to the next place jnz NoInc2 ; check end address of pointer ld Data, P1 NoInc2: ld (P2), R2 ; load R2 with coefficient inc P2 ; coeff.-pointer to the next mul R1, R2 ; R2 := R2 * R1 add R2, R3 ; R3 := R3 + R2 djnz R4, Step ; (also possible to write 4 times the same) ;----------------------------------------------------------------------------------------------- ld R3, (Outp) ; write result to output jp Next ;-----------------------------------------------------------------------------------------------

• Result: 20 taps needs approximately 190 instruction cycles 6 instructions for one tap (the base element of all filters) load data, increment pointer load coeff. Increment pointer mul accumulation MAC

• Aims for DSPs: To put all 6 operations in one DSP-instruction wrap around the pointers (circular pointer) repeat program parts without overhead (hardware do loop)



4. Historical development and types of DSPs • John von Neumann architecture (Princeton) Structure of a classical processor, since 1943

Memory for: CPU Address bus Data Instruction

Data / instruction bus - All operations are going through one bus system - universal structure - Instruction and data are in the same memory area - Simple model, technical not so difficult - The bottleneck is the bus-system The main idea of a Harvard Architecture was to separate program and data memory !!

• Harvard architecture a simple model

Memory for: Memory for: Instruction Address bus CPU Address bus Data

Instruction bus Data bus - Double bus system -- double memory - Double Harvard Architecture -- double data memory (Super Harvard) - Pipelining of stream increased the power of the system - By using different bus systems for data and instruction it is possible to use different data formats - In the most applications there is a special module to couple the bus system for data transfer from

program memory to data memory In conclusion always a lot of parallel working bus systems!

• Historical development: 1979 Intel I 2920 1982 Texas Instruments TMS 32010 1987 Motorola DSP 56000 1989 Texas Instruments TMS 320C30 1990 Motorola DSP 96000 1992 Texas Instruments TMS 320C40 1995 Motorola DSP 56300 1998 Texas Instruments TMS 320C6201 1999 Texas Instruments TMS 320C33 2000 Texas Instruments TMS 320C6701 2001 Analog Devices DSP 21160

• Four types of DSP's: General Purpose DSPs Common Applications Function specific DSP (FASICs) MODEM, Soundcards, ... Microcontroller (MCUs, MCUs) Digital filter, FFT, .... Building blocks Sqr, Sqrt, Sin,

• DSP IC Market ( Forward Concepts US 9/2000)

Place Company Share of Market 1999 Growth Rate 1 Texas Instruments 48,0 % 27,6 % 2 Lucent Technologies 25,1 % 11,9 % 3 Motorola 11,4 % 19,0 % 4 Analog Devices 10,3 % 42,0 % 5 Other 5,3 % 81,9 % 6 World Market 4,4 Bill US$ 25,9 %

DSP for GSM - TI 60 % Share of the Market , Growth Rate 2001 - 30 %



Type Cycle MIPS MOPS MFLOPS Data type Memory int. Addresses ext. Busses Mul / ALU Interr. DMA ns data Program programdata extern

Intel I2920 400 2,5 7,5 I 25 40 192*24 --- 25 - - NEC uPD7720 250 4,0 I 16 128+512 512 16*16>31 16 y y TI TMS32010 200 5,0 I 16 144 1,5 K 4 k 1 16*16>32 40 y -

NS LM32900 100 10 I 16 - - 2*64 K 64 K 2,1 16*16>32 32 y - AD ADSP2100A 50 20 I 16 16 k 32K*24 1,1 16*16 >40 4 1 AT&T DSP16 55 18,2 I 16 512 2k ROM 64 K 1 16*16_>32, 36 y - Motor. DSP56000 50 20 120 I 24 2* 256 2 K 2*64 K 64 K 1 24*24>56 56 4 - Motor. DSP56300 12,5 80 I 24 2*2K 3 K 2*16 M 16 M 1 24*24>56 56 6 6

NEC uPD77230 150 6,7 F 32 512+1K 2K ROM 8 K 4 K 1 32*32>47 55 2 - AT&T DSP32C 80 12,5 25 F 32 2*512 0,5/4K 16 M 1 32*32 >40, 40 y y Motor. DSP96000 50 20 60 F 32 2 * 512 32K ROM 2 * 1 G 2 43*43>96 96 4 y TI TMS320C30 50 20 40 F 32 2 * 1K 4K ROM 16 M 2 40*40>40 40 4 y TI TMS320C40 40 25 50 F 32 2 * 1K 4K ROM 4 G 2 40*40>40 40 4 6 AD ADSP21060 25 40 120 F 32 128 K*32 ... 4 G 1 40*40>40 40 y 10 TI TMS320CV33 13,3 75 825 150 F 32 2*(16k+1K) 16 M 2 40*40>40 40 4 y

TI TMS320C80 20 50 2000 100 4 * I32 25 * 2KB 4 G 4 * interger DSP + RISC + Video-Controller AD ADSP21160 10 100 600 F32 2* 64KW 4 G 1 40*40>40 40 14 TI TMS320C60 5 1600 8 * I32 2k*256 2k*256 4 G 8 function units TI TMS320C67 6 1336 1000 8 * I/F32 2k*256 2k*256 4 G 8 function units

DSP Types with Main Parameters (a selection)

University Rostock Institut of Automation 2002 / 2003

Digital Signal Processors DSP56301 Page 1 / 24

5. Motorola DSP 563001 DSP56300 Core Features • 66/80/100 MIPS by 66/80/100MHz clock at 3.0 – 3.6 V, fully-static logic with operation to DC • Phase Lock Loop (PLL): Allows change of low power Divide Factor (DF) without loss of lock • Very low power CMOS, Optimized power management circuitry, Wait and Stop low-power standby • Object code compatible with the DSP56000 core • Highly parallel instruction set, up to 6 Operations per instruction

• Data Arithmetic Logic Unit (Data ALU) - Fully pipelined 24 x 24-bit parallel multiplier-accumulator (MAC) - 56-bit parallel barrel shifter (fast shift and normalization; bit stream generation) - Conditional ALU instructions, 24-bit or 16-bit arithmetic support under software control

• Program Control Unit (PCU) - Addressing modes optimized for DSP applications (including immediate offsets) - On-chip instruction cache controller, - On-chip memory-expandable hardware stack - Nested hardware DO loops, - Fast auto-return interrupts

• Direct Memory Access (DMA) - Six DMA channels supporting internal and external accesses - One-, two-, and three- dimensional transfers (including circular buffering) - End-of-block-transfer interrupts - Triggering from interrupt lines and all peripherals

• On-chip memories: Program RAM, Instruction cache, X data and Y data RAM, sizes are programmable:

• Off-chip memory expansion: - Data memory expansion to two 16M 24-bit word (24-bit mode) or two 64K (16-bit compatibility mode) - Program memory expansion to one 16M 24-bit words (24-Bit mode) or 64K (16-bit compatibility mode - External memory expansion port with Chip Select Logic for glueless interface to SRAMs - On-chip DRAM Controller for glueless interface to DRAMs

• On-chip peripherals: - Glueless 32 bit universal host bus interface to PCI, ISA and to other DSP563xx buses - Two Enhanced Synchronous Serial Interfaces (ESSI0 and ESSI1) - Serial Communications Interface (SCI) with baud rate generator - Triple timer module - Up to 42 programmable General Purpose Input/Output (GPIO) pins

• Hardware debugging support: - On-Chip Emulation (OnCE) module, - Joint Action Test Group (JTAG) Test Access Port (TAP) port - Address Trace mode reflects internal Program RAM accesses



DSP56301 Block Diagram

• Core Buses: The following 24-bit buses provide data exchange between the main core blocks: - Global Data Bus GBD Between Program Control Unit and other core structures - Peripheral I/O Expansion Bus PIO_EB To peripherals - Program Memory Expansion Bus PM_EB To Program ROM - Program Data Bus PDB Carries program data throughout the core - Program Address Bus PAB Carries program memory addresses throughout the core - X Memory Expansion Bus XM_EB To X memory - X Memory Data Bus XDB Carries X data throughout the core - X Memory Address Bus XAB Carries X memory addresses throughout the core - Y Memory Expansion Bus YM_EB To Y Memory - Y Memory Data Bus YDB Carries Y data throughout the core - Y Memory Address Bus YAB Carries Y memory addresses throughout the core - DMA Data Bus DDB Transfers data with DMA channels - DMA Address Bus DAB Transfers address information with DMA channels



DSP 56300 - ALU - Data arithmetic logic unit • Components : input register output register (accumulators) multiplier unit arithmetic and logic unit MAC - unit accumulator shifter data bus shifter

• Data formats : Word operand - 24 bit fractional data Long word operand - 48 bit fractional data Accumulator operand - 56 bit fractional data

• Four 24 bit independent ALU input register: X0, X1, Y0, Y1 or two 48 bit register called: X = X1:X0 ( X1 = MSB ) Y = Y1:Y0 ( Y1 = MSB )

• Two 56 bit accumulators: A = EXT : MSP : LSP = A2 : A1 : A0 B = EXT : MSP : LSP = B2 : B1 : B0 8 bit extension against overflow (256 op.)

• Twos complement fractional Multiplier : 24 * 24 = 48 bit ( right justified ) MPY, MAC : Input two 24 bit operand, the 48 bit result is input for the ALU

• Twos complement ALU: Sources: 24 or 48 or 56 bit Destination: always in A or B ( A1 or B1 ) Arithmetic: ABS, ADD, SUB, INC, DEC ( e.g. )

• Accumulator shifter: 16/24 bit shift right: DMAC No shift or force to zero

• Bit Field Unit (BFU) Multibit shift left, right: ASL, LSL, ASR, LSR 1 bit rotate right or left: ROR, ROL Bit field merge insert and extract: MERGE, INSERT, EXTRACT Logical operations and normalization: AND, OR, NOT, EOR, NORMF

• Convergent or Tows Complement Rounding of accumulator to 24-bit result in to the MSP (A1,B1) the rounding bit (22,23,24) is controlled by Scaling Bits LSP is forcing to zero MPYR, MACR are the MPY, MAC - operation with rounding

• Data shifter (and limiter) between accumulators (A,B) and data buses (XDB,YDB) Used for 24 bit transfer operations (Move), accumulators not modified Two 1 bit left/right shifter controlled by the Scaling Bits S1, S0 (inside MR) The two shifter are independenly for two parallel operations or combined for one 48 bits long word operation in the same instruction cycle

• Data limiter - arithmetic saturation (follows the shifter) If accumulator extension (EXT = A2 or B2) are in use (E-Flag IN CCR is set) contents of accumulator can't represented without overflow in a 24- or 48-bit destination register ( transfer Move to XDB, YDB). To minimize the error due to overflow, by writing the maximum (limited) value into register. Limiting algorithm set the Limit-Flag in the condition code register (CCR)

• Flags: Carry C Set carry (borrow) from bit 55 of result Overflow V Set if an arithmetic overflow occur in bit 56 result Zero Z Set if result is zero. Negative N Set if MSB of result (bit 55) of result is set. Unnormalized U Set if the the two MSBs of MSP ( bit 47,46) are identical.

Extension E Indicates of extension is in use. Reset if extension bits (bit 55..47) are all ones or all zeros.

Limit L Latch the overflow flag - sticky flag Reset only by special instruction Scaling S Indicates scaling



Data ALU Block Diagram



Types of Addressing :

• Register Direct Mode - without AGU • PC-Relative Mode - without AGU • Special Addressing Mode - without AGU • Register indirect mode - with AGU

DSP56301 AGU for indirect addressing

AGU Programming Model



Address Register Indirect Summary

Mn Operand Reference MODE

P X Y L XYAddresse Pointer

Register Operation

Assembler Syntax

No Update * * * * * ( Rn ) (Rn)

Postincrement by 1 * * * * * * ( Rn ) Rn := Rn + 1 (Rn) +

Postdecrement by 1 * * * * * * ( Rn ) Rn := Rn - 1 (Rn) -

Postinc. by Offset Nn * * * * * * ( Rn ) Rn := Rn+Nn (Rn) + Nn

Postdec. by Offset Nn * * * * * ( Rn ) Rn := Rn-Nn (Rn) - Nn

Indexed by Offset Nn * * * * * ( Rn + Nn) (Rn + Nn)

Predecrement by 1 * * * * * ( Rn -1) Rn := Rn - 1 - (Rn)

Short/Long Displacment * * * (Rn + displ.) (Rn + displ.)

Types of Address Modifier

Easy creation of: FIFOs, Stacks, Circular Buffers, FFT-Buffers Modifier Types : Linear modifier Mn = $FFFF ( without modifier ) Modulo modifier Mn = Modulus-1 Reverse carry modifier Mn = $0000

Modulo Modifier ( Mn = $0001 .... $7FFF ) Mn = M-1, M = Modulus = buffer length, Lower boundary = multiple of 2K ≥ M, K - LSBs = 0 Upper boundary = 2K + M Reverse Carry ( bit reverse ) For the butterfly operation of FFT, with 2K pointsFFT Mn = 0000 Nn = 2(k-1) Rn = buffer address

Modulo Modifier:



Bit-Reverse Address Calculation Example DSP56301 Program Control Unit - PCU PCU coordinates execution of instruction with the following functions (with 7 Stage Piplining)

Fetch Instructions - Decode Instructions - Execute Instructions Control DO loops and REP Process Interrupts, Reset-, Wait-, Stop- and Debug- State

System Stack and Stack Extension System stack internal 2 * 24 bit with 16 levels (48*16 bit), Controlled by stack pointer SP, if SP=0 , stack is empty (last used level) Store for subroutines, and interrupts the return address and the flags:

SP := SP + 1 SSH ⇐ PC SSL ⇐ Status Register (SR) Store for DO-loops and REP-instruction the loop address and loop counter , PC and SR

1. SP := SP + 1 2. SP := SP + 1 SSH ⇐ LA SSH ⇐ PC SSL ⇐ LC SSL ⇐ Status Register (SR) Check for underflow, overflow, stack error

Extended system stack into X or Y data memory (SEN=1 in OMR) External stack is controlled by the stack extension pointer ( EP inside AGU) If internal stack is full the least words is moved to data memory Always one or two 48 bit items ( 2 * 24 bit) Stack limitation by stack size, extension underflow, extension overflow

Loop Address (LA) and Loop Counter (LC) LA – 24 bit address indicate the last instruction of a hardware DO loop LC – 24 bit counter that specifies number of times of DO or REP Automatic push and pop for DO (also pop for ENDDO, BRKcc)

Vector Base Address (VBA) Interrupt Vector Base Address of interrupt vector table 24 bit register, the lower 8 bit always zero (inserted by interrupt)



DSP 56300 PCU Programming Model

DSP56300 Interrupts and Processing States

• External Interrupts: RESET, IRQA – IRQD, NMI • Software Interrupts: Illegal instruction, Trap • Internal Interrupts: Stack error, Debug, DMA, Timer, ............. • Interrupt Priority: 0 ... 3

• Fast Interrupt: Fetch only 2 interrupt instruction words at the two vector addresses an then automatically resumes execution • Long Interrupt Starts like a fast interrupt, but one of the 2 interrupt instructions fetched is a JSR instruction to the ISR, ISR ends with the RTI instruction



Interrupt

Starting Address Interrupt

Priority Level Interrupt Source

VBA:$00 3 Hardware RESET VBA:$02 3 Stack error VBA:$04 3 Illegal instruction VBA:$06 3 Debug request interrupt VBA:$08 3 Trap VBA:$0A 3 Nonmaskable interrupt (NMI) VBA:$0C 3 Reserved VBA:$0E 3 Reserved VBA:$10 0-2 IRQA VBA:$12 0-2 IRQB VBA:$14 0-2 IRQC VBA:$16 0-2 IRQD VBA:$18 0-2 DMA channel 0 VBA:$1A 0-2 DMA channel 1 VBA:$1C 0-2 DMA channel 2 VBA:$1E 0-2 DMA channel 3 VBA:$20 0-2 DMA channel 4 VBA:$22 0-2 DMA channel 5 VBA:$24 0-2 TIMER 0 compare VBA:$26 0-2 TIMER 0 overflow VBA:$28 0-2 TIMER 1 compare VBA:$2A 0-2 TIMER 1 overflow VBA:$2C 0-2 TIMER 2 compare VBA:$2E 0-2 TIMER 2 overflow VBA:$30 0-2 ESSI0 receive data VBA:$32 0-2 ESSI0 receive data with exception status VBA:$34 0-2 ESSI0 receive last slot VBA:$36 0-2 ESSI0 transmit data VBA:$38 0-2 ESSI0 transmit data with exception status VBA:$3A 0-2 ESSI0 transmit last slot

: : VBA:$40 0-2 ESSI1 receive data

: : VBA:$50 0-2 SCI receive data

: : VBA:$60 0-2 Host PCI transaction termination

: : VBA:$FE 0-2 Not assigned

DSP56301 Interrupt Sources Reset Processing State The DSP enters reset processing state when the external RESET pin is asserted

Core remains until RESET is deasserted. (130mA)

Wait Processing State A low-power consumption state that occurs when the WAIT instruction executes Wait for Interrupt or DMA Request (7,5mA)

Stop Processing State The lowest power consumption mode that occurs when the STOP instruction executes. Halts until RESET, or a low level is applied to the IRQA pin (IRQA asserted) (0,1mA)

Debug State Debug state is invoked and used with the JTAG/OnCE port.



Instruction Set of DSP 5630x - General Aspects • General Format of an Instruction Word : on or two 24 bit words

23 8 7 0

Parallel Data Bus Movement Opcode

Optional: Effective Address or Immediate Data

23 0

Non Parallel Opcode

Optional: Effective Address or Immediate Data

• Parallel Instruction Format : organized into five columns

Opcode Operands XDB YDB Condition

MAC X0, Y0, A X:(R0) + , X0 Y:(R4)+ , Y0

MOVE X(R1),X1

MAC X1,Y1,B

MPY X0,Y0,A IFeq

• Non Parallel Instruction Format : organized into two columns

Opcode Operands

JEQ ( R6 )

MOVEP #data,X:ipr

RTS X1,Y1,B

Operand Sizes : 7 0

A2,B2,OMR,MR Byte

15 0

PC,LC,LA,SR, (or 16 bit mode register) Short Word

23 0

X1, X0, Y1,Y0, A1, A0 ,B1, B0, Rn, Nn, Mn, Word

47 0

A10, B10, AB, BA, X, Y Long Word

56 0

A, B Accumulator



DSP 56301 - Instructions 116 instructions in 6 groups

Arithmetic Instructions ADD Add √ *ADC Add Long With Carry √ SUB Subtract √ * SBC Subtract Long With Carry √ CMP Compare √ *CMPM Compare Magnitude √ CMPU Compare Unsigned TST Test Accumulator √ INC Increment by One DEC Decrement by One NEG Negate Accumulator √ CLR Clear Accumulator √ RND Round Accumulator √ ABS Absolute Value √ MPY Signed Multiply √MPYR Signed Multiply and Round √ MPYI Signed Multiply With Immediate Operand MPYRI Signed Multiply and Round With Immediate Operand MPY (su, uu) Mixed Multiply MAC Signed Multiply Accumulate √MACR Signed Multiply Accumulate and Round √ MAC (su, uu) Mixed Multiply Accumulate DMAC Double (Multi) Precision Multiply, Accumulate With Right Shift MACI Signed Multiply Accumulate with, Immediate Operand MACRI Signed Multiply Accumulate and Round, with Immediate Op. ASL Arithmetic Shift Accumulator Left √ *ASR Arithmetic Shift Accumulator Right √ * ADDL Shift Left and Add Accumulators √ ADDR Shift Right and Add Accumulators √ SUBL Shift Left and Subtract Accumulators √ SUBR Shift Right and Subtract Accumulators √ MAX Transfer by Signed Value √MAXM Transfer by Magnitude √ NORM Norm Accumulator Iteration NORMF Fast Accumulator Normalization DIV Divide Iteration

Bit Manipulation Instructions BCHG Bit Test and Change BCLR Bit Test and Clear BSET Bit Test and Set BTST Bit Test

√ - allow parallel data moves √ * - allow parallel data moves, but not for all addressing modes



Logical Instructions NOT Logical Complement √AND Logical AND √ * ANDI AND Immediate With Control Register OR Logical Inclusive OR ORI OR Immediate With Control Register EOR Logical Exclusive OR √ * LSL Logical Shift Left √ *LSR Logical Shift Right √ * ROL Rotate Left √ ROR Rotate Right √ EXTRACT Extract Bit Field EXTRACTU Extract Unsigned Bit Field INSERT Insert Bit Field MERGE Merge Two Half Words CLB Count Leading Bits

Move Instructions

MOVE Move Data √ MOVEC Move Control Register MOVEP Move Peripheral Data MOVEM Move Program Memory

LUA Load Updated Address LRA Load PC-Relative Address Tcc Transfer Conditionally TFR Transfer Data ALU Register VSL Viterbi Shift Left

U Address Register Update R Register-to-Register Data Move I Immediate Short Data Move X: X Memory Data Move Y: Y Memory Data Move X:R X Memory and Register Data Move R:Y Register and Y Memory Data Move X:Y: XY Memory Data Move L: Long Memory Data Move

Loop Instructions

REP Repeat Next Instruction DO Start Hardware Loop DO FOREVER Start Infinite Loop DOR Start PC-Relative Hardware Loop DOR FOREVER Start PC-Relative Infinite Loop ENDDO End Current DO Loop BRKcc Exit Current DO Loop Conditionally

√ - allow parallel data moves √ * - allow parallel data moves, but not for all addressing modes



Program Control Instructions JMP Jump Jcc Jump Conditionally JSET Jump if Bit Set JCLR Jump if Bit Clear

JSR Jump to Subroutine JScc Jump to Subroutine Conditionally JSSET Jump to Subroutine if Bit Set JSCLR Jump to Subroutine if Bit Clear

BRA Branch Always Bcc Branch Conditionally BRCLR Branch if Bit Clear BRSET Branch if Bit Set

BSR Branch to Subroutine BScc Branch to Subroutine Conditionally BSCLR Branch to Subroutine if Bit Clear BSSET Branch to Subroutine if Bit Set

RTS Return From Subroutine RTI Return From Interrupt

NOP No Operation ILLEGAL Illegal Instruction Interrupt TRAP Software Interrupt TRAPcc Conditional Software Interrupt

Ifcc Execute Conditionally IFcc.U Execute Conditionally With CCR Update

DEBUG Enter Debug Mode DEBUGcc Enter Debug Mode Conditionally WAIT Wait for Interrupt or DMA Request STOP Stop Instruction Processing RESET Reset On-Chip Peripherals Devices

PFLUSH Program Cache Flush PFLUSHUN Program Cache Flush Unlocked Sectors PFREE Program Cache Global Unlock PLOCKR Lock Instruction Cache Relative Sector PUNLOCKR Unlock Instruction Cache Relative Sector PUNLOCK Unlock Instruction Cache Sector

Flags cc cc C CS carry set CC carry clear Z EQ equal NE not equal N EXOR V LT less than GE greater than or equal Z OR (N EXOR V) GT greater than LE less than or equal N MI minus PL plus E ES extension set EC extension clear L LS limit set LC limit clear Z OR (N EXOR V) NR normalized NN not normalized

DSP56300 Addressing Modes



• Register Indirect Mode : see AGU • Register Direct Mode : Operand source or destination is data, control or address register • PC-Relative Mode: Short Displacment , 9 bits inside instruction Word, sign-extended add to PC BGT 20 Long Displacment, 24 bits inside a one word instruction extension LRA 23456,X0 Address Register , Address is the sum of address register and PC BSEQ R3 • Special Addressing Modes :

Immediate Data one word instruction extension containing immediate data MOVE #$123456,A1

Absolute Address one word instruction extension containing absolute address MOVE Y:345678,B0

Immediate Short 8 or 12 bit inside the Instruction, unsigned integer, low order portion MOVE #$FF,A1 signed fraction, high order potion MOVE #$1F,A

Short Jump Addr. 12 bit inside the instruction, allowing (000000 ..000FFF) JMP $123

Absolute Short 6 bit addr. inside the instruction, (000000...00003F for X ,Y-Mem) MOVE A1,X:$3

I/O Short 6 bit addr. inside the instruction, (FFFFC0 ..FFFFFF for I/O) MOVEP A1,X:<<$FFFFFE

Implicit Reference Implicit inside the Instruction REP Parallel Data Moves Opcode / Operands Parallel Move Examples

I Immediate Short Data ADD X0,A #$05,Y1 U Address Register Update ADD X0,A (R0)+N0 R Register to Register ADD X0,A A1,Y0 X X-Memory ADD X0,A X0,X:(R3)+ XR X-Memory plus Register ADD X0,A X:(R4)-,X1 A,Y0 Y Y-Memory ADD X0,A Y:(R6)+N6,X0 YR Y-Memory plus Register ADD X0,A A,X0 B,Y:(R0)

XY

L



Motorola DSP Assembler - translate an assembler language source file into an objekt file

• Assembly Language – menmonic opcode for machine instruction and directives • Source statement format:

Label Operation Operands X-field Y-field Condition Comment Start mac x0,y0,a x:(r0)-,x0 a,y:(r4)+ ifz ; free comment text All fields must be separated by on or more blanks or tabs ! Labels: Symbolic marker for an address, first alphabetic character, _label = locale label Operation: Opcodes, Assembler Directives, Macro Calls Operands: Depends of opcode, it contains symbols and, or expressions, separated by commas Data Transfer Fields: Single/double 2-address transfers, first address is Source, Destination Comment: Starts with semicolon, is not significant to assembler

• Symbol Names: for address or data values, first character must be alphabetic, 1..255 char. • Numeric and String Constants: constants represent quantities of data, that do not vary

Binary: %01101110 Hexadecimal: $123ABF Decimal: 123456 String: ‘ABCD’ = $41424344 • Operators can be used to calculate with values (+ - * / % << >> | ) • Assembler significant character (a selection)

; comment delimiter character ;; unreported comment delimiter (in the list file) > long addressing mode force operator # immediate addressing mode < short addressing mode force operator #> immediate long addressing mode << i/o short addressing mode force operator #< immediate short addressing mode

• Directives (a selection)

[<label>] DC <arg1>[,<arg2>,....,<argN>] ;Define constant <arg.> ;allocates memory and initialize [<label>] DS <expression> ;define storage, reserve a block, ;of memory, expression = length [<label>] EQU [|X: | Y: | L: | P:] <expression> ;equate symbol to a value, ;assigns memory address to label ORG |X: | Y: | L: | P: <expression> ;sets the runtime memory counter ;to specifiy addresses DEFINE <symbol> <’string’> ;define a substituten string GLOBAL <symbol>[<symb>,...,<symb>] ;define global list of symbols LOCAL <symbol>[<symb>,...,<symb>] ;define local list of symbols INCLUDE <’string’> ;insert an other file ‘string’ = file name END ;end of source program

• Running the Assembler: ASM56300 –B[<objfil>] -L[<lstfil>] MyProg.asm

Motorola DSP Linker • Link one or more relocatable object files and macro libraries to an absolute executable file. • Running the Linker: DSPLink –B [<outpfil>] [<objfil1>]... [-L<libary>] [-M<mapfil>] • Memory control file: optionally, contains module identification, global start address, base addresses, .....

Data file types • *.asm assembler source file from an editor, input file for assembler • *.lst list file from assembler with error messages • *.cld relative object files from assembler, input file for linker • *.cln absolute object files output from linker or assembler with option -A • *.lod loadable file for monitor from converter program cldlod.exe



Instruction Cache

• A 1024 * 24-bit words buffer memory between external memory and the DSP core processor, logically divided into eight 128-word cache sectors.

• Eight-way every , fully associative Instruction Cache with sectored placement policy • Least Recently Used (LRU) sector replacement algorithm • Transparent operation (that is, no user management is required)

Hardware Debugging Support

1. JTAG (Joint Test Action Group) over Test Access Port (TAP) based on the IEEE 1149.1

TCK TMS TDI TDO /TRST DE

Test Clock Test Mode Select

Test Data Input Test Data Output

Test Reset Debug Event (OnCE)

JTAG Debugging Control Signals

2. OnCE (On Chip Emulation) – the module functions are provided through the JTAG TAP pins

• Allows nonintrusive interaction with the core and its peripherals for developers

• Examine register, memory and the on chip peripherals

• Trace logic and breakpoint logic with breakpoint counter

External Memory Expansion Port (Port A)

D23 ....... D00 Data Bus A23 ....... A00 Address Bus AA0 ......AA3 Address Attribute (CS or Additional Address lines) RD, WR, CAS, RAS Read / Write Signals for Static and Dynamic Memory BR, BG, BB, BS, BL, TA, BCLK Bus Arbitration and Bus Master Control Signals

• DRAM Support with an efficient interface for random cycles or page mode, its controlled by the DRAM Control Register (DCR)

• Four Address Attribute Register (AAR0...3) controls the activity of AA0 ...AA3

• The Port A Bus is controlled by the Bus Control Register ( BCR )

SRAM Access with 1 wait state



DMA Controller

• DMA saves core MIPS because the core can operate in parallel.

• DMA saves power because it requires less circuitry than the core to move data.

• DMA saves pointers because core AGU pointer registers are not needed.

• DMA has no modulo block size restrictions, unlike the core AGU.

• Six DMA Channels supporting internal and external accesses

• One-, two-, and three-dimensional (including circular buffering)

• Triggering from interrupt lines and peripherals

• End off block transfer interrupts

DMA Controller Data Transfers

DCR [5-0] DMA Control Register DOR 0 DMA Offset Register 0 DSR [5-0] DMA Source Address Register DOR 1 DMA Offset Register 1 DDR [5-0] DMA Destination Address Reg. DOR 2 DMA Offset Register 2 DCO [5-0] DMA Counter DOR 3 DMA Offset Register 3 DSR DMA Status Register

DMA Controller Programming Model

Functions Examples Bits Base Control Bits Channel Enable, Interrupt Enable, Continuous Mode 3 Transfer Mode Block, Word, Line, ........ 3 DMA Source / Destination X:, Y:, P: 4 Interrupt Priority IPL: 0 ... 3 2 DMA Request Trigger Ext.-Interrupt, DMA, ESSI, SCI, Timer, Host 5 DMA Address Mode Counter Mode: Post Increment, 2D, 3D, (with Offset Reg.) 7

DMA Control Register Bit Definition



Triple Timer Module

Each Timer can be used as:

• Timed pulse generators or as pulse-width modulators. • Event counter to capture an event or to measure the width or period of a signal. • All signals can also be used as GPIO signals

The three timer module contains:

• Three independent and identical general-purpose 24-bit timer / event counters • A common 21-bit prescalar it works on the rising edge of the prescalar input clock.

DSP563001 Triple Timer Module Block Diagram

Each timer has the following capabilities:

• Uses internal or external clocking • Interrupts the DSP56301 after a specified number of events (clocks) or signals an external device after counting

internal events • Triggers DMA transfers after a specified number of events (clocks) occurs • Connects to the external world through one bidirectional signal, designated TIO[0– 2] for timers 0–2.

DSP563001 Timer Module Block Diagram



Enhanced Synchronous Serial Interface (ESSI)

• Two independent and identical full-duplex serial port ESSIs, also asynchron • Independent transmit and receive sections with a common clock generator • Network mode operation with as many as 32 time slots • Programmable word length (8, 12, 16, 24, or 32 bits) • Program options for frame synchronization and clock generation • One receiver and three transmitters per ESSI (6 channels for surround sound) • Alternative 2 * 6 programmable signals for general purpose I/O pins

DSP56300 ESSI Block Diagram

Pin Description Register Description STD Serial Transmit Data TxData, TxData0 CRA Control Register A SRD Serial Receive Data RxData CRB Control Register B SCK Serial Clock Different Clock Sources SSISR ESSI Status Register SC0 Serial Control 0 RxClk, TxData1, . . . TSR Time Slot Register SC1 Serial Control 1 RxFrame, TxData2, . . . TSMA/B Tx Slot Mask Register SC2 Serial Control 2 Tx/Rx Frame Sync RSMA/B Rx Slot Mask Register



Serial Communication Interface (SCI)

• Full-duplex port for serial communication e.g. for RS-232C, RS-422, etc. • Interfaces without additional logic to peripherals (only level buffer) • It supports standard bit rates: .. 2.4; 4.8; 9.6; 19.2 Kbaud ....up to 12.5 Mbps (100/8) • Programmable baud-rate generator provides the transmit and receive clocks. • Separate SCI transmit and receive sections can operate asynchronously • SCI-Pins and the baud-rate generator allow as general-purpose • The operating modes for the DSP56301 SCI are as follows:

- 8-bit synchronous (shift register mode) - 10-bit asynchronous (1 start, 8 data, 1 stop) - 11-bit asynchronous (1 start, 8 data, 1 even parity, 1 stop) - 11-bit asynchronous (1 start, 8 data, 1 odd parity, 1 stop) - 11-bit multidrop asynchronous (1 start, 8 data, 1 data type, 1 stop)

Pin Description TXD Transmit Data General purpose input / output RXD Receive Data General purpose input / output SCLK SCI Serial Clock General purpose input / output

Register Description

PCRE Port Control Register Programming pins for general purpose SCR SCI Control Register Rx & Tx Enable; Interrupt enable; SCI-Mode-Selection, ... SSR SCI Status Register Error flags; full & empty flags SCCR SCI Clock Control

Register Tx clock source; Rx clock source Clock prescaler, Clock divider;

SRX SCI Receive Data Register (3 x )

Three receiver register for double buffering Additional one serial-to-parallel receiver shift register

TRX SCI Receive Data Register (4 x )

Four transmitter register for unpack 24 bit transfer Additional one parallel-to-serial transmitter shift register

SCI Programming Model – Data Registers



DSP56300 Host Interface (HI32)

General Aspects: The Host Interface is a fast parallel host port up to 32 bits wide, It supports a variety of standard buses and provides glueless connection. HI32 supports three classes of interfaces:

• Peripheral Component Interconnect (PCI-2.1) bus ― 32- bit, 8 words deep data path • HI32 is a dedicated bidirectional target (slave) / initiator (master), it can directly connect to PCI bus • Universal bus interface (UB) mode ― 8 / 16 / 24 - bit, 6 word deep data path • General-purpose I/O (GPIO) port - up to 24 GPIO pins.

Host Port Pins



DSP56303

Special Features High-performance DSP56300 core 66/80/100 Million Instructions Per Second (MIPS) Standard on-chip-RAM-memory 8KW total, reduced external memory space 256 KW Simple 8-bit parallel Host Interface (HI08), ISA-compatible bus interface, no PCI

Target Applications The DSP56303 targets telecommunication applications, such as: multi-line voice/data/fax processing, videoconferencing, audio applications, control general digital signal processing.

DSP56305

Special Features High-performance DSP56300 core 80 Million Instructions Per Second (MIPS) Large RAM and ROM memory, total 21,25KW Filter Co-Processor (FCOP) implements a wide variety of convolution and correlation filtering Viterbi Co-Processor (VCOP) implements Maximum Likelihood Sequential Estimation algorithm Cyclic-code Co-Processor (CCOP) executes cyclic code calculations - ciphering and deciphering Very low power CMOS design by optimized power management circuitry

Target Applications GSM

DSP56307

Special Features High-performance DSP56300 core 100 MIPS at 2.5 V or 3.3V Very large on-chip RAM memory, total 64KW Reduced external memory space 256 KW Additional Enhanced Filter Coprocessor (EFCOP) that runs in parallel to the DSP core

Target Applications For applications requiring a large amount of on-chip data memory, such as

wireless infrastructure systems The EFCOP may be used to accelerate general filtering applications, such as Echo-cancellation applications, Correlation, and general purpose convolution-based algorithms.

DSP56309

Special Features High-performance DSP56300 core 80/100 MIPS at 3.0 V or 3.6V Large on-chip RAM memory, total 34KW Reduced external memory space 256 KW Simple 8-bit parallel Host Interface (HI08), ISA-compatible bus interface,

providing a cost-effective solution for applications not requiring the PCI bus

Target Applications For applications requiring a large amount of on-chip memory, such as wireless infrastructure systems




DSP56311

Special Features High-performance DSP56300 core 150 MIPS at 1,8 V (255 MIPS using the EFCOP in filtering) Ultra large on-chip RAM memory, total 128KW, reduced external memory space 256 KW Additional Enhanced Filter Coprocessor (EFCOP) that runs in parallel to the DSP core

Target Applications For applications requiring a large amount of on-chip data memory, such as wireless infrastructure systems The EFCOP may be used to accelerate general filtering applications, such as Echo-cancellation applications, Correlation, and general purpose convolution-based algorithms.

DSP56362

Special Features High-performance DSP56300 core 100 MIPS at 3,3 V Large on-chip RAM and ROM memory, total 56KW, reduced external memory space 256 KW Enhanced Serial Audio Interface (ESAI) includes: 6 serial data lines, for I2S, Sony, AC97, and other

audio protocol implementations Serial Host Interface (SHI): SPI and I2C protocols, Ten-word receive FIFO, 8-, 16-, and 24-bit words DAX features one serial transmitter capable of supporting S/PDIF, IEC958, IEC1937, CP-340,

and AES/EBU digital audio formats

Target Applications Multimode, multichannel decoder software functionality: Dolby Digital, Pro Logic, MPEG2 5.1, Digital Theater Systems (DTS), Digital audio post-processing capabilities, such as:

Bass management, 3D Virtual surround sound, Lucasfilm THX5.1, Soundfield processing

DSP56364

Special Features

High-performance DSP56300 core 100 MIPS at 3,3 V, without timer Low-cost version with small on-chip RAMs, total 11KW, reduced external memory space 256 KW Enhanced Serial Audio Interface (ESAI) includes: 6 serial data lines, for I2S, Sony, AC97,

and other audio protocol implementations Serial Host Interface (SHI): SPI and I2C protocols, Ten-word receive FIFO, 8-, 16-, and 24-bit words

Target Applications Low-cost-systems for: Dolby ProLogic A/V receivers, televisions, and minisystems, soundfield processing,

3D virtual surround, graphic/parametric equalization, and spectrum analysis.

DSP96002

Special Features (end of life) 96 bit general purpose IEEE floating point processor, 60MHz , 30 MIPS, 60 MFLOPS, at 5V 32-bit DSP engine, Conforms to IEEE 754-1985 standard for single precision (32-bit) and

single extended precision (44-bit) arithmetic 2KW on-chip RAM, 1KW ROM- sin- and cosin-table Double extern bus system, Dual channel DMA controller, Two programmable timers/counters

Digital Signal Processors TMS320CV33 Page 1 / 16

6. Texas Instruments - TMS 320C30/C31/CV33

Key Features TMS320C30:

• ROM one 4K * 32-bit, dual access • RAM two 1K * 32-bit, dual access • Instruction Cache: 64 * 32-bit ; four level pipeline structure • Internal buses: three 32 bit for instruction and data four * 24 bit address buses • External interface ports: two 32 bit for instruction and data two * 24 bit address buses, two control busses • 40 /32-bit FP / Integer multiplier, ALU, 32 bit barrel shifter and 8 ext. precision registers R0....R7 • Parallel ALU and multiplier instructions in a single cycle (all R0...R7 can be used as accumulator) • Two address generators (2 * AGUs) with 8 auxiliary register - AR0....AR7 • Repeat capability, zero overhead loops (single cycle branches) • One DMA controller for concurrent I/O and CPU operation • Two serial ports to support 8/16/24/32-bit transfer • Two 32 bit timer • Four external interrupts • 181 pin grid array (PGA), 1m CMOS ; 5V / typ. 250 mA @40MHz • 27 / 33 / 40 / 50 / 60 / 80 MHz 2 cycles per instruction ---> 40 MIPS = 25 ns

2 FLOPS per instruction 80 MFLOPS

320C31 to C30 differences:

• C31 is low-cost 32-bit DSP - object-code compatible C30, a ROM less version, flexible boot program loader • Only one external interface port - Only external primary bus, no expansion bus system • Only one serial port to support 8/16/24/32-bit transfer • 132 pin plastic quad flat pack (PQFP), 0.8 um CMOS, 5V / typ. 180 mA @40MFLOPS

Special Features of TMS320CV33:

• A new, C30/C31 object-code compatible high speed version of TMS320C31 120/150 MHz • Increased internal Memory 16 times by two additional RAM-blocks 2 * 16K*32bit • Four precoded page strobes and an internal clock PLL • A new JTAG emulation port has been replaced the old MPSD emulation port • Low power version, 1.8V Core / 3.3V for I/O, <200mW @150MFLOPS • 144 pin thin quad flat pack (TQFP), 0.18 um CMOS, low price 5$ (in 100K quantities)

TMS320CV33 CPU Registers

Register Assigned Function Register Assigned Function R0 Extended-precision register AR0 Auxiliary register 0 R1 Extended-precision register AR1 Auxiliary register 1 R2 Extended-precision register AR2 Auxiliary register 2 R3 Extended-precision register AR3 Auxiliary register 3 R4 Extended-precision register AR4 Auxiliary register 4 R5 Extended-precision register AR5 Auxiliary register 5 R6 Extended-precision register AR6 Auxiliary register 6 R7 Extended-precision register AR7 Auxiliary register 7 IE CPU/DMA interrupt enable DP Data-page pointer IF CPU interrupt flags SP Stack pointer

IOF I/O flags IR0 Index register 0 RS Repeat start address IR1 Index register 1 RE Repeat end address BK Block size register RC Repeat counter ST Status register PC Program counter



DSP TMS320VC33 Functional Block Diagram



TMS230CV33 Memory Maps 4K Words MCBL/MP=0 MCBL/MP=1

External memory select signals: Page_0 00.0000 - 3F.FFFF Page_1 04.0000 - 7F.FFFF Page_2 08.0000 - BF.FFFF Page_3 0C.0000 - FF.FFFF Strb 00.0000 - FF.FFFF



Data Formats and Conversion Integer Formats Short Integer: 15 0(immediate short integer) S 31 16 15 0Conversion to 32 bit S S S S S S S S S S S S S S S S S Sign extension of short integer

Short Unsigned-Integer: 15 0(e.g. logical values) 31 16 15 0Conversion to 32 bit 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Zero Fill of short unsigned intger

Single Precision-Integer: 31 16 15 0(normal integer values) S

Unsigned Integer: 31 16 15 0(e.g. for unsigned operands)

Floating-Point Formats Extended Precision 39 32 31 30 0Floating Point: n E S, F (R0 – R7 Register)

Single Precision 31 24 23 22 0 Floating Point: n E S, F (AR0 – AR7 Register) (X: Y: P: Memory)

16 Bit Short 15 12 11 10 0

Floating Point: n E S, F (immediate short float)

Conversion to 31 28 27 24 23 22 12 11 0

Single Precision: n n n n

n E S, F 0 0 0 0 0 0 0 0 0 0 0 0

Conversion to 39 32 31 30 20 19 0Extended Precision n n n

n n E S, F 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Mantissa and Exponent: 2th complement, mantissa with hidden bit technology Number X is given by: X := 01.F * 2E if S = 0 ( = positive number) X := 10.F * 2E if S = 1 ( = negative number) X := 0 if E = 80h (= -128 OR –8 for short data format)



TMS320CV33 - 6 Types of Addressing Type Addressing Example Register CPU-register contains the operand ABSF R1 Short-immediate 16 bit data contained in the instruction word SUBI 1,R0 Long-immediate 24 bit data contained in the instruction word BR 8000h PC-relative 16 bit displacment in the instruction word BU NewPC

PC=1001, NewPC=1005 → Displ.=3 Direct 8 bit Address from data pointer DP

+16 bit Address inside instruction word ADDI @0BCDEh,R7

Indirect Addresses in AR0 ... AR7, or IR0, IR1 LDI *AR3++,R5



Indirect Addressing Modes

(c) the same indirect addressing modes with IR1 L Legende: * indirect addressing disp displacement ARn auxiliary register AR0-AR7 addr memory address ++ add and modify -- subtract and modify IRn index register IR0 or IR1 % perform circular addressing circ() address in circular addressing B perform bit-reversed addressing B() address in bit-reversed addressing



Instruction Set TMS 320C30/C31/C32/CV33 Overview

Load and Store Integer Floating Point Load LDI LDF Load conditionally LDIcond LDFcond Store STI STF Load exponent LDE Load mantissa LDM Pop from stack POP POPF Push on stack PUSH PUSHF Load data pointer LDP Arithmetic Operations: Integer Floating Point Add ADDI ADDI3 ADDF ADDF3 Subtract SUBI SUBI3 SUBF SUBF3 Compare CMPI CMPI3 CMPF CMPF3 Multiply MPYI MPYI3 MPYF MPYF3 Subtract reverse SUBRI SUBRF Absolut value ABSI ABSF Negate NEGI NEGF Add with carry ADDC ADDC3 Subtract with borrow SUBB SUBB3 Subtract reverse with borrow SUBRB Subtract conditionally SUBC Negate with borrow NEGB Logical, Bit & Shift Instructions Integer AND operation AND AND3 AND with Complement ANDN ANDN3 Bit complement NOT OR Operation OR OR3 Exclusive OR XOR XOR3 Test bit fields TSTB TSTB3 Shift and Rotations Integer Arithmetic shift ASH ASH3 Logical shift LSH LSH3 Rotate left ROL Rotate left through carry ROLC Rotate right ROR Rotate right through carry RORC Format Instructions Convert FP --> Integer FIX Convert Integer --> FP FLOAT Normalize FP value NORM Round FP value RND



Instruction Set TMS 320C30/C31/C32/CV33 Overview

Parallel Load and Store Integer Floating Point Double load LDI || LDI LDF || LDF Double store STI || STI STF || STF Load and store LDI || STI LDF || STF Parallel Arithmetic and Store Integer Floating Point Add and store ADDI3 || STI ADDF3 || STF Subtract and store SUBI3 || STI SUBF3 || STF Multiply and store MPYI3 || STI MPYF3 || STF Absolute value and store ABSI || STI ABSF || STF Negate and store NEGI || STI NEGF || STF Convert integer to floating point and store floating-point FLOAT || STF Convert floating point to integer and store integer FIX || STI Parallel Arithmetic and Arithmetic Integer Floating Point Complement and store NOT || STI Logical AND and store AND3 || STI Logical OR and store OR3 || STI Exclusive OR and store XOR3 || STI Arithmetic shift and store ASH3 || STI Logical shift and store LSH3 || STI Parallel Logical or Shift and Arithmetic Integer Floating Point Multiply and add MPYI3 || ADDI3 MPYF3 || ADDF3 Multiply and subtract MPYI3 || SUBI3 MPYF3 || SUBF3 Program Control Instructions unconditionally conditionally Branch BR Bcond Branch delayed BRD BcondD Call subroutine CALL CALLcond Return from subroutine RETScond Return from Interrupt RETIcond Decrement and Branch DBcond Decrement and Branch delayed DBcondD Trap TRAPcond Perform an emulation interrupt SWI Idle until interrupt IDLE Interrupt acknowledge IACK Repeat single instruction RPTS Repeat block of instructions RPTB No operation NOP Interlock operations Integer Floating Point Load interlocked LDII LDFI Store interlocked STII STFI Signal interlocked SIGI



Condition Codes and Flags

Flag Condition Description Compare

Condition Description Flags Code

U unconditional Don’t care 00000 C carry LO lower than C 00001 LS Lower than or same as C OR Z 00010 HI higher than /C AND /Z 00011

NC no carry HS higher than or same as /C 00100 Z zero EQ equal to Z 00101

NZ not zero NE not equal /Z 00110 N negative LT less than N 00111 LE less than or equal N OR Z 01000

P positive GT greater than /N AND /Z 01001 NN non negative GE greater than or equal /N 01010 NV no overflow /V 01100 V overflow V 01101

NUF no underflow /UF 01110 UF underflow UF 01111

NLV no latched overflow /LF 10000 LV latched overflow LV 10001

NLUF no latched underflow float /LUF 10010 LUF latched underflow float LUF 10011 ZUF zero or float underflow Z OR UF 10100

Program Flow Control

1. Repeat Modes RS RE RC Example

RPTS repeat the next instruction PC +1 PC +1 count -1 RPTS 15 ;16 * next instruction RPTB repeat block of code PC +1 EndLoop count -1 RPTB EndLoop ;RC must set before

2. Standard Branch - 4 cycles (empty the pipeline) BR src PC <= src (long immediate address) Bcond src if cond=true then src=Rn PC <= Rn (register addressing) src=displ. PC <= PC+1+displacement (PC-relative addr.) DBcond ARn,src ARn <= ARn - 1 (decrement ARn) if cond=true AND ARn ≥0 then src=Rn PC <= Rn (register addressing) src=displ. PC <= PC+1+displacement (PC-relative addr.) 3. Delayed Branches - 1 cycle (not empty the pipeline, next 3 instructions will be execute ) BRD src PC <= src (long immediate address) BcondD src if cond=true then src=Rn PC <= Rn (register addressing) src=displ. PC <= PC+1+displacement (PC-relative addr.) DBcondD ARn,src ARn <= ARn - 1 (decrement ARn) if cond=true AND ARn ≥0 then src=Rn PC <= Rn (register addressing) src=displ. PC <= PC+1+displacement (PC-relative addr.) 4. Call and Trap-Instructions 4/5 cycles CALL src PC => ++SP (push PC) PC <= src (long immediate address) CALLcond src if cond=true then PC => ++SP (push PC) src=Rn PC <= Rn (register addressing) src=displ. PC <= PC+1+displacement (PC-relative addr.) TRAPcond N if cond=true then PC => ++SP (push PC) 0 => ST(GIE) (disable interrupt) Trap vector N => PC (N inside TRAP instruction) 5. Return-Instructions - 4 cycles RETScnd if cond=true then PC<= SP - - (pop PC) RETIcnd if cond=true then PC<= SP - - (pop PC)



TMS320CV33 - Interrupts

• 4 external Interrupt, a number of internal Interrupts These can be used to interrupt either the DMA or the CPU. Global CPU interrupt enable Flag - GIE is reset by an interrupt ( status register bit 13 ) Every interrupt has separate IF-Bit and separate IE-Bit, IF-Bit is cleared by internal interrupt acknowledge Global DMA interrupt enable flag inside the DMA ( no software control -- synch bit ) • External interrupts are synchronized internally by three FF --> set IFn (Interrupt Flag Register) Interrupt Logic Functional Diagram Reset and Interrupt Vector Table

Addr.-MP Addr. BL Name Priority Function 00h RESET 0 External Reset Signal 01h 809FC1 INT0 1 External interrupt on the INT0 pin 02h 809FC2 INT1 2 External interrupt on the INT1 pin 03h 809FC3 INT2 3 External interrupt on the INT2 pin 04h 809FC4 INT3 4 External interrupt on the INT3 pin 05h 809FC5 XINT0 5 Internal interrupt generated when serial port 0 tx buffer is empty 06h 809FC6 RINT0 6 09h 809FC9 TINT0 9 Internal interrupt generated by timer0 0Ah 809FCA TINT1 10 Internal interrupt generated by timer1 0Bh 809FCB DINT 11 Internal interrupt generated by DMA controller 20h 809FE0 TRAP0 Internal interrupt generated by TRAP 0 instruction 21h 809FE1 TRAP1 Internal interrupt generated by TRAP 1 instruction : : : : 3B 809FFB TRAP27 Internal interrupt generated by TRAP 27 instruction

For a CPU interrupt to occur, two conditions must be met: • All interrupts must be enabled, (GIE = 1) • The interrupt must be enabled, (Ifx = 1)



Reset and Boot Loader Mode • After power up it must used to place processor in a known state • Nonmaskable external RESET Signal (more then 10 cycles) • All internal peripherals are reset, ST-,IF-, IE-, IOF-Register and external bus control register are reset • Reset vector is read from memory location 000000h loaded into PC and start execution • If boot loader mode (MCBL/MP = 1) – mode selection by INT0 .. INT3 (serial loading or 8 / 16 / 32 bit external

memory)

Timer • Two 32-bit general-purpose timer modules with internal or external clocking • TCLK0 / TCLK1 - Timer-Out or Clock-In (or two general purpose I/O-pins) • A timer counts the cycles of a timer input clock with the incremental counter register. • In Counter Mode, , the timer can count external events • In Timer Mode with an internal clock, pulse or clock signal output. (232 / 25MHz internal clock/2 = 171 sec ) • When Counter-Reg. = Timer-Period-Reg., Resets counter to 0 - transition in the timer output signal - Interrupt

Timer Block Diagram Memory-Mapped Timer Locations

Peripheral Addresses Register Timer 0 Timer 1

Timer Global Control 808020h 808030h Reserved 808021h 808031h Reserved 808022h 808032h Reserved 808023h 808033h Timer Counter Register 808024h 808034h Reserved 808025h 808035h Reserved 808026h 808036h Reserved 808027h 808037h Timer Period Register 808028h 808038h



Serial Port • One bi-directional serial port (two totally identical serial ports for the ROM version TMS320C30) • Transfer 8, 16, 24, or 32 bits of data per word simultaneously in both directions. • Either internally or externally generated clock, internal through the serial port timer • Six serial I/O pins for transmitter (CLKX, FSX, DX) and receiver (CLKR, FSR, DR), or 6 general purpose I/O-pins.

Serial Port Block Diagram Memory-Mapped Location for the Serial Port

Register Peripheral Address Serial Global Control 808040h Reserved 808041h FSX / DX / CLKX Port Control 808042h FSR / DR / CLKR Port Control 808043h R/X Timer Control 808044h R/X Timer Counter 808045h R/X Timer Period 808046h Reserved 808047h Data Transmit 808048h Reserved 808049h Reserved 80804Ah Reserved 80804Bh Data Receive 80804Ch



DMA Controller Data transfers to and from anywhere in the processor’s memory map, without interfering with CPU operation It can interface to slow, external memories and peripherals without reducing throughput to the CPU. Concurrent CPU and DMA controller operation with DMA transfers at the same rate as the CPU (separate internal DMA

bus). Source and destination-address registers with or without auto increment / decrement. Synchronization of data transfers via external and internal interrupts.

DMA Basic Operation Memory-Mapped Locations for DMA Channels

Register Peripheral Address DMA Global Control 808000h Reserved 808001h Reserved 808002h Reserved 808003h DMA Source Address 808004h Reserved 808005h DMA Destination Address 808006h Reserved 808007h DMA Transfer Counter 808008h

External Bus Operation • External Interface controlled by Primary Bus Control Register (808064h) -- STRB -signal • External Interface Timing on primary bus, all bus cycles comprise integral number of H1 clock cycles

write takes 2 cycles ( without additional wait states) read takes 1 cycle, read follows a write it takes also 2 cycles ( without additional wait states)

Bit 12 11 10 9 8 7 6 5 4 3 2 1 0

BnkCmp WtCnt SWW HIZ NoHold HoldSt R/W R/W R/W R/W R/W R/W R/W R/W R/W R/W R/W R/W R

• WtCnt - 0 -7 Programmable Wait State Counter • SWW - 00 Wait until external RDY is signaled 01 Wait until internal wait state generator counts down to 0 10 Wait until first signal: external RDY or the internal wait state generator (logical OR) 11 Wait until both external RDY is signaled and wait state generator counts down to 0 (logical AND)

Interlocked Operations • Sharing global memory by multiple processors • External flag pins: XF0 - output, interlock request, active-low signals XF1 - input, interlock acknowledge , active-low signals

Mnemonic Description Operation LDFI Load FP into register interlocked Signal interlocked, src -> dst, LDI Load Int. into register interlocked Signal interlocked, src -> dst, STFI Store FP to memory interlocked src -> dst, clear interlock STII Store Int. to memory interlocked src -> dst, clear interlock SIGI Signal interlocked Signal interlocked, clear interlock



Texas Instruments TMS320C3x Assembler Sections: smallest unit of an object file separate memory maps of DSP .text ;executable code .data ;initialized data .bss label, size ;uninitialized words for variables, label = 1. location .sect "sect name" ;create a named initialized section label .usect "sect name", size ;reserve size uninitialized words in a named section Constants: directives to initialize constants and strings Types: Integer -binary: 01b=01h; 1110001b=71h -decimal: 1000=3E8h; -32768=8000h -hexadec.: 12h; 0abc12h (must start with decimal letter) 32 bit floating point: 3.14; -0.3; -0.314e-2 (inside of instructions 16 bit short fp) 8-bit ASCII character: 'ab' =00006261h; 'abcd' =64636261h (enclosed 1-4 character)

Assembler Directives: supply program data and control the assembler process (only a strong subset) .global sysm1 [,sym2, ...] ;identify symbols that be referenced externally and assign values .word val1 [, val2, ...] ;places one or more 32 bit values .float val1 [, val2, ...] ;places one or more f.p.-values (right in 32 bit word) .string "str1" [, "str2", ...] ;initialize one or more strings (4 char. in each 32 bit word) .space size ;reserve size words in the current section symbol: .set value ;assign a value to a symbol (Pi: .set 3.14159) .asg string, subst.symbol ;assign a text string to substitution symbol (.asg R0, Temp)

Running Assembler: need the source input file generated the object file (and a list file) asm30.exe [inp_file.asm [obj_file.obj [list_fil.lst ] ] ] [ -options ] Options: -v40 select TMS320C40 (default -v30 for C30/C31) -l produces a list file -ipathname directory for .copy, .include or .mlib directives -x produces a cross-reference table on the end of listing file -s puts all defined symbols in the object file’s symbol table -c make upper and lower case insignificant (abc = ABC)

Texas Instruments TMS320C3x Linker:

Running Linker: need the object files as input and link it to an executable output file

lnk30.exe [ -options] file1.obj file2.obj .... fileN.obj Options: -a produce an absolute executable output module (default) -r produce a relocatable unexecutable output module -ar produce a relocatable executable output module -o filename Name of output file (default: a.out) -m filename create a map file -l filename name of archive library file as input -x forces rereading of libraries -f value set the fill value to fill holes (4 byte) default = 0 -e glob_symb define the entry point (start address of program), default = 0 -h make all global symbols static -u symbol place an unresolved external symbol to output module -s strip symbol table and line number entries from output module -c ROM auto-initialization model for C compiler -cr RAM auto-initialization model for C compiler -stack constant define stack size (4 byte, default 1K) for C compiler -heap constant define heap size (4 byte, default 1K) for C compiler Another possibility: A special linker control file contains all options (filename.cmd)



Example Linker Command File: ( linker options in a special linker control ASCII-file = filename.cmd )

/*********************************************************************** */ /* TEST.CMD - v4.70 COMMAND FILE FOR LINKING C30 PROGRAMS */ /* Usage: LNK30.EXE TEST.CMD */ /*********************************************************************** */ test1.obj test2.obj /* Specify input files for linker */ -o test.out /* Specify linker output file */ -m test.map /* ask for a map file */ -ar /* produce a relocatable executable object */ -e START /* specify entry point of code */ MEMORY /* Describe memory configuration for linker memory allocation scheme */ VECT: org = 0x00000000, len = 0x00000040 /* VECTORS IN SRAM */ SRAM: org = 0x00000040, len = 0x00003fc0 /* external 16K SRAM MEMORY */ RAM0: org = 0x00809800, len = 0x00000400 /* internal 1K RAM BLOCK 0 */ RAM1: org = 0x00809c00, len = 0x00000400 /* internal 1K RAM BLOCK 1 */ SECTIONS /* specify how each output section is to be allocated into memory regions */ init > START .text: > RAM0 /* CODE */ .const: > RAM0 /* CONSTANTS */ .stack: > RAM1 /* SYSTEM STACK */ .bss: > RAM0, block 0x10000 /* VARIABLES */ .data: > RAM0 /* DATA - NOT USED FOR C CODE */ .text: > RAM1 .data: > RAM1

TMS320C3x C Source Debugger: Uses the real time in-circuit XDS510 Emulator Board for debug C and Assembler Start DOS-Version: EMU3X.exe [-options] or Windows-Version: EMU3XW.exe [-options] Some Important commands (a selection, most are also available by menu bare):

Debugger working modes and system tasks: Command Example or Hot Key • reset the emulator reset • exit the debugger quit Displaying files and loading programs: • Load an object file: load filename load test.out • Display reassembled at a specific address: dasm address dasm 0x8000 • Creating a new additional Memory Window: mem[#] address [,format] mem2 0x8000 Managing breakpoints: • Add a software breakpoint ba address ba 0x8000 • Delete a software breakpoint bd address Running and breaking programs: • run a program run [expression] F5 • break a running program by a Breakpoint, a mouse click or press Esc • run free disconnect emulator runf • halt the target system after free run halt Single-stepping through code • single step through assembler code step [expression] F8 • single step (without single step called func.) next [expression] F10 • single step through C-code cstep [expression] Memory mapping • list memory map ml • add a block to memory map ma address, length, type • delete a block from memory map md address • reset memory map mr • initialize a memory block with specified value fill address, length, data




TMS320C40

TMS320C40 is a 32-bit, floating-point, its source-code compatible with TMS320C3x Features of TMS320C40-60:

• 33-ns Instruction Cycle Time, 330 MOPS, 60 MFLOPS, • Six Communications Ports withe Six-Channel Direct Memory Access (DMA) Coprocessor • High Data-Rate, Single-Cycle Transfers: High Port-Data Rate of 120M Bytes/s ('C40-60) (Each Bus) • Single-Cycle Conversion to and From IEEE-754 Floating-Point Format, Single Cycle, 1/x, 1/ √ • Single-Cycle 40-Bit Floating-Point, 32-Bit Integer Multipliers • Twelve 40-Bit Registers, Eight Auxiliary Registers, 14 Control Registers, and Two Timers • IEEE 1149.1[dagger] (JTAG) Boundary Scan Compatible • IDLE2 Clock-Stop Power-Down Mode, 325-Pin Ceramic Grid Array (GF Suffix) • Fabricated Using 0.72-um Enhanced Performance Implanted CMOS (EPICTM), 5-V Operation

TMS320C44 a low cost version of C40 • Only 4 bidirectional parallel 8-bit communication ports • 0,72 µm EPIC CMOS-Technology • 304- Lead Plastic Quad Flatpack

Digital Signal Processors Other DSPs Page 1 / 10


7. Overview to other DSPs Zilog Z89175/176

Z89175/176 Functional Block Diagram

Main Features of Mixed signal, dual processor chip system Z89175

• Z8 microcontroller with 24 KB ROM, 256 byte RAM, two 8-bit counter/timers, and up to 47 I/O pins.

• 16-Bit DSP with 24 bit ALU, two DSP timers, realtime clock 8K Word DSP Program ROM, 2*256 Words Data RAM

• 8-Bit A/D Converter with up to 16 kHz Sample Rate, 10-Bit PWM D/A Converter • Z8 and DSP processors are coupled by mailbox registers and an interrupt system. • Clock up to 29,49 MHz, low power consumption - 200mW, 100 pin QFP



Motorola DSP566xx Family Motorola designed the DSP56654 to support the rigorous demands of the cellular subscriber market. Optimized for narrow-band wireless systems such as GSM and TDMA/AMPS.

DSP56654 System Block Diagram RISC M-CORE MCU High-performance DSP56600 core

• 32-bit load/store RISC architecture • Fixed 16-bit instruction length • 16-entry 32-bit general-purpose register file • 32-bit internal address and data buses • Special branch, byte, and bit manipulation

instructions • Support for byte, half-, and word-memory accesses • Fast interrupt support via vectoring/auto-vectoring

and a 16-entry dedicated alternate register file

• One cycle engine (e.g., 70 MHz = 70 MIPS) • 16 *16-bit parallel multiplier-accumulator (MAC) • Two 40-bit accumulators including extension bits

40-bit parallel barrel shifter • Highly parallel instruction set with unique DSP

addressing modes • Nested hardware DO loops; Fast auto-return

interrupts • Real-time trace capability via external address bus



Texas Instruments TMS 320C80 Key features of the ’C80:

• More than 2 billion operations per second (BOPS) • Four parallel processing advanced DSPs (PPs) with 64-bit instructions and 32-bit fixed-point data • Each PP is capable of many parallel operations per cycle. • RISC master processor with 120-MFLOPS IEEE-754 floating-point unit • 2.4G bytes/s of data and 1.8G bytes/s of instructions, 32K bytes of RAM can be shared by all processors • Video controller supports any display or capture resolution • 0.5-µm CMOS technology, Efficient packaging: 305-pin PGA / 352-pin BGA

TMS320C20x, C24x Family The combination of advanced Harvard architecture, on-chip peripherals, on-chip memory, and a highly specialized instruction set is the basis of the operational flexibility and speed of the ’C2xx devices. The ’C2xx generation offers these advantages:

• Enhanced TMS320 architectural design for increased performance and versatility

• Source code compatibility with the ’C1x and ’C2x DSPs

• Upward compatibility with C5x generation • New static-design techniques for minimizing

power consumption Key features of the ’F206 include:

• 50-ns instruction cycle time • 4.5K words on-chip RAM, • 32K 16-bit words of flash memory • 192K-word external address reach • Full-duplex enhanced synchronous serial port (ESSP)

with 4-deep FIFO • Full duplex asynchronous serial port • 100-pin TQFP package • Flash programming utility software



Texas Instruments TMS320C5x Family The TMS320C50 is a highly integrated DSP offering a complete system on a single chip.

• Accepts source code from the ’C1x, ’C2x, and ’C2xx generations. • A parallel logic unit (PLU), zero-overhead context switching, and block repeats • ANSI C compiler, which translates into highly optimized assembly language. • IEEE 1149.1-standard (JTAG) scan-path test bus for system test and emulation • 192K-word external address reach and two indirectly addressed circular buffers • Software wait-state generation and Various phase-locked loop (PLL)

TMS320C51 Key features of the TMS320C51

• 9K-word block of Data/Program RAM

• 20-, 25-, 35-, and 50-ns instruction cycle times

• Boot ROM option • Full-duplex synchronous

serial port • Time-division multiplexed

(TDM) serial port • 132-pin BQFP and 100-pin TQFP

Packages (14 × 14 × 1.4 mm) Texas Instruments TMS320C54x Family The TMS320C54x generation of DSPs combine high-performance, a large degree of parallelism, and a specialized instruction set to effectively implement a variety of complex algorithms and applications.

• Advanced multibus architecture with three separate 16-bit data buses and one program bus, three data buses, and four address buses

• 40-bit arithmetic logic unit (ALU), including a 40-bit barrel shifter and two independent 40-bit accumulators • 17-bit × 17-bit parallel multiplier coupled to a 40-bit dedicated adder for nonpipelined, single-cycle

multiply/accumulate (MAC) operation • Two address generators, including eight auxiliary registers and two auxiliary register arithmetic units

(ARAUs) • Viterbi accelerator

The ’C54x architectural efficiencies — high MIPS and low power dissipation — make it an ideal device for a variety of wireless and wireline communications systems.

• Digital cellular basestations, Wireless local loop • V.34/ISDN modems; Pagers • Personal digital assistants (PDAs) • Cable modems, Networking • Wireless data (CDPD) and handsets (TDMA or CDMA standards) • Digital Cordless (DECT, CT2, PHS) • Set-top boxes and Satellite modems



ADSP-21xx 16-Bit Fixed Point DSPs There are over two hundred ADSP-21xx code-compatible family members, varying in memory integration, operating voltage, operating speed and temperature range. Analog Devices ADSP-2185L • 16-bit fixed Point DSP-Super Harvard Architecture • Multifunction Instruction

• Independent ALU, Multiplyer, Barrel Shifter, 2*DAG • 52 MIPS

• Single-Cycle Instruction Execution • Extended internal 80 Kbytes: (16K*16+16k*24)

• Memory bandwidth increased to 1.6 Gbyte/s @ 100MHz • 16-bit Interval Timer

• 3-Bus Architecture, Dual Operand Fetch • 4 GB Address range

• 16 bit DMA Port for high speed Access • Two double buffered Serial Port

• Clock Rate Increased to 33MHz, 30 ns Core Instruction • Six External Interrupts

• Programmable Wait State Generator • Reduced internal Vdd, 3.3V with 0.8mA/MIP

• Low Power Consumption: 4W/processor (Maximum • Package: 100-Lead TQFP, 144-miniBGA

ADSP-2185L Functional Block Diagram



Analog Devices' 32-Bit SHARC DSP Architectural Overview (Super Harvard Architecture) Common Architectural Features

• 32/40-Bit IEEE Floating-Point Math • 32-Bit Fixed-Point MACS with 64-Bit Product & 80-Bit Accumulation • No Arithmetic Pipeline; All Computations Are Single-Cycle • Circular Buffer Addressing Supported in Hardware • 32 Address Pointers Support 32 Circular Buffers • Six Nested Levels of Zero-Overhead Looping in Hardware • Instruction Set Supports Conditional Arithmetic, Bit Manipulation, Divide & Square Root, Bit Field Deposit • DMA Allows Zero-Overhead Background Transfers at Full Clock Rate Without Processor Intervention • Large internal dual-ported SRAM

ADSP-2106x SHARC Block Diagram



8. New DSP Processor Architectures and Alternatives Common attributes of conventional DSPs

• 16- or 24-bit fixed-point (fractional), or 32-bit floating-point arithmetic • 16-, 24-, or 32-bit instructions, one instruction per cycle ("single issue") • Complex, "compound" instructions encoding many operations • Highly constrained, non-orthogonal architectures • Dedicated addressing hardware with specialized addressing modes • Multiple-access on-chip memory architecture • Dedicated hardware for loops and other execution control • Specialized on-chip peripherals and I/O interfaces • Low cost, low power, low memory usage

Increasing Parallelism –Superscalar Architectures To boosting performance beyond the increases afforded by faster clock speeds requires more parallelism in one of the following ways:

• Increase the number of processors to perform digital signal processing, Enhanced DSPs, Co-Processors; ....

• Increase the number of operations that can be performed in each instruction, SIMD technologie

• Increase the number of instructions that can be issued and executed in every cycle - VLSI or Superscalar DSPs

1. Multiprocessors on a chip • DSP with co-processor systems (enhanced conventional DSPs)

- DSP with additional specialized co-processors - FIR-filter, Viterby decoding, MPEG decoding, ….. - Example: Motorola DSP56305/307

• Hybrid DSP/Microcontroller - DSP implanted into a existing µC or GPP cores - Wide variety of approaches to combine DSP and microcontroller functionality - Example: Zilog Z89175/176

• Multiple processors on a die - RISC Core and one or more DSPs on the chip - Entirely different instruction sets - Example: Motorola 56652, Texas Instruments C80

2. More Operations Per Instruction Inside DSP (enhanced conventional DSPs) How to increase the number of operations that can be performed in each instruction?

• Add execution units (multiplier, adder, etc.) • Use more and/or wider buses to keep the processor fed with data • Enhance the instruction set to use the additional hardware,

possibly, increase the instruction word width • Add SIMD (single instruction, multiple data) capabilities • Example: ADSP 2116xx



3. More Instructions Per Clock Cycle How to increase the number of instructions that are issued and executed in every clock cycle?

• Add execution units (multiplier, adder, etc.) • Use more buses for data load and store • Use VLIW techniques or superscalar architecture

VLIW and superscalar architectures typically use simple, RISC-based instructions • Advantages:

o Increased performance o More orthogonal than the complex, compound instructions

traditionally used in DSP processors o Potentially easier to program, better compiler targets o Scalable (?)

• Disadvantages: o New kinds of programmer/compiler complexity o Programmer (or code-generation tool) must keep track of instruction scheduling o Some VLIW processors have deep pipelines and long latencies o Code size bloat (overhead) - a high program memory bandwidth

requirements o High power consumption

•• Example: TI C62xx , C64xx, C67xx Alternatives to DSP Processors

High-Performance GPPs with SIMD

• Most high-performance GPPs targeting desktop applications are superscalar architectures like: Pentium, Athlon, PowerPC

• These processors can often execute DSP tasks faster than DSP processors • Most offer SIMD extensions to increase multimedia performance:

MMX, SSE, AltiVec • Disadvantages:

- Price and Power consumption - DSP- oriented on-chip integration - Execution time predictability

• It’s only a alternative fore DSP desktop application

High-Performance Microcontroller with DSP elements

• High-Performance RISC-Controller with additional DSP capabilities MAC – Unit, AGU

• Example TI MSP430



Texas Instruments TMS320C62x/C64xC67x

TMS320C62x/C67x Block Diagram Features:

• Advanced VLIW CPU with eight functional units, including two multipliers and six arithmetic units

• Executes up to eight instructions per cycle for up to ten times the performance of typical DSPs

• Allows designers to develop highly effective RISC-like code for fast development time

• All instructions execute conditionally. • Increases parallelism for higher sustained performance

Instruction packing • Gives code size equivalence for eight instructions

executed serially or in parallel • Reduces code size, program fetches, power

consumption. • Industry’s most efficient C compiler on DSP

benchmark suite • Industry’s first assembly optimizer for fast

development and improved parallelization • Saturation and normalization provide support for key

arithmetic operations.

• Peak 1336 MIPS at 167 MHz • Peak 1G FLOPS at 167 MHz for single-

precision operations • Peak 250M FLOPS at 167 MHz for double-

precision operations • Peak 688M FLOPS at 167 MHz for multiply and

accumulate operations • Hardware support for single-precision (32-bit)

& double-precision (64-bit) IEEE floating-point • 32*32-bit integer multiply, 32- or 64-bit result • A variety of memory and peripheral options are

available for the ’C62x/C67x: • Large on-chip RAM for fast algorithm execution• 32-bit external memory interface supports

SDRAM, SBSRAM, SRAM, • 16-bit host port for access to ’C62x/C67x

memory and peripherals • Multichannel DMA, Multichannel Serials • 32-bit timer(s)



Analog Devices ADSP-21160

• 32-bit DSP-Super Harvard Architecture • Assembly source code compatible • Singel-Instruction-Multiple-Data (SIMD) • 600 MFLOPS peak, 400 MFLOPS • Double: ALU, Multiplyer, Barrel Shifter • Extended internal 4 Mbits – (2 * 32 * 64KW) • Memory bandwidth up to 1.6 GByte/sec @ 100MHz • Double-word transfers each cycle • Data bus widths increased to 64 bits • 4 GB Address range • 14 DMA channels with zero overhead – 700 Mbyte/s • Two Serial Port 50 Mbit/s synchr. serial ports • Clock rate up to 100MHz, 10 ns core instruction • Multiprocessor support up to six DSPs • IEEE 1149.1 JTAG standard test access port • Reduced Voltage: 2.5V internal Vdd, 3.3V I/ • Low Power Consumption: 4W/processor (Maximum • Package: 27mm x 27mm Plastic BGA

ADSP-21160 Functional Block Diagram ADSP-21160 Benchmarks (@ 100 MHz)

Digital Signal Processors Exercises Page 1 / 21

Signal Sampling 1. The Aliasing Problem What is the aliasing problem? Examples for sampling failures. Examples for subsampling without failures 2. Anti-Aliasing Low Pass Filter Damping of input signal as function of the frequency behind the anti-aliasing-low-pass fg = Cutoff frequency of LP S = Steepness of LP for f > fg [dB / Octave]

2lg

lg**)( fg

f

SfgfldSfD ==

Shannon’s sampling theorem: 2≥=fgfsV fs = Sampling frequency

The damping of low pass should be lower than the ADC resolution. Task: Calculate the necessary oversampling V as a function of low pass steepness S For ADCs with 8 ; 12 and 16 bits resolution Result:

20 30 40 50 60 70 80 90 100

5

10

15

20

25

30

trace 1trace 2trace 3

dB/Oktave

30

0

V S 8,( )

V S 12,( )

V S 16,( )

10020 S

University Rostock Institute of Automation December 2002


3. Anti-Aliasing LP-Filter with Clock Decimation Filter Oversampling input signals with following decimation of sampling frequency.

• Goal?

• Advantages?

• Function?

n * Clock Clock

x(t) DSP Digital

LP ADCS & HLP

4. Clock Interpolation with Reconstruction LP-Filter Interpolation of output signals with increasing sample frequency (oversampling).

• Goal?

• Advantages?

• Function?

Clock

n* Clock

y(t)DSP

Inter- polation

DAC LP



FIR - Filter with a Simple 16 Bit Microcontroller

1. Write an assembler program to realize a simple FIR filter with 20 coefficients (N= 0..19). Use the

prearranged simple microcontroller with its instruction set. Use the defined memory map.

Organize the data shift of Xn, Xn-1,…., by a rotating buffer, that means without shift.

2. How many instruction cycles need this FIR filter process? Calculate the maximum sample frequency

for a processor with a 10MHz instruction cycle.

3. Analyze the main part of program that realizes one filter tap. Which 6 operations are necessary for

realizing one tap within one instruction cycle of a DSP?

4. For further increasing processing speed of DSPs, which elements can be helpful?

z-1

a0

z-1 z-1

a1 aN-1 aN

yn

Xn-NXn-1Xn

Sum

FIR-Filter for 20 Taps, N=19

Inp: input data ; new data

Data: FIR data 1 ; D - pointer (P1) data 2 : data 19 Dend: data 20

Coff: coeff.: a19 ; C - pointer (P2) a18 : a1 Cend a0

outp: output data ; output result

Memory Map for a 20 Tap FIR-Filter



Internal - Bus -System

Memory ALU Data Register

Pointer Register

Program MUL R1 R2

P1 P2

Data R3 R4

Block Structure of a Simple 16 Bit Microcontroller

Load src , dest Arithmetic src , dest Branch cond , dest

ld reg , reg inc reg jp abs.adr

ld # immed.value , reg dec reg jz cond , abs.adr.

ld (abs.adr) , reg cp # immed.value , reg jnz cond , abs.adr.

ld reg , (abs.adr) (st) add reg , reg jc cond , abs.adr.

ld (reg.pnt) , reg sub reg , reg jnc cond , abs.adr.

ld reg , (reg.pnt) (st) mul reg , reg djnz reg , abs.adr.

Instruction Set of the Simple Microcontroller ;---------------------------------------------------------------------- ; Assembler Frame Program for a FIR Filter ;---------------------------------------------------------------------- Start: ld #Data, P1 ; load data pointer ;---------------------------------------------------------------------- Next: ld (Inp), R1 ; read new data from input ld R1, (P1) ; write data into the filter data array inc P1 ; pointer to the next place cp #DEnd+1, P1 ; compare pointer jnz NoRelo1 ; check end address of pointer ld #Data, P1 ; reload pointer to begin NoRelo1: ld #Coff, P2 ; new load of coefficient pointer ld #0, R3 ; sum register R3 = 0 ld #20, R4 ; loop counter R4 = 20 ;---------------------------------------------------------------------- Step: ; R1 load with data value (1. oldest) ; pointer to the next place ; compare pointer ; check end address of pointer Write the program loop ! ; reload pointer to begin ; R2 load with coefficient (first is a19) ; coeff.-pointer to the next ; R2 := R2 * R1 ; R3 := R3 + R2 ; decrement loop, repeat for next step ;---------------------------------------------------------------------- ld R3, (Outp) ; write result to output jp Next ; jump -> Start Mainloop, with next data ;----------------------------------------------------------------------



Data Format DSP56xxx Family

i

Bi

zni

ni∑

=

−=

⋅• General :

• Fractional Integer two's complement like normal integer but with a decimal point (normal integer have the point right) the digits right of point get the power of -2 the location of the point is not significant for ADD and SUB, only for MUL

Hexa Dezimal

Biggest positive number for word operands

Smallest negative number for word operands

Biggest negative number for word operands

Dynamik range for word operand dB

Dynamik range for Accumulator dB

Coefficients for a IIR Butterworth Low pass filter 2. Order; fs=40*fg

Coeff. decimal (fs=40*fg) hexa hexa/4 decimal (fs=400*fg)

a0 = 0.005543 0.000061

a1 = 0.011085 0.000122

a2 = 0.005543 0.000061

b0 = 1.000000 1.000000

b1 = -1.778640 -1.977780

b2 = 0.800813 0.978032

• Failure for a0 (hexa/4) in compare to decimal coefficients = °/oo



Addressing Modes DSP56300 Fill in addressing mode and results after operations

1. Move Y0,Y:(R3)- Addressing Mode =

Before: After: Y1 Y0 Y1 Y0 12.31.23 45.64.56 R3 00.47.35 R3 N3 00.00.02 N3 M3 FF.FF.FF M3

Addr. Y-Memory Addr. Y-Memory

004734 12.45.67 004734 004735 89.AB.00 004735 004736 67.45.00 004736 004737 AB.12.33 004737

2. Move B0,X:(R2)+ Addressing Mode =

Before: After: B2 B1 B0 B2 B1 B0 AF 65.43.21 FE.DC.BA

R2 00.25.00 R2 N2 00.00.02 N2 M2 FF.FF.FF M2

Addr. X-Memory Addr. X-Memory

00.24.FF 12.45.67 00.24.FF 00.25.00 89.AB.00 00.25.00 00.25.01 67.45.00 00.25.01 00.25.02 AB.12.33 00.25.02



3. Move X:(R4)-N4,A0 Addressing Mode =

Before: After: A2 A1 A0 A2 A1 A0 0F 65.43.21 FE.DC.BA

R4 00.25.02 R4 N4 00.00.03 N4 M4 FF.FF.FF M4

Addr. X-Memory Addr. X-Memory

00.24.FF 12.45.67 00.24.FF 00.25.00 89.AB.00 00.25.00 00.25.01 67.45.00 00.25.01 00.25.02 55.66.77 00.25.02

4. Move Y:(R4)+N4,X0 Addressing Mode =

Before: After: X1 X0 X1 X0 65.43.21 FE.DC.BA R4 00.25.02 R4 N4 00.00.03 N4 M4 00.00.03 M4

Addr. Y-Memory Addr. Y-Memory

00.24.FF 12.45.67 00.24.FF 00.25.00 89.AB.00 00.25.00 00.25.01 67.45.00 00.25.01 00.25.02 55.66.77 00.25.02 00.25.03 11.22.33 00.25.03 00.25.04 44.56.78 00.25.04



Function Generator with DSP56001 LSI - DSP 56001 PC System Board

• PC-AT compatible ISA-Board with Motorola 56001

• 64 K SRAM Program memory

• 64 K SRAM X - data memory

• 64 K SRAM Y - data memory

• 2 * 16 bit ADC’s maximum sample frequency 200 KHz

• 2 * 16 bit DAC’s maximum sample frequency 200 KHz

• external or internal programmable sample clock Input / Output Ports and Interrupt for the LSI - DSP56001 PC System Board

• Port0 = Y:$FFC0 CONTRO: 8 bit I/O control (bit 6 switch the analogue channel select) • Port1 = Y:$FFC1 ADCDAC: read data from ADC (for the selected analogue channel)

write data to DAC • Port2 = Y:$FFC2 SAMPLE: provides one immediate sample clock

(to ADC when written, or to DAC when read from) • Port3 = Y:$FFC3 SAMCLK: 16 bit forward counter to provide sample pulses (10 MHz clock)

Value = $FFFF - (10MHz / Sample frequency) • Interrupt - IRQB every sample pulse

Init for the LSI - DSP56001 PC System Board

movec #$0006,OMR ; DE=1 enable the internal data ROMs

movep #$0020,Y:FFC0 ; select analogue ADC and DAC channel 1

movep #$FFCE,Y:FFC3 ; program the counter for 200KHz sample frequency (e.g.)

movep #$FC3C,X:FFFF ; enable IRQB on negative edge in the interrupt priority register

; programs it uses the interrupt IRQB every sample pulse

Tasks

1. Write an assembler program to generate a Ramp-Function to the DAC-Output

Describe every assembler line.

Fill in the missing instruction.

2. Modifier the previous program to generate a Sine-Wave.

(start address of internal 4-quadrant-sin-wave-table: Y:$100, length:$100)

3. Modifier the previous programs to generate a Ramp-Modulated-Sin-Signal.



;******************************************************************** ; Ramp Generator ;******************************************************************** PAGE 132,89,0,0 LSTCOL 7,6,8,11,11 ;-------------------------------------------------------------------- ; Define ;-------------------------------------------------------------------- Start equ $60 ;Start address of program ContA equ $FFC0 ;Address of AD/DA Control port DAC equ $FFC1 ;Address of output signal port SamF equ $FFC3 ;Counter for sample frequency Chan1 equ $20 ;Code Select AD/DA Channel 1 K200 equ $FFCE ;Code for 200KHz samples DeltR equ $002000 ;Delta for ramp IPL equ $FFFF ;Interrupt priority register EIMB equ $00FC3C ;Enable interrupt MON, IRQB ;-------------------------------------------------------------------- ; Interrupt-Service-Routine ;-------------------------------------------------------------------- org p:$000A ;interrupt vektor address for IRQB ;add delta to sum ;output of ramp to DAC ;-------------------------------------------------------------------- ; Main Program ;-------------------------------------------------------------------- org p:Start ;define begin of program memory Ramp movep ;set sample frequency 200kHz movep ;select AD/DA channel 1 movep ;load of Inter.-Prior. Reg. move ;load delta for one ramp step main jmp ;endless main loop ;-------------------------------------------------------------------- end ;end for assembler



FIR Filter Program for DSP56xxx Family FIR-Filter (finite impulse response)

z-1

a0

z-1 z-1

a1 aN-1 aN

yn

Xn-NXn-1Xn

Real Correlation-, Convolution- or FIR-Filter

Sum

Exercise: Write a DSP 56301 assembler program for a FIR-Filter, fill in the missing instructions

• filter length N=20

• get input values from external ADC (address y:input)

• put output values to external DAC (address y:output)

• store input values into X-memory

• store filter coefficients into Y-memory

Steps to do:

• get sample from ADC

• multiply samples and coefficients and accumulate the result

• write result to DAC

filter

coefficients input samples

(before) input samples

(after) low address R4 → a0 R0 → x(n-19) x(n)

a1 x(n) x(n-1)

:

:

:

a17 x(n-16) x(n-17) a18 x(n-17) x(n-18)

high address a19 x(n-18) R0 → x(n-19) University Rostock Institute of Automation December 2002


20 Tap FIR Filter Example

;*********************************************************************

; Definition

;*********************************************************************

n equ 20 ; number of taps

start equ $40 ; start address of program

data equ $20 ; start address for data (x)

coeff equ $20 ; start address for coefficients (y)

input equ $ffc1 ; input address

output equ $ffc1 ; output address

;********************************************************************* ; Initialisiation ;********************************************************************* org p:start ; set counter to start

start: move #data, r0 ; start address data to pointer r0

move #coeff, r4 ; start address coeff. to pointer r4

; for circular buffer of r0

; for circular buffer of r4

;********************************************************************* ; Filter loop (can be repeated by a loop instruction) ;********************************************************************* movep y:input, x:(r0) ;input sample into x-memory

;clear acumul., load data and coeff.

; repeat next instr. 19 times

, acumul. load for every tap

; last mac, set data pointer to oldest

movep a, y:output ; output filtered sample

;*********************************************************************

end



IIR Filter Program for DSP56xxx Family Design of a IIR-Low-Pass

Filter-Order N = 2 Filter-Type: Low-Pass High-Pass Sample-Frequency [Hz] = 200 Band-Pass Band-Stop Upper Cutoff-Frequency [Hz] = 5 Filter-Type : Butterworth Bessel Lower Cutoff-Frequency [Hz] = 0 Tschebyscheff Cauer Filter-Coefficients

a0 a1 a2 b0 b1 b2 0.005543 0.011085 0.005543 1 -1.77864 0.800813

Z-Function

21

21

*][2*][1][0*][2*][1][0)( −−

−−

++++

=zibzibibziaziaiazH

∑ ∑= =

−− −=N

k

M

kknkknkn ybxay

0 1

Second Order Real Biquad IIR Filter ( 1. direct form ) z-1

+

a0

z-1

z-1z-1

-b2

a1 a2

-b1

+ +yn

yn

Xn-2Xn-1Xn

yn-2 yn-1

Equation for filter length N=2

y(n) = a0*x(n)+a1*x(n-1)+ a2*x(n-2)

-b1*y(n-1)-b2*y(n-2)



Second Order Real Biquad IIR Filter Memory Map:

Pointer X-memory Y-memory r0 a0, a1, a2, b1, b2 r4 x(n-1), x(n-2) r5 y(n-1), y(n-2)

Second Order Real Biquad IIR Filter Assembler Programm: prepare register set for the biquad

get input values from external ADC (Y:input) repeat calculate the filter result put the result values to external DAC (Y:output) ;----------------------------------------------------------------- ;define variables ;----------------------------------------------------------------- coeff equ $0050 ;start address of filter coefficients

xdata equ $0080 ;start address of x-data inside Y-Memory

ydata equ $0090 ;start address of y-data inside Y-Memory

input equ $ffe0 ;ADC-address

output equ $ffc1 ;DAC-address

;----------------------------------------------------------------- ;init register for iir-biqad ;----------------------------------------------------------------- Init move #coeff,r0 ;load r0 pointer to coefficients

move #xdata,r4 ;load r4 pointer to x-data

move #ydata,r5 ;load r5 pointer to y-data

move ;m0 – circular for 5 coefficients

move ;m4 – circular for 2 old x-data

move ;m5 – circular for 2 old y-data

;----------------------------------------------------------------- ;IIR-biqad-main-loop ;----------------------------------------------------------------- IIR movep y:input,y1 x:(r0)+,x0 ;read ADC x0:=a0 y1:=x(n)

mpy ;a0*x(n) x0:=a1 y0:=x(n-1)

mac ;a1*x(n-1) x0:=a2 y0:=x(n-2)

mac ;a2*x(n-2) x0:=b1 y0:=y(n-1)

mac ;-b1*y(n-1) x0:=b1 y0:=y(n-2)

macr ;-b2*y(n-2) x(n-2):=x(n)

movep a1,y:output a1,y:(r5) ;write DAC y(n-2):=y(n)

;-----------------------------------------------------------------

jmp IIR ;repeat main loop or return subroutin or ...

end ;end for assembler



Second Order Real Biquad IIR Filter ( 2. canonic form )

+ a0*

a1*

a2*

z-1

b1

-

b2

z-1 a1* = a1 / a0

a0* = 1 / a0

a2* = a2 / a0

Equation: d(n) = x(n) - (a1)*d(n-1) - (a2)*d(n-2) y(n) = d(n) + (b1)*d(n-1) + (b2)*d(n-2) Second Order Real Biquad IIR Filter Memory Map:

Pointer X memory Y memory r0 d(n-2), d(n-1) r4 a2, a1, b2, b1

Second Order Real Biquad IIR Filter Assembler Programm ( 2. canonic form ):



FFT Implementation: Assembler Program for DSP 56xxx

Fourier transform: ∫∞

∞−

= dtetfjF tjωω )()(

DFT : ∑−

=

−=Ω

1

0

2)()(

N

n

Nknj

enTfkFπ

ca. 4N² MAC-Operations; Reduction by decomposition (½ N → ¼ operations) FFT: Decomposition until only 2 elements per sequence left. ("Butterfly"). Subsequences consists of even or odd sample numbers only.

Therefore the total number of samples must be a power of two. ca. N ld(N) MAC operations

Decomposition using even an odd sample numbers causes new element order ("bit reversal"):

index Folge Step 1 Step 2 index 000 0 0 0 000 001 1 2 4 100 010 2 4 2 010 011 3 6 6 110 100 4 1 1 001 101 5 3 5 101 110 6 5 3 011 111 7 7 7 111

Pass 1 Pass 2 Pass 3

x(0) X(0)

Group 1 Group 2

x(1) X(4)

x(2) X(2)

x(3) X(6)

x(4) X(1)

x(5) X(5)

x(6) X(3)

x(7) X(7)



DIT-FFT Motorola DSP 56000 FFT core: "Butterfly"

G E F jH E F j

Gr Er Fr FiGi Ei Fi FrHr Er GrHi Ei Gi

= + −= − −

⇒

= − − + −= − − + −= −= −

exp( )exp( )

( cos ) ( sin )( cos ) ( sin )Φ

Φ

Φ ΦΦ Φ

22

exp(-jφ)

E G

HF

; Radix-2 Decimation-In-Time FFT ; Complex input and output data (Real data in X memory, Imaginary data in Y memory) ; input data in normal order, output data in bit reversed order ; ; points number of points (power of 2) ; passes number of passes (passes:=ld(points)) ; data start address data buffer ; coeff start address Sine/cosine table ; ; r0 -> E r4 -> G X: real r6 -> T X: -cos() ; r1 -> F r5 -> H Y: imag (twiddle factor) Y: -sin() move #points/2,n0 ; butterflies per group move #1,n2 ; groups per pass move #points/4,n6 ; T-pointer offset move #-1,m0 ; linear addressing mode E move m0,m1 ; - " - F move m0,m4 ; - " - G move m0,m5 ; - " - H move #0,m6 ; bit reversed addressing T do #passes,end_pass ; passes: ld(points) move #data,r0 ; E-pointer move r0,r4 ; G-pointer lua (r0)+n0,r1 ; F-pointer move #coeff,r6 ; T-pointer lua (r1)-,r5 ; H-pointer move n0,n1 ; offset F move n0,n4 ; offset G move n0,n5 ; offset H do n2,end_grp ; groups: 1,2,4,... move X:(r1),x1 Y:(r6),y0 ; Fr->x1, -sin()->y0 move X:(r5),a Y:(r0),b ; Fr->a , Ei->b move X:(r6)+n6,x0 ; -cos()->x0, update r6 do n0,end_bfy ; butterflies: N/2, N/4,... mac x1,y0,b Y:(r1)+,y1 ; Ei+Fr(-sin())->b, Fi->y1 macr -x0,y1,b a,X:(r5)+ Y:(r0),a ; b-Fi(-cos())->b=Gi, Ei->a subl b,a X:(r0),b b,Y:(r4) ; 2Ei-b->a=Hi, Er->b, B->Gi mac -x1,x0,b X:(r0)+,a a,Y:(r5) ; Er-Fr(-cos())->b, Er->a, a->Hi macr -y1,y0,b X:(r1),x1 ; b-Fi(-sin())->b=Gr, Fr->x1 subl b,a b,X:(r4)+ Y:(r0),b ; 2Er-b->a=Hr, Ei->b, b->Gr end_bfy move a,X:(r5)+n5 Y:(r1)+n1,y1 ; update F&H move X:(r0)+n0,x1 Y:(r4)+n4,y1 ; update E&G end_grp move n0,b1 lsr b n2,a1 ; butterflies /= 2 lsl a b1,n0 ; groups *= 2 move a1,n2 end_pass



Floating-Point Data Format TMS320C3x Family General: N M B E= • N= number, M = mantissa, E = exponent, B = Basis (2) Extended Precision 39 32 31 30 0Floating Point: n E S, F (R0 – R7 Register)

Single Precision 31 24 23 22 0 Floating Point: n E S, F (AR0 – AR7 Register) (Memory)

Mantissa and Exponent: 2th complement, mantissa with hidden bit technology

Number X is given by: X := 01.F * 2E if S = 0 ( = positive number) X := 10.F * 2E if S = 1 ( = negative number) X := 0 if E = 80h ( = -128 )

Hexa Dezimal

Biggest positive number for single Precision

Smallest positive number for single Precision

Dynamik range for single Precision dB

Coefficients for a IIR Butterworth Low pass filter 2. Order; fs=40*fg

Coeff. decimal (fs=40*fg) 31 hexa 0

a0 = 0.005543

a1 = 0.011085

a2 = 0.005543

b0 = 1.000000

b1 = -1.778640

b2 = 0.800813

Coefficients for Constants

Constant decimal 31 hexa 0

π = 3.141592654

2*π = 6.283185307

e = 2.718281828



FIR Filter Program for TMS320C3x Family TMS320C3x assembler program for a FIR-Filter

• FIR-Filter length =20, • Floating point calculation • FIR-Filter-Structure

z-1

a0

z-1 z-1

a1 aN-1 aN

yn

Xn-NXn-1Xn

Real Correlation-, Convolution- or FIR-Filter

Sum

Data Memory Organization for the FIR-Filter (one state inside the cycles)

filter coefficients

input samples (before)

input samples(after)

low addr. ar0 → a0 x(n-18) x(n-19) a1 x(n-19) ar1 → x(n) a2 ar1 → x(n) x(n-1) a3 x(n-1) x(n-2)

:

: : circular

queue a17 x(n-15) x(n-16) a18 x(n-16) x(n-17)

high addr. a19 x(n-17) x(n-18)



FIR-Filter: Assembler Program for TMS320C3x ***************************************************************** *** Subroutine: FIR-Filter *** ***************************************************************** ; ;---------------------------------------------------------------------- ; fir-algorithm: y(n)= a(0)*x(n) + a(1)*x(n-1) + a(2)*x(n-2) + .. ; .. + a(N-2)*x(n-(N-2)) + a(N-1)*x(n-(N-1)) ;---------------------------------------------------------------------- ; typical call sequence: ; PrepFIR load DaSec,dp ;addr. of data-section ; load Coeff,ar0 ;addr. of a(0) (first coeff.) ; load NewDa,ar1 ;addr. of x(n) (newest data) ; load FiLng-2,rc ;length of filter-2 (N-2) ; load FiLng,bk ;length of filter (N) ; call FirFil ;call subroutine ;---------------------------------------------------------------------- .data ;pointer to data section AdcDat .word 0000 ;actually integer-adc-data ;---------------------------------------------------------------------- .text ;pointer to program section ;------------------------------------- FirFil ldf 0.0,r2 ;r2:=0, reset sum register

;load & float new acd-data

;new data overwrite oldest

;r0:=a(0)*x(n) 1.product

;-------------------------------------

;setup the repeat cycle (-1)

;-------------------------------------

;accumulation of products

;multiply next coeff. & data

;-------------------------------------

addf r2,r0 ;add the last product

;-------------------------------------

rets ;return from subroutine ;---------------------------------------------------------------------- ; results: r0 = filter output Y(n) ; ar0 = to 1. filter coeff. a(0) ; ar1 = to newest data value x(n) *********************************************************************** *** End FIR-Filter *** *********************************************************************** University Rostock Institute of Automation December 2002


32*32 Bit Multiplication: Assembler Program for TMS320C3x

• Integer multiplier of C3x processors uses only 24 bit operands

and calculate only 32 results! X = X1 , X0

Y = Y1 , Y0 P1 = X0 * Y0P2 = X0 * Y1 P3 = X1 * Y0 P4 = X1 * Y1 W = W1 W0

• Multiply two 32 bit values X, Y to a 64 bit result: W = X * Y

• Split into 4 steps 16 *16 bit = 32 bit part product P1 …P4

• Write subroutine with register saving

• Pay attention for the right sign of result

***************************************************************** *** Subroutine: 32 * 32 Bit Multiplication With 64 Bit Product *** ********************************************************************* ;-------------------------------------------------------------------- ; Algorithm W := X * Y ;-------------------------------------------------------------------- ; multiplier r0 = X = X1 X0 32 bit X, split to 2*16 bit ; multiplicand r1 = Y = Y1 Y0 32 bit Y, split to 2*16 bit ; ------- ; p1 = X0*Y0 ; p2 = X0*Y1 32 bit part products ; p3 = X1*Y0 ; p4 = X1*Y1 ; ----------- ; product r1,r0 = W = W1 W0 64 bit W, split to 2*32 bit ;-------------------------------------------------------------------- .text ;set text section for program ;------------------------------ Mpy32 push st ;program start push r2 push r3 ;save registers push r4 push ar0 push ar1 ;-------------------------------------------------------------------- ; calculate & save result sign, perform positive numbers ;-------------------------------------------------------------------- xor3 r0,r1,ar0 ;store result sign absi r0 ;absolute value of X absi r1 ;absolute value of Y ;-------------------------------------------------------------------- ; split numbers into 16 bit parts ;-------------------------------------------------------------------- ldi -16,ar1 ;ar1 = -16 lsh3 ar1,r0,r2 ;r2 = upper 16 bit of X (X1) and 0ffffh,r0 ;r0 = lower 16 bit of X (X0) lsh3 ar1,r1,r3 ;r3 = upper 16 bit of Y (Y1) and 0ffffh,r1 ;r1 = lower 16 bit of Y (Y0) ;--------------------------------------------------------------------




;-------------------------------------------------------------------- ; carry out the multiplication ;-------------------------------------------------------------------- ;r4 = p1 = X0*Y0 ;r0 = p2 = X0*Y1 ;r1 = p3 = X1*Y0 ;r1 = p2+p3 ;r3 = p4 = X1*Y1 ;r2 = p2+p3 ;r2 = lower 16 bit of p2+p3 ;-------------------------------------------------------------------- ; check for negate result, because numbers are opposite signs ;-------------------------------------------------------------------- cmpi 0,ar0 ;check result sign bged EndMpy ;if positive then delayed jump ;r1 = upper 16 bits of p2+p3 ;W0 = lower word of product ;W1 = higher word of product ;------------------------------ not r0 ;inverse r0 not r1 ;inverse r1 addi 1,r0 ;2'complement r0 addc 0,r1 ;2'complement r1 ;------------------------------ EndMpy pop ar1 ;end multiplication pop ar0 pop r4 ;restore registers pop r3 pop r2 pop st rets ;return from subroutine ********************************************************************* *** End Subroutine: 32 * 32 Bit = 64 Bit Multiplication *** *********************************************************************

digital signal processors - högskolan kristianstad€¦ · signal processing with digital signal...

Documents