ch. 11 digital signal processing using general-purpose processors

Ch. 11 Digital Signal Processing Using General-Purpose ProcessorsKathy Grimes

Signals• Signals

• Electrical• Mechanical• Acoustic

• Most real-world signals are Analog – they vary continuously over time

• Many Limitations with Analog• Repeatability• Tolerances• Difficulty storing information or implementing certain

operationsLeads us to DSP…

Digital Signal Processing (DSP)• Represent signals by sequences of numbers• Pros

• Repeatable• Accuracy can be controlled• Time-varying operations are easier to implement

• Cons• Sampling cause loss of information• Round-off errors• A/D and D/A mixed-signal hardware

Digital Signal Processing (DSP)• Analog to Digital Converter

• Continuous to Discrete time signal• 11.1 shows the sampling of a signal

• Common Signals• Step Discontinuity (Figure 11.2) Impulse (Figure 11.3)

FIGURE 11.1 Discrete Time Signals.

FIGURE 11.2 Step Function. FIGURE 11.3 Impulse Function.

DSP Building Blocks• Based off of three basic functions:

• Delay• Add• Multiply

• Raw Performance for DSP algorithm is usually by # of ops needed to execute

FIGURE 11.4 Add Function. FIGURE 11.5 Multiply Function.

FIGURE 11.6 Delay Function.

DSP Building Blocks• These two systems in combination can be used to

develop any discrete difference equation

FIGURE 11.7 Feedforward System.

FIGURE 11.8 Feedback System.

Fixed-Point and Floating-Point Implementations• Floating-Point DSP perform Integer Operation

• Dynamic operating range• Fixed-Point DSP perform Integer and Floating

Operation• Fixed range – 16 bit = 65536 max range

• Analog world signals = infinite precision• Floating-point mimic the “infinite” range better

• Easier to implement, avoids rounding and overflow errors• Why not always use Floating-point?

• Cost, Availability, Price, and Performance• Precision Floating Point is good for smaller values but is

poorer at larger values using same number of bits

Single Instruction Multiple Data• SIMD Microarchitecture and Instructions

• One clock cycle for 4 data x(1 instruction)x 1 value• Increase of performance for low-level DSP functions (MAC)

FIGURE 11.10 SIMD Instruction.

Microarchitecture Considerations• Processor Clockspeed• Cache size

• Usually DSP architectures manually partition the memory space in order to reduce number of accesses to external memory• Latency = costly in terms of time and resources

• Intel architectures have large amounts of cache and can overcome the fast/slow memory, however, all memory starts in “far” caches

• Output data should be generated sequentially Accessing memory in a scattered pattern (while using threads) should be avoided

Implementation Options for Intel• Intrinsic• Vectorization• Intel Performance Primitives

Intrinsics and Data Types• C code that calls special built-in compiler capabilities

that map closely to underlying SSE instruction set• Added Data Types

• _m64, _m128, _m128d, _m128i• Intrinsic Operation Types

• Arithmetic (fixed- and floating-point)• Shift• Logical• Compare• Set• Shuffle• Concatenation

Adds four FP values packed into a and b and performs four additions in one instruction

Vectorization• Use compiler to apply vectorization techniques to

loops within data processing iteration looks for opportunities to convert loops from single set to vector-based implementation (so that multiple operands can be operated at the same time)• Like GCC -- >aligned with SIMD instruction set

• Use #pragma directives to guide compiler to avoid overheads such as data dependces

Listing 11.4 Explicitly Don’t Vectorize Loop.

Listing 11.7 Memory Alignment Property and Discarding Assumed Data Dependences.

Vectorization• Comparisons on Performance

• This performance would be vastly different if the memory was not already aligned

Performance Primitives• Intel Libraries – highly optimized implementations for

many different applications (include audio codecs, image processing, data compression, etc…)

• Libraries take full advantage of CPU and SIMD (and most are written for performance)

• Libraries are threaded and can obtain performance gains by parallelizing the algorithm

• Libraries that take advantage are:• Signal Processing – Convolution and correlation, Finite impulse

response (FIR) filter, FIR coefficints generation function, Infinite response filter (IIR), Transforms

• Image Processing• Small Matrices and Realistic Rendering• Cryptography

Finite Impulse Response Filter• FIR filter equation

• Y[n] = a.x[n] + b.x[n-1] + c.x[n-2]

Listing 11.8 FIR Filter C Code Example

Listing 11.9 FIR Using Intel Performance Primitives.

FIR Ex: Intel SSE

• Loop Unrolling to get rid of data dependences

• By changing the data elements, we can reduce the number of times we need to read data

Medical Ultrasound Imaging• Computation intensive

• Needs a significant amount of embedded computational performance

• Same basic algorithmic pattern even though physical configurations, parameters, and functionality are different• Beam forming• Envelope Extraction• Polar-to-Cartesian coordinate translation

FIGURE 11.12 Block Diagram of a Typical Ultrasound Imaging Application.

Envelope Detector

FIGURE 11.15 Block Diagram of the Envelope Detector.

Envelope Detector

FIGURE 11.16 Polar-to-Cartesian Conversion of a Hypothetically Scanned RectangularObject.

Listing 11.11 Code Sample for Envelope Detector.

Performance Results

• Why such a large difference?

Summary• Digital Signal Processing in general-purpose

processors• Extend Processing Capabilities

• Simplifies overall application when platforms require Control, Communications, and General-purpose processing w/DSP

• Many ways to improve an Intel system by implementing special C code, vectorization, and specific libraries

• Performance is greatly enhanced when DSP is implemented properly

ch. 11 digital signal processing using general-purpose processors

Documents

impulse figure

signals usingresistors

acoustic signals

mechanical signals

types of signals

discrete time signals

rangefixedpoint dsp

dsp architectures