on the integration of opencl into a software defined radio...
TRANSCRIPT
On the Integration of OpenCL into a
Software Defined RadioMichael L Dickens and J Nicholas Laneman
!""#$#%& '(#)*+,#-. )#,'%& #/*(-#-. ,-%(/%+/,— "*0+'%+. 1223
GPUTC’12’US 2012-May-17
Software Defined Radio
2 / 23 GPUTC’12’US 2012-May-17
➡ Real-time signal processing, using a class of radios that can be reprogrammed and reconfigured primarily via software
➡ Digitize as close to the antenna as possible
➡ Still significantly dependent on hardware
SDR Runtime
3 / 23 GPUTC’12’US 2012-May-17
SDRRuntime
Processing Devices
(CPU, GPU, DSP)
WaveformGraph
QuadratureDemodulator
Source
1:L
Demux
Sink
Subfilter #1
Subfilter #L
L-WaySync
Adder...
...
...
Mapping
Buffering
Scheduling
Processing
➡ Surfer Overview
➡ Key Features
➡ NBFM Example
➡ OFDM Demo
➡ Conclusions
➡ Future Work
Overview
4 / 23 GPUTC’12’US 2012-May-17
SDR Layers Model
5 / 23
1
Shark / naCLSalt / naCL
SalineSQuid
Application
23456
Operating System
7
Hardware
Block ProcessingRuntime / Block
ScriptGraphical
Application
Operating SystemHardware
Spf / naCL
GPUTC’12’US 2012-May-17
Surfer Overview
6 / 23
Hardware
Salt: Surfer Operating System and Hardware Abstraction LayernaCL: OpenCL Abstraction Layer
Host Hardware Abstraction Layer
Operating System API, ABI, and Kernel
Surfer
Runtime Kernel(Shark)
SupervisorAPI
Script API(Saline)
Signal Processing
(Spf)Application
API
Graphics API(SQuid)
GPUTC’12’US 2012-May-17
naCL Class Hierarchy
7 / 23 GPUTC’12’US 2012-May-17
naCLPlatform
...
CommandQueue
Context
...
...
......
ProgramSource ProgramSource
MemoryObject
Event ...
Platform
DeviceDevice
Context Context...
CommandQueue
CommandQueue
Program
Program
MemoryObject
Context
Program
...
MemoryObject
MemoryMap ... MemoryMap...CommandQueue
Device
DeviceDevice
Device
Legend...Owns
References
Vector
Class
Platform
Context
Device ... DeviceKernelKernel
Event
1 Signal processing flavors, providing a new form of dynamic runtime reconfiguration and heterogeneous processing
2 Use of thresholds for scheduling, to allow direct control over throughput and latency
2 Use of runtime statistics for modifying system runtime behavior, using a Supervisor
Key Features
8 / 23 GPUTC’12’US 2012-May-17
NBFM Example
9 / 23
Quadrature Demodulator
RateLimiter
Sink Downsampler
Source
➡ Demodulate narrowband FM signal and downsample to typical audio rate
GPUTC’12’US 2012-May-17
Flavors
10 / 23
1 Signal processing flavors, providing a new form of dynamic runtime reconfiguration
Block
FlavorTable
Processing Flavors
GPUVariant 1
GPUVariant 2
State
Glue
Local CPU Generic
PPC CPUOptimized
Intel CPUOptimized
ARM CPUOptimized
GPUTC’12’US 2012-May-17
Thresholds
11 / 23
2 Use of thresholds for scheduling, to allow explicit control over throughput and latency
Block 1
Threshold
Block 2
Threshold
Buffer
To iT
GPUTC’12’US 2012-May-17
Throughput / Latency
12 / 23
1000 2000 3000 40006
7
8
9
10
Proc
essi
ng T
hrou
ghpu
t (M
S/s)
Proc
essi
ng L
aten
cy (m
s)
2000 3000 40000
0.1
0.2
0.3
0.4
Threshold (Items)1000
GPUTC’12’US 2012-May-17
Supervisor / Statistics
13 / 23
3 Use of runtime statistics for modifying system runtime behavior, using a Supervisor
➡ Load
• Host Processor
• Surfer
• Any Surfer Thread
➡ Block Timing
• Queued
• Dequeued
• Processing Start
• Processing End
• Finished
➡ User Defined
• Energy UseGPUTC’12’US 2012-May-17
Data Flow by Thread
14 / 23
Queued X
Legend
Preprocessing
Processing
Postprocessing
Sleeping
Notification o
0
GPU
CPU
0.40.350.3Relative Time (ms)
0.20.150.10.05 0.25
Thread
0
1
GPUTC’12’US 2012-May-17
Data Flow by Thread
14 / 23
Queued X
Legend
Preprocessing
Processing
Postprocessing
Sleeping
Notification o
0
GPU
CPU
0.40.350.3Relative Time (ms)
0.20.150.10.05 0.25
Source to Sink LatencyThread
0
1
GPUTC’12’US 2012-May-17
Data Flow by Block
15 / 23
Legend
Preprocessing
Processing
Postprocessing
Waiting
Notification o
0
Sink
DS
QD
Source0.40.35
Relative Time (ms)0.250.20.150.10.05 0.3
0
1
0
1
Thread
GPUTC’12’US 2012-May-17
Data Flow by Block
15 / 23
Legend
Preprocessing
Processing
Postprocessing
Waiting
Notification o
0
Sink
DS
QD
Source0.40.35
Relative Time (ms)0.250.20.150.10.05 0.3
Source to Sink Latency
Throughput
Latency 331 µs
3 MS/s
0
1
0
1
Thread
GPUTC’12’US 2012-May-17
OFDM Demo
16 / 23
➡ Extended audio loopback
➡ Keyboard-based Supervisor runtime system control
• Rotate through flavors
• Increase and decrease processing threshold
GPUTC’12’US 2012-May-17
OFDM Demo Tx
17 / 23
GNU Radio Style Packet
Encoder
UHD Tx
Surfer Style Packet
EncoderAudio
SourceGNU Radio
Style MapperGNU Radio
Style Preamble Inserter
Inverse FFT
GainCyclic Prefix
Data
Flag
FlavorTable
FFTW3 Optimized
Local CPU Optimized
GPU Variant 1 Optimized
FlavorTable
GPU Generic
Local CPU Generic
FlavorTable
GPU Generic
Local CPU Generic
FFT ShiftFlavorTable
GPU Generic
Local CPU Generic
Local CPU Optimized
GPU Generic
Local CPU Optimized
Local CPU Optimized
GPUTC’12’US 2012-May-17
OFDM Demo Tx
17 / 23
GNU Radio Style Packet
Encoder
UHD Tx
Surfer Style Packet
EncoderAudio
SourceGNU Radio
Style MapperGNU Radio
Style Preamble Inserter
Inverse FFT
GainCyclic Prefix
Data
Flag
FlavorTable
FFTW3 Optimized
Local CPU Optimized
GPU Variant 1 Optimized
FlavorTable
GPU Generic
Local CPU Generic
FlavorTable
GPU Generic
Local CPU Generic
FFT ShiftFlavorTable
GPU Generic
Local CPU Generic
Local CPU Optimized
GPU Generic
Local CPU Optimized
Local CPU Optimized
GPUTC’12’US 2012-May-17
OFDM Demo Rx
18 / 23 GPUTC’12’US 2012-May-17
Default Settings
19 / 23
Legend
Preprocessing
Processing
Postprocessing
Waiting
Notification o
Source to Sink Latency
8.8 MS/sInstantaneous
Throughput
0
Sink
CP
Gain
iFFT
FFT Shift
GR Preamble
GR Mapper
S Packet
Relative Time (ms)
Source 1.41.210.80.60.40.2
GR Packet Latency 1.16 ms
GPUTC’12’US 2012-May-17
GR Packet Encoder
20 / 23
Legend
Preprocessing
Processing
Postprocessing
Waiting
Notification o
Source to Sink Latency
InstantaneousThroughput
Latency
1.4 MS/s
0
Sink
CP
Gain
iFFT
FFT Shift
GR Preamble
GR Mapper
S Packet
Relative Time (ms)
Source 14012010080604020
GR Packet 109 ms
GPUTC’12’US 2012-May-17
FFT Shift
21 / 23
Legend
Preprocessing
Processing
Postprocessing
Waiting
Notification o
1.2 MS/sInstantaneous
Throughput
Latency 1 ms
0
Sink
CP
Gain
iFFT
FFT Shift
GR Preamble
GR Mapper
S Packet
Relative Time (ms)
Source 12108642
GR Packet
Source to Sink Latency
GPUTC’12’US 2012-May-17
➡ Evolving software defined radio runtime to exploit heterogeneous processing
➡ Key Features: Flavors, Thresholds, Statistics and Supervisor
➡ Supervisor can use statistics to set thresholds and flavors, providing runtime system control of throughput / latency, and mapping
Conclusions
22 / 23 GPUTC’12’US 2012-May-17
➡ GUI using Qt (SQuid)
➡ Flavors for CUDA and other processors
➡ Better optimized GPU flavors
➡ Interval / cluster mapping
➡ QoS-based adaptive runtime
➡ Using more embedded graph information
Future Work
23 / 23 GPUTC’12’US 2012-May-17
Thank you!
Questions?
GPUTC’12’US 2012-May-17