an fpga based adaptive viterbi decoder sriram swaminathan russell tessier department of ece...
Post on 21-Dec-2015
220 views
TRANSCRIPT
An FPGA Based Adaptive Viterbi
Decoder
Sriram Swaminathan
Russell Tessier
Department of ECEUniversity of Massachusetts
Amherst
04/18/23
Overview Introduction Objectives Background Adaptive Viterbi Algorithm Architecture and Implementation Issues Results Related Work Summary and Future Work
04/18/23
Introduction A Digital Data Communication System
ChannelEncoder
SinkSource
Decoder
SourceEncoder
ChannelDecoder
Source
Noise
information Bitstream Bitstream with redundancy
Bitstream
Modulator
DeModulator
Convolutionalencoder
Viterbi
04/18/23
Goals Implement Adaptive Viterbi Algorithm
on hardware Constraints
Data rate (or throughput) - 20 Kbps Probability of Error or Bit Error Rate (BER) <
10-5
# of errors / Length of Sequence
Minimize Design-time area
04/18/23
Convolutional Encoder Accepts information bits as a continuous stream Operates on the current b-bit input, where b
ranges from 1 to 6 and some number of immediately preceding b-bit inputs to produce V output bits, V > b
FF FF
+
+
1
0 1
0
0
b =1, V =2
04/18/23
Definitions Constraint Length
Number of successive b-bit groups of information bits for each encoding operation
Denoted by K Code Rate (or) Rate
b/V Typical values
K : 7 Rate : 1/2, 1/3
04/18/23
The Viterbi Algorithm Finds a bit-sequence in the set of all
possible transmitted bit-sequences that most closely resembles the received data.
Maximum likelihood algorithm Each bit received by decoder associated
with a measure of correctness. Practical for short constraint length
convolutional codes
04/18/23
00
10
11
01
0/00
1/11
1/01
1/10
0/01
0/11
1/00
0/10
State diagram State
Encoder memory Branch
k/ij,where i and j
representthe output bitsassociated with input bit k
04/18/23
Trellis Diagram
00
01
10
11
00 00 00
11 1111
11
10
01
10
01
00
10
T=0 T=1 T=2 T=3
ENC IN : 0 1 0ENC OUT : 00 11 10RECEIVED: 00 11 11
Accumulated metric
2+2,3+0 : 3
0+1,3+1 : 1
2+0,3+1 : 2
0+1,3+1 : 1
0 0
3
2
2
3 1
3
0 2
1
K = 3Rate ½
Total number of states = 2K-1
04/18/23
Adaptive Viterbi Algorithm
Motivation Extremely large memory and logic for Viterbi
Algorithm Fewer number of paths retained Reduced memory and computation
Definitions Path – Bit sequence Path metric or cost – Accumulated error metric of a
path Survivor – Path which is retained for the
subsequent time step
04/18/23
Adaptive Viterbi AlgorithmCriterion for path survival
1. A threshold T is introduced such that a path is retained if and only if current path metric is less than dm+T, where dm is the minimum cost among all survivors of the previous time step.
2. The total number of survivors per time step is limited to a critical number called Nmax selected by user.
Only best Nmax paths have to be retained at any
time.
04/18/23
Parameters in the algorithm Constraint length K Truncation length, TL
Rate R Threshold T Maximum # of paths per time Nmax
04/18/23
Influence of Threshold T and Nmax
Threshold T Smaller T, low average # of survivors, increased BER Larger T, high average # of survivors, reduced BER
Nmax Smaller Nmax
Possibility of discarding the best path => high BER Smaller area
Larger Nmax
Reduced BER Larger area
Selection of Nmax and T crucial
04/18/23
Variation of BER with T and Nmax
for K = 9 & 14
K = 9, SNR = 3.1 db, TL=45 K = 14, SNR = 2.5 db, TL=70
T=24Nmax= 41
T=18Nmax= 9
04/18/23
Optimal values of Nmax, T and TL for different K’s
K TL Nmax T 4 20 4 145 25 7 146 30 8 187 35 8 178 40 8 179 45 9 1810 50 21 2011 55 25 2312 60 25 2314 70 41 24
04/18/23
Simplified View of Adaptive Viterbi Decoder
Branch metricgenerator
AddCompare
Select
SurvivorMemory
Logic for di < dm + T
Symbolsfrom channel
Branchmetrics
DecisionBits
Decodedoutput
04/18/23
Survivor Memory
Truncation length
Nmax
Store all possible bit-sequences(paths) before making a decision
Size of memory for Viterbi :
Rows : Nmax
Columns : Truncation Length - (3-5) * K
Two schemes Traceback
Large Latency, small area, low power
Register Exchange Fast, Large area,
large power
04/18/23
Practical Considerations Serial Implementation
Same ACS repeatedly used for all states Small area, Inexpensive Slow, Low throughput (data rate)
Parallel Implementation Each State has its own ACS (2K-1 ACS) Fast, High throughput (data rate) Large area, bottleneck for large K values
04/18/23
Architecture (contd.)
Add
Add
b1
sum1
b2
sum2
di < dm + T
di < dm + T
Countpaths
Count < Nmax
T = T-2
yes
no
Updatememory
yes
yes
Elimination of sorting
04/18/23
FPGA Implementation FPGA can exploit the parallelism Dynamic reconfiguration for
performance enhancement Implementation platform
WildOne-XL FPGA board from Annapolis Microsystems Inc.
2 XC4036 FPGAs, one for user application
Simulation on Virtex XCV1000
04/18/23
Hardware implementationRTL description
in VHDL
HDL Simulation
Synthesis
FPGA Mapping, place
and route
Cadence Affirma tools
Synplicity Synplify Pro
Xilinx Foundation 2.1i
FPGA XC4036XL-08
04/18/23
XC4036XL FPGA Resource utilization
K TL Nmax T 4 20 4 145 25 7 146 30 8 187 35 8 178 40 8 179 45 9 1810 50 21 2011 55 25 2312 60 25 2314 70 41 24
K CLBs LUTs FFs 4i/p 3i/p
6 1206 2081 482 7247 1215 2087 537 7568 1284 2119 654 7889 1296 2213 615 820
4 553 978 196 278 5 1194 2046 340 540
04/18/23
Decoding rate on XC4036 FPGA Overheads
32-bit, 33 MHz PCI bus
Execution of Wildone API using VC++
Slowdown 1.5-2 times
0
50
100
150
200
250
300
350
400
Constraint length, K
Deco
din
g r
ate
in
Kb
ps
no overhead 333.743 164.168 162.273 160.773 143.632 141.141
with overhead 185.994 117.689 116.28 114.231 109.392 107.775
4 5 6 7 8 9
FPGA freq.(MHz) 40.455 20.089 19.857 19.674 17.576 17.316
04/18/23
Issues in Reconfiguration Reconfigurable Units
Number of ACS units (depends on number of survivors) Run-time survivor memory
Reconfiguration types Fine-grained - infeasible Coarse-grained - feasible
Motivation Performance improvement
Tradeoff Small SNR (noisy channel), Large K, slow decoding Large SNR (less noisy channel), Small K, fast decoding Maintain approx. same BER
04/18/23
Coarse-timescale reconfiguration
20.9 % performance improvement over static
100
120
140
160
180
200
3 4 5 6 7 8 9Constraint length K
Decoding rates
(Kbps)
Individual decoding rates w/oreconfigurationAverage decoding rate w/reconfiguration
LessNoisy channel Noisy channel
04/18/23
Coarse-timescale reconfiguration – Experimental Approach
Vary channel noise during transmission Noise changes ~ 250,000 bits or ~1.5
to 2.5 seconds If noise change is detected
Download new decoder configuration content to the FPGA on WildOne board
Reconfiguration overhead ~40 mS PCI bus transfer + Noise change
detection + download bitstream
04/18/23
Comparison with microprocessor
Intel Celeron 366 MHz, 128 MB RAM Speed-up
Up to 7.5X for XC4036 (incl. overheads)
10
100
1000
4 5 6 7 8 9Constraint length K
Decoding rate in
Kbps w/ PCI
overhead
FPGACoprocessorCeleronProcessor