an fpga based adaptive viterbi decoder sriram swaminathan russell tessier department of ece...

Post on 21-Dec-2015

221 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

An FPGA Based Adaptive Viterbi

Decoder

Sriram Swaminathan

Russell Tessier

Department of ECEUniversity of Massachusetts

Amherst

04/18/23

Overview Introduction Objectives Background Adaptive Viterbi Algorithm Architecture and Implementation Issues Results Related Work Summary and Future Work

04/18/23

Introduction A Digital Data Communication System

ChannelEncoder

SinkSource

Decoder

SourceEncoder

ChannelDecoder

Source

Noise

information Bitstream Bitstream with redundancy

Bitstream

Modulator

DeModulator

Convolutionalencoder

Viterbi

04/18/23

Goals Implement Adaptive Viterbi Algorithm

on hardware Constraints

Data rate (or throughput) - 20 Kbps Probability of Error or Bit Error Rate (BER) <

10-5

# of errors / Length of Sequence

Minimize Design-time area

04/18/23

Convolutional Encoder Accepts information bits as a continuous stream Operates on the current b-bit input, where b

ranges from 1 to 6 and some number of immediately preceding b-bit inputs to produce V output bits, V > b

FF FF

+

+

1

0 1

0

0

b =1, V =2

04/18/23

Definitions Constraint Length

Number of successive b-bit groups of information bits for each encoding operation

Denoted by K Code Rate (or) Rate

b/V Typical values

K : 7 Rate : 1/2, 1/3

04/18/23

The Viterbi Algorithm Finds a bit-sequence in the set of all

possible transmitted bit-sequences that most closely resembles the received data.

Maximum likelihood algorithm Each bit received by decoder associated

with a measure of correctness. Practical for short constraint length

convolutional codes

04/18/23

00

10

11

01

0/00

1/11

1/01

1/10

0/01

0/11

1/00

0/10

State diagram State

Encoder memory Branch

k/ij,where i and j

representthe output bitsassociated with input bit k

04/18/23

Trellis Diagram

00

01

10

11

00 00 00

11 1111

11

10

01

10

01

00

10

T=0 T=1 T=2 T=3

ENC IN : 0 1 0ENC OUT : 00 11 10RECEIVED: 00 11 11

Accumulated metric

2+2,3+0 : 3

0+1,3+1 : 1

2+0,3+1 : 2

0+1,3+1 : 1

0 0

3

2

2

3 1

3

0 2

1

K = 3Rate ½

Total number of states = 2K-1

04/18/23

Adaptive Viterbi Algorithm

Motivation Extremely large memory and logic for Viterbi

Algorithm Fewer number of paths retained Reduced memory and computation

Definitions Path – Bit sequence Path metric or cost – Accumulated error metric of a

path Survivor – Path which is retained for the

subsequent time step

04/18/23

Adaptive Viterbi AlgorithmCriterion for path survival

1. A threshold T is introduced such that a path is retained if and only if current path metric is less than dm+T, where dm is the minimum cost among all survivors of the previous time step.

2. The total number of survivors per time step is limited to a critical number called Nmax selected by user.

Only best Nmax paths have to be retained at any

time.

04/18/23

Trellis Diagram for AVA

04/18/23

Parameters in the algorithm Constraint length K Truncation length, TL

Rate R Threshold T Maximum # of paths per time Nmax

04/18/23

Influence of Threshold T and Nmax

Threshold T Smaller T, low average # of survivors, increased BER Larger T, high average # of survivors, reduced BER

Nmax Smaller Nmax

Possibility of discarding the best path => high BER Smaller area

Larger Nmax

Reduced BER Larger area

Selection of Nmax and T crucial

04/18/23

Variation of BER with T and Nmax

for K = 9 & 14

K = 9, SNR = 3.1 db, TL=45 K = 14, SNR = 2.5 db, TL=70

T=24Nmax= 41

T=18Nmax= 9

04/18/23

Optimal values of Nmax, T and TL for different K’s

K TL Nmax T 4 20 4 145 25 7 146 30 8 187 35 8 178 40 8 179 45 9 1810 50 21 2011 55 25 2312 60 25 2314 70 41 24

04/18/23

Simplified View of Adaptive Viterbi Decoder

Branch metricgenerator

AddCompare

Select

SurvivorMemory

Logic for di < dm + T

Symbolsfrom channel

Branchmetrics

DecisionBits

Decodedoutput

04/18/23

Survivor Memory

Truncation length

Nmax

Store all possible bit-sequences(paths) before making a decision

Size of memory for Viterbi :

Rows : Nmax

Columns : Truncation Length - (3-5) * K

Two schemes Traceback

Large Latency, small area, low power

Register Exchange Fast, Large area,

large power

04/18/23

Practical Considerations Serial Implementation

Same ACS repeatedly used for all states Small area, Inexpensive Slow, Low throughput (data rate)

Parallel Implementation Each State has its own ACS (2K-1 ACS) Fast, High throughput (data rate) Large area, bottleneck for large K values

04/18/23

Architecture

04/18/23

Architecture (contd.)

Add

Add

b1

sum1

b2

sum2

di < dm + T

di < dm + T

Countpaths

Count < Nmax

T = T-2

yes

no

Updatememory

yes

yes

Elimination of sorting

04/18/23

System Model

Test-bench

04/18/23

FPGA Implementation FPGA can exploit the parallelism Dynamic reconfiguration for

performance enhancement Implementation platform

WildOne-XL FPGA board from Annapolis Microsystems Inc.

2 XC4036 FPGAs, one for user application

Simulation on Virtex XCV1000

04/18/23

Hardware implementationRTL description

in VHDL

HDL Simulation

Synthesis

FPGA Mapping, place

and route

Cadence Affirma tools

Synplicity Synplify Pro

Xilinx Foundation 2.1i

FPGA XC4036XL-08

04/18/23

XC4036XL FPGA Resource utilization

K TL Nmax T 4 20 4 145 25 7 146 30 8 187 35 8 178 40 8 179 45 9 1810 50 21 2011 55 25 2312 60 25 2314 70 41 24

K CLBs LUTs FFs 4i/p 3i/p

6 1206 2081 482 7247 1215 2087 537 7568 1284 2119 654 7889 1296 2213 615 820

4 553 978 196 278 5 1194 2046 340 540

04/18/23

Decoding rate on XC4036 FPGA Overheads

32-bit, 33 MHz PCI bus

Execution of Wildone API using VC++

Slowdown 1.5-2 times

0

50

100

150

200

250

300

350

400

Constraint length, K

Deco

din

g r

ate

in

Kb

ps

no overhead 333.743 164.168 162.273 160.773 143.632 141.141

with overhead 185.994 117.689 116.28 114.231 109.392 107.775

4 5 6 7 8 9

FPGA freq.(MHz) 40.455 20.089 19.857 19.674 17.576 17.316

04/18/23

Issues in Reconfiguration Reconfigurable Units

Number of ACS units (depends on number of survivors) Run-time survivor memory

Reconfiguration types Fine-grained - infeasible Coarse-grained - feasible

Motivation Performance improvement

Tradeoff Small SNR (noisy channel), Large K, slow decoding Large SNR (less noisy channel), Small K, fast decoding Maintain approx. same BER

04/18/23

Coarse-timescale reconfiguration

20.9 % performance improvement over static

100

120

140

160

180

200

3 4 5 6 7 8 9Constraint length K

Decoding rates

(Kbps)

Individual decoding rates w/oreconfigurationAverage decoding rate w/reconfiguration

LessNoisy channel Noisy channel

04/18/23

Coarse-timescale reconfiguration – Experimental Approach

Vary channel noise during transmission Noise changes ~ 250,000 bits or ~1.5

to 2.5 seconds If noise change is detected

Download new decoder configuration content to the FPGA on WildOne board

Reconfiguration overhead ~40 mS PCI bus transfer + Noise change

detection + download bitstream

04/18/23

Comparison with microprocessor

Intel Celeron 366 MHz, 128 MB RAM Speed-up

Up to 7.5X for XC4036 (incl. overheads)

10

100

1000

4 5 6 7 8 9Constraint length K

Decoding rate in

Kbps w/ PCI

overhead

FPGACoprocessorCeleronProcessor

04/18/23

Conclusions and future work

A new adaptive Viterbi decoder dynamically reconfigurable ~21 % improvement over static Scales linearly

Speed-up up to 7.5X over a microprocessor

Future Research Extend present concept to Power-

aware dynamic reconfiguration

top related