an fpga based adaptive viterbi decoder sriram swaminathan russell tessier department of ece...

31
An FPGA Based Adaptive Viterbi Decoder Sriram Swaminathan Russell Tessier Department of ECE University of Massachusetts Amherst

Post on 21-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

An FPGA Based Adaptive Viterbi

Decoder

Sriram Swaminathan

Russell Tessier

Department of ECEUniversity of Massachusetts

Amherst

04/18/23

Overview Introduction Objectives Background Adaptive Viterbi Algorithm Architecture and Implementation Issues Results Related Work Summary and Future Work

04/18/23

Introduction A Digital Data Communication System

ChannelEncoder

SinkSource

Decoder

SourceEncoder

ChannelDecoder

Source

Noise

information Bitstream Bitstream with redundancy

Bitstream

Modulator

DeModulator

Convolutionalencoder

Viterbi

04/18/23

Goals Implement Adaptive Viterbi Algorithm

on hardware Constraints

Data rate (or throughput) - 20 Kbps Probability of Error or Bit Error Rate (BER) <

10-5

# of errors / Length of Sequence

Minimize Design-time area

04/18/23

Convolutional Encoder Accepts information bits as a continuous stream Operates on the current b-bit input, where b

ranges from 1 to 6 and some number of immediately preceding b-bit inputs to produce V output bits, V > b

FF FF

+

+

1

0 1

0

0

b =1, V =2

04/18/23

Definitions Constraint Length

Number of successive b-bit groups of information bits for each encoding operation

Denoted by K Code Rate (or) Rate

b/V Typical values

K : 7 Rate : 1/2, 1/3

04/18/23

The Viterbi Algorithm Finds a bit-sequence in the set of all

possible transmitted bit-sequences that most closely resembles the received data.

Maximum likelihood algorithm Each bit received by decoder associated

with a measure of correctness. Practical for short constraint length

convolutional codes

04/18/23

00

10

11

01

0/00

1/11

1/01

1/10

0/01

0/11

1/00

0/10

State diagram State

Encoder memory Branch

k/ij,where i and j

representthe output bitsassociated with input bit k

04/18/23

Trellis Diagram

00

01

10

11

00 00 00

11 1111

11

10

01

10

01

00

10

T=0 T=1 T=2 T=3

ENC IN : 0 1 0ENC OUT : 00 11 10RECEIVED: 00 11 11

Accumulated metric

2+2,3+0 : 3

0+1,3+1 : 1

2+0,3+1 : 2

0+1,3+1 : 1

0 0

3

2

2

3 1

3

0 2

1

K = 3Rate ½

Total number of states = 2K-1

04/18/23

Adaptive Viterbi Algorithm

Motivation Extremely large memory and logic for Viterbi

Algorithm Fewer number of paths retained Reduced memory and computation

Definitions Path – Bit sequence Path metric or cost – Accumulated error metric of a

path Survivor – Path which is retained for the

subsequent time step

04/18/23

Adaptive Viterbi AlgorithmCriterion for path survival

1. A threshold T is introduced such that a path is retained if and only if current path metric is less than dm+T, where dm is the minimum cost among all survivors of the previous time step.

2. The total number of survivors per time step is limited to a critical number called Nmax selected by user.

Only best Nmax paths have to be retained at any

time.

04/18/23

Trellis Diagram for AVA

04/18/23

Parameters in the algorithm Constraint length K Truncation length, TL

Rate R Threshold T Maximum # of paths per time Nmax

04/18/23

Influence of Threshold T and Nmax

Threshold T Smaller T, low average # of survivors, increased BER Larger T, high average # of survivors, reduced BER

Nmax Smaller Nmax

Possibility of discarding the best path => high BER Smaller area

Larger Nmax

Reduced BER Larger area

Selection of Nmax and T crucial

04/18/23

Variation of BER with T and Nmax

for K = 9 & 14

K = 9, SNR = 3.1 db, TL=45 K = 14, SNR = 2.5 db, TL=70

T=24Nmax= 41

T=18Nmax= 9

04/18/23

Optimal values of Nmax, T and TL for different K’s

K TL Nmax T 4 20 4 145 25 7 146 30 8 187 35 8 178 40 8 179 45 9 1810 50 21 2011 55 25 2312 60 25 2314 70 41 24

04/18/23

Simplified View of Adaptive Viterbi Decoder

Branch metricgenerator

AddCompare

Select

SurvivorMemory

Logic for di < dm + T

Symbolsfrom channel

Branchmetrics

DecisionBits

Decodedoutput

04/18/23

Survivor Memory

Truncation length

Nmax

Store all possible bit-sequences(paths) before making a decision

Size of memory for Viterbi :

Rows : Nmax

Columns : Truncation Length - (3-5) * K

Two schemes Traceback

Large Latency, small area, low power

Register Exchange Fast, Large area,

large power

04/18/23

Practical Considerations Serial Implementation

Same ACS repeatedly used for all states Small area, Inexpensive Slow, Low throughput (data rate)

Parallel Implementation Each State has its own ACS (2K-1 ACS) Fast, High throughput (data rate) Large area, bottleneck for large K values

04/18/23

Architecture

04/18/23

Architecture (contd.)

Add

Add

b1

sum1

b2

sum2

di < dm + T

di < dm + T

Countpaths

Count < Nmax

T = T-2

yes

no

Updatememory

yes

yes

Elimination of sorting

04/18/23

System Model

Test-bench

04/18/23

FPGA Implementation FPGA can exploit the parallelism Dynamic reconfiguration for

performance enhancement Implementation platform

WildOne-XL FPGA board from Annapolis Microsystems Inc.

2 XC4036 FPGAs, one for user application

Simulation on Virtex XCV1000

04/18/23

Hardware implementationRTL description

in VHDL

HDL Simulation

Synthesis

FPGA Mapping, place

and route

Cadence Affirma tools

Synplicity Synplify Pro

Xilinx Foundation 2.1i

FPGA XC4036XL-08

04/18/23

XC4036XL FPGA Resource utilization

K TL Nmax T 4 20 4 145 25 7 146 30 8 187 35 8 178 40 8 179 45 9 1810 50 21 2011 55 25 2312 60 25 2314 70 41 24

K CLBs LUTs FFs 4i/p 3i/p

6 1206 2081 482 7247 1215 2087 537 7568 1284 2119 654 7889 1296 2213 615 820

4 553 978 196 278 5 1194 2046 340 540

04/18/23

Decoding rate on XC4036 FPGA Overheads

32-bit, 33 MHz PCI bus

Execution of Wildone API using VC++

Slowdown 1.5-2 times

0

50

100

150

200

250

300

350

400

Constraint length, K

Deco

din

g r

ate

in

Kb

ps

no overhead 333.743 164.168 162.273 160.773 143.632 141.141

with overhead 185.994 117.689 116.28 114.231 109.392 107.775

4 5 6 7 8 9

FPGA freq.(MHz) 40.455 20.089 19.857 19.674 17.576 17.316

04/18/23

Issues in Reconfiguration Reconfigurable Units

Number of ACS units (depends on number of survivors) Run-time survivor memory

Reconfiguration types Fine-grained - infeasible Coarse-grained - feasible

Motivation Performance improvement

Tradeoff Small SNR (noisy channel), Large K, slow decoding Large SNR (less noisy channel), Small K, fast decoding Maintain approx. same BER

04/18/23

Coarse-timescale reconfiguration

20.9 % performance improvement over static

100

120

140

160

180

200

3 4 5 6 7 8 9Constraint length K

Decoding rates

(Kbps)

Individual decoding rates w/oreconfigurationAverage decoding rate w/reconfiguration

LessNoisy channel Noisy channel

04/18/23

Coarse-timescale reconfiguration – Experimental Approach

Vary channel noise during transmission Noise changes ~ 250,000 bits or ~1.5

to 2.5 seconds If noise change is detected

Download new decoder configuration content to the FPGA on WildOne board

Reconfiguration overhead ~40 mS PCI bus transfer + Noise change

detection + download bitstream

04/18/23

Comparison with microprocessor

Intel Celeron 366 MHz, 128 MB RAM Speed-up

Up to 7.5X for XC4036 (incl. overheads)

10

100

1000

4 5 6 7 8 9Constraint length K

Decoding rate in

Kbps w/ PCI

overhead

FPGACoprocessorCeleronProcessor

04/18/23

Conclusions and future work

A new adaptive Viterbi decoder dynamically reconfigurable ~21 % improvement over static Scales linearly

Speed-up up to 7.5X over a microprocessor

Future Research Extend present concept to Power-

aware dynamic reconfiguration