fir filter in fpga

Design & Implementation of anArea Efficient FIR Filter Using FPGA & VHDL

Presented By:

K.Praveen

V.Sathish kumar

A.Yasar Arafath

R.Venkateshwaran

Outline Introduction FIR filter implementation MAG algorithm RSG Multiplier block synthesis algorithm Generation & Presentation of results Overview Advantages Applications Conclusion

Introduction High-speed digital signal processor (DSP) systems are increasingly being implemented on field programmable gate array (FPGA) hardware platforms. The techniques in use are applicable to implementation on application-specific integrated circuit (ASIC) and Structured ASIC technologies, analysis is performed using field programmable gate array (FPGA) hardware. The FPGA platform provides high performance and flexibility with the option to reconfigure and is the technology focused on for the remainder of this paper. Finite impulse response (FIR) digital filters are common DSP functions and are widely used in FPGA implementations. If very high sampling rates are required, full-parallel hardware must be used where every clock edge feeds a new input sample and produces a new output sample.

continued……. Such filters can be implemented on FPGAs using combinations of the

general purpose logic fabric, on-board RAM and embedded arithmetic hardware.

Full-parallel filters cannot share hardware over multiple clock cycles and so tend to occupy large amounts of resource. Hence, efficient implementation of such filters is important to minimize hardware requirement.

When implementing a DSP system on a platform containing dedicated arithmetic blocks, it is normal practice to utilize such blocks as far as possible in preference to any general purpose logic fabric.

FIR Filter A finite impulse response (FIR) filter is a filter structure that

can be used to implement almost any sort of frequency response digitally. An FIR filter is usually implemented by using a series of delays, multipliers, and adders to create the filter's output.

The general formula of FIR filter is

Y(n) =∑ x(k) h(n-k) where k = 0,1,2…..L-1

FIR Filter Add/Shift ImplementationReplacing Constant Multiplication by Multiplier Block

X[n]

+ +

+

+

+ +

y0 y1 yL-1

+

y2

Z-1

+

Z-1 + Z-1 Z-1 + y[n]

MultiplierBlock

DelayBlock

+

x

z-1 +

x

z-1 +

x

z-1+

x

z-1

x

z-1

X [n]

y [n]

h0hL-1 hL-2 hL-3 h1

. . .

FIR Filter Add/Shift ImplementationRegistered Adder at no Additional Cost

Shift/Add Operation

RSG Algorithm Reduced Slice Graph Algorithm algorithm performing

synthesis for low FPGA area. This can be achieved by synthesising low-logic depth multiplier blocks.

Considering above multiplier block diagram for RSG synthesis, that multiplies the input by 3, 13, 21,37 in 2 clock cycles. input is fed to the ‘3’ adder untouched and after being left shifted once (multiplied by 2). Hence, the output of the ‘3’ adder is 2x + x = 3x required. This product can be used as a graph output be fed to the filter summation chain.

Contd.. It generate +ve, odd integers since negative filter

coefficient weightings can be restored at the summation chain by subtracting the +ve equivalent generated by the multiplier block.

Once a multiplier has been synthesised, the multiplier & any additional values used to generate it are placed in a set called ‘Graph Set’. RSG stores data that describes the synthesised graph topology & the mathematical association between the vertices and edges (shifts)

Contd… RSG requires the least FPGA area with the slice and

flip-flop cost. RSG takes the approach , starts with the highest

cost values and simply inserts the ‘best graph’ required for each, ensuring no duplicate adders are created and that adder outputs are shared as far as possible.

As the number of slices reduced compared to MAG algorithm, RSG synthesis is followed to reduce filter complexity in verl large scale integration (VLSI) design.

RSG Multiplier Block Synthesis Algorithm

RSG coding:library IEEE;

use IEEE.STD_LOGIC_1164.ALL;

use IEEE.STD_LOGIC_ARITH.ALL;

use IEEE.STD_LOGIC_SIGNED.ALL;

entity test is

Port ( clk,reset : in std_logic;

xin : in std_logic_vector(3 downto 0);

yout3 : out std_logic_vector(5 downto 0);

yout13: out std_logic_vector(7 downto 0);


yout37: out std_logic_vector(9 downto 0)

);

end test;

architecture bhv of test is

signal add_in,s1 :std_logic_vector(5 downto 0):=(others=>'0');

Contd…

signal add_3,add_21,mul3:std_logic_vector(8 downto 0):=(others=>'0');

signal s:std_logic_vector(8 downto 0):=(others=>'0');

signal sh_5,add_5_0,add_37 ,xin_5 :std_logic_vector(9 downto 0):=(others=>'0');

signal sh_2,add_5:std_logic_vector(6 downto 0):=(others=>'0');

signal sh_22, xin_0 :std_logic_vector(7 downto 0):=(others=>'0');

begin

process(clk,reset)

begin

if reset='1' then

s1<=(others=>'0');

s<=(others=>'0');

yout3<=(others=>'0');

add_5<=(others=>'0');

add_5_0<=(others=>'0');

Contd…elsif clk'event and clk='1' then

s1<= '0' & xin(3 downto 0) & '0' + ("00" & xin );

yout3<=s1;

s<="000" & s1;

add_5<=('0' & xin(3 downto 0) & "00") + ("000" & xin);--7

add_5_0<="000" & add_5;--10

end if;

end process;

process(clk,reset)

begin

if reset='1' then


xin_5<=(others=>'0');


sh_22<=(others=>'0');


Contd..xin_0<=(others=>'0');


elsif clk'event and clk='1' then

yout21<=( s(5 downto 0) & "000") - s;

xin_5<="000000" & xin;

yout37<= xin_5(4 downto 0) & "00000" + add_5_0;--10

sh_22<= s1(5 downto 0) & "00";--8

xin_0<="0000" & xin;--8

yout13<=sh_22 + xin_0;--8

end if;

end process;

end bhv;

MAG algorithm The minimised adder graph (MAG) algorithm was

defined byDempster and Macleod. It generates minimum adder graphs consisting of

shifts,adds&subtracts for implementing constant integer multiplication of individual values up to12-bits.

MAG is required for recent FPGA architectures to further reduce the area requirement of conventional constant coefficient multipliers.

MAG finds numerous graphs for each value that implement the required multiplication with the number of adders/vertices.

Contd..

This algorithm attempts to synthesise all required multiplications by initially placing coefficients of adder cost 1 into the multiplier block

the ‘3’ adder is synthesised first and all other coefficients are built from it. This approach leads to high logic depth blocks and hence pushes up FPGA area requirement.

calculating the number of single-bit full adders required to implement each graph and selecting the graph requiring the least. Minimise VLSI area consumed when implementing the multiplication. MAG also generates a table containing the adder cost of each.

Multiplier block with 4 adders,4 pipeline stages costing 44 slices

MAG Coding:library IEEE;

use IEEE.STD_LOGIC_1164.ALL;

use IEEE.STD_LOGIC_ARITH.ALL;

use IEEE.STD_LOGIC_SIGNED.ALL;

entity mag is

Port ( clk,reset : in std_logic;

xin : in std_logic_vector(3 downto 0);

yout3 : out std_logic_vector(5 downto 0);



yout37: out std_logic_vector(9 downto 0)

);

end mag;

architecture bhv of mag is

signal add_in,s1 :std_logic_vector(5 downto 0):=(others=>'0');

signal add_3,add_21,mul3:std_logic_vector(8 downto 0):=(others=>'0');

Contd..signal s:std_logic_vector(8 downto 0):=(others=>'0');

signal sh_5,add_5_0,add_37 ,xin_5 :std_logic_vector(9 downto 0):=(others=>'0');

signal sh_2,add_5:std_logic_vector(6 downto 0):=(others=>'0');

signal sh_22, xin_0 :std_logic_vector(7 downto 0):=(others=>'0');

begin

process(clk,reset)-----3

begin

if reset='1' then

s1<=(others=>'0');



s1<= '0' & xin(3 downto 0) & '0' + ("00" & xin );

yout3<=s1;

end if;

end process;

Contd..process(clk,reset)----13

begin

if reset='1' then

sh_22<=(others=>'0');



sh_22<= s1(5 downto 0) & "00";--8

xin_0<="0000" & xin;--8

yout13<=sh_22 + xin_0;--8

end if;

end process;

process(clk,reset)----21

begin

if reset='1' then

s<=(others=>'0');

yout21 <=(others=>'0');


Contd..s<="000" & s1;

yout21<=( s(5 downto 0) & "000") - s;

end if;

end process;

process(clk,reset)---37

begin

if reset='1' then

add_5<=(others=>'0');

add_5_0<=(others=>'0');



add_5<=('0' & xin(3 downto 0) & "00") + ("000" & xin);--7

add_5_0<="000" & add_5;--10

xin_5<="000000" & xin;

yout37<= xin_5(4 downto 0) & "00000" + add_5_0;--10

end if;

end process;

end bhv;

Generation & presentation of Results VHDL filters were generated to compare RSG, RAG

and C1, coefficient bitwidth varied from 1 to 20 and filter length fixed at 51taps.

Fig 4a demonstrates that RSG requires less FPGA area in general, although fig 4b shows that RSG uses more adders in all cases,fig 4c confirms that FPGA area is correlated with flip-flop usage,fig 4d RSG is lowest as expected and shift-register usage.

Fig 4e LUT data is correlated with adder usage with RSG using fractionally more LUTs in general.

Contd..Device Utilization of RSG synthesis multiplier block

---------------------------------------------------------------------

Resource Used Available Utilization

----------------------------------------------------------------------

IOs 39 86 45.35%

Function Generators 27 384 7.03%

CLB Slices 25 192 13.02%

Dffs or Latches 50 672 7.44%

Overview…. Minimising multiplier block adder cost has been demonstrated

not to minimise FPGA hardware for full-parallel pipelined FIR filters.

Reducing flip-flop count through minimising multiplier logic depth has instead been shown to yield the lowest area solutions.

The results presented establish a clear area advantage of RSG over prior algorithms for typical filter parameters with comparablemaximum clock rates.

In addition, the industrial relevance of the transposed FIR with multiplier block architecture & the RSG algorithm has been established through comparison with filters implemented using the DA technique.

Contd.. RSG is shown to consume more LUTs which is the primary

cause of reducing RSG area advantage with increasing bit-width.

Maximumclock rates were generally well into the full-parallelrange although for bit-widths 18 and 20, RSG filter performance drops as filter length increases.

RSG filters provide lowest area implementations capable of being clocked at full-parallel rates.

Advantages There is a constant requirement for efficient use of FPGA

resources where occupying less hardware for a given system can yield significant cost-related benefits:

1.Reduced power consumption.

2. Area for additional application functionality.

3. Potential to use a smaller, cheaper FPGA.

(i)No input sample shift registers are required since each sample is fed to each tap simultaneously.

(ii) the pipelined addition chain maps efficiently.

(iii) filter latency is reduced.

(iv) identical tap coefficient magnitudes can share multiplication

hardware because taps receive the input sample

simultaneously.

Applications: Extensive use of FPGAs in computationally intensive

applications such as DSP More available logic resources in current FPGAs Broad applications of FIR filters in multimedia and

Audio communication. Need to efficient design methods to save area/power. Parameterizing FIR filters in VHDL by using required

multiplier block synthesis algorithms. Minimise VLSI area consumed when implementing

the synthesis of multiplier blocks.

Conclusion Presented a multiplierless technique, based on the add and shift

method.

And Common sub expression elimination for low area, low power and high speed implementations of FIR filters.

FIR filters can perform at sample rates greatly exceeding those of a state-of-the-art programmable DSP.

The FPGA solution offers complete flexibility in the design and, by reducing chip count, improves the overall reliability of the system.

fir filter in fpga

Documents