circular buffering-5 (1)

Upload: rushi-desai

Post on 06-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Circular Buffering-5 (1)

    1/10

    Alignment Issue in Circular Buffering:

    This document is written targeting Texas instruments microcontrollers for signal

    processing specifically TMS320VC5416. The issue is explained in detail in furtherparagraphs.

    While implementing FIR filtering in C on the TMS320VC5416 some function calls

    in the DSPLIB requires that the coefficients and the data be aligned in memory.

    The alignment of the data is understood but what is not understood is the

    alignment required with the filter coefficients. This document explains why

    alignment is required for the filter coefficients in the memory for the DSPLIB

    functions to work. Here the question can be rephrased that why it was chosen to

    put the filter coefficients in a circular buffer instead of a linear buffer. Is it moreefficient?

    First I clarify what exactly we mean by alignment. Alignment in memory here

    means that for a given filter length nh, the coefficients are put into memory with

    starting address with K = log2(l), where 2l>=nh lower bit zeros. e.g. for a filter

    length 5 the starting address in the memory for the filter coefficients must be :

    x x x x x x x x x x x x x 0 0 0

    The Issues are as following:

    1. Why do filter coefficients need circular buffering for the DSPLIB functions towork?

    2. Why do the filter coefficients need memory alignment for the DSPLIBfunctions to work?

    Issues:

    The approach is as follows. We answer the following questions .

    a) Why is circular buffering efficient than linear buffering for the data?a.1) How is FIR filtering implemented?

    a.2) How is it implemented in the hardware?

  • 8/3/2019 Circular Buffering-5 (1)

    2/10

    b) Why is it efficient for the filter coefficients?

    c) Why is alignment needed for DSPLIB functions to work?

    a.1 :

    I first explain FIR processing algorithms and their flaws and problems occurring

    during practical implementations that will shed light on our issues.

    For an FIR filter the impulse response is of the form:

    Here M is the filter order.

    The length of the filter is nh = M+1.

    The output can be obtained by equation:

    (Equation 1)

    length of output = ny

    length of input = nx

    Now to implement above equation there are two types of methods:

    1. Sample Processing: We take the input samples one by one and with eachinput coming we have a sample of output y. Here the filter is implemented

    as state system. Some of sample processing techniques are as follows.

    y Direct form 1y Direct form 2y Canonical formy Cacade form

    2. Block Processing: Here we take a block of input samples and give manyouput samples. Some of block processing methods are as follows.

    y Convolutiony Matrix form

  • 8/3/2019 Circular Buffering-5 (1)

    3/10

    y LTI formy Overlap add block convolution method

    We start with the sample processing methods as they are much easier in realtime

    applications. We focus on Direct form 1.

    The structure to implement the filter through direct form is following.

    fig 1 : direct form 1

    Above structure implements an Mth

    order filter.

    To understand higher level of what is happening we first look at the C

    implementation of above structure and then look at the processor level

    implementation.

    The following code would implement the above structure:

    double fir(M,h,w,x) // usage y = fir(M,h,w,x)

    double *h, *w, x;

    int M;

    {

    int i;

  • 8/3/2019 Circular Buffering-5 (1)

    4/10

    double y = 0 ; //output sample

    K = (L

  • 8/3/2019 Circular Buffering-5 (1)

    5/10

    1. One call of above function returns just one sample of the output. i.e. firstcall will give y0 second call will give y1 and so on.. However with every call

    we must give the function proper input samples to give correct output.

    2. Here the picture suggests as the input samples sitting and the filter slidingalong. This implements Equation 1 but with a change of variables. If wepicture the opposite i.e filter sitting and input samples sliding through, then

    it replicates the Equation 1 and the function.

    a.2 :

    Now we look more closely of how this can be implemented in the

    hardware(controller). Hardware implementation is closely related to the assembly

    language or the instruction set available to us for a particular processor. Here we

    use instruction set of the TMS320VC to understand implemetation.

    All the filter coeficients will be located at a particular location in the memory.

    Here we assume the order of the filter 3 but this can be generalized to any filter

    of order M.

    y0 y1 ym

    Say,

    ...

    0000 1000

    0001 1001

    0002 . 1002

    0003 1003

    xn

    .

    .

    .

    .

    .

    .

    x1

    x0

    xn

    .

    .

    .

    .

    .

    .

    x1

    x0

    h0

    h1

    h2

    h3

  • 8/3/2019 Circular Buffering-5 (1)

    6/10

    Above is the pictorial representation on the convolution. Procedure. Every output

    sample y is result of one fir function call.

    Now in assembly we can use MAC and RPT instructions.

    *********

    Here we look at specific locations in the memory in our case for the filter : 0000 to

    0003 and for the data 1000 to 0003 and perform filtering over those memory

    locations and thus have to move our data in those memory locations constantly.

    So we need continuous moving of input data.

    This is clearly an overhead. The following mechanism can be used to do the same

    thing more efficiently.

    p p1

    A pseudo code is as follows:

    cfir:

    repeat : n

  • 8/3/2019 Circular Buffering-5 (1)

    7/10

    fig 3 : Contents of circular buffer at successive time instants

    .. . .

    1001 . 1001 p

    1000 p 1000

    999 0 | 0

    998 0 | 0

    997 0 0

    n=0 n=1

    fig4: updating of pointer p for one cfir call (n=0, n=1 etc.) and successive cfir

    calls

    xn

    .

    .

    .

    .

    .

    .

    x1

    x0

    xn

    .

    .

    .

    .

    .

    .

    x1

    x0

  • 8/3/2019 Circular Buffering-5 (1)

    8/10

    In this case the data is assumed to be located at a static location in memory. The

    pointer p as shown in fig 3 is pointing at location of x0. For the first call of cfir

    (n=0 in the fig 3), p is pointing at x0. The function results in the successive

    decrements to the pointer (i.e. pointing to memory locations 999, 998, 997)

    resulting in multiplication with 0. When the last time repeat is executed (n=nh),if condition becomes true and the pointer p (pointing to data) is first reset to

    original position (i.e. memory location 1000) is started from i.e. in this case x0and then incremented (so now it points to x1, memory location 1001). For the next

    call (n=1 in fig 3) same thing happens but now when n equals nh and the

    function enter if then pointer p is reset to original position in this case x1 and

    then incremented. Thus pointer wraps around emulating a circular buffer. This is

    clearly more efficient than moving all data every time a filtering operation is

    done. Here putting the data in the circular buffer means that we dont need to

    implement the statement 2 and statement 3. We still have to implementstatement 1 because of the pointer p1.

    This clarifies all parts of a.

    Some comments:

    1. Here although we started we samples processing but if we call above cfirfunction for more than one we effectively are doing block processing.

    2. For TMS320vcXXXX controllers the resetting of the data pointer and theinc for the pointer for the next filtering operation is done in the hardware.

    3.Here the function would be called number of times= ny to get all the output

    samples. After ny calls the pointer to the data values is reset to the first location.

    This explains the need for circular buffering for the data samples. Now we look at

    part b

    b :

    In the above discussion we just looked at what is happening to data samples. Now

    we concentrate on the filter coefficients. For the first call (i.e. n=0) the pointer

  • 8/3/2019 Circular Buffering-5 (1)

    9/10

    p1 would be pointing at h0. Then for the successive calls the pointer will be

    decremented and will point to h1, h2 and h3 successively. This completes one

    filtering operation (i.e. one call of cfir is completed) Now for the next call the to

    the function or next filtering operation for next output sample the pointer p1

    must point to h0 again. Thus it can be seen that putting in the circular buffer alsowould benefit the filter coefficients. Here what we mean by putting in the circular

    buffer is that we will not have to check for statement 4. So if we put both data

    and filter coefficients in the circular buffer then we dont need to execute

    statements 1,2,3 and 4 in the software.

    This clarifies b.

    c :

    To understand why memory alignment would be required in the memory

    specifically for the filter coefficients when circular buffering is used, we take a

    closer look at how would the circular buffering would be implemented in the

    hardware. We assume that the filter coefficients are store in order in contagious

    location in memory.

    Before we look at both the ways we take a note of what would it require for the

    hardware implementation to be successful in the hardware of circular buffering

    for filter coefficients.

    y The filter buffer pointer must reset or come back to the original position(i.e. the starting address or the location of the first filter coefficient) once it

    goes through nh iterations.

    The information available to hardware is:

    1. nh i.e. length of filter.2. Starting address of the filter coefficients

    Now there are two ways we can implement the requirement stated above.

    1. The starting address of the register is stored in a register say R1 and isadded to the length of the filter stored in R2. The result is stored in

    register R3. Contents of R1 are copied to R4 and this acts as the pointer

  • 8/3/2019 Circular Buffering-5 (1)

    10/10

    and is incremented in the cfir . With every increment the contents of the

    register is XORed with R3. If a match is found then R3 is copied to R4.

    Thus a circular buffer is implemented.

    2. For the second method we must ensure the following:

    a. The size of the buffer must be a power of two (2n>nh). The filterlength can be any size.

    b. However, the buffer must be aligned so that the starting address ofthe buffer has n lsb's equal to zero.

    In this case we register L1 contains the length of the filter. After every

    iteration of the loop in cfir the register is decremented. As soon as all

    become zero is reached the pointer is made zero in the n lsbs.

    The TMS320 processors implement the circular buffer in the above-mentioned

    way. Hence we need alignment in the memory.

    Here we dont need to move the data in the memory. The data is static at one

    place it is just the pointer pointing them that is being changed. This is definitely

    more efficient than previous method. The former method is linear buffering and

    the data was assumed to be in linear buffer. And the later method is circular

    buffering in which the data is put in a circular buffer.

    As we see that for one filtering operation we need only those number of input

    samples as the length of the filter.

    So every time fir is called we first have to move input data by one memory

    location and then perform filtering.

    In order that the above code can be implemented in assembly the following

    things need to be done.

    This clarifies c

    Arthur Butz.

    Rushi Desai.