alexandru bârleanu, vadim băitoiu and andrei stan

Floating-point to fixed-point code conversion with variable trade-off between

computational complexity and accuracy loss

Alexandru Bârleanu, Vadim Băitoiu and Andrei Stan

Technical University “Gh. Asachi”, Iaşi, Romania

15th International Conference on System Theory, Control and Computing(Joint conference of SINTES15, SACCS11, SIMSIS15)

October 14-16, 2011Sinaia, ROMANIA

1/13

Motivation

• Embedded microprocessors:– No hardware dedicated to floating-point– Limited processing capabilities

• Emulated floating-point arithmetic:– Unnecessary high accuracy– Long execution time

• Fixed-point code written manually:– Error-prone– Important accuracy loss

2/13

Existing work• For FPGA

– The main problem is fractional word-length optimization– The search space grows exponentially with the number of fixed-point

variables– Search techniques (often sophisticated) are necessary:

• Greedy algorithms• Genetic algorithms• Simulated annealing

– Optimization objectives: accuracy loss, area

• For microcontrollers, C language– Existing solutions:

• Fixed-point format is supplied by the user (in annotations, for example)• Fixed-point format is determined through simulations, taking into consideration for

example some accuracy constraints

– Available integer types types in C: only 16/32/64-bit signed/unsigned– Optimization objectives: accuracy loss, number of (scaling) operations

3/13

Problem formulation

The problem is constructed from practical considerations:• Input – a digital filter:

– Filter structure: Direct-Form I– Constant floating-point coefficients– Known input bounds (low/high values)

• Output – ANSI-C integer code:– ideally the result must be the same as if floating-point code

would have been used

4/13

𝑦=∑𝑖=0

𝑛

𝑎𝑖 𝑥 𝑖

Building the dataflow

• Initial state – very long fractional parts– Multiply operators overflow– Add operators have unaligned terms

• Changing the dataflow – making nodes representable in C– Resolving overflows in any operator– Aligning summation terms

Recursive method calls – bottom-up action

5/13

Run-time integer interval: [0; 4 400 000 000]Fractional word-length: 27Datatype: none (using only 16/32 bit integers)Floating-point interval: [0; 32.782...]

Run-time integer interval: [0; 2 200 000 000]Fractional word-length: 26Datatype: unsigned longFloating-point interval: [0; 32.782...]

Example: making node run-time integer interval smaller (scaling)

Dataflow transformation philosophy

At design-time(scaling coefficients)

At run-time(scaling operators)

Loss of accuracy large, because scaling occurs at dataflow sources

small, because scaling occurs close to dataflow root

Run-time operations 0 >0

Overflow avoidance(not optional!)

Run-time integer interval reduction(together with FWL)

Discarding of least significant bits(multiple ways)

6/13

Selecting the optimal dataflow transformation

𝑐𝑜𝑠𝑡=𝑘1∗𝑐𝑜𝑚𝑝𝑙𝑒𝑥𝑖𝑡𝑦+𝑘2∗𝑒𝑟𝑟𝑜𝑟

Size oferror interval

Number ofoperators

Increase or decrease node run-time integer interval

Construct multiple dataflow transformation variants

(alternative dataflow fragments)

Compare candidate dataflow transformation variants

using a linear cost function

Analitycally computed values

Number ofcycles

SQNR loss,error distribution...

Ideal values

7/13

Varying the cost function coefficients (example)

0.010513dB 0.000243dB 0.000025dB 0.000004dB

198 220 239 260

FilterResponse type: bandpassType: FIROrder: 40

Target/CompilationProcessor: ARM Cortex-M3Compiler: IAR C/C++ 5.41 for ARM (Kickstart)Optimizations: medium

SQNR loss Time (cycles)For comparison –the floating-point codetakes 3984-4078 cycles

4 dataflows shown from 18 total found

8/13

Implementation insights

• Language: Java SE 1.6• Techniques: OOP, polymorphism• Analitycal estimation of run-time integer intervals,

dataflow complexity, and node error intervals• Dataflows are transformed using Change instances (not

by copying large dataflow portions and modifying them).– Change instances are invertible (apply/undo)– Change instances can be combined in logical AND and OR

• Dataflow vizualization: dot (graph description language)

9/13

Usage exampleFilter propertiesResponse type: highpassType: FIROrder: 30Designed with: Matlab FDATool

Conversion informationNumber of dataflows produced by varying the cost function coefficients: 158 (18 different)Total transformation time: 2.44s

Performance of fixed-point function #7Distortion (SQNR loss): 3.1e-05dBSpeed test: Device: MSP430F149 Compiler: IAR 5.10 (Kickstart) Compiler opt.: High speed Factor: 11.5

10/13

TestingAccuracy Speed

Compiler Microsoft C++ (Visual Studio 2010)

IAR, gcc

Compiler settings Optimizations: disabled / enabled (low, high, ...)

Processor variant • 8-bit (AVR)• 16-bit (MSP430)• 32-bit (ARM7 Cortex-M3)

Filter properties Type: FIR, IIR (work in progress)Order: 4-80 (FIR)Input interval: [0; 4095], [-4096; 4095], and otherDesign method: random coefficients, Matlab FDATool

Cost function From „low-complexity-low-accuracy” to „high-complexity-high-accuracy”

Code generation From „everything in one expression” (inline) to „every operator variable declared”

11/13

Results12/13

Number of cycles

Speed factor: 3...15(or more if compiler optimizations are applied)

Accuracy loss

SQNR loss:1e-5...1e-1 dB

Floating-point code

Variable trade-offbetween complexity and accuracy

Constant execution time(no jitter – more determinism)

Conclusions

An innovative floating-point to fixed-point conversion method for C language is proposed:

– Very good speed factor is obained (integer code compared with floating-point code).

– Very good accuracy is obtained for FIR filters.– The conversion algorithm is designed to use variable cost functions. It is

possible to specify, for example, that complexity is important and accuracy loss is unimportant when building the integer dataflow.

– The conversion time is very short. This happens because:• Dataflow metrics are estimated analytically• Dataflow nodes have cache information (run-time integer interval, error interval)• The automatic search of dataflows algorithm uses a heuristic to generate as few as

possible identical dataflows

13/13

alexandru bârleanu, vadim băitoiu and andrei stan

Documents

floatingpoint code

fixedpoint code conversion

fixedpoint dataflow

fixedpoint conversion

fixedpoint format

accuracy loss floatingpoint

floatingpoint expressions

integer data types