university of malaga

CORDIC Based Parallel/Pipelined Architecturefor the Hough Transform

J.D. BrugueraN. GuilT. LangJ. VillalbaE.L. Zapata

January 1996

Technical Report No: UMA-DAC-96/02

Published in:

To appear on the Journal of VLSI Signal Processing

University of MalagaDepartment of Computer ArchitectureC. Tecnologico • PO Box 4114 • E-29080 Malaga • Spain

CORDIC BASED PARALLEL/PIPELINED ARCHITECTURE

FOR THE HOUGH TRANSFORM*

by

J.D. Bruguera1, N. Guil, T. Lang2, J. Villalba and E.L. Zapata

Dept. Arquitectura de Computadores

University of Malaga

Plaza El Ejido. 29013 Málaga. SPAIN

1Dept. Electrónica. Facultad de Física

Univ. Santiago de Compostela

15706 Santiago de Compostela. SPAIN

2Dept. Elect and Compt. Eng.

University of California, Irvine

CA 92717. U.S.A.

Mailing address: Emilio L. Zapata

Dept. Arquitectura de Computadores

University of Málaga

Plaza El Ejido s/n

29013 Málaga

SPAIN

* This work was supported by the Minitry of Education and Science (CICYT) of Spain under proyect

TIC-92-0942.

CORDIC BASED PARALLEL/PIPELINED ARCHITECTURE

FOR THE HOUGH TRANSFORM

Abstract

We present the design of parallel architectures for the computation of the Hough transformbased on application-specific CORDIC processors. The design of the circular CORDIC in rotation modeis simplified by the a priori knowledge of the angles participating in the transform and a high throughputis obtained through a pipelined design combined with the use of redundant arithmetic (carry saveadders in this paper). Saving area is essential to the design of a pipelined CORDIC and can beachieved through the reduction in the number of microrotations and/or the size of the coefficient ROM.To reduce the number of microrotations we incorporate radix 4, when it is possible, or mixed radix(radix 2 and radix 4) in the design of the processor, achieving a reduction by half and 25%microrotations, respectively, with respect to a totally radix 2 implementation. Furthermore, if we allocatetwo circular CORDIC rotators into one processors then the size of the shared coefficient ROM is only50% of the ROM of a design based on two separated rotators. Finally, we have also incorporatedadditional microrotations in order to reduce the scale factor to one. The result is a pipelined architecturewhich can be easily integrated in VLSI technology due to its regularity and modularity.

1.- Introduction

The Hough transform is a powerful technique for the detection of patterns in images [27]. Withthe Hough transform the image space is mapped onto a parameter space so that the detection of aspecific pattern in the image space becomes the detection of peaks in the parameter space. Thistransform has been used with different variants in a wide scope of applications such as straight linerecognition [16], extraction of straight line segments [31], object recognition [25], extraction of planarfigures [42], [43], vectorization of aerial photographs [46], etc.

The high computational cost of the Hough transform has induced a considerable research effortin order to reduce its calculation time. We can identify three directions for the acceleration of theexecution of the Hough transform: design of fast Hough transform algorithms, use of parallelarchitectures and design of specific architectures for the Hough transform. The object of the fast Houghtransform algorithms is to reduce the calculation time and memory requirements. Examples of thisapproach are: Piecewise Linear (PLHT) [33], Combinatorial (CHT) [6], Binary (BHT) [21], Randomized(RHT) [54] and Fast (FHT) [38], [22] Hough Transforms. The most important problem presented bythese fast algorithms is their low regularity and parallelism [22], whereas the traditional algorithm hasa strong implicit parallelism which can be conveniently used in shared memory [11] or distributedmemory multiprocessors (linear array [20]; mesh [44], [5], [19], [3], [13]; hypercube [7], [45]; binary tree[9]).

The algorithm associated with the Hough transform presents a high level of regularity andlocality. These properties make it appropriate for the VLSI design of a specific purpose architecturewhich permits high speed computation. Systolic solutions in this line are the designs by Chuang andLi [12], Li et al [39] and Fontoura and Sandler [21].

One of the main problems of the Hough transform is the evaluation of the implicit trigonometricfunctions in the algorithm (sines and cosines), which are difficult to implement with conventionalarithmetic units. Some authors have proposed the use of tables [5], [24], [40] or the formulation withonly addition and shift operations (loss of precision in the results) [21]. Recently, Timmermann et al [48]have proposed an architecture based on two general CORDIC processors. The alternative we proposein this work consists in the formulation of the Hough transform in terms of rotations and it permits thedesign of a specific purpose CORDIC (Coordinate Rotation DIgital Computer) processor.

The CORDIC algorithm was developed by Volder [51] in 1959 for the calculation of rotationsand the conversion from rectangular to polar coordinates. The only operations necessary for theimplementation of the algorithm are additions, subtractions and shifts. In 1971 Walther [53] generalizedthe algorithm for operation in three coordinate systems: linear, circular and hyperbolic, making theapplications extend to the solution of problems using trigonometric, hyperbolic and arithmetic functions.Several VLSI designs based on the CORDIC algorithm have been developed for applications in linearalgebra, matrix algebra, digital signal processing and image processing [1], [2], [10], [15], [26], [49], [30].When the application requires high computation speeds it is necessary to incorporate pipelining and/orredundant arithmetic to the design [32], [34], [35], [41].

In this article we present the design of a specific purpose processor based on the CORDICalgorithm for the calculation of the Hough transform. The CORDIC processor includes the followingfeatures: pipelined design and use of redundant arithmetic to increase the speed, use of mixed radix(2 and 4) to reduce the number of microrotations, use of additional microoperations to reduce the scalefactor to one. This way the output of the CORDIC will give us directly the memory address in which thevote for the angle being analyzed is located.

The paper has been structured as follows. In section 2 we formulate the Hough transform bymeans of CORDIC rotations. The design of an application specific processor for the computation of thetransform is presented in section 3. We analyze the implementation of the radix 2 and radix 4microrotations of the CORDIC algorithm, we determine the size of the ROM memory for the coefficientsand we generalize the design to arbitrarily sized images. In section 4, we present a parallelimplementation of the Hough transform, starting from a partition of the angle space into severalsubspaces which can be simultaneously processed. Finally, in section 5 we summarize the mainfeatures of the processor and compare it to other designs based on the CORDIC algorithm we foundin the literature.

2.- Hough Transform Processor

We can differentiate three stages in the process of detecting lines by means of the Houghtransform: 1) Creation of the contour of the image using a border detector (Sobel operator, forinstance); 2) Application of the Hough transform to each point of the image; and 3) voting in theparameter space in order to extract the lines. We can additionally include a fourth stage of edge linking.We concentrate here on the second phase, which is also the one with highest algorithmic complexity.

Without losing generality we consider an image space of dimension NxN. The normal equationfor a line than crosses a point (x,y) is

ρ=xcosθ + ysinθ (1)

2

being ρ the perpendicular distance form the line to the coordinate origin and θ the angle defined by theabscissa axis and the normal to the line. The Hough transform of a line produces a set of lines of theparameter space which cross at a point of coordinates (ρ,θ). Also, equation (1) represents the set ofvalues (ρ,θ) of all the possible straight lines crossing the point (x,y) of the space.

In digital image processing we have discretized both the image space (image array) and theparameter space (Hough array, H(m,j)). Therefore, each illuminated point of the image space "votes"over a set of points of the parameter space so that collinear points vote to a common point in theparameter space. The detection of the crossing point of the curves produced by each point of a straightline translates into finding peaks in the parameter space (Hough space).

Hough transform algorithm

1. Initialize the Hough array to zero2. For each pixel (x,y) with gray level equal one,

a) for x,y=[0,1,...,N-1] compute

ρj= xcosθj + ysinθj (2)

where 0≤θj<π, θj=πj/N, j=0,1,...,N-1; x,y = {0,1,...,N-1}.

b) Increment the Hough array

H(m,j)=H(m,j)+1 (3)

where m=[ρj] and [ ] denotes the nearest integer value of its argument.

3. Look for peaks in the Hough array

The number of angles into which the Hough space is divided depends on the desired precisionfor the slope of the straight lines in the image space. The dimension of the image space (N; j=0,1,...,N-1) is usually taken as the number of angles of the parameter space. Figure 1 shows an image space(a) and its Hough transformed space (b).

2.1.- CORDIC based Hough Transform

In order to obtain a CORDIC formulation of the Hough transform algorithm we must rewriteequation (2) as follows:

ρja= xcosθj + ysinθj (4)

ρjb= -xsinθj + ycosθj

where 0≤θj≤π/2, j=0,1,...,N/2-1 and we have applied the symmetry properties of angles which differ byπ/2. ρa represents the set of values of ρ for angles of less than π/2 (θj) whereas ρj

b represents the setof values of ρ for angles which differ by π/2 with respect to the previous ones (θj+π/2). This gives riseto the division of the Hough space into two independent subspaces.

Expression (4) can be directly implemented using a circular CORDIC module in rotation mode[51], [53]. This module performs iterations of the form

xi+1=xi + σi2-iyi (5)

yi+1=yi - σi2-ixi

zi+1=zi - σitg-12-i

3

where θj= σitg-12-i and σi=[-1,1]. Since we know the angles in advance, if the initial conditions are

(x0,y0)=(x,y) and z0=θ and the values of σi are chosen the z-equation can be eliminated. Therefore,

xn= K(x0cosθ + y0sinθ) (6)yn= K(y0cosθ - x0sinθ)

being

n-1

K = (1 + σi 2-2i)½ (7)i=0

which is a constante because σi=[-1,1].

The convergence and precision of the CORDIC algorithm has been widely verified in theliterature [17], [28], [29], [34]. If we compare equations (4) with the CORDIC equations (6) we see thatthe only difference is the scale factor K. Consequently,

ρa= xn/K (8)ρb= yn/K

A scan of the angle θ from 0 to π/2 in the CORDIC algorithm corresponds to a scan from 0to π in the Hough transform. The CORDIC processor performs the scan from 0 to π/2 (equation ρa) andthe scan from π/2 to π (equation ρb) of the transform in parallel. Consequently, the Hough transformprocessor will be a CORDIC rotator with corrected scaling factor equal to one. An alternative CORDICformulation has been proposed in [48] (see section 5).

The implementation of this CORDIC operation can be done in a sequential manner in whichall iterations share the same adders and shifters, or in an unfolded manner where the adders andshifters are replicated. In this second form it is possible to pipeline the implementation for highthroughput. Moreover, the variable shifters are reduced to hardwired connections. Several of thesepipelined implementations have been reported [47],[50]. We here will also use this pipelined approach.

Several techniques can be used to perform the correction of the scale factor K. For this purposewe will incorporate additional iterations (repetition of CORDIC iterations and scaling iterations) [14].

The CORDIC-based scheme described uses a standard CORDIC module in the circular mode.However, for the particular application we consider the following simplifications can be introduced:

1) Because the rotation angles (jπ/N) are know a priori, it is possible to precompute theirdescomposition in terms of σi and to store this descomposition in a ROM. Consecuently, the zrecurrence is not required.

2) Because of 1), no sign detection is needed to force zn to 0, so it is possible to use redundant (carry-free) adders in a simple manner. The utilization of these adders reduces significantly the stage delay.The use of these adders in CORDIC modules has been proposed [34],[50], but significant complicationsexist when sign detection is required.

3) Also because of the lack of sign detection, it is simple to use radix-4 stages to reduce the totalnumber of stage. The use of radix-4 stages presents an important drawback with respect to radix 2because, while in radix 2 the scaling factor K is constant for all angles, facilitating the correction, forradix-4 the scaling factor varies. To maintain a constant scaling factor, the following two approach arepossible:

i) The radix-4 stages have to be restricted to the second half of the iterations where the coefficients σi

4

do not affect the scale factor (see equation (7)). This approach has been used for the SVD and HartleyTransforms [4]. We addapt this approach to the Hough Transform in section 3.

ii) Another possiblility is to classify the set of angles into groups with the same scale factor and designa specific radix-4 CORDIC Processor for each group (see section 4).

3.- Mixed-radix CORDIC processor for the Hough Transform

We now describe a pipelined mixed-radix CORDIC processor for the Hough Transform.

In figure 2 we represent the general structure of a radix 4 (δ=4-i) or radix 2 (δ=2-i) microrotationusing carry-save adders. For radix 4 microrotations the coefficients σi can take the values{-2,-1,0,+1,+2}, which can be coded with three bits (a,b,c). Bit a acts as the sign bit, controlling theaddition/subtraction operations of the CSAs. Because of the carry-save representation, each coordinateis represented by two words, the sum word and the carry word. In block δ we carry out the wired shiftof the operands, whereas in blocks 2 we perform the shift to the left over the carry words, inherent tothe addition with CSAs. In blocks 2 bit a is the input LSB to the shifters. In a radix 2 microrotation thecoefficients σi are coded with only one bit, a ((b,c)=(0,1) in figure 2, which selects inputs xi and yi), sothat a=1 if σi=+1 and a=0 when σi=-1. Since bits b and c are constants, we can eliminate themultiplexors and these control bits/signals of figure 2 for radix 2 microrotations. The scalingmicrorotations required for correction of the scale factor have the same structure as a radix 2microrotation.

For an image of dimensions NxN the range of values (in pixel units) ρ can take in (4) is

0 ≤ ρa ≤ N√2 (9)-N ≤ ρb ≤ +N

If n = log2N we need n+1 bits for coding the different values of ρ in each one of the Houghsubspaces. We can design the CORDIC processor with n+2 bit precision by generating ρa and ρb inparallel. Therefore, a pure radix-2 CORDIC processor would need n+2 microrotations. However, themixed radix (2 and 4) CORDIC processor will have a total of 3(n+2)/4 microrotations, out of which(n+2)/2 microrotations are radix 2 and (n+2)/4 are radix 4.

In figure 3 we present the global diagram of the unit for the calculation of the transformationof a pixel of coordinates (x,y). The CORDIC processor generates the N/2 values of ρa

i and the N/2values of ρb

i corresponding to the two independent subspaces of the Hough space. The coefficients ofthe decomposition of the angle θi (i=0,1,...,N/2-1) are extracted from the coefficient ROM. The addressgeneration module calculates the absolute address over which the votations in each subspace will beperformed (see equation (3)).

The number of rotation angles depends on the size of the image. For NxN images, N anglesare used, 0≤θ≤π, but it is only necessary to store in the ROM memory the decomposition of N/2 angles,0≤θ≤π/2 (see equation (4)). In principle, we need (n+2)/2 + 3(n+2)/4 bits to encode the angles, one bitfor the radix 2 coefficients (σi=±1) and three bits for the radix 4 coefficients (σi=±2,±1,0). However, bymaking a suitable selection of the decomposition angles of the transform we can restrict to four thevalues of the radix 4 coefficients of most components. This way, we only need two bits, a and d. Thecoding chosen in this case is the following:

5

a d σi b c

0 0 0 0 0

0 1 1 0 1

1 1 -1 0 1

1 0 -2 1 0

Apart from the microrotations of the CORDIC algorithm, it is necessary to introduce additionalmicrorotations for scaling, positive ("+") and/or negative ("-"), and repetitions ("R") for the reduction ofthe scale factor of the CORDIC algorithm to 1. In Table I we show the additional microrotations (+,-,R)for 1024x1024 images (K=1.646492 for 12 bit precision). In the Table we show several possibilitiesand we specify the final scale factor and the value of the maximum angle possible for each. In everycase, the maximum error for θ and ρ is 0.028 and 0.000488, respectively. On the other hand, whenintroducing a repetition in the microrotation 1 it is possible to eliminate microrotation 0 without affectingthe precision and convergence of the CORDIC algorithm. Therefore, the number of radix 2microrotations is reduced to five, with shifts from 2-1 to 2-5.

We observe that in three cases, options 2, 3 and 4 of table I, is the range of angles of interestof the CORDIC formulation of the Hough transform from 0 to π/2 covered. The first option of the tablerequires fewer additional microrotations for the compensation of the scale factor, but it is impossibleto reach all the angles required.

Summarizing, the structure of the CORDIC processor for images of size NxN (N≤1024) consistsof thirteen microrotations, out of which five are radix 2 microrotations, three are radix 4 microrotationsand we have incorporated the five additional microrotations specified in row three of table I, repetitionsin microrotations 1 and 2, negative scaling in 2, 8 and 9. The shift associated with the i-th microrotationis 2-i (radix 2) or 4-j (radix 4), with j=(n+2)/4+(imod(n+2)/2). As each microrotation is associated with adifferent stage of the pipeline, and each microrotation has a fixed shift assigned to it, they are constantand can be wired in, eliminating, this way, one of the main difficulties of the non pipelined CORDICimplementations.

The coefficients into which the angles are decomposed in the Hough space are stored in theROM memory. Each memory word, therefore, contains the coefficients, σi, of all the microrotations,radix 2, radix 4 and additional applied to input pixel (x,y). For each coefficient to act in the appropriatecycle, the memory word must propagate through the pipeline, accompanying pixel (x,y). It is necessaryto add hardware dedicated to the propagation of the decomposition of the angle throughout the pipeline.This hardware is a set of latches whose number decreases along the pipeline, as illustrated in figure3. As an example, in table II we show the decomposition of the first 16 rotation angles for a Houghspace of 1024 angles and 12 bit precision.

Taking this considerations into account together with the rest of the design conditions,elimination of the zero microrotation and selection of the radix 4 coefficients, for a 1024x1024 imageit would be necessary to have a 512 word coefficient ROM memory with a word width of 14 bits, where7 bits are necessary for coding the microrotations in radix 2 (included the two repetition stages) and

6

7 bits are needed for coding the three radix 4 microrotations.

After the last microrotation it is necessary to introduce a stage for the conversion fromredundant representation to conventional representation in order to obtain the values ρa and ρb in theoutput of the CORDIC (see figure 3). This stage consists of two adder with carry propagation whichperforms the addition of the sum and carry, x and cx as well as y and cy in order to obtain a nonredundant representation. This adder must be pipelined for it not to affect the global performance ofthe processor.

Finally, the architecture developed for processing NxN images with N=2n is applicable to imagesof smaller sizes, MxM being M=2n-k, due to the fact that the angles we must process for a MxM imageis a subset of the angles necessary for a NxN image and the precision required for parameter ρ islower. Consequently, it is necessary to include a stepwise addressing mechanism of the ROM memory[4].

4.- Parallel Hough Transform

The computation of the Hough transform of an NxN image with a single CORDIC processorrequires N3/2 cycles, as for each of the N2 pixels of the image it is necessary to implement equations(4) sequentially for N/2 angles. In each evaluation of equations (4) two values for parameter ρ areobtained and they produce a Hough space divided into two subspaces. Therefore, the votations takeplace in independent spaces. ρj

a votes over the Hough subspace H(m,j) with j=0,1,...,N/2-1 and ρjb over

the subspace with j=N/2,...,N-1, so that no votation conflict exist.

The computation time for the transform can be reduced by means of parallelism. Threepossible approaches to obtain this parallelism exist, namely, we can parallelize the pixels of the image,the angle θ or the pixels and the angle simultaneously. The latter requires N3/2 CORDIC processors,one processor per pixel per angle. The evaluation of the transform takes only the time of one CORDICoperation (n cycles for radix 2, n/2+n/4 cycles for mixed radix 2-4 and n/2 for radix 4), but thenecessary hardware is considerably increased. Also, conflicts occur in the votation process becausethe results obtained by the processors with the same angle θi can vote over the same element of theHough space (common Hough space).

The introduction of parallelism only in the pixels requires N2 processors, one for each pixel.The number of CORDIC operations is N/2 + latency (the latency is depending of the radix, see previousparagraph). In this case there are also conflicts in the votation, because the processors share acommon Hough space.

Moreover, in both cases, as we paralize the pixels of the image and only pixels with gray levelequal one are processed, a unbalanced workload among processors may occur.

Finally, a solution that does not produce voting conflicts is the parallelization of the angles. Inthis case, we need one CORDIC processor for each angle, in which all the pixels of the image areprocessed sequentially. The total number of processors is N/2 and the number of cycles for theevaluation of the transform is N2+latency, one pixel is processed in each cycle. Moreover, theimplementation of the processor is simplified because of the following factors:

i) As each CORDIC processor has to compensate only one scale factor, we can implement

7

it using a full radix-4 CORDIC, which reduces the required number of microrotations.

ii) Since only one angle is processed per processor, its decomposition can be wired-in,eliminating the need for a decomposition table and for the multiplexers.

In order to reduce the number of processors required in this solution, it is possible to group theangles in such a manner that angles producing the same scale factor are assigned to the sameprocessor. In this way, the processors can be implemented fully in radix 4, and a small ROM memoryis needed to store the decomposition of the assigned angles. The number of processors is the numberof different scale factors. We develop this scheme below.

We complement this with the design of a double CORDIC processor, in which the values of ρcorresponding to four angles are simultaneously evaluated. If we consider the set of angles of theparameter space divided into four subsets, {θ, 0 ≤ θ < π/4 }, {π/2-θ, 0 ≤ θ < π/4 }, {π/2+θ, 0 ≤ θ < π/4} and {π-θ, 0 ≤ θ < π/4 } and we apply equations (4), we obtain the following relations,

ρja= xcosθj + ysinθj

ρjb= -xsinθj + ycosθj

ρjc= xsinθj + ycosθj

(10)

ρjd= -xcosθj + ysinθj

with 0 ≤ θj < π/4, j=0,1,...,N/4-1. In this case ρa represents the set of values taken by ρ for the anglesbetween 0 and π/4, ρc are the values of ρ between π/4 and π/2, ρb between π/2 and 3π/4 and ρd

between 3π/4 and π. Equations (10) can be implemented by means of two circular CORDICs in rotationmode. Furthermore, this implies a reduction in the size of the ROM by half, since the angles scannedgo from 0 to π/4 and both sets of equations in (10) share the same ROM.

4.1.- Angle Parallelization with radix-4 CORDIC

A possibility to design the angle parallelization scheme with radix-4 processors is to have asmany processors as scaling factors and to assign to each processor the angles that produce thatparticular scale factor. However, if σi = {0,±1,±2}, for a precision of n bits, the number of scale factorsis 2 3n/4, which is large and therefore requires many processors. The number of scale factors and thenumber of scaling stages needed for its compensation can be reduced by the combination of thefollowing approaches:

1) Perform the compensation of individual stages by one scaling stage. This consists inapproximating the contribution of the microrotation to the scale factor by a linear function andcompensate it with a single scaling [50]. The contribution of microrotation i to the scaling factor is

with X=(σi4-i)2. This can be approximed by a linear function, K–

i1 = 1-½X, when

(11)K 1i

1

1 X1 1

2X 3

8X2 ...

(12)38

[(σi4i)2]2 < 2 n 1

8

In the worst case (σi=±2) it is obtained

As an example, for a precision of n=12 bits, the compensation of individual stages by scaling

(13)i > 18

(n log23)

may be introduced for i≥2, in microrotations 2 and 3. The remaining compensation is determined by σ0

and σ1, and the number of scale factors has been reduced to 9.

2) Use the scheme that divides the space into four subspaces, as described by expression(10). The corresponding iterations are

xi+1=xi + σi4-iyi

yi+1=yi - σi4-ixi (14)

x’i+1=y’i + σi4-ix’i

y’i+1=-x’i + σi4-iy’i

The division of the space of angles into four subspaces (eqs. (10) and (14)) limits the range of anglesto the subset { 0,...,π/4 }. As seen below, this facilitates the selection of a reduced number of scalefactors.

Moreover, the division of the space into four subspaces facilitates the use elementary anglestan-1(½σi4

-i) instead of tan-1(σi4-i). This is advantageous because it modifies the angles of the first

iteration from 45° (σ0=1) y 63.4° (σ0=2) to 26.5° and 45°, which produces a better utilization when therange of angles is π/4. Now stage i=n/4 does not influence the scaling factor, so that it does not needcompensation. This approach is only possible if the angles of the transform range from 0 to π/4, sincewhen the angles range from 0 to π/2, the whole range is not covered by new elementary angles. Theiterations now are:

xi+1=xi + ½σi4-iyi

yi+1=yi - ½σi4-ixi (15)

x’i+1=y’i + ½σi4-ix’i

y’i+1=-x’i + ½σi4-iy’i

Using these elementary angles, equation (13) is modified. Now, compensation of individualstages may be introduced for

3) Since the angles have several possible decomposition, we can choose decompositions that minimize

(16)i > 18

(n log23

16)

the number of scale factors.

We now describe an example of how these approaches reduce the number of scale factors andthe number of scaling stages for its compensation. For a precision of n=12 bits, the initial number ofscale factors, 2 3n/4, is 54. Using elementary angles tan-1(½σi4

-i), the compensation of individual stagesby scaling may be introduced for i≥2 (see eq. (16)). Within this precision and with this elementaryangles, only stages 0,1 and 2 influence the scaling factor, so only stage 2 need to be compensated(stage 3 does not influence the scaling factor due to the use of the new elementary angles). Thecompensation for stage 2 can be performed by:

9

and the remaining compensation corresponds to 9 scale factors.

(17)(( 12

σi4i)2 1)

12 1 1

2( 12

σ242)2 1 σ2

22 11

Table III displays the scaling stages necessary for each of the nine scaling factors. We specifythe values of σ0 and σ1, which determine the scale factor, together with the type of scaling applied andthe resulting scale factor. We have not included repetitions because they do not reduce the resultingnumber of stages. On the other hand, in order to maintain a full radix-4 CORDIC design and to reducethe number of processors in a parallel implementation, we have to group all the angles so that all thatare in the same subset have decompositions producing the same scale factor. In this way, we canchoose the angle decompositions which generate the smaller number of scale factors.

Table IV displays the angular coverings obtained by each scale factor. Analyzing table IV andthe scaling factors of table III we can choose those factors that lead to the minimum number of scalingstages (three): factors that are characterized by σ0σ1 =(00,01,10,11,12). As a consequence, it ispossible to select just five scaling factors for all angles. With these five scale factors we cover the entirerange of angles. Observe that in reality only four factors are different from unity due to the fact the firstone always unity.

Therefore, the implementation of the Hough transform with the angle parallelization schemeonly needs five CORDIC processors. Each processor has assigned angles producing the same scalefactor. With a full radix-4 design, the number of microrotations is n/4 plus the microrotations and scalingstages needed to compensate the assigned scale factor.

However, this kind of solution presents several problems. First, a parallel system with as manyprocessors as different scale factors generates a significant unbalanced workload among processors,because each processors has assigned a different number of angles. Examining the angular coveringsof the scale factors in table IV, it is easy to verify that the angle subsets associated to each scale factorare differents.

On the other hand, the total number of stages (standard radix-4 microrotations and scalingstages) is different in each processor. All processors have the same number of radix-4 microrotations,but the number of scaling stages for the scale factor compensation is different (see table III). With n=12,the number of scaling stages ranges from 0, for σ0σ1 = (00), to 3, for σ0σ1 = (10,11,12). In this way,each processor has a different operation time. This makes difficult the synchronization amongprocessors.

Finally, this solution is not flexible. The modification of the number of processor is not direct.In order to increase the parallelism, incorporating more processors, it is needed that angles generatinga scale factor be assigned to several processors. In this way, the new processors must have the scalingstages to compensate a determined scale factor. To implement the Hough transform with less than fiveprocessors requires that some processors perform the compensation of several scale factors.

With this solution, the resulting parallel system is non homogeneous, with processors havingdifferent latencies and hardware structures, and not flexible.

10

Splitting of the angle space

The solution we consider more adequate consists to make use of equal radix 4 cordicprocessors. The processors must incorporate the scaling stages necessary to compensate five differentscale factors. In this way, in a parallel system with m processors, the set of angles of the Houghtransform is split into m disjoint subsets, with N/4m angles each one, so that each processor hasassociated a different subset. In this case, the computation of the Hough transform of a NxN imagerequires (N3/4m)+latency cycles, as for each pixel of the image equations (15) are implementedsequentially for N/4m angles. No voting conflict exists, because, as the processors have assigneddifferent angles, voting is performed over different elements of the Hough space.

To increase the parallelism of the system only requires to incorporate more processors. Theonly difference among processors is in the ROM memory where coefficients σi are stored. When wehave five processors, the differences with the previous solution (to assign the angles generating thesame scale factor to one processor), are that, in this case, all processors have the same number ofstages, larger than the number of stages of the processors in the previous solution. On the other hand,now, the problems of unbalanced workload and different operation times are avoided.

The flexibility and high regularity of this solution makes easy its VLSI implementation.

Considering 12-bit precision, the resulting radix-4 CORDIC processor has ten stages: sixstandard radix-4 iteration stages, one for the compensation of the scaling factor of stage i=2 and threefor performing the scaling. The standard radix-4 iteration stages follow equations (15), needing 3 bitsin order to code σi (σi1,σi2,σi3), which implies 18 bits in the ROM for each angle. The compensation ofthe scale factor of stage i=2 has the appearance of a scaling stage, performing the multiplication ofeach coordinate times (1-|σ2|

22-11) (see equation (17)). This compensation does not add any bits to theROM because it uses the same coefficient σ2 as the stage. The scaling needed in order to compensatethe four scaling factors selected appear in table III. Each of the three scaling stages can becharacterized by just 2 bits σi=(σi1,σi2) (see right side of Table III). This adds 6 bits, resulting in a totalof 24 bits per angle.

Figure 4.a and 4.b display the general design of a radix 4 stage and of a scaling stagerespectively. We have specified three control signals (c1,c2,c3). Table V establishes the equivalencebetween the control signals and the coding of the coefficients carried out in each stage. It also showsthe shifts (A and B in figures 4) for each of the stages. c3 determines the type of operation (additionor subtraction). c1 and c2 select the shift to be applied. The scaling is always negative for the stagesidentified by i=6,7 in the table. We code this fact by setting c3=1 (subtraction operation). In the lastscaling stage (i=8) the control signals c2 and c3 have the same value in order to obtain a positivescaling. The right size of Table III reflects a possible coding (not the only one) for the coefficients σij,with i=6,7,8 and j=1,2.

It is possible to reduce the latency time if we carry out the parallel compensation of scale factorusing the double rotation method proposed in [52]. This method works in radix-2 and mixed radix andit does not need the addition of scaling iterations. Basically, to rotate an angle θ we have to carry outtwo parallel rotations: a rotation of an angle θ+ß and a rotation of an angle θ-ß, where ß=cos-1(K-1).Finally, we make two parallel sums with the values obtained by both rotators. To apply this method tofull radix-4 CORDIC we can add a code associated to the scale factor involved to each angle of theROM . Therefore, the total number of iterations can be reduced to n/4+2, but doubling the hardware.

Finally, it is possible to save 50% of the hardware if we multiplex the inputs to the processor,

11

given the high level of symmetry and independence found in figure 4. Obviously the number of anglesprocessed per unit time will be halved.

5. Comparison and conclusions

In this paper we have presented the design of parallel architectures for the computation of theHough transform based on application-specific CORDIC processors. Although this transform has asignificant and natural parallelism, only those parallel architectures which maintain independent Houghsubspaces are of interest, because the voting process is free of conflicts.

The parallel architecture we propose is based in the division of the Hough space intoindependent angles subspaces, provided that each subspace is assigned to a different processor. Inthis way, we obtain a parallel system, without conflicts in the voting process, highly regular, becauseit is only based on CORDIC rotators, maintaining the Duda and Hart parametrization (equations (4) and(10)). This regularity simplifies the design and facilitates the VLSI integration.

The application-specific CORDIC processors have the following characteristics: pipelining,redundant arithmetic, compensation of the scale factor, decomposition of the angles and use of fullradix 4 or mixed radix.

Now we concentrate on the evaluation of the parallel system we propose, comparingit withother solutions that appeared recently in the literature [39],[48],[49].

Li et al. [39] present a systolic architecture that processes one row (or column) of pixelsconcurrently. By means of the reformulation of equation (1), parameter ρ depends only on the cosθ (noton sinθ). They use a table to store the cosine of the angles. In this way, for a given value of θ, ρ canbe computed by simple addition. The basic systolic architecture consists of a linear array of N computecells, where ρ is obtained, a routing network and a linear array of 2N accumulator cells. The routingnetwork, composed of a 2NxN array of routing cells, sends the pixels with a given ρ to thecorresponding accumulator. The systolic array processor computes the Hough transform for a particularvalue of θ.

The hardware complexity of the systolic array processor grows linearly with the size N of theimage. In this way, when N is large, the complexity of the systolic array is very high. In this case, it isnecessary to partition the image into blocks, where each block is processed in a different array.Moreover, the computation of the Hough transform with this systolic array requires the processing ofall the pixels of the image, whereas, in the solution proposed in this paper, only pixels with gray levelequal to one are processed.

Timmermann et al [48] propose another CORDIC-based processor design. In this solution,equation (1) is reformulated as

ρ = (x2+y2)1/2 sin (θ ± tan-1(x/y)) (18)

with (x2+y2)1/2 being the amplitude of sinusoidal curves with phase shifts of tan-1(x/y) and 0≤θ<2π. Theyalso define the center of the coordinate system to be in the middle of the image space (x,y) andsubdivide it into eight octants, facilitating the simultaneous evaluation of eight points of the Hough spacefor each pixel of the image space.

With this reformulation of equation (1) they propose an architecture based on two general radix-2 CORDIC processors: rotator and vectorizer. The vectorizer computes tan-1(x/y) and the rotator

12

computes equation (18). Its more significant features are: a) with some extra adders this architectureevaluates simultaneously until eight different angles for each pixel; b) it is suitable for on-line CORDICimplementation; c) a parallel system based on equation (18) does not maintain independent Houghsubspaces.

In a full radix 2 implementation the design proposed in [48] (eight angles in the interval(0≤θ<2π) and our design of equations (10) (four angles in the interval (0≤θ<π) consume the samenumber of iterations. The use of mixed radix or full radix 4 reduces the number of iterations. Thearchitecture proposed in [48] does not consider the advantages of the design of application-specificpipelined CORDIC processors, where it is possible eliminate the z-equation (see equation (5)) if theangle decomposition is known a priori. The elimination of the z-equation combined with a radix 4 ormixed radix design reduces area significantly.

The use of a full radix-4 or mixed radix-2 and radix-4 CORDIC rotator permits the reductionof the total hardware cost of the pipelined CORDIC processor, by means of a reduction of the totalnumber of microrotations. The amount of reduction in the number of stages using mixed radix directlydepends on the number of precision bits required for the values of ρ. In Table VI we show thereduction obtained with the design of the mixed radix CORDIC processor with respect to radix-2implementations, for several image sizes (NxN, n=log2N). In order to obtain a constant scale factor itis necessary that the number of radix 2 stages be greater than or equal to half the number of precisionbits; in this way, we insure that the radix 4 coefficients do not participate in the final value of the scalefactor. Also, whenever a radix 4 stage provides the same precision to the final result as a radix 2 stage,it is convenient to use the latter because it implies a lower hardware cost. As can be observed in thetable, the images where the percentage of reduction in the number of stages is larger are those inwhich the number of precision bits required for ρ is a multiple of 4. This permits the maximum use ofthe radix 4 stages.

The a priori knowledge of the angles that are going to be processed in the CORDIC (mixedradix and radix 4) allows us to know their decomposition in terms of the elementary angles. This allowsus to eliminate the recurrence associated with the z coordinate, which simplifies the design of thecontrol section of the processor. This section is made up of a ROM memory for storing thedecomposition coefficients of the angles and the necessary hardware for addressing this memory. Onthe other hand, it facilitates the incorporation of redundant arithmetic (CSA o signed-digit) in themicrorotations, as it is not necessary to check the sign of the operands in order to determine thecoefficients. The substitution of the ripple carry or look-ahead carry adders by redundant adders (CSA)accelerates the operation of the stages of the pipeline.

Finally, we have presented a parallel architecture based on full radix 4 CORDIC processors.The radix 4 CORDIC processor reduces by half the number of microrotations of the radix 2 CORDICdesign. In order to simplify the compensation of the scale factor, the number of radix 4 scale factorshas been reduced. This reduction is obtained by reducing the range of the transform angles, modifyingthe elementary microrotation angles, performing the compensation of individual microrotations andselecting the transform angle decompositions. In this way, each processor can compensate all radix4 scale factors with some additional hardware.

An additional reduction in area and more parallelism is obtained if we combine two circularradix 4 CORDIC rotators in the same processor (see equation (15)), because both rotators share thesame ROM. Its size is half of the ROM of a single CORDIC processor.

13

References

[1] H.M. Ahmed. "Efficient Elementary Function Generation with Multipliers". 8th IEEE Int’l Symp.on Computer Arithmetic, pp. 52-59, 1989.

[2] H. M. Ahmed, "Signal Processing Algorithms and Architectures". Ph. D. Dissertation, Dept. ofElectrical Engineering, Stanford University, Stanford, C. A., 1982.

[3] M.G. Albanesi. "Time complexity evaluation of algorithms for the Hough transform on meshconnected computers". IEEE Int’l Conf. CompEuro’91, pp. 253-257, 1991.

[4] F. Argüello, J.D. Bruguera, R. Doallo, T. Lang and E.L. Zapata. "CORDIC Based ApplicationSpecific Processor for Orthogonal Transforms". (submitted)

[5] D. Ben-Tzvi, A. Naovi and M. Sandler. "Synchronous Multiprocessor Implementation of theHough Transform". J. Computer Vision, Graphics and Image Processing, Vol. 52, pp. 437-446,1990.

[6] D. Ben-Tzvi and M. Sandler. "A combinatorial Hough transform". J. Pattern Recognition Letters,Vol. 11, pp. 167-174, 1990.

[7] K.W. Bowyer. "Computing the Hough transform on an MIMD hypercube". 6th ScandinavianConf. on Image Analysis, Vol. 2, pp. 1172-1181, 1989.

[8] J. Bruguera, E. Antelo and E.L. Zapata. "Design of a pipelined radix 4 CORDIC processor". J.Parallel Computing, Vol. 19, pp. 729-744, 1993.

[9] X. Cao, F. Deravi and M.G. Rodd. "Parallel implementation of the tuned generalized Houghtransform on transputer networks". In Application of Transputers I (IOS Press, London), pp.113-121, 1990.

[10] J.R. Cavallaro and F.T. Luk, "CORDIC Arithmetic for an SVD Processor". J. Parallel andDistributed Computing, Vol. 5, pp. 271-290, 1988.

[11] A.N. Choudhary and R. Ponnusamy. "Implementation and Evaluation of Hough TransformAlgorithms on a Shared-Memory Multiprocessor". J. Parallel and Distributed Computing, Vol.12, pp. 178-188, 1991.

[12] H.Y.H. Chuang and C.C. Li. "A systolic processor for straight line detection by modified Houghtransform". IEEE Conf. on Computer Architecture for Pattern Analysis and Image DatabaseManagement, pp. 300-304, 1985.

[13] R.E. Cypher, J.L.C. Sanz and L. Snyder. "The Hough transform has O(N) complexity on NxNmesh connected computers". SIAM Journal of Computing, Vol. 19, No. 5, pp. 805-820, 1990.

[14] J.M. Delosme, "VLSI Implementation of Rotations in Pseudo Euclidean Spaces". IEEE Int’lConf. Acoustics, Speech, and Signal Processing, pp. 927-930, 1983.

[15] A. Despain. "Fourier Transform Computers using CORDIC Iterations". IEEE Trans. onComputers, Vol. C-23, No. 10, pp. 993-1001, 1974.

[16] R.D. Duda and P.E. Hart. "Use of the Hough transform to detect lines and curves in pictures".J. Communications of the ACM, Vol. 15, pp. 11-15, 1972.

[17] J. Duprat and J.M. Muller. "The CORDIC algorithm: new results for fast VLSI implementation".Report nº 90-04, Ecole Normale Superieure de Lyon (France), 1990.

[18] M.D. Ercegovac and T. Lang. "Redundant and On-Line CORDIC: Application to MatrixTriangularization and SVD". IEEE Trans. on Computers, Vol. C-39, No. 6, pp. 725-740, 1990.

[19] M. Ferretti. "Mapping the generalized Hough transform on a mesh connected computer". IEEEInt’l Conf. CompEuro’91, pp. 248-252, 1991.

[20] A.L. Fisher and P.T. Highnam. "Computing the Hough transform on a scan line arrayprocessor". IEEE Workshop on Computer Architecture for Pattern Analysis and Machine

14

Intelligence, pp. 83-87, 1987.[21] L. da Fontoura and M.B. Sandler. "A binary Hough transform and its efficient implementation

in a systolic array architecture". J. Pattern Recognition Letters, Vol. 10, pp. 329-334, 1989.[22] N. Guil, J. Villalba and E.L. Zapata. "A fast Hough transform for segment detection".

(submitted).[23] J. Harding, T. Lang and J. Lee. "A Comparasion of Redundant CORDIC Rotator Engines".

IEEE Int’l Conf. Computer Design, pp. 556-559, 1991.[24] K. Hanahara, T. Maruyama and T. Uchiyama. "A real time processor for the Hough transform".

IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 10, No. 1, pp. 121-125, 1988.[25] D.D. Haule and A.S. Malowany. "Object recognition using fast adaptative Hough transform".

IEEE Computer Pacific Conf. on Communication, Computer and Signal Procesing, pp. 91-94,1989.

[26] G.L. Haviland and A.A. Tuszynski. "A CORDIC Arithmetic Processor Chip". IEEE Trans. onComputers, Vol. C-29, No. 2, pp. 68-79, 1980.

[27] P.V.C. Hough. "Method and Means for Recognizing Complex Patterns". U.S. Patent 3069654,1962.

[28] X. Hu, R.G. Harber and S.C. Bass. "Expanding the Range of Convergence of the CORDICAlgorithm". IEEE Trans. on Computers, Vol. C-40, No. 1, pp. 13-21, 1991.

[29] Y.H. Hu. "The quantization effects of the CORDIC algorithm". IEEE Trans. on SignalProcessing, Vol. 40, No. 4, pp. 834-844, 1992.

[30] Y.H. Hu. "CORDIC-Based VLSI Architectures for Digital Signal Processing". IEEE SignalProcessing Magazine, Vol. 9, No. 3, pp. 16-35, 1992.

[31] C.W. Kang, R.H. Park and K.H. Lee. "Extraction of straight line segments using rotationtransformation: generalized Hough Transformation". J. Pattern Recognition, Vol. 24, No. 7, pp.663-641, 1991.

[32] D. König and J.F. Böhme. "Optimizing the CORDIC Algorithm for Processors with PipelineArchitecture". In Signal Processing V: Theories and Applications (Elsevier Science Publishers),pp. 1391-1394, 1990.

[33] H. Koshimizu and M. Numada. "On a fast Hough transform method PLHT based on piecewise-linear Hough function". J. System Computer in Japan, Vol. 21, No. 5, pp. 62-73, 1990.

[34] R. Künemund, H. Sölder, S. Wohlleben and T. Noll. "CORDIC processor with carry savearchitecture". 16th European Solid State Citcuits Conference, pp. 193-196, 1990.

[35] A.A. de Lange and E.F. Deprettere. "Design and Implementation of a Floating-Point Quasi-Systolic General Purpose CORDIC Rotator for High-Rate Parallel Data and Signal Processing".IEEE 10th Int’l Symp. on Computer Arithmetic, pp. 272-281, 1991.

[36] J. Lee and T. Lang. "SVD by Constant-Factor-Redundant-CORDIC". IEEE 10th Int’l Symp. onComputer Arithmetic, pp. 264-271, 1991.

[37] J. Lee and T. Lang. "Constant-Factor Redundant CORDIC for Angle Calculation and Rotation".IEEE Trans. on Computers, Vol. 41, No. 8, pp. 1016-1025, 1992.

[38] H. Li, M.A. Lavin and R.J. Le Master. "Fast Hough transform: a hierachical approach". J.Computer Vision Graphics Imagen Processing, Vol. 36, pp. 139-161. 1986.

[39] H.F. Li, D. Pao and R. Jayakumar. "Improvements and systolic implementation of the Houghtransformation for straight line detection". J. Pattern Recognition, Vol. 22, No. 6, pp. 697-706,1989.

[40] "L64250 Histogram/Hough transform processor (HHP)". LSI Logic, 1989.[41] D.E. Metafas and C.E. Goutis. "A DSP Processor with a Powerfull Set of Elementary Arithmetic

15

Operations Based on CORDIC and CCM Algorithms". J. Microprocessing andMicroprogramming, Vol. 30, pp. 51-58, 1990.

[42] H.K. Muammar and M. Nixon. "Tristage Hough transform for multiple ellipse extraction". IEEProc. Part E: Computer and Digital Techniques, Vol. 138, No. 1, 1991.

[43] H. Nishino and Y. Kobayashi. "Extraction of Planar surfaces from a set of line segments usingthe 3-dimensional Hough transform". J. System Computer in Japan, Vol. 21, No. 12, pp. 78-87,1990.

[44] A. Rosenfeld, J. Ornelas and Y. Hung. "Hough transform algorithms for mesh-connected SIMDparallel processor". J. Computer Vision Graphics Imagen Processing, Vol. 41, pp. 293-305,1988.

[45] R.V. Shankar and N. Asokan. "A parallel implementation of the Hough transform method todetect lines and curves in pictures". IEEE 32th Midwest Symp. on Circuits & Systems, pp. 321-324, 1990.

[46] I. da Silva. "Vectorization from aerial photographs applying the Hough transform method". Proc.SPIE, Vol. 1395, Pt.2, pp. 956-963, 1990.

[47] N. Takagi, T. Asada and S. Yajima. "Redundant CORDIC Methods with a Constant ScaleFactor for Sine and Cosine Computation". IEEE Trans. on Computers, Vol. C-40, No. 9, pp.989-995, 1991.

[48] D. Timmermann, H. Hahn and B.J. Hosticka. "Hough transform using CORDIC method". J.Electronics Letters, Vol. 25, No. 3, pp. 205-206, 1989.

[49] D. Timmermann, H. Hahn, B.J. Hosticka and G. Schmidt. "A programmable CORDIC chip fordigital signal processing applications". IEEE Journal of Solid-State Circuits, Col. 26, No. 9, pp.1317-1321, 1991.

[50] D. Timmermann, H. Hahn and B.J. Hosticka. "Low Latency Time CORDIC Algorithms". IEEETrans. on Computers, Vol. 41, No. 8, pp. 1010-1015, 1992.

[51] J.E. Volder. "The CORDIC Trigonometric Computing Technique". IRE Transactions onElectronic Computers, Vol. EC-8, No. 3, pp. 330-334, 1959.

[52] J. Villalba, J.A. Hidalgo, E. Antelo, J.D. Bruguera and E.L. Zapata. "CORDIC Architecture withParallel Compensation of the Scale Factor". Proc. Int. Conf. on Application Specific ArrayProcessors (ASAP’95), pp. 258-269, July 1995,

[53] J.S. Walther. "A Unified Algorithm for Elementary Funtions". Proc. Spring Joint ComputersConference, pp. 379-385, 1971.

[54] L. Xu, E. Oja and P. Kultaken. "A new curve detection method: Randomized Hough transform(RHT)". J. Pattern Recognition Letters, Vol. 11, pp. 331-338, 1990.

16

Captions for tables and figures

Table I: Scale factor correction stages, final scale factor and maximum angle achieved.Table II: Mixed radix decomposition of the first 16 angles (N=1024).Table III: Coefficients (σ0,σ1) and their scaling stages associated.Table IV: Angular covering for each scale factor.Table V: Equivalence between the control signals and the coding of the coefficients.Table VI: Number of stages of the radix 2 and mixed radix CORDIC processors for different

sizes of the image space. The last column represents the reduction factor achievedwith the mixed radix design.

Figure 1: a) Image space and b) Hough tranformed space.Figure 2: Radix 2 (δ=2i) or 4 (δ=4i) microrotation.Figure 3: CORDIC based Hough transform processor for NxN images (N≤1024).Figure 4: a) Standard Radix 4 microrrotation and b) scaling microrrotation

17

Table I

Option Stages K θ max

1 1 3 3 8R - - + 1.000482 81.44

2 1 2 3 4 4 5 5R R - - - R - 1.000088 96.71

3 1 2 2 8 9R R - - - 0.999600 94.92

4 1 2 3 4 4 5R R - - - - 0.999600 94.92

5 1 4 4 3 3R R R - - 1.000481 88.04

6 1 2 3 4 5 5 5R - + R - - - 1.000181 84.46

7 1 3 3 3 4 4R R - - + - 1.000421 88.01

18

Table II

Angulo(θ=Kπ/N)

K

Descomposiciónradix 2 radix 4

1 1 2 2 3 4 5 6 7 8

0123

45678

9101112

131415

-1 1 -1 1 1 -1 -1 -2 0 1-1 1 -1 1 1 -1 -1 -2 1 0-1 1 -1 1 1 -1 -1 -1 -2 -1-1 1 -1 1 1 -1 -1 -1 -2 2-1 1 -1 1 1 -1 -1 -1 -1 -2-1 1 -1 1 1 -1 -1 -1 -1 1-1 1 -1 1 1 -1 -1 -1 0 0-1 1 -1 1 1 -1 -1 -1 1 -1-1 1 -1 1 1 -1 -1 0 -2 -1-1 1 -1 1 1 -1 -1 0 -2 2-1 1 -1 1 1 -1 -1 0 -1 -2-1 1 -1 1 1 -1 -1 0 -1 1-1 1 -1 1 1 -1 -1 0 0 0-1 1 -1 1 1 -1 -1 0 1 -1-1 1 -1 1 1 -1 -1 0 1 2-1 1 -1 1 1 -1 -1 1 -2 -2-1 1 -1 1 1 -1 -1 1 -2 1-1 1 -1 1 1 -1 -1 1 -1 1-1 1 -1 1 1 -1 -1 1 0 0

19

Table III

σ0σ1 Scaling K σ61σ62 σ71σ72 σ81σ82

0 0 1.00000 0 0 0 0 0 0

0 1 7–

0.99990 0 0 0 0 0 1

0 2 5–

10+

0.99954

1 0 4–

5–

6–

0.99955 0 1 1 0 1 1

1 1 3–

10–

6+

1.00031 1 0 0 1 1 0

1 2 3–

10–

7–

1.00046 1 0 0 1 0 1

2 0 2–

4–

7+

9–

1.00018

2 1 2–

4–

9–

1.00015

2 2 1–

2+

2+

3–

8+

1.00039

20

Table IV

σ0 σ1

Ranges 00 01 02 10 11 12 20 21 22

0.000 --> 2.3732.373 --> 4.8334.833 --> 7.8227.822 --> 9.316

9.316 --> 11.95311.953 --> 14.67714.677 --> 17.31417.314 --> 18.89618.896 --> 21.79621.796 --> 24.25724.257 --> 26.19126.191 --> 28.91628.918 --> 31.37631.376 --> 35.77135.771 --> 35.85935.859 --> 40.25340.253 --> 42.71442.714 --> 45.000

xx x

xxx x

xxx

xxxx

xxxx

xxxx

xxxx

xxx

xx

xxx

xxx

21

Table V

STAGE c1 c2 c3 A B

i= 0,1,...,5 σi1 σi2 σi3 2-2i 2-2i+1

Compensation i=2 σ21 σ22 σ23 2-9 2-11

i=6 σ61 σ62 1 2-3 2-4

Scaling i=7 σ71 σ72 1 2-5 2-10

i=8 σ81 σ82 σ82 2-6 2-7

22

Table VI

Tamaño imagenn

Etapas radix-2 Etapas radix mixto2 4 total

% reducciónde etapas

14 16 8 4 12 25 %

13 15 9 3 12 20 %

12 14 8 3 11 21.5 %

11 13 7 3 10 23 %

10 12 6 3 9 25 %

9 11 7 2 9 18 %

8 10 6 2 8 20 %

7 9 5 2 7 22.2 %

23

Figure 1

24

δ δ

Figure 2

25

Figure 3

26

Figure 4

27

university of malaga

Documents