fast dct algorithm using winograd’s method

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN

0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 1, January- June (2012), © IAEME

98

FAST DCT ALGORITHM USING WINOGRAD’S METHOD

Ch. Ramesh1, Dr.N.B. Venkateswarlu

2, Dr. J.V.R. Murthy

3

1Professor, Dept .of CSE, AITAM, Tekkali, A.P, India

[email protected] 2Professor, Dept .of CSE, AITAM, Tekkali, A.P, India

[email protected] 3Professor, Dept .of CSE, College of Engineering, JNTUK, A.P, India

[email protected]

ABSTRACT

Applications of Digital Image Communication have increased exponentially in the

recent years. Evidently, discrete cosine transform (DCT) based algorithms are in wide

use for reducing communication cost. Forward DCT and inverse DCT computation

are reported to be taking very long time which may often impede real time responses

in some applications. In this paper, we present Winograd’s matrix multiplication

approach for forward DCT and inverse DCT computation to reduce their CPU time.

Experiments are made with standard images and synthetic images.

Key Words: DCT, IDCT, Winograd’s, JPEG, MATLAB

I. INTRODUCTION

Discrete cosine transform (DCT) based algorithms such as JPEG, JPEG2000, MP3,

are the most widely used in the audio, image, and video data compression. DCT is

originally developed by Ahmed, Natarajan, and Rao (1974). Its application to image

compression was pioneered by Chen and Pratt (1984). DCT is a technique for

converting a signal into elementary frequency components. DCT represents an image

as a sum of sinusoids of varying magnitudes and frequencies.

The DCT has the property that, for a typical image, most of the visually significant

information about the image is concentrated in just a few coefficients of the DCT, for

this reason the DCT is often used in image compression applications [3]. The cosines

transform converts each block of spatial information into an efficient frequency space

representation that is better suited for compression. Specifically, the transform

produces an array of coefficients for real-valued basis functions that represent each

block of data in frequency space. The magnitude of the DCT coefficients exhibits a

distinct pattern within the array, where transform coefficients corresponding to the

INTERNATIONAL JOURNAL OF ELECTRONICS AND

COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

ISSN 0976 – 6464(Print) ISSN 0976 – 6472(Online)

Volume 3, Issue 1, January- June (2012), pp. 98-110

© IAEME: www.iaeme.com/ijecet.html

Journal Impact Factor (2011): 0.8500 (Calculated by GISI)

www.jifactor.com

IJECET

© I A E M E



99

lowest frequency basis functions usually have the highest magnitude and are the most

perceptually significant. Similarly, discrete cosine transform coefficients

corresponding to the highest frequency basis functions usually have the lowest

magnitude and are the least perceptually significant. In DCT based compression

methods, only important DCT coefficients are retained. Thus, we achieve

compression of data.

The 2D-DCT equation (Eq-1) computes the u, v th

entry of the DCT of an image [5]

( ) ∑∑−

=

−

=

+

+=

1

0

1

0 2

)12(cos

2

)12(cos),()()(,

N

x

N

y N

vy

N

uxyxfvuvuC

ππαα

for u, v = 0, 1, 2 …….N-1 (1)

>

==

0/2

0/1)(

uforN

uforNuα (2)

>

==

0/2

0/1)(

uforN

uforNvα (3)

f (x, y) is the x, yth

element of the image represented by the matrix f. N is the size of

the block that the DCT is done on. The equation calculates one entry (u, vth

) of the

transformed image from the pixel values of the original image matrix.

The first coefficient C00 is termed the “DC coefficient” and the remaining coefficients

are called the “AC coefficients”. After performing DCT, the remaining operations at

the sender side are quantization, zigzag and encoding. The reverse operations at the

receiving side are decoding, inverse zigzag, de-quantization and IDCT. As these

concepts are widely reported elsewhere, we skip discussion about them for the

reasons of terseness.

The IDCT is a transform that converts a set of frequency coefficients to a signal for an

image, this transform is performed on a 2 dimensional array of coefficients resulting

in a 2 dimensional array of samples.

The 2D-IDCT equation (Eq-4) computes the x, yth

entry of an image. [5]

( ) ∑∑−

=

−

=

+

+=

1

0

1

0 2

)12(cos

2

)12(cos),()()(,

N

u

N

v N

vy

N

uxvuCvuyxf

ππαα

(4)

for x, y = 0, 1, 2 …….N-1

C (u, v) is u, vth

DCT coefficient of the image represented by the matrix C. N is the

size of the block that the IDCT is done on. The equation calculates one entry (x, yth

)

of the image from the transformed coefficients of the IDCT matrix.

This paper is organized as follows. In section II, we have given a brief overview of

DCT and IDCT algorithms by conventional approach . The proposed Winograd’s

based DCT and IDCT algorithms are described in section III. Experimental results are

presented in IV. Finally, concluding remarks are given in section V.

II. Computational Complexity of Conventional DCT/IDCT



100

In the 2D-DCT (Eq-1) the cosine functions

+

N

ux

2

)12(cos

πand

+

N

vy

2

)12(cos

πare computationally very expensive.

+

N

vy

2

)12(cos

πis the transpose

of

+

N

ux

2

)12(cos

π. Calculation of

+

N

ux

2

)12(cos

π requires 4 multiplications, 1

addition, and 1 division. For calculation of each element in DCT matrix the loop in

Eq-1 iterates 64 times. Therefore

+

N

ux

2

)12(cos

πrequires 256 multiplications, 64

additions and 64 divisions. For calculation of all elements in DCT matrix, it requires

16384 multiplications 4096 additions and 4096 divisions. Therefore both the cos

functions require 32768 multiplications 8192 additions and 8192 divisions. Therefore

the way to improve the performance is to pre compute the coefficients and read them

during DCT algorithms. In this way for the calculation of each element in DCT

matrix, the Eq-1 requires 130 multiplications, 63 additions and 2 divisions. Similarly

for the calculation of all the elements in 8x8 DCT matrix The Eq-1 requires 8320

multiplications,4023 additions and 2 divisions. For the calculation of each element in

IDCT matrix, the Eq-4 requires 256 multiplications, 63 additions and 2 divisions.

Similarly for the calculation of all the elements in 8x8 IDCT matrix The Eq-4

requires 16384 multiplications, 4023 additions and 2 divisions. The IDCT requires

more number of arithmetic operations compared to DCT.

III. Winograd’s Approach

Consider calculation of scalar or dot product of two vectors, X and Y

X = [x1, x2, …..xN] (5)

Y = [y1, y2,…...yN] (6)

XTY= x1y1 +x2y2+ …+xNyN (7)

This calculation usually requires N multiplications and N additions. Winograd’s

algorithm [2, 5] is used in the literature to reduce these computations in applications

such as classification, etc. According to [2],

XTY= [(x1+y2)(x2+y1)+(x3+y4)(x4+y3) + …. + (xN+yN-1)(xN-1+yN) ] –

[ x1x2+x3x4+ … xN-1xN] –

[ y1y2 + y3y4 + …. + yN-1yN] (8)

This can be also represented as below assuming k=N/2.

122

1=

122

1=

122212

1=

2

1=

))((==−−−− ∑∑∑∑ −−++

uu

k

u

uu

k

u

uuuu

k

u

ii

k

i

T yyxxyxyxyxYX (9)

By representing in the above form, if last two terms are assumed to be pre-calculated,

we can get dot product with N/2 multiplications itself. In some applications, last two

terms can be re-used. Thus, we may get computational benefit. This theme we

propose to use in our DCT/IDCT algorithm’s by extending this to matrix

multiplication. Of course, here we have assumed N is even number, thus N/2 pairs are

available. If N is not even, we can simply convert X and Y into even by adding one 0

at the end.

Though Winograd’s algorithm [2, 5] reduces actual computations involved, its

asymptotic computational complexity is same as the naïve matrix multiplication



101

algorithm. In our DCT algorithm, we are required to carry a series of matrix

multiplications. We propose to reduce CPU time requirements by meticulously using

Winograd’s method.

For the multiplication of two N x N square matrices A and B Winogard’s algorithm is

defined as shown in equations (11), (12) and (13) below.

Ci,j=Product of Ai and Bj (10)

∑=

−− −−++=2/

1

,122,,212,, ))((n

K

jijKKijKKiji BAbabaC (11)

∑=

−=

2/

1

2,12, .n

K

KiKii aaA (12)

jK

n

K

jKj bbB ,2

2/

1

,12 .∑=

−= (13)

→iA Sum of pairwise multiplication of couples in ith

row.

→jB Sum of pairwise multiplication of couples in jth

column.

→jiC , ith

row, jth

column element of matrix C.

Since Ai and Bj are pre-computed once for each row of A and column of B. They

require only N2 multiplications. That is, to calculate pair-wise product of any row or

column of N x N matrix, we need N/2 multiplications. For N rows or columns, we

need NxN/2 multiplications. Thus, in total to calculate pair-wise product of rows of A

and columns of B, we need N2 multiplications. The total number of multiplications

needed to calculate matrix product becomes: 23

2

1NN + . However, the number of

additions and subtractions has been increased to NNN 22)2

3( 23

−+ .Winograd’s

algorithm is theoretically faster than the naïve matrix multiplication algorithm,

because additions takes very less CPU time compared to multiplications. In DCT or

IDCT computations matrices are of size 8 x 8. For an 8x8 matrix, each iA calculation

requires 4 multiplications, 3 additions. Each jB calculation requires 4

multiplications, 3 additions. Each jiC , calculation requires 4 multiplications, 3

additions and 2 subtractions. For multiplication of two 8x8 matrices 320

multiplications, 752 additions and 128 subtractions are required.

Now let us discuss how Winograd’s matrix multiplication method can be used with

DCT. The Eq-1 in matrix notation can be represented as [12]

( ) )*),(*(*)()(, 11

TcyxfcvuvuC αα= (14)

f (x, y) → 8 x 8 image block

c1 → 8 x 8 matrix belongs to the 1st cos function in Eq-1 – c1 is



102

Constant for all the blocks.

C1T

→ 8 x 8 matrix belongs to the 2nd

cos function in Eq-1 – C1T is


C1T is the transpose of c1

We propose the following steps to calculate ( )vuC , .

1. Calculate the Matrix product of 1c and ),( yxf

2. Calculate the product of Result at Step 1 and C1T.

3. Multiply the resultant Matrix at Step 2 with scalar )()( vu αα

For the multiplication of two matrices (c1 and f (x, y)) Ai calculation requires 32

multiplications, 24 additions, Bj calculation requires 32 multiplications, 24 additions

and cij calculation requires 256 multiplications, 704 additions,128 subtractions.

Therefore totally 320 multiplications, 752 additions, and 128 subtractions are

required. For the multiplication of resultant matrix (c1*f (x, y)) with C1T, Ai

calculation requires 32 multiplications, 24 additions, Bj is not required any

calculations because Bj in c1T is same as Ai in c1, cij calculation requires 256

multiplications, 704 additions, 128 subtractions. Therefore the term c1 * f (x, y)*c1T

calculation requires 608 multiplications, 1480 additions and 128 subtractions. Ai for

c1 or Bj for C1T is constant irrespective of the image block. For the calculation of all

the elements in 8 x 8 DCT matrix, according to Eq-14 requires 736 multiplications

1480 additions, 128 subtractions and 4 divisions. Thus, the no of arithmetic operations

required is less compared to conventional approach.

Now let us discuss how Winograd’s can be used with IDCT. The Eq-4 in matrix

notation is

)*)),(*)()((*),( 11 cvucvucyxfT

αα= (15)

c1T → 8 x 8 matrix belongs to the 1

st cos function in Eq-4 – c1

T is


C1 → 8 x 8 matrix belongs to the 2nd

cos function in Eq-4 – c1 is


c4 is the transpose of c3

We propose the following steps to calculate ),( yxf

1. Multiply the Matrix ),( vuc with scalar )()( vu αα

2. Calculate the Product of Resultant Matrix at Step 1 with 1c

3. Calculate the Product of T

c 1 with the Resultant Matrix at Step 2.

),(*)()( vucvu αα calculation requires 4 divisions and 128 multiplications. For

the multiplication of two matrices C1T and )),(*)()(( vucvu αα Ai Calculation

requires 32 multiplications, 24 additions, Bj calculation requires 32 multiplications, 24

additions and cij requires 256 multiplications, 704 additions,128 subtractions.

Therefore totally 320 multiplications, 752 additions, and 128 subtractions are

required. For the multiplication of resultant matrix )),(*)()((*1 vucvucT

αα with

C1, Ai calculation requires 32 multiplications, 24 additions, Bj is not required any

calculations because Bj in c1 is same as Ai in, c1T ,cij calculation requires 256



103

multiplications, 704 additions, 128 subtractions. For the calculation of all the

elements in 8 x 8 DCT matrix, according to Eq-15 requires 736 multiplications 1480

additions, 128 subtractions and 4 divisions. The total no of arithmetical operations

required is less when compared to conventional approach.

IV. EXPERIMENTAL WORK

In this study a number of images in tiff format are used including the widely used

Lena, Mandrill and Pepper images. The Table-1 shows the complete details of images

used in our study.

Table 1: Details of images used in our study

All the above images are taken from the USC-SIPI image database

“http://sipi.usc.edu/database” [6]

S. No Fig No Image Size Type

1 1(a) Chess 128x128 Gray

2 1(b) Helmet 128x128 Gray

3 1(c) X-ray 128x128 Gray

4 1(d) Clock 256x256 Gray

5 1(e) Moon surface 256x256 Gray

6 1(f) Cameraman 256x256 Gray

7 1(g) Lena 512x512 Gray

8 1(h) Mandrill 512x512 Gray

9 1(i) Peppers 512x512 Gray

10 1(j) Man 1024x1024 Gray

11 1(k) Airplane2 1024x1024 Gray

12 1(l) Airport 1024x1024 Gray

13 1(m) Flowers 2048x2048 Gray

14 1(n) Flowers1 2048x2048 Gray

15 1(0) City 2048x2048 Gray

16 2(a) Couple 128x128 Color

17 2(b) House 128x128 Color

18 2(c) Jennybeans1 128x128 Color

19 2(d) Girl1 256x256 Color

20 2(e) Jennybeans 256x256 Color

21 2(f) Tree 256x256 Color

22 2(g) Girl2 512x512 Color

23 2(h) Sailboat 512x512 Color

24 2(i) Splash 512x512 Color

25 2(j) Oakland 1024x1024 Color

26 2(k) Richmond 1024x1024 Color

27 2(l) Shreport 1024x1024 Color

28 2(m) Flowers 2048x2048 Color

29 2(n) Flowers1 2048x2048 Color

30 2(0) City 2048x2048 Color

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 –

6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 1, January- June (2012), © IAEME

104

Experiments are carried out on MS Windows XP version 2002, SP3 edition and Fedora 10,

Kernel Linux 2.6.27.5-117. fc 10.i 686. The system is equipped with Intel core 2 Duo 2.60

GHz with 1 GB RAM. Under Windows XP, programs are written in C language under

Micro Soft Visual Studio 2005 version 8.0. Under Linux, we have used GNU g++ 4.32.

MATLAB is a popular numerical computing environment and fourth generation

programming language developed by Mathworks. The dct2() function in the image

processing tool box computes the two dimensional discrete cosine transforms (DCT) of an

image. The idct2() function in the image processing tool box computes the two dimensional

inverse discrete cosine transform (IDCT). We have used these functions to compare our

algorithms performance. In Windows environment the CPU time for DCT and IDCT is

calculated by using the function GetSystemTime(). In UNIX environment the CPU time for

DCT and IDCT is calculated by using the function gettimeofday(). In MATLAB the CPU

time for DCT and IDCT is calculated by using the function cputime().

S.

No Fig No MAT LAB

Conventional

Approach

Wino

grad’s

Speed Up of

Winograd’s

compared to

MATLAB

Speed Up of

Winograd’s compared

to Conventional

Approach

1 1(a) 0.0313 0.0167 0.0035 8.942 4.771

2 1(b) 0.0313 0.0168 0.0035 8.942 4.8

3 1(c) 0.0312 0.0167 0.0035 8.914 4.771

4 1(d) 0.1250 0.0686 0.0140 8.928 4.9

5 1(e) 0.1248 0.0685 0.0141 8.851 4.858

6 1(f) 0.1250 0.0689 0.0141 8.865 4.886

7 1(g) 0.4688 0.2715 0.0550 8.523 4.936

8 1(h) 0.4683 0.2714 0.0530 8.835 5.120

9 1(i) 0.4688 0.2716 0.0550 8.523 4.938

10 1(j) 1.8288 1.0949 0.2250 8.128 4.866

11 1(k) 1.8290 1.0949 0.2250 8.128 4.866

12 1(l) 1.8281 1.0948 0.2240 8.161 4.8875

13 1(m) 7.1242 4.4053 0.9030 7.889 4.878

14 1(n) 7.1240 4.4051 0.9030 7.889 4.878

15 1(0) 7.1250 4.4055 0.9040 7.881 4.873

Table 2: CPU time in Secs for DCT Calculation (Windows Environment)-gray level images



105

S. No Fig No MAT LAB Conventional

Approach

Wino

grad’s

Speed Up of

Winograd’s

compared to

MATLAB

Speed Up of


to Conventional

Approach

1 2(a) 0.0939 0.051 0.0105 8.942 4.857

2 2(b) 0.0939 0.0512 0.0105 8.942 4.876

3 2(c) 0.0939 0.0512 0.0105 8.942 4.876

4 2(d) 0.3273 0.2035 0.0406 8.061 5.012

5 2(e) 0.3280 0.2049 0.0407 8.058 5.034

6 2(f) 0.3282 0.205 0.042 7.814 4.880

7 2(g) 1.3124 0.8124 0.1672 7.849 4.858

8 2(h) 1.3125 0.8124 0.1673 7.845 4.855

9 2(i) 1.3125 0.8126 0.1673 7.845 4.857

10 2(j) 5.2521 3.3 0.68 7.723 4.852

11 2(k) 5.2560 3.301 0.68 7.729 4.854

12 2(l) 5.2500 3.2846 0.672 7.8125 4.887

13 2(m) 21.842 12.3595 2.7037 8.078 4.571

14 2(n) 21.6093 12.3592 2.7036 7.992 4.571

15 2(0) 21.8439 12.3633 2.7037 8.079 4.572

Table 3: CPU time in Secs for DCT Calculation (Windows Environment)-color images


Approach Wino grad’s

Speed Up of

Winograd’s

compared to

MAT LAB

Speed Up of


to Conventional

Approach

1 1(a) 0.0313 0.014 0.0029 10.793 4.827

2 1(b) 0.0313 0.014 0.0029 10.793 4.827

3 1(c) 0.0312 0.014 0.0029 10.793 4.827

4 1(d) 0.125 0.0573 0.0117 10.683 4.897

5 1(e) 0.1248 0.0568 0.0116 10.758 4.896

6 1(f) 0.1250 0.0575 0.0117 10.683 4.914

7 1(g) 0.4688 0.2267 0.0466 10.060 4.864

8 1(h) 0.4683 0.2262 0.0465 10.070 4.864

9 1(i) 0.4688 0.2268 0.0468 10.017 4.846

10 1(j) 1.8288 0.9142 0.1871 9.774 4.886

11 1(k) 1.8290 0.9153 0.1873 9.765 4.886

12 1(l) 1.8281 0.9139 0.1870 9.775 4.887

13 1(m) 7.1242 3.6676 0.7500 9.498 4.890

14 1(n) 7.1240 3.6515 0.7479 9.525 4.882

15 1(0) 7.1250 3.6774 0.7546 9.442 4.873

Table 4: CPU time in Secs for DCT Calculation (UNIX Environment)-gray level images



106


Approach

Wino

grad’s

Speed Up of

Winograd’s

compared to

MATLAB

Speed Up of


to Conventional

Approach

1 2(a) 0.0939 0.0426 0.0087 10.793 4.896

2 2(b) 0.0939 0.0424 0.0087 10.793 4.873

3 2(c) 0.0939 0.0423 0.0086 10.918 4.918

4 2(d) 0.3273 0.1699 0.0339 9.654 5.011

5 2(e) 0.3280 0.1707 0.0349 9.398 4.891

6 2(f) 0.3282 0.1708 0.0350 9.377 4.88

7 2(g) 1.3124 0.6782 0.1382 9.496 4.907

8 2(h) 1.3125 0.6798 0.1396 9.401 4.869

9 2(i) 1.3125 0.6803 0.1412 9.295 4.817

10 2(j) 5.2521 2.7466 0.5618 9.348 4.888

11 2(k) 5.2560 2.7561 0.5620 9.352 4.904

12 2(l) 5.2500 2.7418 0.5610 9.358 4.887

13 2(m) 21.842 11.00 2.2393 9.753 4.912

14 2(n) 21.6093 10.9607 2.1900 9.867 5.004

15 2(0) 21.8439 11.0320 2.2569 9.678 4.888

Table 5: CPU time in Secs for DCT Calculation (UNIX Environment)-color images


Approach

Wino

grad’s

Speed Up of

Winograd’s

compared to

MAT LAB

Speed Up of


to Conventional

Approach

1 1(a) 0.0313 0.0203 0.0031 10.09 6.54

2 1(b) 0.0313 0.0202 0.0032 9.78 6.31

3 1(c) 0.0313 0.0204 0.0032 9.78 6.37

4 1(d) 0.1406 0.0837 0.0129 10.89 6.48

5 1(e) 0.1407 0.0838 0.0129 10.90 6.48

6 1(f) 0.1406 0.0837 0.0128 10.98 6.53

7 1(g) 0.5314 0.3278 0.049 10.84 6.82

8 1(h) 0.5469 0.3279 0.050 10.93 6.55

9 1(i) 0.5313 0.3276 0.049 10.84 6.68

10 1(j) 2.1716 1.3184 0.2076 10.46 6.35

11 1(k) 2.1710 1.3183 0.2076 10.45 6.35

12 1(l) 2.1719 1.3185 0.2078 10.45 6.34

13 1(m) 8.6406 5.24 0.8292 10.42 6.31

14 1(n) 8.6094 5.241 0.8292 10.38 6.32

15 1(0) 8.5 5.239 0.8291 10.25 6.31

Table 6: CPU time in Seconds for IDCT Calculation (Windows Environment)-gray level images



107


Approach Wino grad’s

Speed Up of

Winograd’s

compared to

MAT LAB

Speed Up of


to Conventional

Approach

1 2(a) 0.0938 0.0608 0.0092 10.19 6.60

2 2(b) 0.0938 0.0609 0.0091 10.30 6.69

3 2(c) 0.0938 0.0608 0.0092 10.19 6.60

4 2(d) 0.4218 0.2511 0.0388 10.87 6.47

5 2(e) 0.4216 0.251 0.0387 10.89 6.48

6 2(f) 0.424 0.253 0.0388 10.92 6.52

7 2(g) 1.5942 0.9833 0.147 10.84 6.68

8 2(h) 1.5944 0.9835 0.148 10.77 6.64

9 2(i) 1.5939 0.9857 0.148 10.76 6.66

10 2(j) 6.6578 3.9552 0.6227 10.69 6.35

11 2(k) 6.6589 3.9573 0.6229 10.69 6.35

12 2(l) 6.6564 3.9570 0.6226 10.69 6.35

13 2(m) 25.926 15.720 2.4876 10.42 6.31

14 2(n) 25.7641 15.692 2.4869 10.35 6.30

15 2(0) 25.7814 15.696 2.4871 10.36 6.31

Table 7: CPU time in Secs for IDCT Calculation (Windows Environment)-color images

Figure 4: CPU time in Secs for DCT Calculation

(Windows Environment)-color images.

Table 8: CPU time in Secs for IDCT Calculation (UNIX Environment)-gray level images

S.

No

Fig

No

MAT

LAB

Conventional

Approach

Wino

grad’s

Speed Up of

Winograd’s

compared to

MAT LAB

Speed Up of

Winograd’s

compared to

Conventional

Approach

1 1(a) 0.0313 0.0177 0.0028 11.17 6.32

2 1(b) 0.0313 0.0180 0.0028 11.17 6.42

3 1(c) 0.0313 0.0179 0.0028 11.17 6.35

4 1(d) 0.1406 0.0735 0.0114 12.33 6.44

5 1(e) 0.1407 0.0736 0.0115 12.23 6.40

6 1(f) 0.1406 0.0735 0.0115 12.22 6.39

7 1(g) 0.5314 0.2852 0.041 12.96 6.95

8 1(h) 0.5469 0.2856 0.044 12.42 6.94

9 1(i) 0.5313 0.2851 0.041 12.95 6.95

10 1(j) 2.1716 1.1466 0.1807 12.02 6.34

11 1(k) 2.1710 1.1462 0.1806 12.02 6.34

12 1(l) 2.1719 1.1467 0.1809 12.00 6.33

13 1(m) 8.6406 4.5770 0.7211 11.98 6.34

14 1(n) 8.6094 4.5640 0.7210 11.94 6.33

15 1(0) 8.5 4.5636 0.7210 11.78 6.32



108

S.

No

Fig

No

MAT

LAB

Conventional

Approach

Wino

grad’s

Speed Up

of

Winograd’s

compared to

MAT LAB

Speed Up

of

Winograd’s

compared to

Conventiona

l Approach

1 2(a) 0.0938 0.5320 0.0084 11.16 6.33

2 2(b) 0.0938 0.5310 0.0083 11.30 6.39

3 2(c) 0.0938 0.5310 0.0083 11.30 6.39

4 2(d) 0.4218 0.2206 0.0343 12.29 6.43

5 2(e) 0.4216 0.2205 0.0342 12.32 6.44

6 2(f) 0.424 0.2205 0.0342 12.39 6.44

7 2(g) 1.5942 0.8557 0.124 12.85 6.90

8 2(h) 1.5944 0.8558 0.124 12.45 6.90

9 2(i) 1.5939 0.8560 0.132 12.95 6.95

10 2(j) 6.6578 3.4397 0.5422 12.27 6.34

11 2(k) 6.6589 3.4399 0.5421 12.28 6.34

12 2(l) 6.6564 3.4392 0.5320 12.59 6.46

13 2(m) 25.926 13.692 2.163 11.98 6.33

14 2(n) 25.7641 13.679 2.162 11.91 6.32

15 2(0) 25.7814 13.679 2.163 11.91 6.32

Table 9: CPU time in Secs for IDCT Calculation (UNIX Environment)-color images

Figure 3: CPU time in Secs for DCT Calculation (Windows

Environment)-gray level images

0

2

4

6

8

12

8x1

28

25

6x2

56

51

2x5

12

10

24

x10

24

20

48

x20

48

Matlab

Conventional

Winograd

Figure 5: CPU time in Secs for DCT Calculation (UNIX

Environment)-gray level images

Figure 6: CPU time in Secs for DCT Calculation (UNIX

environment)-color images

Figure 7: CPU time in Secs for IDCT Calculation (Windows Environment)-gray level images



109

Figure 8: CPU time in Secs for IDCT Calculation (Windows Environment)-Color images

Figure 9: CPU time in Secs for IDCT Calculation (UNIX Environment)-gray level images

Figure 10: CPU time in Secs for IDCT Calculation (UNIX Environment)-color images

Our Winograd’s based DCT algorithm is consistently taking less CPU time than

conventional algorithm and MATLAB function. Also, the CPU time for our IDCT

algorithm is very less compared to MATLAB function and conventional approach.

Table 2 & 3 displays CPU time for DCT in windows XP with gray and color images.

Our algorithm is consistently giving better results than MATLAB routines and

conventional algorithm. We are getting a speed up of more than 8 when compared to

MATLAB and more than 4 when compared to conventional algorithm.

Table 4 & 5 displays CPU time for DCT in UNIX with gray and color images. Our

algorithm is consistently giving better results than MATLAB routings and

conventional algorithm. We are getting a speed up of more than 9 when compared to

MATLAB and more than 4 when compared to conventional algorithm.


Our algorithm is consistently giving better results than MATLAB routings and



110

conventional algorithm. We are getting a speed up of more than 10 when compared

to MATLAB and more than 6 when compared to conventional algorithm.


Our algorithm is consistently giving better results than MATLAB routings and

conventional algorithm. We are getting a speed up of more than 11 when compared

to MATLAB and more than 6 when compared to conventional algorithm.

The CPU time for DCT and IDCT calculations for the color images is around 3 times

for the corresponding size gray level image.

The Speedup of CPU time for DCT and IDCT calculations in UNIX environment as

compared to CPU time for DCT and IDCT calculations in Windows environment is

around 15%.

V. CONCLUSIONS In this paper, Winograd based fast DCT and IDCT algorithms were proposed. We

have compared our DCT and IDCT algorithms with conventional, MATLAB

counterparts. From our experiments it is evident that our Winograd’s based DCT and

IDCT algorithms is the most preferred algorithms as they consume very less CPU

time compared to conventional implementation and MATLAB. Our approach can be

employed to compress video sequences also.

REFERENCES [1]N.BVenkateswarlu and P.S.V.S.K.Raju “Winograd’s method:A perspective for some

pattern recognition problems” 105-109,Vol 15 ,No2,1994 Pattern Recognition

Letters.

[2]N.BVenkateswarlu and P.S.V.S.K.Raju “Winograd’s Inequality:A perspective for

some PR problems”, Pattern Recognition Letters 1991.

[3]R.C.Gonzalez and R.E.Woods “Digital Image Processing”,2nd Edition Addison

Wesley,USA ISBN:0-201-60078,1993

[4]The USC–SIPI image database (http://sipi.usc.edu/database).Signal and image

processing institute Ming Hgieh Department of Electrical Engineering..

[5]Rudra Pratap “Getting started with Matlab”:A Quick Introduction for Scientist and

Engineer” version 6 Oxford university press 2003.

[6]Andrew B.Watson “Image Compression using the discrete cosine transform”

Mathematica Journal 4(1),1994, p-81-88

[7]D.L.Lee and M.A.Aboelaze “Linear speedup of Winograd’s matrix multiplication

algorithm using an array processor”. Distributed memory computing conference,1991

proceedings of IEEE,pages(427-430).

[8]R.P.Bent “Algorithms for matrix multiplication”,Technical report TR-CS-70-

157,DCS,Stanford University (March 1970).

[9]Boyko Kakaradov “Ultra-fast Matrix Multiplication:An Empirical Analysis of

Highly optimized vector Algorithms“, Stanford under graduate Research journal

2004

[10]Ken cabeen and peter gent,”Image compression and the discrete cosine

transform”,Math 45 college of redwoods

fast dct algorithm using winograd’s method

Documents

forward dct

u vc u

important dct coefficients

entry u

u v f x

vth dct coefficient

inverse dct computationare

comfast dct algorithm