fast dct algorithm using winograd’s method
DESCRIPTION
TRANSCRIPT
International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 1, January- June (2012), © IAEME
98
FAST DCT ALGORITHM USING WINOGRAD’S METHOD
Ch. Ramesh1, Dr.N.B. Venkateswarlu
2, Dr. J.V.R. Murthy
3
1Professor, Dept .of CSE, AITAM, Tekkali, A.P, India
[email protected] 2Professor, Dept .of CSE, AITAM, Tekkali, A.P, India
[email protected] 3Professor, Dept .of CSE, College of Engineering, JNTUK, A.P, India
ABSTRACT
Applications of Digital Image Communication have increased exponentially in the
recent years. Evidently, discrete cosine transform (DCT) based algorithms are in wide
use for reducing communication cost. Forward DCT and inverse DCT computation
are reported to be taking very long time which may often impede real time responses
in some applications. In this paper, we present Winograd’s matrix multiplication
approach for forward DCT and inverse DCT computation to reduce their CPU time.
Experiments are made with standard images and synthetic images.
Key Words: DCT, IDCT, Winograd’s, JPEG, MATLAB
I. INTRODUCTION
Discrete cosine transform (DCT) based algorithms such as JPEG, JPEG2000, MP3,
are the most widely used in the audio, image, and video data compression. DCT is
originally developed by Ahmed, Natarajan, and Rao (1974). Its application to image
compression was pioneered by Chen and Pratt (1984). DCT is a technique for
converting a signal into elementary frequency components. DCT represents an image
as a sum of sinusoids of varying magnitudes and frequencies.
The DCT has the property that, for a typical image, most of the visually significant
information about the image is concentrated in just a few coefficients of the DCT, for
this reason the DCT is often used in image compression applications [3]. The cosines
transform converts each block of spatial information into an efficient frequency space
representation that is better suited for compression. Specifically, the transform
produces an array of coefficients for real-valued basis functions that represent each
block of data in frequency space. The magnitude of the DCT coefficients exhibits a
distinct pattern within the array, where transform coefficients corresponding to the
INTERNATIONAL JOURNAL OF ELECTRONICS AND
COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)
ISSN 0976 – 6464(Print) ISSN 0976 – 6472(Online)
Volume 3, Issue 1, January- June (2012), pp. 98-110
© IAEME: www.iaeme.com/ijecet.html
Journal Impact Factor (2011): 0.8500 (Calculated by GISI)
www.jifactor.com
IJECET
© I A E M E
International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 1, January- June (2012), © IAEME
99
lowest frequency basis functions usually have the highest magnitude and are the most
perceptually significant. Similarly, discrete cosine transform coefficients
corresponding to the highest frequency basis functions usually have the lowest
magnitude and are the least perceptually significant. In DCT based compression
methods, only important DCT coefficients are retained. Thus, we achieve
compression of data.
The 2D-DCT equation (Eq-1) computes the u, v th
entry of the DCT of an image [5]
( ) ∑∑−
=
−
=
+
+=
1
0
1
0 2
)12(cos
2
)12(cos),()()(,
N
x
N
y N
vy
N
uxyxfvuvuC
ππαα
for u, v = 0, 1, 2 …….N-1 (1)
>
==
0/2
0/1)(
uforN
uforNuα (2)
>
==
0/2
0/1)(
uforN
uforNvα (3)
f (x, y) is the x, yth
element of the image represented by the matrix f. N is the size of
the block that the DCT is done on. The equation calculates one entry (u, vth
) of the
transformed image from the pixel values of the original image matrix.
The first coefficient C00 is termed the “DC coefficient” and the remaining coefficients
are called the “AC coefficients”. After performing DCT, the remaining operations at
the sender side are quantization, zigzag and encoding. The reverse operations at the
receiving side are decoding, inverse zigzag, de-quantization and IDCT. As these
concepts are widely reported elsewhere, we skip discussion about them for the
reasons of terseness.
The IDCT is a transform that converts a set of frequency coefficients to a signal for an
image, this transform is performed on a 2 dimensional array of coefficients resulting
in a 2 dimensional array of samples.
The 2D-IDCT equation (Eq-4) computes the x, yth
entry of an image. [5]
( ) ∑∑−
=
−
=
+
+=
1
0
1
0 2
)12(cos
2
)12(cos),()()(,
N
u
N
v N
vy
N
uxvuCvuyxf
ππαα
(4)
for x, y = 0, 1, 2 …….N-1
C (u, v) is u, vth
DCT coefficient of the image represented by the matrix C. N is the
size of the block that the IDCT is done on. The equation calculates one entry (x, yth
)
of the image from the transformed coefficients of the IDCT matrix.
This paper is organized as follows. In section II, we have given a brief overview of
DCT and IDCT algorithms by conventional approach . The proposed Winograd’s
based DCT and IDCT algorithms are described in section III. Experimental results are
presented in IV. Finally, concluding remarks are given in section V.
II. Computational Complexity of Conventional DCT/IDCT
International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 1, January- June (2012), © IAEME
100
In the 2D-DCT (Eq-1) the cosine functions
+
N
ux
2
)12(cos
πand
+
N
vy
2
)12(cos
πare computationally very expensive.
+
N
vy
2
)12(cos
πis the transpose
of
+
N
ux
2
)12(cos
π. Calculation of
+
N
ux
2
)12(cos
π requires 4 multiplications, 1
addition, and 1 division. For calculation of each element in DCT matrix the loop in
Eq-1 iterates 64 times. Therefore
+
N
ux
2
)12(cos
πrequires 256 multiplications, 64
additions and 64 divisions. For calculation of all elements in DCT matrix, it requires
16384 multiplications 4096 additions and 4096 divisions. Therefore both the cos
functions require 32768 multiplications 8192 additions and 8192 divisions. Therefore
the way to improve the performance is to pre compute the coefficients and read them
during DCT algorithms. In this way for the calculation of each element in DCT
matrix, the Eq-1 requires 130 multiplications, 63 additions and 2 divisions. Similarly
for the calculation of all the elements in 8x8 DCT matrix The Eq-1 requires 8320
multiplications,4023 additions and 2 divisions. For the calculation of each element in
IDCT matrix, the Eq-4 requires 256 multiplications, 63 additions and 2 divisions.
Similarly for the calculation of all the elements in 8x8 IDCT matrix The Eq-4
requires 16384 multiplications, 4023 additions and 2 divisions. The IDCT requires
more number of arithmetic operations compared to DCT.
III. Winograd’s Approach
Consider calculation of scalar or dot product of two vectors, X and Y
X = [x1, x2, …..xN] (5)
Y = [y1, y2,…...yN] (6)
XTY= x1y1 +x2y2+ …+xNyN (7)
This calculation usually requires N multiplications and N additions. Winograd’s
algorithm [2, 5] is used in the literature to reduce these computations in applications
such as classification, etc. According to [2],
XTY= [(x1+y2)(x2+y1)+(x3+y4)(x4+y3) + …. + (xN+yN-1)(xN-1+yN) ] –
[ x1x2+x3x4+ … xN-1xN] –
[ y1y2 + y3y4 + …. + yN-1yN] (8)
This can be also represented as below assuming k=N/2.
122
1=
122
1=
122212
1=
2
1=
))((==−−−− ∑∑∑∑ −−++
uu
k
u
uu
k
u
uuuu
k
u
ii
k
i
T yyxxyxyxyxYX (9)
By representing in the above form, if last two terms are assumed to be pre-calculated,
we can get dot product with N/2 multiplications itself. In some applications, last two
terms can be re-used. Thus, we may get computational benefit. This theme we
propose to use in our DCT/IDCT algorithm’s by extending this to matrix
multiplication. Of course, here we have assumed N is even number, thus N/2 pairs are
available. If N is not even, we can simply convert X and Y into even by adding one 0
at the end.
Though Winograd’s algorithm [2, 5] reduces actual computations involved, its
asymptotic computational complexity is same as the naïve matrix multiplication
International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 1, January- June (2012), © IAEME
101
algorithm. In our DCT algorithm, we are required to carry a series of matrix
multiplications. We propose to reduce CPU time requirements by meticulously using
Winograd’s method.
For the multiplication of two N x N square matrices A and B Winogard’s algorithm is
defined as shown in equations (11), (12) and (13) below.
Ci,j=Product of Ai and Bj (10)
∑=
−− −−++=2/
1
,122,,212,, ))((n
K
jijKKijKKiji BAbabaC (11)
∑=
−=
2/
1
2,12, .n
K
KiKii aaA (12)
jK
n
K
jKj bbB ,2
2/
1
,12 .∑=
−= (13)
→iA Sum of pairwise multiplication of couples in ith
row.
→jB Sum of pairwise multiplication of couples in jth
column.
→jiC , ith
row, jth
column element of matrix C.
Since Ai and Bj are pre-computed once for each row of A and column of B. They
require only N2 multiplications. That is, to calculate pair-wise product of any row or
column of N x N matrix, we need N/2 multiplications. For N rows or columns, we
need NxN/2 multiplications. Thus, in total to calculate pair-wise product of rows of A
and columns of B, we need N2 multiplications. The total number of multiplications
needed to calculate matrix product becomes: 23
2
1NN + . However, the number of
additions and subtractions has been increased to NNN 22)2
3( 23
−+ .Winograd’s
algorithm is theoretically faster than the naïve matrix multiplication algorithm,
because additions takes very less CPU time compared to multiplications. In DCT or
IDCT computations matrices are of size 8 x 8. For an 8x8 matrix, each iA calculation
requires 4 multiplications, 3 additions. Each jB calculation requires 4
multiplications, 3 additions. Each jiC , calculation requires 4 multiplications, 3
additions and 2 subtractions. For multiplication of two 8x8 matrices 320
multiplications, 752 additions and 128 subtractions are required.
Now let us discuss how Winograd’s matrix multiplication method can be used with
DCT. The Eq-1 in matrix notation can be represented as [12]
( ) )*),(*(*)()(, 11
TcyxfcvuvuC αα= (14)
f (x, y) → 8 x 8 image block
c1 → 8 x 8 matrix belongs to the 1st cos function in Eq-1 – c1 is
International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 1, January- June (2012), © IAEME
102
Constant for all the blocks.
C1T
→ 8 x 8 matrix belongs to the 2nd
cos function in Eq-1 – C1T is
Constant for all the blocks.
C1T is the transpose of c1
We propose the following steps to calculate ( )vuC , .
1. Calculate the Matrix product of 1c and ),( yxf
2. Calculate the product of Result at Step 1 and C1T.
3. Multiply the resultant Matrix at Step 2 with scalar )()( vu αα
For the multiplication of two matrices (c1 and f (x, y)) Ai calculation requires 32
multiplications, 24 additions, Bj calculation requires 32 multiplications, 24 additions
and cij calculation requires 256 multiplications, 704 additions,128 subtractions.
Therefore totally 320 multiplications, 752 additions, and 128 subtractions are
required. For the multiplication of resultant matrix (c1*f (x, y)) with C1T, Ai
calculation requires 32 multiplications, 24 additions, Bj is not required any
calculations because Bj in c1T is same as Ai in c1, cij calculation requires 256
multiplications, 704 additions, 128 subtractions. Therefore the term c1 * f (x, y)*c1T
calculation requires 608 multiplications, 1480 additions and 128 subtractions. Ai for
c1 or Bj for C1T is constant irrespective of the image block. For the calculation of all
the elements in 8 x 8 DCT matrix, according to Eq-14 requires 736 multiplications
1480 additions, 128 subtractions and 4 divisions. Thus, the no of arithmetic operations
required is less compared to conventional approach.
Now let us discuss how Winograd’s can be used with IDCT. The Eq-4 in matrix
notation is
)*)),(*)()((*),( 11 cvucvucyxfT
αα= (15)
c1T → 8 x 8 matrix belongs to the 1
st cos function in Eq-4 – c1
T is
Constant for all the blocks.
C1 → 8 x 8 matrix belongs to the 2nd
cos function in Eq-4 – c1 is
Constant for all the blocks.
c4 is the transpose of c3
We propose the following steps to calculate ),( yxf
1. Multiply the Matrix ),( vuc with scalar )()( vu αα
2. Calculate the Product of Resultant Matrix at Step 1 with 1c
3. Calculate the Product of T
c 1 with the Resultant Matrix at Step 2.
),(*)()( vucvu αα calculation requires 4 divisions and 128 multiplications. For
the multiplication of two matrices C1T and )),(*)()(( vucvu αα Ai Calculation
requires 32 multiplications, 24 additions, Bj calculation requires 32 multiplications, 24
additions and cij requires 256 multiplications, 704 additions,128 subtractions.
Therefore totally 320 multiplications, 752 additions, and 128 subtractions are
required. For the multiplication of resultant matrix )),(*)()((*1 vucvucT
αα with
C1, Ai calculation requires 32 multiplications, 24 additions, Bj is not required any
calculations because Bj in c1 is same as Ai in, c1T ,cij calculation requires 256
International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 1, January- June (2012), © IAEME
103
multiplications, 704 additions, 128 subtractions. For the calculation of all the
elements in 8 x 8 DCT matrix, according to Eq-15 requires 736 multiplications 1480
additions, 128 subtractions and 4 divisions. The total no of arithmetical operations
required is less when compared to conventional approach.
IV. EXPERIMENTAL WORK
In this study a number of images in tiff format are used including the widely used
Lena, Mandrill and Pepper images. The Table-1 shows the complete details of images
used in our study.
Table 1: Details of images used in our study
All the above images are taken from the USC-SIPI image database
“http://sipi.usc.edu/database” [6]
S. No Fig No Image Size Type
1 1(a) Chess 128x128 Gray
2 1(b) Helmet 128x128 Gray
3 1(c) X-ray 128x128 Gray
4 1(d) Clock 256x256 Gray
5 1(e) Moon surface 256x256 Gray
6 1(f) Cameraman 256x256 Gray
7 1(g) Lena 512x512 Gray
8 1(h) Mandrill 512x512 Gray
9 1(i) Peppers 512x512 Gray
10 1(j) Man 1024x1024 Gray
11 1(k) Airplane2 1024x1024 Gray
12 1(l) Airport 1024x1024 Gray
13 1(m) Flowers 2048x2048 Gray
14 1(n) Flowers1 2048x2048 Gray
15 1(0) City 2048x2048 Gray
16 2(a) Couple 128x128 Color
17 2(b) House 128x128 Color
18 2(c) Jennybeans1 128x128 Color
19 2(d) Girl1 256x256 Color
20 2(e) Jennybeans 256x256 Color
21 2(f) Tree 256x256 Color
22 2(g) Girl2 512x512 Color
23 2(h) Sailboat 512x512 Color
24 2(i) Splash 512x512 Color
25 2(j) Oakland 1024x1024 Color
26 2(k) Richmond 1024x1024 Color
27 2(l) Shreport 1024x1024 Color
28 2(m) Flowers 2048x2048 Color
29 2(n) Flowers1 2048x2048 Color
30 2(0) City 2048x2048 Color
International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 –
6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 1, January- June (2012), © IAEME
104
Experiments are carried out on MS Windows XP version 2002, SP3 edition and Fedora 10,
Kernel Linux 2.6.27.5-117. fc 10.i 686. The system is equipped with Intel core 2 Duo 2.60
GHz with 1 GB RAM. Under Windows XP, programs are written in C language under
Micro Soft Visual Studio 2005 version 8.0. Under Linux, we have used GNU g++ 4.32.
MATLAB is a popular numerical computing environment and fourth generation
programming language developed by Mathworks. The dct2() function in the image
processing tool box computes the two dimensional discrete cosine transforms (DCT) of an
image. The idct2() function in the image processing tool box computes the two dimensional
inverse discrete cosine transform (IDCT). We have used these functions to compare our
algorithms performance. In Windows environment the CPU time for DCT and IDCT is
calculated by using the function GetSystemTime(). In UNIX environment the CPU time for
DCT and IDCT is calculated by using the function gettimeofday(). In MATLAB the CPU
time for DCT and IDCT is calculated by using the function cputime().
S.
No Fig No MAT LAB
Conventional
Approach
Wino
grad’s
Speed Up of
Winograd’s
compared to
MATLAB
Speed Up of
Winograd’s compared
to Conventional
Approach
1 1(a) 0.0313 0.0167 0.0035 8.942 4.771
2 1(b) 0.0313 0.0168 0.0035 8.942 4.8
3 1(c) 0.0312 0.0167 0.0035 8.914 4.771
4 1(d) 0.1250 0.0686 0.0140 8.928 4.9
5 1(e) 0.1248 0.0685 0.0141 8.851 4.858
6 1(f) 0.1250 0.0689 0.0141 8.865 4.886
7 1(g) 0.4688 0.2715 0.0550 8.523 4.936
8 1(h) 0.4683 0.2714 0.0530 8.835 5.120
9 1(i) 0.4688 0.2716 0.0550 8.523 4.938
10 1(j) 1.8288 1.0949 0.2250 8.128 4.866
11 1(k) 1.8290 1.0949 0.2250 8.128 4.866
12 1(l) 1.8281 1.0948 0.2240 8.161 4.8875
13 1(m) 7.1242 4.4053 0.9030 7.889 4.878
14 1(n) 7.1240 4.4051 0.9030 7.889 4.878
15 1(0) 7.1250 4.4055 0.9040 7.881 4.873
Table 2: CPU time in Secs for DCT Calculation (Windows Environment)-gray level images
International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 –
6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 1, January- June (2012), © IAEME
105
S. No Fig No MAT LAB Conventional
Approach
Wino
grad’s
Speed Up of
Winograd’s
compared to
MATLAB
Speed Up of
Winograd’s compared
to Conventional
Approach
1 2(a) 0.0939 0.051 0.0105 8.942 4.857
2 2(b) 0.0939 0.0512 0.0105 8.942 4.876
3 2(c) 0.0939 0.0512 0.0105 8.942 4.876
4 2(d) 0.3273 0.2035 0.0406 8.061 5.012
5 2(e) 0.3280 0.2049 0.0407 8.058 5.034
6 2(f) 0.3282 0.205 0.042 7.814 4.880
7 2(g) 1.3124 0.8124 0.1672 7.849 4.858
8 2(h) 1.3125 0.8124 0.1673 7.845 4.855
9 2(i) 1.3125 0.8126 0.1673 7.845 4.857
10 2(j) 5.2521 3.3 0.68 7.723 4.852
11 2(k) 5.2560 3.301 0.68 7.729 4.854
12 2(l) 5.2500 3.2846 0.672 7.8125 4.887
13 2(m) 21.842 12.3595 2.7037 8.078 4.571
14 2(n) 21.6093 12.3592 2.7036 7.992 4.571
15 2(0) 21.8439 12.3633 2.7037 8.079 4.572
Table 3: CPU time in Secs for DCT Calculation (Windows Environment)-color images
S. No Fig No MAT LAB Conventional
Approach Wino grad’s
Speed Up of
Winograd’s
compared to
MAT LAB
Speed Up of
Winograd’s compared
to Conventional
Approach
1 1(a) 0.0313 0.014 0.0029 10.793 4.827
2 1(b) 0.0313 0.014 0.0029 10.793 4.827
3 1(c) 0.0312 0.014 0.0029 10.793 4.827
4 1(d) 0.125 0.0573 0.0117 10.683 4.897
5 1(e) 0.1248 0.0568 0.0116 10.758 4.896
6 1(f) 0.1250 0.0575 0.0117 10.683 4.914
7 1(g) 0.4688 0.2267 0.0466 10.060 4.864
8 1(h) 0.4683 0.2262 0.0465 10.070 4.864
9 1(i) 0.4688 0.2268 0.0468 10.017 4.846
10 1(j) 1.8288 0.9142 0.1871 9.774 4.886
11 1(k) 1.8290 0.9153 0.1873 9.765 4.886
12 1(l) 1.8281 0.9139 0.1870 9.775 4.887
13 1(m) 7.1242 3.6676 0.7500 9.498 4.890
14 1(n) 7.1240 3.6515 0.7479 9.525 4.882
15 1(0) 7.1250 3.6774 0.7546 9.442 4.873
Table 4: CPU time in Secs for DCT Calculation (UNIX Environment)-gray level images
International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 –
6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 1, January- June (2012), © IAEME
106
S. No Fig No MAT LAB Conventional
Approach
Wino
grad’s
Speed Up of
Winograd’s
compared to
MATLAB
Speed Up of
Winograd’s compared
to Conventional
Approach
1 2(a) 0.0939 0.0426 0.0087 10.793 4.896
2 2(b) 0.0939 0.0424 0.0087 10.793 4.873
3 2(c) 0.0939 0.0423 0.0086 10.918 4.918
4 2(d) 0.3273 0.1699 0.0339 9.654 5.011
5 2(e) 0.3280 0.1707 0.0349 9.398 4.891
6 2(f) 0.3282 0.1708 0.0350 9.377 4.88
7 2(g) 1.3124 0.6782 0.1382 9.496 4.907
8 2(h) 1.3125 0.6798 0.1396 9.401 4.869
9 2(i) 1.3125 0.6803 0.1412 9.295 4.817
10 2(j) 5.2521 2.7466 0.5618 9.348 4.888
11 2(k) 5.2560 2.7561 0.5620 9.352 4.904
12 2(l) 5.2500 2.7418 0.5610 9.358 4.887
13 2(m) 21.842 11.00 2.2393 9.753 4.912
14 2(n) 21.6093 10.9607 2.1900 9.867 5.004
15 2(0) 21.8439 11.0320 2.2569 9.678 4.888
Table 5: CPU time in Secs for DCT Calculation (UNIX Environment)-color images
S. No Fig No MAT LAB Conventional
Approach
Wino
grad’s
Speed Up of
Winograd’s
compared to
MAT LAB
Speed Up of
Winograd’s compared
to Conventional
Approach
1 1(a) 0.0313 0.0203 0.0031 10.09 6.54
2 1(b) 0.0313 0.0202 0.0032 9.78 6.31
3 1(c) 0.0313 0.0204 0.0032 9.78 6.37
4 1(d) 0.1406 0.0837 0.0129 10.89 6.48
5 1(e) 0.1407 0.0838 0.0129 10.90 6.48
6 1(f) 0.1406 0.0837 0.0128 10.98 6.53
7 1(g) 0.5314 0.3278 0.049 10.84 6.82
8 1(h) 0.5469 0.3279 0.050 10.93 6.55
9 1(i) 0.5313 0.3276 0.049 10.84 6.68
10 1(j) 2.1716 1.3184 0.2076 10.46 6.35
11 1(k) 2.1710 1.3183 0.2076 10.45 6.35
12 1(l) 2.1719 1.3185 0.2078 10.45 6.34
13 1(m) 8.6406 5.24 0.8292 10.42 6.31
14 1(n) 8.6094 5.241 0.8292 10.38 6.32
15 1(0) 8.5 5.239 0.8291 10.25 6.31
Table 6: CPU time in Seconds for IDCT Calculation (Windows Environment)-gray level images
International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 –
6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 1, January- June (2012), © IAEME
107
S. No Fig No MAT LAB Conventional
Approach Wino grad’s
Speed Up of
Winograd’s
compared to
MAT LAB
Speed Up of
Winograd’s compared
to Conventional
Approach
1 2(a) 0.0938 0.0608 0.0092 10.19 6.60
2 2(b) 0.0938 0.0609 0.0091 10.30 6.69
3 2(c) 0.0938 0.0608 0.0092 10.19 6.60
4 2(d) 0.4218 0.2511 0.0388 10.87 6.47
5 2(e) 0.4216 0.251 0.0387 10.89 6.48
6 2(f) 0.424 0.253 0.0388 10.92 6.52
7 2(g) 1.5942 0.9833 0.147 10.84 6.68
8 2(h) 1.5944 0.9835 0.148 10.77 6.64
9 2(i) 1.5939 0.9857 0.148 10.76 6.66
10 2(j) 6.6578 3.9552 0.6227 10.69 6.35
11 2(k) 6.6589 3.9573 0.6229 10.69 6.35
12 2(l) 6.6564 3.9570 0.6226 10.69 6.35
13 2(m) 25.926 15.720 2.4876 10.42 6.31
14 2(n) 25.7641 15.692 2.4869 10.35 6.30
15 2(0) 25.7814 15.696 2.4871 10.36 6.31
Table 7: CPU time in Secs for IDCT Calculation (Windows Environment)-color images
Figure 4: CPU time in Secs for DCT Calculation
(Windows Environment)-color images.
Table 8: CPU time in Secs for IDCT Calculation (UNIX Environment)-gray level images
S.
No
Fig
No
MAT
LAB
Conventional
Approach
Wino
grad’s
Speed Up of
Winograd’s
compared to
MAT LAB
Speed Up of
Winograd’s
compared to
Conventional
Approach
1 1(a) 0.0313 0.0177 0.0028 11.17 6.32
2 1(b) 0.0313 0.0180 0.0028 11.17 6.42
3 1(c) 0.0313 0.0179 0.0028 11.17 6.35
4 1(d) 0.1406 0.0735 0.0114 12.33 6.44
5 1(e) 0.1407 0.0736 0.0115 12.23 6.40
6 1(f) 0.1406 0.0735 0.0115 12.22 6.39
7 1(g) 0.5314 0.2852 0.041 12.96 6.95
8 1(h) 0.5469 0.2856 0.044 12.42 6.94
9 1(i) 0.5313 0.2851 0.041 12.95 6.95
10 1(j) 2.1716 1.1466 0.1807 12.02 6.34
11 1(k) 2.1710 1.1462 0.1806 12.02 6.34
12 1(l) 2.1719 1.1467 0.1809 12.00 6.33
13 1(m) 8.6406 4.5770 0.7211 11.98 6.34
14 1(n) 8.6094 4.5640 0.7210 11.94 6.33
15 1(0) 8.5 4.5636 0.7210 11.78 6.32
International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 –
6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 1, January- June (2012), © IAEME
108
S.
No
Fig
No
MAT
LAB
Conventional
Approach
Wino
grad’s
Speed Up
of
Winograd’s
compared to
MAT LAB
Speed Up
of
Winograd’s
compared to
Conventiona
l Approach
1 2(a) 0.0938 0.5320 0.0084 11.16 6.33
2 2(b) 0.0938 0.5310 0.0083 11.30 6.39
3 2(c) 0.0938 0.5310 0.0083 11.30 6.39
4 2(d) 0.4218 0.2206 0.0343 12.29 6.43
5 2(e) 0.4216 0.2205 0.0342 12.32 6.44
6 2(f) 0.424 0.2205 0.0342 12.39 6.44
7 2(g) 1.5942 0.8557 0.124 12.85 6.90
8 2(h) 1.5944 0.8558 0.124 12.45 6.90
9 2(i) 1.5939 0.8560 0.132 12.95 6.95
10 2(j) 6.6578 3.4397 0.5422 12.27 6.34
11 2(k) 6.6589 3.4399 0.5421 12.28 6.34
12 2(l) 6.6564 3.4392 0.5320 12.59 6.46
13 2(m) 25.926 13.692 2.163 11.98 6.33
14 2(n) 25.7641 13.679 2.162 11.91 6.32
15 2(0) 25.7814 13.679 2.163 11.91 6.32
Table 9: CPU time in Secs for IDCT Calculation (UNIX Environment)-color images
Figure 3: CPU time in Secs for DCT Calculation (Windows
Environment)-gray level images
0
2
4
6
8
12
8x1
28
25
6x2
56
51
2x5
12
10
24
x10
24
20
48
x20
48
Matlab
Conventional
Winograd
Figure 5: CPU time in Secs for DCT Calculation (UNIX
Environment)-gray level images
Figure 6: CPU time in Secs for DCT Calculation (UNIX
environment)-color images
Figure 7: CPU time in Secs for IDCT Calculation (Windows Environment)-gray level images
International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 –
6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 1, January- June (2012), © IAEME
109
Figure 8: CPU time in Secs for IDCT Calculation (Windows Environment)-Color images
Figure 9: CPU time in Secs for IDCT Calculation (UNIX Environment)-gray level images
Figure 10: CPU time in Secs for IDCT Calculation (UNIX Environment)-color images
Our Winograd’s based DCT algorithm is consistently taking less CPU time than
conventional algorithm and MATLAB function. Also, the CPU time for our IDCT
algorithm is very less compared to MATLAB function and conventional approach.
Table 2 & 3 displays CPU time for DCT in windows XP with gray and color images.
Our algorithm is consistently giving better results than MATLAB routines and
conventional algorithm. We are getting a speed up of more than 8 when compared to
MATLAB and more than 4 when compared to conventional algorithm.
Table 4 & 5 displays CPU time for DCT in UNIX with gray and color images. Our
algorithm is consistently giving better results than MATLAB routings and
conventional algorithm. We are getting a speed up of more than 9 when compared to
MATLAB and more than 4 when compared to conventional algorithm.
Table 6 & 7 displays CPU time for DCT in windows XP with gray and color images.
Our algorithm is consistently giving better results than MATLAB routings and
International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 –
6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 1, January- June (2012), © IAEME
110
conventional algorithm. We are getting a speed up of more than 10 when compared
to MATLAB and more than 6 when compared to conventional algorithm.
Table 8 & 9 displays CPU time for DCT in windows XP with gray and color images.
Our algorithm is consistently giving better results than MATLAB routings and
conventional algorithm. We are getting a speed up of more than 11 when compared
to MATLAB and more than 6 when compared to conventional algorithm.
The CPU time for DCT and IDCT calculations for the color images is around 3 times
for the corresponding size gray level image.
The Speedup of CPU time for DCT and IDCT calculations in UNIX environment as
compared to CPU time for DCT and IDCT calculations in Windows environment is
around 15%.
V. CONCLUSIONS In this paper, Winograd based fast DCT and IDCT algorithms were proposed. We
have compared our DCT and IDCT algorithms with conventional, MATLAB
counterparts. From our experiments it is evident that our Winograd’s based DCT and
IDCT algorithms is the most preferred algorithms as they consume very less CPU
time compared to conventional implementation and MATLAB. Our approach can be
employed to compress video sequences also.
REFERENCES [1]N.BVenkateswarlu and P.S.V.S.K.Raju “Winograd’s method:A perspective for some
pattern recognition problems” 105-109,Vol 15 ,No2,1994 Pattern Recognition
Letters.
[2]N.BVenkateswarlu and P.S.V.S.K.Raju “Winograd’s Inequality:A perspective for
some PR problems”, Pattern Recognition Letters 1991.
[3]R.C.Gonzalez and R.E.Woods “Digital Image Processing”,2nd Edition Addison
Wesley,USA ISBN:0-201-60078,1993
[4]The USC–SIPI image database (http://sipi.usc.edu/database).Signal and image
processing institute Ming Hgieh Department of Electrical Engineering..
[5]Rudra Pratap “Getting started with Matlab”:A Quick Introduction for Scientist and
Engineer” version 6 Oxford university press 2003.
[6]Andrew B.Watson “Image Compression using the discrete cosine transform”
Mathematica Journal 4(1),1994, p-81-88
[7]D.L.Lee and M.A.Aboelaze “Linear speedup of Winograd’s matrix multiplication
algorithm using an array processor”. Distributed memory computing conference,1991
proceedings of IEEE,pages(427-430).
[8]R.P.Bent “Algorithms for matrix multiplication”,Technical report TR-CS-70-
157,DCS,Stanford University (March 1970).
[9]Boyko Kakaradov “Ultra-fast Matrix Multiplication:An Empirical Analysis of
Highly optimized vector Algorithms“, Stanford under graduate Research journal
2004
[10]Ken cabeen and peter gent,”Image compression and the discrete cosine
transform”,Math 45 college of redwoods