implementation of 8-point approximate dct for image ... · for instance, considering 8x8 image...
TRANSCRIPT
© 2016, IJARCSSE All Rights Reserved Page | 441
Volume 6, Issue 9, September 2016 ISSN: 2277 128X
International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com
Implementation of 8-Point Approximate DCT for Image
Compression D.Vaithiyanathan
*, Dheepansundaravelu.P
Department of Electronics and Communication Engineering. College of Engineering Guindy, Anna University, Chennai,
Tamil Nadu, India
Abstract— The Discrete Cosine Transform is the basic operation used by video and image compression standard,
such as JPEG, MPEG1-2-4, H.261-2.63 and others. In this paper 8-point approximate DCT compression scheme is
implemented. The lower computational complexity is achieved by eliminates the need for multiplication and shifting
operation and can be implemented only through addition operation itself. This approximate DCT method outweighs
other approximate DCT methods by a significant increase in the Peak Signal to Noise ratio (PSNR).
Keywords— Discrete Cosine Transform, DCT approximation, Low Complexity, Image Compression, Hardware
Implementation, Pipelining, Field Programmable Gate Arrays (FPGAs)
I. INTRODUCTION
Multimedia data processing, which encompasses almost every aspects of our daily life such as communication broad
casting, data search, advertisement, video games, etc., has become an integral part of our life style. The most significant
part of multimedia systems is application involving image or video, which require computationally intensive data
processing. Moreover, as the use of mobile device increases exponentially, there is a growing demand for multimedia
application to run on these portable devices. In order to reduce the volume of multimedia data over wireless channel
compression techniques are widely used. Efficiency of a transformation scheme can be directly gauged by its ability to
pack input data into as few coefficients as possible. This allows the quantizer to discard coefficients with relatively small
amplitudes without introducing visual distortion in the reconstructed image.
In image compression, the image data is divided up into 8x8 blocks of pixels. From this point on, each colour
component is processed independently, so a "pixel" means a single value, even in a colour image. A DCT is applied to
each 8x8 block. DCT converts the spatial image representation into a frequency map: the low-order or "DC" term
represents the average value in the block, while successive higher-order "AC' terms represent the strength of more and
more rapid changes across the width or height of the block. The highest AC term represents the strength of a cosine wave
alternating from maximum to minimum at adjacent pixels.
There are several efficient DCT algorithms being developed and a noticeable literature is available. Although fast
algorithms can significantly reduce the computational complexity of computing the DCT, floating-point operations are
still required. Despite their accuracy, floating-point operations are expensive in terms of circuitry complexity and power
consumption. Therefore, minimizing the number of floating-point operations is a sought property in a fast algorithm. One
way of circumventing this issue is by means of approximate transforms. The approximate DCT methods under
consideration are (i) the transform in [1]; (ii) the 2008 Bouguezel-Ahmad-Swamy (BAS) DCT approximation [2] (iii) the
parametric transform for image compression [3] (iv) the Cintra-Bayer (CB) approximate DCT based on the rounding-off
function [4] (v) the modified CB approximate DCT [5].
II. APPROXIMATE DCT TRANSFORM
The reduction in computational complexity comes with the cost of PSNR. One way to guarantee good candidates is
to restrict the search to matrices whose entries do not require multiplication operations. Another important parameter is
the number of retained coefficients. Hence, this paper adopted the number of retained coefficients equal to 10 as
suggested by Bouguezel et al., All the methods here consist of a transformation matrix that can be put in the following
format:
[Diagonal matrix] x [Low complexity matrix]
The diagonal matrix usually contains irrational numbers in the form 1/√m where m is a small positive integer.
Therefore, in this case, the complexity of the approximation is bounded by the complexity of the low-complexity matrix
[1]. Since the entries of the low complexity matrix comprise only powers of two in {0, ±1/2, ±1, ±2}, null multiplicative
complexity is achieved [1]. Many reported papers [1-12] derive a novel low-complexity approximate DCT. For such end,
reported works proposed a search over the 8x8 matrix space in order to find candidate matrices that possess low
Vaithiyanathan et al., International Journal of Advanced Research in Computer Science and Software Engg. 6(9),
September - 2016, pp. 441-448
© 2016, IJARCSSE All Rights Reserved Page | 442
computation cost. Let us define the cost of a transformation matrix as the number of arithmetic operations required for its
computation. One way to guarantee good candidates is to restrict the search to matrices whose entries do not require
multiplication operations.
An important parameter in the image compression routine is the number of retained coefficients in the transform
domain. In several applications, the number of retained coefficients is very low. For instance, considering 8x8 image
blocks,
1. In image compression using support vector machine, only the first 8-16 coefficients were considered.
2. Mandyam et al. proposed a method for image reconstruction based on only three coefficients.
3. Bouguezel et al. employed only 10 DCT coefficients when assessing image compression methods.
Hence the DCT approximated, low computational complexity transform matrix is given by [1]:
C* = D*. T*
= D*
…..(1)
The Diagonal matrix is given by,
D* = diag [1/ , 1/ , 1/2, 1/ , 1/ , 1/ , 1/2, 1/ ]
The above transform matrix requires addition operations only. Hence the need of multiplications and shift operations
are completely eliminated, thereby reducing arithmetic complexity. The given transform is for 8X8 structure and the
same can be replicated for 16X16, 32X32 and so on. On comparing with other approximate DCT model like CB-2011,
BAS-2011, the above transform provides almost same peak signal to noise ratio but considerable reduction in arithmetic
complexity need 14 additions only is given in equation 1 [1]. The comparison of arithmetic computational complexity
with other approximate methods is given in Table I.
Table I. Comparison of Computational Complexity
Transform Add. Multi. Shifts Total
BAS-2008 [2] 21 0 0 21
BAS-2011 [3] 16 0 0 16
CB-2011 [4] 22 0 0 22
CB-2012 [5] 14 0 0 14
Transform in [1] 14 0 0 14
Transform in [10] 14 0 0 14
III. IMPLEMENTATION OF DCT PROCESS
For quality analysis, images were submitted to a JPEG-like technique for image compression. The resulting
compressed images are then assessed for image degradation in comparison to the original input image. Thus, 2D versions
of the discussed methods are required. An 8x8 image block has its 2D transform mathematically expressed by [1]:
T . A . TT, …(2)
Where T is a considered transformation. The input images were divided into 8x8 sub-blocks, which were submitted
to the 2D transforms. For each block, this computation furnished 64 coefficients in the approximate transform domain for
a particular transformation. According to the standard zigzag sequence, only the initial coefficients in each block were
retained and employed for image reconstruction. This range of corresponds to high compression. All the remaining
coefficients were set to zero. The inverse procedure was then applied to reconstruct the original image [1-12].
A. Overall process of the DCT implementation
The above said process of compression and reconstruction of the image using DCT approximate transform is shown
in Fig 1 & Fig 2. Here the Input image for e.g. Lena image of size 256x256 is taken. This image is divided into 8x8
image sub blocks. Each sub block is then DCT approximated by low complexity DCT approximate transform matrix
given in equation 1 [1].
Vaithiyanathan et al., International Journal of Advanced Research in Computer Science and Software Engg. 6(9),
September - 2016, pp. 441-448
© 2016, IJARCSSE All Rights Reserved Page | 443
Fig. 1. Image compression process Flow
Fig. 2. Image Reconstruction process Flow
After DCT transform, 64 DCT coefficients were obtained. These 64 DCT coefficients are quantized using standard
JPEG quantization matrix as given in equation (3). The standard quantization matrix is preferred, because it provides
better quality with balanced compression. These quantized coefficients are then trimmed out using zigzag sequence,
keeping only first 10 DCT coefficients as proposed by Bouguezel et al.
Q = … (3)
These trimmed quantized DCT coefficients (10 no‟s) are used for reconstruction of the image. De-quantization uses
the same standard quantization matrix of equation 3. Then the inverse DCT is performed using the same low complexity
DCT transform matrix. All the 8X8 sub blocks of image are then merged to get the reconstructed image.
B. Verilog Implementation Process flow:
The basic flow of the verilog implementation is shown in Fig.3. Here the input gray scale image of size 256x256 is
converted .coe file format using MATLAB functions and loaded into block RAM of hardware core using Verilog HDL.
The block RAM is being read by Read/Write controller to get loaded into the internal array of size 256x256. In order to
process the image by taking 8x8 blocks at a time, an 8x8 block read controller is used. This block read controller reads
8x8 blocks present in the internal array one by one, until all the 1024 blocks are read [13-18].
The output from 8x8 block read converter is given to block to parallel input converter. This module takes the 8x8
block and converts them into 64 parallel inputs. While converting them to parallel manner, the inputs of format 0 to 255
are converted to -128 to 127 formats. It is because since the DCT can operate only in the range of -128 to 127. These 64
parallel inputs are then sent to DCT core at a shot, so that it can process the image.
The 64 line outputs from DCT are given to IDCT core, which gives the resultant reconstructed image for size 8x8.
The 64 parallel outputs coming from IDCT core are converted to 8x8 block format using the parallel to block converter.
By converting to block format, it also converts them to 0 to 255 format back. Then the 8x8 block write controller
receives the input from parallel to block converter and writes them onto the internal array. After the successful
processing of all 1024 8x8 block, the internal array matrix is being written on to the Block RAM by the Read/write
controller.
IV. DCT & IDCT CORE ARCHITECTURE
The DCT and IDCT core being used in the Compression and Reconstruction process are designed for an input size
of 64 i.e. 8x8 block. It is because; the DCT approximate transform matrix is of size 8x8 only.
Vaithiyanathan et al., International Journal of Advanced Research in Computer Science and Software Engg. 6(9),
September - 2016, pp. 441-448
© 2016, IJARCSSE All Rights Reserved Page | 444
A. Discrete Cosine Transform (DCT) CORE architecture
The architecture of DCT core is shown in Fig. 4. The DCT core receives an input of 64 parallel lines, each being 13
bit of -128 to 127 format (13th bit being sign bit). The 2D DCT implemented here is of row-column transformation
model. Initially the data is processed Row wise. After row transformation, it is being given to transpose buffer, which
converts the row elements to column elements. Again it is being operated by column transformation, to produce the DCT
coefficients. For both row and column transformation, same approximate DCT transform as given in equation 2 is being
used, as the transform matrix is a orthogonal matrix.
Fig.3. Verilog Implementation Process flow
Fig.4. DCT core Architecture
Vaithiyanathan et al., International Journal of Advanced Research in Computer Science and Software Engg. 6(9),
September - 2016, pp. 441-448
© 2016, IJARCSSE All Rights Reserved Page | 445
Fig. 5 IDCT core Architecture
After DCT transform, quantization of the DCT coefficients are done using the standard quantization matrix as given
in equation 3. Quantization is achieved by dividing each element in the transformed image matrix D by the
corresponding element in the quantization matrix and then rounding to the nearest integer value. The quantization is
mathematically expressed as below
Ci,j = round (Di,j / Q i,j) ... (4)
Where D is the DCT coefficients of the transformed image, Q is the quantization matrix.
These quantized DCT coefficients are then trimmed out, which is keeping only few DCT coefficients and making the
remaining as zeroes. This trimming procedure is done using the zigzag procedure. In our case, we kept only first 10 DCT
coefficients.
B. Inverse Discrete Cosine Transform (IDCT) CORE Architecture
The architecture of IDCT core is shown in Fig. 5. The input to the IDCT core is the Trimmed Quantized 64 DCT
coefficients from DCT core module. Initially De-quantization is done using the same quantization matrix. This de-
quantisation is otherwise called as Decompression. The De-quantization is mathematically expressed as below:
Ri,j = Qi,j X Ci,j .... (5)
The 2D Inverse DCT is performed similar to DCT using Row-column transform model.
V. MATLAB RESULTS AND COMPARISONS
The compression performance is evaluated for gray scale images that can be grouped into three image types.
Cameraman and Lena are low frequency (LF) images, Barbara and Parrots are medium frequency (MF) and Mandrill and
Satellite are high frequency (HF) images. The fast DCT transform [1] has been implemented in Matlab and the
performance parameters such as Peak Signal to Noise Ratio (PSNR) is determined.
Fig.6 Results of Lena Images after retaining various no. of DCT coefficients
Vaithiyanathan et al., International Journal of Advanced Research in Computer Science and Software Engg. 6(9),
September - 2016, pp. 441-448
© 2016, IJARCSSE All Rights Reserved Page | 446
In this section, the devised transforms were applied to achieve compression for all the three type (LF, MF and HF) of
benchmark images obtained from a standard public image bank was considered. These are 256x256 8 bit per pixel (8bpp)
images in bitmap (BMP) format.
Table II. PSNR values for the various no of DCT coefficients being retained
Fig. 7. Plot Showing The Variation Of PSNR With No.
Of DCT Coefficients Retained
Table III PSNR values for the various DCT
approximation methods operated upon various images
Transform Lena Mandrill Cameraman
Transform in
[1] 24.6016 20.8263 21.6314
CB-2012 23.8056 20.6442 21.2611
CB-2011 26.8345 21.2786 23.0571
BAS 2011(a=0) 26.349 21.2051 22.7376
BAS 2008 26.6948 21.2817 22.9872
A. Image Results for Various levels of DCT Coefficients
Compression of images is taken in the aspect of number of DCT coefficients retained for the reconstruction of image.
Here Lena image of size 256x256 (bmp image) is compressed by retaining various degrees of DCT coefficients, which
were shown in Fig 6. It is found that higher the number of DCT coefficients retained, higher is the quality of the
decompressed image. The comparative results of the PSNR values for various standard test images with varying number
of DCT coefficients retained is given in Table II. The plot in Fig 7 shows the variation of PSNR with the no of DCT
coefficients being retained.
Fig. 8. Showing Various DCT approximation methods operated over Cameraman and Lena Image with only 10 DCT
coefficients retained
B. Comparison of other DCT approximation methods
The standard test images are compressed, decompressed and reconstructed using only 10 DCT coefficients. The
reconstructed images using the fast DCT approximate method [1] and existing are illustrated in Fig. 8 and the PSNR
Image No. of Retained DCT coefficients
2 4 8 12 16 20 24 32
Lena 21.3883 21.7113 23.878 24.8005 25.4371 25.8079 25.9145 27.8466
Mandrill 19.3425 19.7101 20.3928 21.0034 21.2657 21.5615 21.8873 22.1853
Cameraman 19.0805 19.441 21.0208 21.8392 22.6134 23.0769 23.333 25.4581
woman 21.6987 22.222 23.556 24.7004 25.4849 25.9076 26.0328 27.7164
(a) BAS -2008 (b) BAS-2011 (c) CB-2011 (d) CB-2012 (e) Transform in [1]
(PSNR=22.9872) (PSNR=22.7376) (PSNR=23.0571) (PSNR=21.2611) (PSNR=21.6314)
(a) BAS -2008 (b) BAS-2011 (c) CB-2011 (d) CB-2012 (e) Transform in [1]
(PSNR=26.6949) (PSNR=26.349) (PSNR=26.8345) (PSNR=23.8056) (PSNR=24.6016)
Vaithiyanathan et al., International Journal of Advanced Research in Computer Science and Software Engg. 6(9),
September - 2016, pp. 441-448
© 2016, IJARCSSE All Rights Reserved Page | 447
comparisons are presented in Table III. The approximate transform of our work is positioned in the SDCT family by
means of PSNR. From the Fig. 8 it clearly shows that the transform in [1] has better PSNR than BAS-2011, CB-2011 and
CB-2012 for almost all the compression ratios and when compared to the existing CB-2012. Hence from the results, we
are able to prove that the transform of our work has better PSNR values on comparing with the CB-2012, which also has
the computational complexity of 14 additions only and it also outperforms the existing transform by a 0.796dB
improvement in the PSNR value.
VI. VERILOG SIMULATION RESULTS AND ANALYSIS
The entire process of compression and reconstruction of image has been implemented using Verilog HDL on Xilinx
ISE simulator. The simulation results are shown in Fig. 9, where after pipeline is completely loaded, we can get output
for every clock cycle.
Fig. 9. Verilog Simulation output results using Xilinx ISE simulator
VII. CONCLUSIONS
In this work, we have worked upon 8X8 transformation matrix, which requires only 14 additions, thus avoiding the
need for multiplication and shift operations. The hardware implementation of approximate DCT for image compression
is a simple, efficient architecture having lower computation complexity and has closed or better performance than the
existing transforms.
REFERENCES
[1] Uma Sadhvi Potluri, Arjuna Madanayake, Renato, J. Cintra, FábioM. Bayer, Sunera Kulasekera, and Amila
Edirisuriya, “Improved 8-Point Approximate DCT for Image and Video Compression Requiring Only 14
Additions”, IEEE Transactions On Circuits And Systems, Vol. 61, No. 6, June 2014
[2] S. Bouguezel, M. O. Ahmad, and M. N. S. Swamy, “Low-complexity 8x8 transform for image compression,”
Electron. Lett., vol. 44, no.21, pp. 1249–1250, 2008.
[3] S. Bouguezel,M. O. Ahmad, and M. N. S. Swamy, “A low-complexity parametric transform for image
compression,” in Proc. ISCAS, May 2011, pp. 2145–2148.
[4] R. J. Cintra and F. M. Bayer, “A DCT approximation for image compression,” IEEE Signal Processing Lett.,
vol. 18, no. 10, pp. 579–582, Oct. 2011.
[5] F. M. Bayer and R. J. Cintra, “DCT-like transform for image compression requires 14 additions only,” Electron.
Lett., vol. 48, no. 15, pp. 919–921, 2012.
[6] S. Bouguezel, M. O. Ahmad, and M. N. S. Swamy, “A multiplicationfree transform for image compression,” in
Proc. 2nd Int. Conf. Signals,Circuits Systems, Nov. 2008, pp. 1–4.
[7] S. Bouguezel,M. O. Ahmad, and M. N. S. Swamy, “A novel transform for image compression,” in Proc. 53rd
IEEE Int. Midwest Symp. Circuits Systems (MWSCAS), Aug. 2010, pp. 509–512.
[8] Edirisuriya, A. Madanayake, R. J. Cintra, and F.M. Bayer “A multiplication- free digital architecture for 16 16
2-D DCT/DST transform for HEVC,” in Proc. IEEE 27th Conv. Electr. Electr. Eng. Israel, 2012, Nov. 2012, pp.
1–5.
[9] S. Bouguezel, M. O. Ahmad, and M. N. S. Swamy, “A multiplicationfree transform for image compression,” in
Proc. 2nd Int. Conf. Signals,Circuits Systems, Nov. 2008, pp. 1–4.
[10] D.Vaithiyanathan, R. Seshasayanan, S. Anith and K. Kunaraj, “A low-complexity DCT approximation for
image compression with 14 additions only,” In Proc. of Inter. Conf. on Green Computing, Communication and
Conservation of Energy (ICGCE), Chennai, India, pp. 303 – 307, Dec. 2013
[11] D. Vaithiyanathan and R. Seshasayanan, “Low power DCT architecture for image compression,” In Proc. of
International conference on Advanced Computing and Communication Systems, Coimbatore, India, pp. 1-6,
Dec. 19-21, 2013
[12] D.Vaithiyanathan, K.Kunaraj, S.Anith & R.Seshasayanan, „Multiplierless 8-Point DCT Architecture for Fast
Image Compression‟, International Journal of Applied Engineering Research, vol. 9, no. 20, 2014, pp.4533-4538
[13] Vijaya Prakash A.M., K.S. Gurumurthy “A Novel VLSI Architecture for Image compression Model using low
power Discrete Cosine Transform” in International Scholarly and Scientific Research & Innovation 4(12) 2010,
Vol:4 2010-12-27
Vaithiyanathan et al., International Journal of Advanced Research in Computer Science and Software Engg. 6(9),
September - 2016, pp. 441-448
© 2016, IJARCSSE All Rights Reserved Page | 448
[14] Venkata Pavan Kumar, Ch. Venkateswarlu “Design of Low power 2-D DCT architecture using Reconfigurable
Architecture” in IOSR Journal of Electronics and Communication Engineering (IOSRJECE) ISSN : 2278-2834,
ISBN No : 2278-8735 Volume 3, Issue 1 (Sep-Oct 2012), PP 20-25
[15] T. Pradeepthi, Addanki Purna Ramesh “Pipelined Architecture of 2D DCT, Quantization and zig zag processor
for JPEG Image compression using VHDL” in International Journal of VLSI design & Communication Systems
(VLSICS) Vol.2, No.3, September 2011
[16] Papiya Chakraborty “A survey analysis for Lossy image compression using Discrete Cosine Transform” in
International Journal of Scientific & Engineering Research Volume 3, Issue 9, September-2012.
[17] Vikram Angala, Harika. T, “Low power DCT architecture for Image/video coders” in IPASJ International
Journal of Electronics & Communication (IIJEC), Volume 2, Issue 10, October 2014.
[18] B. Raghu Kanth, S R Sastry Kalavakolanu, M. Aravind Kumar, D.N. Bhushan Babu, “JPEG image compression
using Verilog” in International Journal Of Engineering Science & Advanced Technology [IJESAT], Volume-2,
Issue-5, 1410 – 1415.