implementation of 8-point approximate dct for image ... · for instance, considering 8x8 image...

© 2016, IJARCSSE All Rights Reserved Page | 441

Volume 6, Issue 9, September 2016 ISSN: 2277 128X

International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com

Implementation of 8-Point Approximate DCT for Image

Compression D.Vaithiyanathan

*, Dheepansundaravelu.P

Department of Electronics and Communication Engineering. College of Engineering Guindy, Anna University, Chennai,

Tamil Nadu, India

Abstract— The Discrete Cosine Transform is the basic operation used by video and image compression standard,

such as JPEG, MPEG1-2-4, H.261-2.63 and others. In this paper 8-point approximate DCT compression scheme is

implemented. The lower computational complexity is achieved by eliminates the need for multiplication and shifting

operation and can be implemented only through addition operation itself. This approximate DCT method outweighs

other approximate DCT methods by a significant increase in the Peak Signal to Noise ratio (PSNR).

Keywords— Discrete Cosine Transform, DCT approximation, Low Complexity, Image Compression, Hardware

Implementation, Pipelining, Field Programmable Gate Arrays (FPGAs)

I. INTRODUCTION

Multimedia data processing, which encompasses almost every aspects of our daily life such as communication broad

casting, data search, advertisement, video games, etc., has become an integral part of our life style. The most significant

part of multimedia systems is application involving image or video, which require computationally intensive data

processing. Moreover, as the use of mobile device increases exponentially, there is a growing demand for multimedia

application to run on these portable devices. In order to reduce the volume of multimedia data over wireless channel

compression techniques are widely used. Efficiency of a transformation scheme can be directly gauged by its ability to

pack input data into as few coefficients as possible. This allows the quantizer to discard coefficients with relatively small

amplitudes without introducing visual distortion in the reconstructed image.

In image compression, the image data is divided up into 8x8 blocks of pixels. From this point on, each colour

component is processed independently, so a "pixel" means a single value, even in a colour image. A DCT is applied to

each 8x8 block. DCT converts the spatial image representation into a frequency map: the low-order or "DC" term

represents the average value in the block, while successive higher-order "AC' terms represent the strength of more and

more rapid changes across the width or height of the block. The highest AC term represents the strength of a cosine wave

alternating from maximum to minimum at adjacent pixels.

There are several efficient DCT algorithms being developed and a noticeable literature is available. Although fast

algorithms can significantly reduce the computational complexity of computing the DCT, floating-point operations are

still required. Despite their accuracy, floating-point operations are expensive in terms of circuitry complexity and power

consumption. Therefore, minimizing the number of floating-point operations is a sought property in a fast algorithm. One

way of circumventing this issue is by means of approximate transforms. The approximate DCT methods under

consideration are (i) the transform in [1]; (ii) the 2008 Bouguezel-Ahmad-Swamy (BAS) DCT approximation [2] (iii) the

parametric transform for image compression [3] (iv) the Cintra-Bayer (CB) approximate DCT based on the rounding-off

function [4] (v) the modified CB approximate DCT [5].

II. APPROXIMATE DCT TRANSFORM

The reduction in computational complexity comes with the cost of PSNR. One way to guarantee good candidates is

to restrict the search to matrices whose entries do not require multiplication operations. Another important parameter is

the number of retained coefficients. Hence, this paper adopted the number of retained coefficients equal to 10 as

suggested by Bouguezel et al., All the methods here consist of a transformation matrix that can be put in the following

format:

[Diagonal matrix] x [Low complexity matrix]

The diagonal matrix usually contains irrational numbers in the form 1/√m where m is a small positive integer.

Therefore, in this case, the complexity of the approximation is bounded by the complexity of the low-complexity matrix

[1]. Since the entries of the low complexity matrix comprise only powers of two in {0, ±1/2, ±1, ±2}, null multiplicative

complexity is achieved [1]. Many reported papers [1-12] derive a novel low-complexity approximate DCT. For such end,

reported works proposed a search over the 8x8 matrix space in order to find candidate matrices that possess low

http://www.ijarcsse.com/

Vaithiyanathan et al., International Journal of Advanced Research in Computer Science and Software Engg. 6(9),

September - 2016, pp. 441-448


computation cost. Let us define the cost of a transformation matrix as the number of arithmetic operations required for its

computation. One way to guarantee good candidates is to restrict the search to matrices whose entries do not require

multiplication operations.

An important parameter in the image compression routine is the number of retained coefficients in the transform

domain. In several applications, the number of retained coefficients is very low. For instance, considering 8x8 image

blocks,

1. In image compression using support vector machine, only the first 8-16 coefficients were considered.

2. Mandyam et al. proposed a method for image reconstruction based on only three coefficients.

3. Bouguezel et al. employed only 10 DCT coefficients when assessing image compression methods.

Hence the DCT approximated, low computational complexity transform matrix is given by [1]:

C* = D*. T*

= D*

…..(1)

The Diagonal matrix is given by,

D* = diag [1/ , 1/ , 1/2, 1/ , 1/ , 1/ , 1/2, 1/ ]

The above transform matrix requires addition operations only. Hence the need of multiplications and shift operations

are completely eliminated, thereby reducing arithmetic complexity. The given transform is for 8X8 structure and the

same can be replicated for 16X16, 32X32 and so on. On comparing with other approximate DCT model like CB-2011,

BAS-2011, the above transform provides almost same peak signal to noise ratio but considerable reduction in arithmetic

complexity need 14 additions only is given in equation 1 [1]. The comparison of arithmetic computational complexity

with other approximate methods is given in Table I.

Table I. Comparison of Computational Complexity

Transform Add. Multi. Shifts Total

BAS-2008 [2] 21 0 0 21

BAS-2011 [3] 16 0 0 16

CB-2011 [4] 22 0 0 22

CB-2012 [5] 14 0 0 14

Transform in [1] 14 0 0 14

Transform in [10] 14 0 0 14

III. IMPLEMENTATION OF DCT PROCESS

For quality analysis, images were submitted to a JPEG-like technique for image compression. The resulting

compressed images are then assessed for image degradation in comparison to the original input image. Thus, 2D versions

of the discussed methods are required. An 8x8 image block has its 2D transform mathematically expressed by [1]:

T . A . TT, …(2)

Where T is a considered transformation. The input images were divided into 8x8 sub-blocks, which were submitted

to the 2D transforms. For each block, this computation furnished 64 coefficients in the approximate transform domain for

a particular transformation. According to the standard zigzag sequence, only the initial coefficients in each block were

retained and employed for image reconstruction. This range of corresponds to high compression. All the remaining

coefficients were set to zero. The inverse procedure was then applied to reconstruct the original image [1-12].

A. Overall process of the DCT implementation

The above said process of compression and reconstruction of the image using DCT approximate transform is shown

in Fig 1 & Fig 2. Here the Input image for e.g. Lena image of size 256x256 is taken. This image is divided into 8x8

image sub blocks. Each sub block is then DCT approximated by low complexity DCT approximate transform matrix

given in equation 1 [1].


September - 2016, pp. 441-448


Fig. 1. Image compression process Flow

Fig. 2. Image Reconstruction process Flow

After DCT transform, 64 DCT coefficients were obtained. These 64 DCT coefficients are quantized using standard

JPEG quantization matrix as given in equation (3). The standard quantization matrix is preferred, because it provides

better quality with balanced compression. These quantized coefficients are then trimmed out using zigzag sequence,

keeping only first 10 DCT coefficients as proposed by Bouguezel et al.

Q = … (3)

These trimmed quantized DCT coefficients (10 no‟s) are used for reconstruction of the image. De-quantization uses

the same standard quantization matrix of equation 3. Then the inverse DCT is performed using the same low complexity

DCT transform matrix. All the 8X8 sub blocks of image are then merged to get the reconstructed image.

B. Verilog Implementation Process flow:

The basic flow of the verilog implementation is shown in Fig.3. Here the input gray scale image of size 256x256 is

converted .coe file format using MATLAB functions and loaded into block RAM of hardware core using Verilog HDL.

The block RAM is being read by Read/Write controller to get loaded into the internal array of size 256x256. In order to

process the image by taking 8x8 blocks at a time, an 8x8 block read controller is used. This block read controller reads

8x8 blocks present in the internal array one by one, until all the 1024 blocks are read [13-18].

The output from 8x8 block read converter is given to block to parallel input converter. This module takes the 8x8

block and converts them into 64 parallel inputs. While converting them to parallel manner, the inputs of format 0 to 255

are converted to -128 to 127 formats. It is because since the DCT can operate only in the range of -128 to 127. These 64

parallel inputs are then sent to DCT core at a shot, so that it can process the image.

The 64 line outputs from DCT are given to IDCT core, which gives the resultant reconstructed image for size 8x8.

The 64 parallel outputs coming from IDCT core are converted to 8x8 block format using the parallel to block converter.

By converting to block format, it also converts them to 0 to 255 format back. Then the 8x8 block write controller

receives the input from parallel to block converter and writes them onto the internal array. After the successful

processing of all 1024 8x8 block, the internal array matrix is being written on to the Block RAM by the Read/write

controller.

IV. DCT & IDCT CORE ARCHITECTURE

The DCT and IDCT core being used in the Compression and Reconstruction process are designed for an input size

of 64 i.e. 8x8 block. It is because; the DCT approximate transform matrix is of size 8x8 only.


September - 2016, pp. 441-448


A. Discrete Cosine Transform (DCT) CORE architecture

The architecture of DCT core is shown in Fig. 4. The DCT core receives an input of 64 parallel lines, each being 13

bit of -128 to 127 format (13th bit being sign bit). The 2D DCT implemented here is of row-column transformation

model. Initially the data is processed Row wise. After row transformation, it is being given to transpose buffer, which

converts the row elements to column elements. Again it is being operated by column transformation, to produce the DCT

coefficients. For both row and column transformation, same approximate DCT transform as given in equation 2 is being

used, as the transform matrix is a orthogonal matrix.

Fig.3. Verilog Implementation Process flow

Fig.4. DCT core Architecture


September - 2016, pp. 441-448


Fig. 5 IDCT core Architecture

After DCT transform, quantization of the DCT coefficients are done using the standard quantization matrix as given

in equation 3. Quantization is achieved by dividing each element in the transformed image matrix D by the

corresponding element in the quantization matrix and then rounding to the nearest integer value. The quantization is

mathematically expressed as below

Ci,j = round (Di,j / Q i,j) ... (4)

Where D is the DCT coefficients of the transformed image, Q is the quantization matrix.

These quantized DCT coefficients are then trimmed out, which is keeping only few DCT coefficients and making the

remaining as zeroes. This trimming procedure is done using the zigzag procedure. In our case, we kept only first 10 DCT

coefficients.

B. Inverse Discrete Cosine Transform (IDCT) CORE Architecture

The architecture of IDCT core is shown in Fig. 5. The input to the IDCT core is the Trimmed Quantized 64 DCT

coefficients from DCT core module. Initially De-quantization is done using the same quantization matrix. This de-

quantisation is otherwise called as Decompression. The De-quantization is mathematically expressed as below:

Ri,j = Qi,j X Ci,j .... (5)

The 2D Inverse DCT is performed similar to DCT using Row-column transform model.

V. MATLAB RESULTS AND COMPARISONS

The compression performance is evaluated for gray scale images that can be grouped into three image types.

Cameraman and Lena are low frequency (LF) images, Barbara and Parrots are medium frequency (MF) and Mandrill and

Satellite are high frequency (HF) images. The fast DCT transform [1] has been implemented in Matlab and the

performance parameters such as Peak Signal to Noise Ratio (PSNR) is determined.

Fig.6 Results of Lena Images after retaining various no. of DCT coefficients


September - 2016, pp. 441-448


In this section, the devised transforms were applied to achieve compression for all the three type (LF, MF and HF) of

benchmark images obtained from a standard public image bank was considered. These are 256x256 8 bit per pixel (8bpp)

images in bitmap (BMP) format.

Table II. PSNR values for the various no of DCT coefficients being retained

Fig. 7. Plot Showing The Variation Of PSNR With No.

Of DCT Coefficients Retained

Table III PSNR values for the various DCT

approximation methods operated upon various images

Transform Lena Mandrill Cameraman

Transform in

[1] 24.6016 20.8263 21.6314

CB-2012 23.8056 20.6442 21.2611

CB-2011 26.8345 21.2786 23.0571

BAS 2011(a=0) 26.349 21.2051 22.7376

BAS 2008 26.6948 21.2817 22.9872

A. Image Results for Various levels of DCT Coefficients

Compression of images is taken in the aspect of number of DCT coefficients retained for the reconstruction of image.

Here Lena image of size 256x256 (bmp image) is compressed by retaining various degrees of DCT coefficients, which

were shown in Fig 6. It is found that higher the number of DCT coefficients retained, higher is the quality of the

decompressed image. The comparative results of the PSNR values for various standard test images with varying number

of DCT coefficients retained is given in Table II. The plot in Fig 7 shows the variation of PSNR with the no of DCT

coefficients being retained.

Fig. 8. Showing Various DCT approximation methods operated over Cameraman and Lena Image with only 10 DCT

coefficients retained

B. Comparison of other DCT approximation methods

The standard test images are compressed, decompressed and reconstructed using only 10 DCT coefficients. The

reconstructed images using the fast DCT approximate method [1] and existing are illustrated in Fig. 8 and the PSNR

Image No. of Retained DCT coefficients

2 4 8 12 16 20 24 32

Lena 21.3883 21.7113 23.878 24.8005 25.4371 25.8079 25.9145 27.8466

Mandrill 19.3425 19.7101 20.3928 21.0034 21.2657 21.5615 21.8873 22.1853

Cameraman 19.0805 19.441 21.0208 21.8392 22.6134 23.0769 23.333 25.4581

woman 21.6987 22.222 23.556 24.7004 25.4849 25.9076 26.0328 27.7164

(a) BAS -2008 (b) BAS-2011 (c) CB-2011 (d) CB-2012 (e) Transform in [1]

(PSNR=22.9872) (PSNR=22.7376) (PSNR=23.0571) (PSNR=21.2611) (PSNR=21.6314)

(a) BAS -2008 (b) BAS-2011 (c) CB-2011 (d) CB-2012 (e) Transform in [1]

(PSNR=26.6949) (PSNR=26.349) (PSNR=26.8345) (PSNR=23.8056) (PSNR=24.6016)


September - 2016, pp. 441-448


comparisons are presented in Table III. The approximate transform of our work is positioned in the SDCT family by

means of PSNR. From the Fig. 8 it clearly shows that the transform in [1] has better PSNR than BAS-2011, CB-2011 and

CB-2012 for almost all the compression ratios and when compared to the existing CB-2012. Hence from the results, we

are able to prove that the transform of our work has better PSNR values on comparing with the CB-2012, which also has

the computational complexity of 14 additions only and it also outperforms the existing transform by a 0.796dB

improvement in the PSNR value.

VI. VERILOG SIMULATION RESULTS AND ANALYSIS

The entire process of compression and reconstruction of image has been implemented using Verilog HDL on Xilinx

ISE simulator. The simulation results are shown in Fig. 9, where after pipeline is completely loaded, we can get output

for every clock cycle.

Fig. 9. Verilog Simulation output results using Xilinx ISE simulator

VII. CONCLUSIONS

In this work, we have worked upon 8X8 transformation matrix, which requires only 14 additions, thus avoiding the

need for multiplication and shift operations. The hardware implementation of approximate DCT for image compression

is a simple, efficient architecture having lower computation complexity and has closed or better performance than the

existing transforms.

REFERENCES

[1] Uma Sadhvi Potluri, Arjuna Madanayake, Renato, J. Cintra, FábioM. Bayer, Sunera Kulasekera, and Amila

Edirisuriya, “Improved 8-Point Approximate DCT for Image and Video Compression Requiring Only 14

Additions”, IEEE Transactions On Circuits And Systems, Vol. 61, No. 6, June 2014

[2] S. Bouguezel, M. O. Ahmad, and M. N. S. Swamy, “Low-complexity 8x8 transform for image compression,”

Electron. Lett., vol. 44, no.21, pp. 1249–1250, 2008.

[3] S. Bouguezel,M. O. Ahmad, and M. N. S. Swamy, “A low-complexity parametric transform for image

compression,” in Proc. ISCAS, May 2011, pp. 2145–2148.

[4] R. J. Cintra and F. M. Bayer, “A DCT approximation for image compression,” IEEE Signal Processing Lett.,

vol. 18, no. 10, pp. 579–582, Oct. 2011.

[5] F. M. Bayer and R. J. Cintra, “DCT-like transform for image compression requires 14 additions only,” Electron.

Lett., vol. 48, no. 15, pp. 919–921, 2012.

[6] S. Bouguezel, M. O. Ahmad, and M. N. S. Swamy, “A multiplicationfree transform for image compression,” in

Proc. 2nd Int. Conf. Signals,Circuits Systems, Nov. 2008, pp. 1–4.

[7] S. Bouguezel,M. O. Ahmad, and M. N. S. Swamy, “A novel transform for image compression,” in Proc. 53rd

IEEE Int. Midwest Symp. Circuits Systems (MWSCAS), Aug. 2010, pp. 509–512.

[8] Edirisuriya, A. Madanayake, R. J. Cintra, and F.M. Bayer “A multiplicationfree digital architecture for 16 16

2-D DCT/DST transform for HEVC,” in Proc. IEEE 27th Conv. Electr. Electr. Eng. Israel, 2012, Nov. 2012, pp.

1–5.

[9] S. Bouguezel, M. O. Ahmad, and M. N. S. Swamy, “A multiplicationfree transform for image compression,” in

Proc. 2nd Int. Conf. Signals,Circuits Systems, Nov. 2008, pp. 1–4.

[10] D.Vaithiyanathan, R. Seshasayanan, S. Anith and K. Kunaraj, “A low-complexity DCT approximation for

image compression with 14 additions only,” In Proc. of Inter. Conf. on Green Computing, Communication and

Conservation of Energy (ICGCE), Chennai, India, pp. 303 – 307, Dec. 2013

[11] D. Vaithiyanathan and R. Seshasayanan, “Low power DCT architecture for image compression,” In Proc. of

International conference on Advanced Computing and Communication Systems, Coimbatore, India, pp. 1-6,

Dec. 19-21, 2013

[12] D.Vaithiyanathan, K.Kunaraj, S.Anith & R.Seshasayanan, „Multiplierless 8-Point DCT Architecture for Fast

Image Compression‟, International Journal of Applied Engineering Research, vol. 9, no. 20, 2014, pp.4533-4538

[13] Vijaya Prakash A.M., K.S. Gurumurthy “A Novel VLSI Architecture for Image compression Model using low

power Discrete Cosine Transform” in International Scholarly and Scientific Research & Innovation 4(12) 2010,

Vol:4 2010-12-27


September - 2016, pp. 441-448


[14] Venkata Pavan Kumar, Ch. Venkateswarlu “Design of Low power 2-D DCT architecture using Reconfigurable

Architecture” in IOSR Journal of Electronics and Communication Engineering (IOSRJECE) ISSN : 2278-2834,

ISBN No : 2278-8735 Volume 3, Issue 1 (Sep-Oct 2012), PP 20-25

[15] T. Pradeepthi, Addanki Purna Ramesh “Pipelined Architecture of 2D DCT, Quantization and zig zag processor

for JPEG Image compression using VHDL” in International Journal of VLSI design & Communication Systems

(VLSICS) Vol.2, No.3, September 2011

[16] Papiya Chakraborty “A survey analysis for Lossy image compression using Discrete Cosine Transform” in

International Journal of Scientific & Engineering Research Volume 3, Issue 9, September-2012.

[17] Vikram Angala, Harika. T, “Low power DCT architecture for Image/video coders” in IPASJ International

Journal of Electronics & Communication (IIJEC), Volume 2, Issue 10, October 2014.

[18] B. Raghu Kanth, S R Sastry Kalavakolanu, M. Aravind Kumar, D.N. Bhushan Babu, “JPEG image compression

using Verilog” in International Journal Of Engineering Science & Advanced Technology [IJESAT], Volume-2,

Issue-5, 1410 – 1415.

implementation of 8-point approximate dct for image ... · for instance, considering 8x8 image...

Documents