image super-resolution based on local...

Image Super-resolution Based on Local Self-similarityNoriaki SUETAKE�, Morihiko SAKANO1, and Eiji UCHINO1

Academic Domain of Natural Science, Graduate School of Science and Engineering,Yamaguchi University, 1677-1 Yoshida, Yamaguchi 753-8512, Japan1Division of Natural Science and Symbiosis, Graduate School of Science and Engineering,Yamaguchi University, 1677-1 Yoshida, Yamaguchi 753-8512, Japan

(Received August 23, 2007; Accepted November 17, 2007)

A new image super-resolution method based on a codebook mapping is proposed. The codebook mapping representsthe internal relationship between low- and high-frequency image components, and is used for the estimation of high-frequency image components lost in the sampling process. In the proposed method, codebooks are first generated by thelow- and high-frequency image components of the original image itself that is to be processed. Then, a resultant super-resolution image, that is, an enlarged image, is obtained by combining the estimated high-frequency image componentswith the low-frequency ones obtained by a typical interpolation-based method. The effectiveness of the proposedmethod is verified by comparing it against some conventional methods. # 2008 The Optical Society of Japan

Key words: super-resolution, image enlargement, codebook, self-similarlity

1. Introduction

The goal of image super-resolution (SR) is to reconstructan image with a higher resolution than its source. VariousSR methods, both multi-frame and single-frame, have beenproposed.1)

Multi-frame SR combines multiple frames of low-reso-lution images of the same scene for the reconstruction of asingle high-resolution image. Many multi-frame SR methodshave been proposed. For example, there are frequencydomain methods,2–4) projection onto convex sets (POCS)methods,5,6) and maximum a posteriori (MAP) methods.7–9)

They are able to obtain good results, however, theircomputational complexity and storage consumption aretremendous. Hence, it is difficult to realize real-timeprocessing.

On the other hand, single-frame SR reconstructs a high-resolution image using only a single source image. Single-frame SR is also known as image scaling, image interpo-lation, image zooming, image enlargement, and so on. Ingeneral, single-frame SR methods are inferior to multi-framemethods in terms of image quality.

Single-frame SR methods are often based on interpolationusing kernels, including nearest neighbor interpolation(NNI), bilinear interpolation (BLI), and cubic convolutioninterpolation (CCI).10,11)

Some of them are used in commercial photo-retouchingsoftware. However, the resolution of the resulting image canhardly be improved, and the step edges and peaks of theimage are not restored clearly, because high-frequencyimage components beyond the Nyquist frequency cannot berestored by simple kernel-based methods. To cope with theseproblems, more sophisticated single-frame SR methods havebeen proposed. For example, there are directional meth-ods,12,13) a multiple kernel-based method,14) orthogonaltransform-based methods,15,16) neural network-based meth-

ods,17–19) and so on. Although they have better performancethan simple kernel-based methods, they are so complicatedto implement. Also, some of them are based on assumptionsthat are not always correct in practical situations, and thusthey may occasionally not produce useful results. As asuperior technique suitable for practical use, a nonlinearextrapolation method20) has been proposed. However, thismethod often produces ringing artifacts which degrade theresulting image quality.

We propose a new single-frame SR method that estimatesthe high-frequency components lost in the sampling processby considering local self-similarity of the image. The high-frequency image components are estimated using codebooksdesigned by pairs of low- and high-frequency components ofthe source image itself, and are combined with low-frequency image components obtained by typical interpola-tion methods to reconstruct the super-resolved image. Theeffectiveness and validity of the proposed method areverified by applying it to standard images.

The rest of the paper is organized as follows. The detailsof the proposed method are explained in §2, §3 describesour experimental results, and finally, conclusions are givenin §4.

2. Proposed Super-resolution Method

In our proposed method, the internal relationship betweenlow- and high-frequency image components is realized by acodebook mapping based on the local self-similarity of animage. This mapping is used to estimate the high-frequencyimage components lost in the sampling process, and forreconstructing a super-resolved image.

The proposed method consists of the following twoprocedures: designing codebooks, and image super-resolu-tion using the codebook mapping.

2.1 Designing codebooksThe design of the codebooks is a crucial part of the

proposed super-resolution algorithm, and is achieved based�E-mail address: [email protected]

OPTICAL REVIEW Vol. 15, No. 1 (2008) 26–30

26

on the local self-similarity of the original source image. Indesigning codebooks, a straightforward decomposition of anoriginal source image f (N � N pixels) into low- and high-frequency components is first performed as shown in Fig. 1.If the matrix L represents a lowpass operator, f can bewritten as

F ¼ Lf þ ðI � LÞ f ¼ fl þ fh; ð1Þ

where I is the identity matrix, and fl and fh are the low- andhigh-frequency components of f , respectively. If f isenlarged to M times its original size, that is NM � NM

pixels, the cutoff frequency of the lowpass filter is assignedto 1=M of the maximum spatial frequency derived from theimage size.

One of the codebooks consists of the low-spatial fre-quency component vectors, and the other consists of thehigh-spatial frequency component vectors. As shown inFig. 2, for generation of the low-spatial frequency codebookCl ¼ fli ¼ ðli1; . . . ; lidÞj i ¼ 1; . . . ; ðN �wþ 1ÞðN � hþ 1Þgand the high-spatial frequency codebook Ch ¼ fhi ¼ðhi1; . . . ; hidÞj i ¼ 1; . . . ; ðN �wþ 1ÞðN � hþ 1Þg, fl and fhare decomposed into w� h pixels per sliding image block,respectively. w and h are the width and height of each imageblock, respectively. The dimensions of both li and hi areequal to the number of pixels in image block, i.e., d ¼w� h.

2.2 Image super-resolution via the codebook mappingTwo main processing steps are involved in the image

super-resolution procedure.The first step is to enlarge the original image f to an

image Fl (NM � NM pixels) using one of the typicalinterpolation-based SR methods such as CCI, as shown inFig. 3. This enlarged image, which does not include high-spatial frequency components beyond the Nyquist frequen-cy, is used as an input image for the image super-resolutionalgorithm employing the codebook mapping.

The second step is the estimation of the high-spatialfrequency components that cannot be restored by the typicalinterpolation methods, and the reconstruction of the super-resolution image. The estimation process is achieved byusing Fl and the codebook mapping as shown in Fig. 4. Fl

is decomposed into w� h pixels per sliding image block,and each d-dimensional input vector ri ¼ ðri1; . . . ; ridÞ isobtained. Then, ri is compared to each li in Cl using amatching criterion

Xdk¼1

wk lik �1

d

Xdl¼1

lil � rik þ1

d

Xdl¼1

ril

!2

; ð2Þ

where wk is a parameter giving weight to the center of theimage block. The h� corresponding to the best matched l� isthen used to locally reconstruct the high-frequency compo-nents of the image. Finally, an enlarged super-resolutionimage F is obtained by adding Fl to the various h�=dobtained by sliding the block.

Lowpassoperator

N

N f f l

f h+

−

Fig. 1. Decomposition of an original image f into low-frequencyimage components fl and high-frequency components fh.

f l

f h

w

hl

Cl

w

h

Ch

h

Fig. 2. Generation of the low-spatial frequency codebook Cl andthe high-spatial frequency codebook Ch.

f

Fl

Interpolation

N

N

NM

NM

Fig. 3. Generation of the enlarged low-resolution image Fl.

l*

Cl Ch

h*

Fl

w

hr i

Fig. 4. Estimation of high-spatial frequency image componentsusing the codebook mapping.

OPTICAL REVIEW Vol. 15, No. 1 (2008) N. SUETAKE et al. 27

3. Experimental Results

We verify the effectiveness and validity of the proposedmethod by applying it to some gray scale images.

In our experiment, the six standard well-known imagesshown in Fig. 5, ‘‘Airplane’’, ‘‘Building’’, ‘‘Cameraman’’,‘‘Lenna’’, ‘‘Lighthouse’’, and ‘‘Text’’ are employed asoriginal images, respectively. Each image consists of 256�256 8-bit grayscale pixels. Parameters w and h are set as 4and 4, respectively. wk is given by Gaussian distributionwith � ¼ 1. For comparison, NNI, BLI, CCI, and thenonlinear extrapolation method (NE) in20) are also em-ployed. NE is a typical method that estimates the high-frequency components of images. The original images havetheir scale reduced to 1/2, that is 128� 128 pixels, afterantialiasing low-pass filtering, and the resulting images areapplied to each SR method as the images to be enlarged. Inthis case, up-sampling scale M is set as 2, that is, the imagesare enlarged to 256� 256 pixels. Table 1 shows PSNR’s ofthe results obtained by some conventional methods and ourproposed method to quantitatively evaluate the super-resolution performance of methods. PSNR is defined by

PSNR ¼ 20 log10255ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

½1=ðN2M2Þ�ðFideal � FÞ2p

!: ð3Þ

From Table 1, the superiority of our proposed method isconfirmed. For example, Fig. 5 shows the super-resolutionresults for ‘‘Cameraman’’. From Fig. 6, it is clear that theproposed method exceeds the traditional interpolationmethods testing, and outperforms NE in terms of imagequality. Figure 7 shows cross sections of the imagesobtained by NE and our proposed method. The ringing thatappears from NE does not occur in the proposed method.Further, Fig. 8 shows the estimation results of the high-frequency image components of the image ‘‘Cameraman’’obtained by NE and our proposed method; as shown, thehigh-frequency image components estimated by the pro-posed methods are close to ideal.

To compare our proposed method with others subjective-

ly, a questionnaire evaluation was made of 20 persons, 15men and 5 women from 21 to 37 years of age. On thequestionnaire, the images produced by the conventionalmethods and the proposed method are displayed simulta-neously, with the position of the images chosen randomly.The testees watch the images, and are requested to answerthe order of their preference. The average rank order isshown in Table 2, which suggests the superiority of ourproposed method over the conventional methods tested. Thiscorrelates with our finding from the PSNR’s shown inTable 1.

Furthermore, Fig. 9 shows the partial super-resolutionresults for ‘‘Cameraman’’ [Fig. 6(a)] obtained by CCI, NE,and the proposed method in a case where M is 3. The figure

(a) (b) (c) (d)

(e) (f)

Table 1. PSNR’s for results obtained by some conventionalmethods and our proposed method in a case where M is 2.

NNI BLI CCI NE Proposed

Airplane 24.26 25.59 26.54 27.46 28.11Building 24.92 26.21 27.22 28.39 28.86Cameraman 23.38 24.50 25.24 26.15 26.55Lenna 25.74 27.53 28.62 29.72 30.41Lighthouse 22.04 22.82 23.37 24.05 24.29Text 21.92 23.69 24.82 26.13 26.37

Fig. 5. Original images employed in the experiment: (a) ‘‘Airplane’’,(b) ‘‘Building’’, (c) ‘‘Cameraman’’, (d) ‘‘Lenna’’, (e) ‘‘Lighthouse’’, (f)‘‘Text’’.

Table 2. The average order of questionnaire results obtained bysome conventional methods and our proposed method in a casewhere M is 2.

NNI BLI CCI NE Proposed

Airplane 4.6 4.3 3.1 1.6 1.4Building 4.8 4.3 3.0 1.7 1.4Cameraman 5.0 4.1 2.9 1.8 1.4Lenna 4.8 4.2 3.2 1.8 1.2Lighthouse 4.8 4.1 3.1 1.9 1.2Text 4.6 4.2 3.2 1.7 1.3

28 OPTICAL REVIEW Vol. 15, No. 1 (2008) N. SUETAKE et al.

confirms that the proposed method does not cause theringing and has better performance than the others similar tothe case when M is 2.

4. Conclusions

We proposed a new image SR method that estimates high-frequency components inherent in an original image basedon self-produced codebooks. Experiments confirmed thesuperiority of the proposed method over conventional

methods. Furthermore, the concept of the proposed tech-nique is quite simple, its implementation does not require acomplicated algorithm, thus, it may be helpful in a widerange of practical image processing applications.

Future work will include detailed analysis of the relation-ship between up-sampling scale M and the size of image tobe enlarged, reduction of the codebook size to saveprocessing time, and development of an efficient algorithmfor color images.

(a) (b) (c)

(d) (e) (f)

Fig. 6. Results of super-resolution for ‘‘Cameraman’’ in a case where M is 2: (a) the original image to be enlarged,(b) NNI, (c) BLI, (d) CCI, (e) NE, (f) the proposed method.

200

150

100

50

020 30 40 50

Pixel60 70 80

IdealNE

Inte

nsity

(a)

200

150

100

50

020 30 40 50

Pixel60 70 80

Ideal

Inte

nsity

Proposed

(b)

Fig. 7. An example of cross sections of the resultant image for ‘‘Cameraman’’: (a) NE, (b) the proposed method.

OPTICAL REVIEW Vol. 15, No. 1 (2008) N. SUETAKE et al. 29

References

1) J. D. van Ouwerkerk: Image Vision Comput. 24 (2006) 24.2) R. Y. Tsai and T. S. Huang: Adv. Comput. Vision Image

Process. 1 (1984) 317.3) E. Kaltenbacher and R. C. Hardie: Proc. IEEE National

Aerospace Electron. Conf., 1996, p. 702.4) S. P. Kim, N. K. Bose, and H. M. Valenzuela: IEEE Trans.

Acoust. Speech Sig. Process. 38 (1990) 1013.5) P. Oskoui-Fard and H. Strak: IEEE Trans. Med. Imaging 7

(1998) 45.6) A. J. Patti and Y. Altunbasak: IEEE Trans. Image Process. 10

(2001) 179.7) R. R. Schultz and R. L. Stevenson: IEEE Trans. Image

Process. 5 (1996) 996.8) R. C. Hardie, K. J. Barnard, and E. E. Armstrong: IEEE Trans.

Image Process. 6 (1997) 1621.9) N. Nguyen, P. Milanfar, and G. Golub: IEEE Trans. Image

Process. 10 (2001) 573.

10) J. S. Lin: Two-dimensional Signal Processing and ImageProcessing (Prentice-Hall, London, U.K., 1990).

11) R. G. Keys: IEEE Trans. Acoust. Speech Signal Process. 29(1981) 1153.

12) K. Jensen and D. Anastassiou: IEEE Trans. Image Process. 4(1995) 285.

13) X. Li and M. T. Orchard: IEEE Trans. Image Process. 10(2001) 1521.

14) A. M. Darwish and M. S. Bedair: Proc. SPIE 2466 (1996) 131.15) S. A. Martucci: Proc. IEEE Int. Conf. Image Processing, 1995,

p. 244.16) S. G. Chang, Z. Cvetkovic, and M. Vetterli: Proc. IEEE Int.

Conf. Acoustics, Speech and Signal Processing, 1995, p. 2379.17) F. M. Candonica and J. C. Principe: IEEE Trans. Neural Netw.

10 (1999) 372.18) N. Plaziac: IEEE Trans. Image Process. 8 (1999) 1647.19) F. Pan and L. Zhang: Opt. Eng. 42 (2003) 3038.20) H. Greenspan, C. H. Anderson, and S. Akber: IEEE Trans.

Image Process. 9 (2000) 1035.

(a) (b) (c)

Fig. 8. Estimation results of high-spatial frequency image components: (a) an ideal image, (b) NE, (c) the proposedmethod.

(c)(a) (b)

Fig. 9. Results of super-resolution for ‘‘Cameraman’’ in a case where M is 3: (a) CCI, (b) NE, (c) the proposed method.

30 OPTICAL REVIEW Vol. 15, No. 1 (2008) N. SUETAKE et al.

image super-resolution based on local...

Documents