generative adversarial network-based edge-preserving ......2.1. stage 1 gan. in stage 1, we use the...

12
Research Article Generative Adversarial Network-Based Edge-Preserving Superresolution Reconstruction of Infrared Images Yuqing Zhao , Guangyuan Fu, Hongqiao Wang, Shaolei Zhang, and Min Yue Xian Research Institute of High-Tech, Shaanxi 710025, China Correspondence should be addressed to Yuqing Zhao; [email protected] Received 7 February 2021; Revised 2 June 2021; Accepted 1 July 2021; Published 21 July 2021 Academic Editor: Quoc-Tuan Vien Copyright © 2021 Yuqing Zhao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The convolutional neural network has achieved good results in the superresolution reconstruction of single-frame images. However, due to the shortcomings of infrared images such as lack of details, poor contrast, and blurred edges, superresolution reconstruction of infrared images that preserves the edge structure and better visual quality is still challenging. Aiming at the problems of low resolution and unclear edges of infrared images, this work proposes a two-stage generative adversarial network model to reconstruct realistic superresolution images from four times downsampled infrared images. In the rst stage of the generative adversarial network, it focuses on recovering the overall contour information of the image to obtain clear image edges; the second stage of the generative adversarial network focuses on recovering the detailed feature information of the image and has a stronger ability to express details. The infrared image superresolution reconstruction method proposed in this work has highly realistic visual eects and good objective quality evaluation results. 1. Introduction Infrared imaging technology used for passive noncontact detection and identication has the advantages of good con- cealment, strong transmission ability, no interference from electromagnetic waves, and good low-light and night-vision capabilities [1]. In addition to the main military applications, this technology can also be widely used in civilian elds, such as industry, agriculture, medicine, and public security recon- naissance. However, infrared images have many disadvan- tages, such as low resolution, low contrast, and blurred edges. Although it is possible to improve the hardware perfor- mance of the infrared imaging system by improving the manufacturing process of the infrared detector, it requires tre- mendous human and nancial resources and is dicult to achieve in the short term. Therefore, digital signal processing is an economical and eective way to improve the quality of infrared images [24]. Superresolution reconstruction refers to the reconstruction of a high-resolution image or a sequence on single-frame or multiframe low-resolution images and includes three main types, i.e., interpolation-based methods, reconstruction-based methods, and instance-based learning methods. Instance-based learning methods are exible in algorithm structure, can provide more details under high mag- nication, and thus have become a research hotspot of super- resolution reconstruction in recent years. Chao et al. [5] proposed the use of a convolutional neural network (CNN) to achieve the superresolution reconstruction of visible light images and to learn the mapping relationship between low- resolution image and high-resolution image by training with a large dataset. Other researchers [6] used the perceptual loss to replace the minimum mean square error and the learned upsampling to replace bicubic interpolation, achieving better results. Subsequently, network structures with more layers, such as the deeply recursive convolutional network (DRCN) [7] and the ecient subpixel convolutional neural network (ESPCN) [8], were proposed to achieve better outcomes. In the eld of machine learning, generative models have always been a dicult problem. The proposal of a generative adver- sarial network (GAN) [9] meets the demands of generative models for research and application in many elds. GAN only uses back propagation and thus avoids the complex Markov chain. At the same time, GAN uses unsupervised learning methods, enabling the production of more clear and true Hindawi International Journal of Digital Multimedia Broadcasting Volume 2021, Article ID 5519508, 12 pages https://doi.org/10.1155/2021/5519508

Upload: others

Post on 16-Aug-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Generative Adversarial Network-Based Edge-Preserving ......2.1. Stage 1 GAN. In Stage 1, we use the 128×128 low-resolution image ILR0 as input and export it to the Stage 1 generator

Research ArticleGenerative Adversarial Network-Based Edge-PreservingSuperresolution Reconstruction of Infrared Images

Yuqing Zhao Guangyuan Fu Hongqiao Wang Shaolei Zhang and Min Yue

Xirsquoan Research Institute of High-Tech Shaanxi 710025 China

Correspondence should be addressed to Yuqing Zhao zoe_rabbi126com

Received 7 February 2021 Revised 2 June 2021 Accepted 1 July 2021 Published 21 July 2021

Academic Editor Quoc-Tuan Vien

Copyright copy 2021 Yuqing Zhao et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

The convolutional neural network has achieved good results in the superresolution reconstruction of single-frame imagesHowever due to the shortcomings of infrared images such as lack of details poor contrast and blurred edges superresolutionreconstruction of infrared images that preserves the edge structure and better visual quality is still challenging Aiming at theproblems of low resolution and unclear edges of infrared images this work proposes a two-stage generative adversarial networkmodel to reconstruct realistic superresolution images from four times downsampled infrared images In the first stage of thegenerative adversarial network it focuses on recovering the overall contour information of the image to obtain clear imageedges the second stage of the generative adversarial network focuses on recovering the detailed feature information of the imageand has a stronger ability to express details The infrared image superresolution reconstruction method proposed in this workhas highly realistic visual effects and good objective quality evaluation results

1 Introduction

Infrared imaging technology used for passive noncontactdetection and identification has the advantages of good con-cealment strong transmission ability no interference fromelectromagnetic waves and good low-light and night-visioncapabilities [1] In addition to the main military applicationsthis technology can also be widely used in civilian fields suchas industry agriculture medicine and public security recon-naissance However infrared images have many disadvan-tages such as low resolution low contrast and blurrededges Although it is possible to improve the hardware perfor-mance of the infrared imaging system by improving themanufacturing process of the infrared detector it requires tre-mendous human and financial resources and is difficult toachieve in the short term Therefore digital signal processingis an economical and effective way to improve the quality ofinfrared images [2ndash4] Superresolution reconstruction refersto the reconstruction of a high-resolution image or a sequenceon single-frame or multiframe low-resolution images andincludes three main types ie interpolation-based methodsreconstruction-based methods and instance-based learning

methods Instance-based learning methods are flexible inalgorithm structure can providemore details under highmag-nification and thus have become a research hotspot of super-resolution reconstruction in recent years Chao et al [5]proposed the use of a convolutional neural network (CNN)to achieve the superresolution reconstruction of visible lightimages and to learn the mapping relationship between low-resolution image and high-resolution image by training witha large dataset Other researchers [6] used the perceptual lossto replace the minimum mean square error and the learnedupsampling to replace bicubic interpolation achieving betterresults Subsequently network structures with more layerssuch as the deeply recursive convolutional network (DRCN)[7] and the efficient subpixel convolutional neural network(ESPCN) [8] were proposed to achieve better outcomes Inthe field of machine learning generative models have alwaysbeen a difficult problem The proposal of a generative adver-sarial network (GAN) [9] meets the demands of generativemodels for research and application in many fields GAN onlyuses back propagation and thus avoids the complex Markovchain At the same time GAN uses unsupervised learningmethods enabling the production of more clear and true

HindawiInternational Journal of Digital Multimedia BroadcastingVolume 2021 Article ID 5519508 12 pageshttpsdoiorg10115520215519508

samples In recent research GAN has been widely used [10ndash12] Among them the SaliencyGAN model proposed byWang et al [10] is a semisupervised salient object detectionmethod for the Internet of Things Yang et al [12] proposeda new model based on conditional generative adversarialnetwork (DAGAN) to reconstruct Compressed SensingMagnetic Resonance Imaging (CS-MRI) Literature [11] pro-posed a novel fast CS-MRI deep learning architecture basedon a conditional generative confrontation network SRGAN[13] is a GAN for image superresolution (SR) and canrecover a photorealistic natural image from four instancesof downsampling However the amplified details of the

image generated through this method usually show unsightlyartifacts To further improve the visual quality researchers[14] have proposed a residual-in-residual dense block(RRDB) network unit and made improvements in the per-ception domain loss Researchers [15] have also proposed anovel generative adversarial frame to improve the edgestructure and texture information in the compressed imageESRGAN+ [16] designed a network architecture with novelbasic blocks to replace the basic structure used by the origi-nal ESRGAN In the field of infrared imaging SR recon-struction has been mostly achieved using sparse codingmethods [17ndash21]

Figure 1 Original image (left) and the image recovered from 4x downsampled images using the proposed method

G0

D0

G1

D1

128times128 256times256_fake

256times256_real

512times512_fake

01

01

Stage- I generator G0

Stage- I discriminator D0

Stage- II generator G1

Stage- II discriminator D1

512times512_real

Figure 2 Model architecture of the proposed method

2 International Journal of Digital Multimedia Broadcasting

In this study we propose a new GAN framework toimprove the perceived quality of infrared images Then wedesign a multiconstraint loss function by combining imagefidelity loss adversarial loss feature fidelity loss and edgefidelity loss By continuously updating and iterating the min-imization of loss functions we obtain the reconstructedimage with high resolution and sharp edges using the pro-posed method (Figure 1)

The main contributions of this paper are as follows

(i) We propose a GAN-based SR reconstruction frame-work for edge preservation of infrared images thatcan enhance the GAN to better restore the edge

structure of the infrared image while maintainingthe detailed information

(ii) To preserve the characteristics and the edgeinformation in the image we propose a multiple con-straints loss function applicable for SR reconstruction

(iii) We validate the proposed method using images frompublicly available datasets and compare the perfor-mance of the proposed method with that from othermainstream methods The results confirm that com-pared with other methods the proposed methodobtains more realistic reconstructed infrared SRimages with sharper edges

Inpu

t Conv

PReL

U

Conv

BN

PReL

U

Conv

BN

Elem

entw

ise su

m

k9n64s1 k3n64s1 k3n64s1

Conv

BN

Elem

entw

ise su

m

k3n64s1

Conv

Pixe

lShu

ffler

x2

PReL

U

k3n256s1

Conv

k9n3s1

Tanh

Resto

ratio

n

6 residual blocks

998661 998661 998661

Skip connection su

m

sum

er x

2

Figure 3 Network structure of the generator G0

Conv

Leak

y Re

LU

Inpu

t

k3n64s1

Conv

BNLe

aky

ReLU

k3n64s2 k3n128s1 k3n256s1 k3n512s1

k3n128s2 k3n256s2 k3n512s2

Conv

BNLe

aky

ReLU

k3n1024s1

Conv

BNLe

aky

ReLU

k3n1024s1

Conv

BNLe

aky

ReLU

k3n1024s1

Ada

ptiv

eAvg

Pool

Conv

Leak

y Re

LU

Sigm

oid

01

Conv

Inpu

t

Figure 4 Network structure of the discriminator D0

Inpu

t Conv

ReLU

Conv

BN ReLU

Conv

BN

Elem

entw

ise su

m

k3n64s1 k3n64s1 k3n64s1

BN

Elem

entw

ise su

m

Conv

ReLU

Conv

k1n3s1

Tanh

Resto

ratio

n

Feature extraction stack16 residual blocks

bull bull bull

Skip connection

Conv

ReLU

k3n256s1 k3n256s1

e sum

e sum

Figure 5 Network structure of the generator G1

3International Journal of Digital Multimedia Broadcasting

We describe the proposed network framework and lossfunction in Section 2 and their quantitative evaluations usingthe public datasets in Section 3 followed by recommendedfuture study on the proposed method and the conclusion

2 Method

To generate a high-resolution image with photograph-grade realistic details and inspired by the literature [2223] we propose a simple and effective two-layer GAN in

Ground-truth

SRCNN

ESPCN

SRGAN

ESRGAN

ESRGAN+

Ours

Figure 6 Comparison of the reconstructed results of images from the FLIR_ ADAS_1_3 validation set The images in the first row areoriginal images those in the second row are the reconstruction results using the SRCNN method those in the third row are thereconstruction results using the ESPCN method those in the fourth row are the reconstruction results using the SRGAN method those inthe fifth row are the reconstruction results using the ESRGAN method those in the sixth row are the reconstruction results using theESRGAN+ method and those in the last row are the reconstruction results using our proposed method

4 International Journal of Digital Multimedia Broadcasting

which the image generation process is divided into twostages (Figure 2)

21 Stage 1 GAN In Stage 1 we use the 128 times 128 low-resolution image ILR0 as input and export it to the Stage 1

generator G0 to generate a false 256 times 256 image (ISR0)which together with a real 256 times 256 image (IHR0) isimported to the Stage 1 discriminator D0 to identify theimage The core of the network structure of the generatorG0 in the proposed method is shown in Figure 3 We adopt

Ground-truth

SRCNN

ESPCN

SRGAN

ESRGAN

ESRGAN+

Ours

Figure 7 Comparison of the reconstructed results of images from the Itir_v1_0 validation set The images in the first row are original imagesthose in the second row are the reconstruction results using the SRCNNmethod those in the third row are the reconstruction results using theESPCNmethod those in the fourth row are the reconstruction results using the SRGANmethod those in the fifth row are the reconstructionresults using the ESRGANmethod those in the sixth row are the reconstruction results using the ESRGAN+method and those in the last roware the reconstruction results using our proposed method

5International Journal of Digital Multimedia Broadcasting

the network design of Ledig et al [10] and introduce the skipconnection which has been proven effective in training deepneural networks We adopt the residual block proposed inthe literature [24] to construct a neural network with sixresidual blocks used as a stack to extract features from theimage Recent work on single-image superresolution [25ndash28] pointed out that with the deepening of the network andthe training and testing phases under the GAN frameworkhallucination artifacts often appear In order to solve thisproblem we are following the method in [29] and adding abatch normalization layer (BN) to the network Each residualblock contains two convolutional layers with a kernel size of3 times 3 and 64 feature maps two BN layers and one parametricrectified linear unit (PReLU) layer [30] The specific settingsof each layer of the generative model are as follows

C PR 64eth THORN⟶ C 64eth THORNBN PReth THORNC 64eth THORNBN PReth THORNSC⟶ 6⋯⟶ C 64eth THORNBN PReth THORNSC⟶ C PR 256eth THORN⟶ C t 3eth THORN

eth1THORN

Here C(PR64) denotes a set of convolutional layers with64 feature maps and activation function PReLUC(64)BN(PR)C(64)BN(PR)SC represents a residual blockBN(PR) is a batch normalization layer with activation func-tion PReLU and SC denotes a skip connection There arein total 6 residual blocks C(t3) represents a convolutionallayer with 3 feature maps and activation function tanh

To distinguish the generated SR samples from real high-resolution (HR) samples we train the discriminator networkD0 whose overall framework is shown in Figure 4 and whichis adopted from the architectural framework summarized byRadford et al [31] in which LeakyReLU [32] activation isused to avoid maximum pooling of the entire network Thediscriminator network includes ten convolutional blocksAll blocks except for the first block contain a convolutionallayer a BN layer and a LeakyReLU layer The number of ker-nels within the filter increases continually from 64 in theVisual Geometry Group (VGG) network [33] to 1024 theneach time the number of kernels increases segmented convo-lution is used to lower the image resolution After that a spe-cial residual block containing two convolution layers and aLeakyReLU layer is connected The output of the last convo-lution unit is sent to a dense layer with an S-type activationfunction to get a true and false result The structure andparameters of each layer of the discriminator D0 networkare as follows

C LR 64eth THORN⟶ C 64eth THORNBN LReth THORN⟶ C 128eth THORNBN LReth THORN⟶ C 256eth THORNBN LReth THORN⟶ C 512eth THORNBN LReth THORN⟶ C 1024eth THORNBN LReth THORN⟶ C LR 1024eth THORN⟶D

eth2THORN

Here LR denotes the activation function LeakyReLU CethLR 64THORN denotes a set of convolutional layers with 64 featuremaps and activation function LeakyReLU Ceth64THORNBNethLRTHORNdenotes a set of convolutional layers with 64 feature mapsfollowed by batch-normalization with activation function

LeakyReLU and D is the dense layer for outputting The fea-ture maps were increased from 64 to 1024

L1 = λ1Ladv + λ2Lmse + λ3Ledge eth3THORN

Stage 1 Reconstruction Loss Function The loss functioncontains three parts adversarial loss image fidelity lossand edge fidelity loss (Eq (3)) These parts each capture dif-ferent perceptual characteristics of the reconstructed imageto obtain a more visually satisfactory reconstructed image

where the weight fλig is a trade-off parameter that isused to balance multiple loss components The first part isthe adversarial loss between the generator G0 and the dis-criminator D0 of GAN This part encourages the generatorto trick the discriminator network to produce a more realisticHR image as follows

Ladv = N

n=1minus log DθD

GθGILR0

eth4THORN

whereDθDethGθG

ethILR0THORNTHORN is the estimated probability of the recon-

structed image GθGethILR0THORN to be a true HR image To obtain a

better gradient we use the minimization minuslog DθDethGθG

ethILR0THORNTHORNto replace the minimizationlog frac121 minusDθD

ethGθGethILR0THORNTHORN

Table 1 Method comparison quantitative results of the SRCNNESPCN SRGAN ESRGAN and ESRGAN+ methods and theproposed method on the FLIR_ ADAS_1_3 validation dataset

CCPSNR(dB)

SSIM VIF UIQI TIME (s)

SRCNN 08160 183963 08655 07942 08770 27112

ESPCN 08627 277171 09349 08269 09398 10206

SRGAN 08463 280445 09497 08120 08503 09202

ESRGAN 09547 285453 09506 07918 09382 01093

ESRGAN+ 09806 337018 09855 09119 09864 02713

Ours 09964 379865 09980 09083 09965 01145

Table 2 Method comparison quantitative results of the SRCNNESPCN SRGAN ESRGAN and ESRGAN+ methods and theproposed method on the Itir_v1_0 dataset

CCPSNR(dB)

SSIM VIF UIQI TIME (s)

SRCNN 07937 202460 08966 07079 08746 25706

ESPCN 08464 221708 09185 08194 09877 10020

SRGAN 08152 226732 09154 07944 08610 06642

ESRGAN 09146 303205 09608 07752 09166 00937

ESRGAN+ 09547 341024 09844 08497 09541 02558

Ours 09962 352865 09943 09064 09936 01059

6 International Journal of Digital Multimedia Broadcasting

The second term of Eq (3) Lmse ensures the fidelity ofthe restored image using the pixel-level mean square error(MSE) loss as follows

Lmse = 1WHC IHR0 minus ISR0

22

eth5THORN

whereWH and C are the height the width and the numberof channels respectively of the image

The third term of Eq (3) Ledge the edge fidelity loss ispurported to reproduce sharp edge information as follows

Ledge = 1WH IandE minus IE

22

eth6THORN

whereW and H are the width and the height respectively ofthe image The labeled edge map IE is extracted by a specificedge filter on the real 256 times 256 image IHR0 while IandE isextracted by a specific edge filter on the 256 times 256 imageISR0 generated by the generator G0 In our experiments wechose the Canny edge detection operator By minimizingthe edge fidelity loss the network continuously guides edgerecovery

22 Stage 2 GAN In the second stage we use the generated256 times 256 low-resolution image ISR0 as input and export itto the Stage 2 generator G1 to generate a 512 times 512 imageISR1 which together with a real 512 times 512 image IHR1 isexported to the Stage 2 discriminator D1 to identify if it isfalse or real We adopt the network design described in theliterature [15] in which a generator model G1 (Figure 5)

SRCNN

ESPCN

SRGAN

ESRGAN

ESRGAN+

Ours

Figure 8 Comparison of edge detection results of superresolution reconstruction results

7International Journal of Digital Multimedia Broadcasting

containing 16 residual blocks is constructed The output for-mat and parameters of each layer of the generator G1 are asfollows

C R 64eth THORN⟶ C 64eth THORNBN Reth THORNC 64eth THORNBN Reth THORNSC⟶ 16⋯⟶ C R 64eth THORNSC⟶ C 256eth THORN⟶ C t 3eth THORN

eth7THORN

where C(R64) denotes a set of convolutional layers with64 feature maps and activation function ReLUC(64)BN(R)C(64)BN(R)SC represents a residual block

The network structure of the discriminator networkD1 inthe second stage adopts a network structure similar to that ofthe discriminator D0 Each layer structure and networkparameters of D1 are as follows

C LR 64eth THORN⟶ C 64eth THORNBN LReth THORN⟶ C 128eth THORNBN LReth THORN⟶ C 256eth THORNBN LReth THORN⟶ C 512eth THORNBN LReth THORN⟶ C 1024eth THORNBN LReth THORN⟶ C LR 1024eth THORN⟶D

eth8THORN

L2 = λ1primeLadv1 + λ2primeLmse1 + λ3primeL feature eth9THORNwhere the weight fλiprimeg is a trade-off parameter that is used tobalance multiple loss components The first term Ladv1 isthe adversarial loss between the generator G1 and the dis-criminator D1 of GAN The second termLmse1 is the imagefidelity loss The third term L feature is the feature fidelity

loss which is defined based on the feature space distancedefined in the literature [34] to facilitate the preservationof feature representation in the reconstructed image similarto that of the original image as follows

L feature = 1WHC ϕ IHR0 minus ISR0

22

eth10THORN

whereWH and C are the height the width and the numberof channels respectively of the image and ϕethsdotTHORN represents thefeature space function which is a pretrained VGG-19 [33]network that maps images to feature space The fourth pool-ing layer is used to calculate the feature-activated L2 distanceto be used as the feature fidelity loss function

Stage 2 Reconstruction Loss Function The loss functioncontains three parts adversarial loss image fidelity lossand feature fidelity loss (Eq (9))

3 Experiments

31 Experimental Details All experiments are performed ona desktop computer with 220GHz times40 Intel Xeon (R) Silver4114 CPU GeForce GTX 1080Ti and 64GB of memory Weuse the PyTorch deep learning framework to implement themethod proposed in this article and all comparison methodsAll methods are trained with the FLIR training data set Dur-ing the training process we choose 8862 images from theFLIR_ADAS_1_3 thermal sensor training dataset released

(a) SRCNN super-resolution reconstruction image matching results (b) ESPCN super-resolution reconstruction image matching results

(c) SRGAN super-resolution reconstruction image matching results (d) ESRGAN super-resolution reconstruction image matching results

(e) ESRGAN+ super-resolution reconstruction image matching results (f) Our method super-resolution reconstruction image matching results

Figure 9 Superresolution reconstruction image matching results (the left image is the original high-resolution image)

8 International Journal of Digital Multimedia Broadcasting

by FLIR Systems Inc a sensor system developer in 2018First we perform the 4x factor downsampling on all experi-mental data to obtain low-resolution (LR) images by lower-ing the resolution of the HR images We set the batch sizeto 4 and used the Adam [35] with a momentum term of β= 09 as the optimization procedure To enable the loss func-tion in the same order of magnitude to better balance losscomponents we set λ1λ2prime and λ3prime of Eq (9) to 10-3 1 and10-6 respectively When training the Stage 1 GAN we set

the learning rate to 10-4 which is lowered to 10-5 when train-ing the Stage 2 GAN

32 Experimental Evaluation To verify the effectiveness of theproposed method we conduct the validation on two publicdata sets the FLIR_ ADAS_1_3 (1366 images) validation setand the Itir_v1_0 dataset (11262 images) We compared theproposed method with the most advanced methods includingsuperresolution using deep convolutional networks (SRCNN)

LR

SRCNN

ESPCN

SRGAN

ESRGAN

ESRGAN+

Our method

Figure 10 Target detection result of superresolution reconstructed image

9International Journal of Digital Multimedia Broadcasting

[5] ESPCN [8] SRGAN [10] ESRGAN [15] and ESRGAN+[16] methods Three images are selected from the verificationset of FLIR_ADAS_1_3 and the data set of Itir_v1_0 and thesubjective results of several methods are shown in Figures 6and 7 From the reconstruction results it is not difficult tosee that the reconstruction results of our proposed methodproduce finer texture and edge details

To facilitate a fair quantitative comparison we used thecorrelation coefficient (CC) [36] the peak-signal-to-noiseratio (PSNR) [37] the Structural Similarity Index Measure(SSIM) [38] the visual information fidelity (VIF) [39] theUniversal Image Quality Index (UIQI) and the time con-sumption six objective indicators to evaluate the quality ofthe reconstructed images and the SR methods The quantita-tive results of the comparison of different reconstructionmethods are shown in Table 1 which shows that our methodis superior to the SRCNN [5] ESPCN [8] SRGAN [13] ESR-GAN [15] and ESRGAN+ [16] methods in the indicators ofCC PSNR SSIM and UIQI on the FLIR data set The VIFindex is slightly lower than the ESRGAN+ method and thetime consumption is greater than the ESRGAN method

The quantitative results of comparison of different recon-struction methods on the Itir_v1_0 dataset are shown inTable 2 indicating that the proposed method is superior tothe SRCNN [5] ESPCN [8] SRGAN [13] ESRGAN [15]and ESRGAN+ [16] methods on the Itir_v1_0 dataset Onlythe time consumption is slightly greater than the ESRGANmethod

In order to more intuitively illustrate the effectiveness ofthe superresolution method proposed in this work forimproving the edge features of infrared images we show inFigure 8 the comparison of the edge detection results of thesuperresolution reconstruction results of various methodsIt can be seen from the figure that the image reconstructedby our method has more and clearer edge information whichis meaningful for the application of infrared images

33 Use Advanced Vision Tasks to Compare SuperresolutionResults Basic vision tasks including image superresolutionreconstruction are all for advanced vision tasks Infraredimages are widely used in target detection and target match-ing tasks but the shortcomings of low resolution and unclearedges of infrared images affect the accuracy of the abovetasks Therefore whether the result of infrared image super-resolution reconstruction can improve the accuracy of theabove tasks has become an evaluation index of the result ofsuperresolution reconstruction In order to further verifyour method we match the superresolution images generatedby several methods with real high-resolution images Scale

Invariant Feature Transform (SIFT) is a representation ofthe statistical results of Gaussian image gradients in the fieldof feature points and is a commonly used image local featureextraction algorithm In the matching result the number ofmatching points can be used as a criterion for matching qual-ity and the corresponding matching points can also deter-mine the similarity of the local features of the two imagesFigure 9 shows the result of matching the superresolutionreconstructed image with the original high-resolution imagethrough the SIFT algorithm It can be seen from the quantitythat the reconstructed image produced by our proposedmethod obtains more correct matching pairs than othermethods

In this experiment we use the classic YOLO [40] methodfor image target detection (Figure 10) It can be seen that thesuperresolution reconstructed image generated by our pro-posed method has better detection results and can detectmore targets

4 Discussion and Future Work

Through experiments we demonstrate that compared withother methods the proposed method has a better perceptualperformance However during the experiment we alsofound that for some images the reconstruction results arenot satisfactory as shown in Figure 11 By analyzing theseimages we found that they have common characteristicsie when the imaging device and the imaging object are mov-ing at high relative speeds the captured image may containmotion blur For such images ordinary SR reconstructionmethods cannot achieve effective edge recovery Thereforein future studies we will address these problems

5 Conclusion

In this study we propose a two-stage GAN framework that isable to reconstruct SR image by recovering edge structureinformation and retaining feature information In the firststage the image fidelity loss the adversarial loss and the edgefidelity loss are combined to preserve the edges of the imageIn the second stage the image adversarial loss the imagefidelity loss and the feature fidelity loss are combined to mineimage visual features By iteratively updating the generativenetwork and the discriminator network the SR reconstruc-tion of infrared image with preserved edges is achievedExperimental verification results show that the proposedmethod outperforms several other image reconstructionmethods in reconstructing SR infrared images

Figure 11 Images with motion blur cannot obtain clear edge information using the ordinary SR reconstruction methods

10 International Journal of Digital Multimedia Broadcasting

Data Availability

The data used to support the findings of this study are avail-able from the corresponding author upon request

Conflicts of Interest

The authors declare that there is no conflict of interestregarding the publication of this paper

References

[1] L Xiang-di et al ldquoInfrared imaging system and applicationsrdquoLaser amp Infrared vol 3 2014

[2] X Sui Q Chen G Gu and X Shen ldquoInfrared super-resolution imaging based on compressed sensingrdquo InfraredPhysics amp Technology vol 63 pp 119ndash124 2014

[3] X Sui H Gao Y Sun Q Chen and G Gu ldquoInfrared super-resolution imaging method based on retina micro-motionrdquoInfrared Physics amp Technology vol 60 pp 340ndash345 2013

[4] J L Suo J I Xiangyang and Q H Dai ldquoAn overview of com-putational photographyrdquo Science China(Information Sciences)vol 6 pp 5ndash24 2012

[5] D Chao C L Chen K He and X Tang ldquoLearning a DeepConvolutional Network for Image Super-Resolutionrdquo in Euro-pean Conference on Computer Vision Springer 2014

[6] C Dong C C Loy K He and X Tang ldquoImage super-resolution using deep convolutional networksrdquo IEEE Transac-tions on Pattern Analysis and Machine Intelligence vol 38no 2 pp 295ndash307 2014

[7] J Kim J Lee and K Lee Deeply-Recursive Convolutional Net-work for Image Super-Resolution 2016 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) IEEE pp1637ndash1645 2016

[8] W Shi J Caballero F Huszar et al ldquoReal-Time Single Imageand Video Super-Resolution Using an Efficient Sub-Pixel Con-volutional Neural Networkrdquo in 2016 IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR) Las Vegas NVUSA 2016

[9] I Goodfellow J Pouget-Abadie M Mirza et al ldquoGenerativeAdversarial Networksrdquo Advances in Neural Information Pro-cessing Systems vol 3 pp 2672ndash2680 2014

[10] C Wang S Dong X Zhao G Papanastasiou H Zhang andG Yang ldquoSaliencyGAN deep learning semisupervised salientobject detection in the fog of IoTrdquo IEEE Transactions onIndustrial Informatics vol 16 no 4 pp 2667ndash2676 2020

[11] G Yang S Yu H Dong et al ldquoDAGAN deep de-aliasing gen-erative adversarial networks for fast compressed sensing MRIreconstructionrdquo IEEE Transactions on Medical Imagingvol 37 no 6 pp 1310ndash1321 2018

[12] S Yu D Hao G Yang G Slabaugh and Y Guo ldquoDeep De-Aliasing for Fast Compressive Sensing MRIrdquo 2017 httpsarxivorgabs170507137

[13] C Ledig L Theis F Huszar et al ldquoPhoto-Realistic SingleImage Super-Resolution Using a Generative Adversarial Net-workrdquo in 2017 IEEE Conference on Computer Vision and Pat-tern Recognition (CVPR) Honolulu HI USA 2016

[14] X Wang K Yu S Wu et al ldquoESRGAN Enhanced Super-Resolution Generative Adversarial Networksrdquo in LectureNotes in Computer Science Springer 2018

[15] Q Mao S Wang S Wang X Zhang and S Ma ldquoEnhancedImage Decoding Via Edge-Preserving Generative AdversarialNetworksrdquo in 2018 IEEE International Conference on Multi-media and Expo (ICME) San Diego CA USA 2018

[16] N C Rakotonirina and A Rasoanaivo ldquoESRGAN+ furtherimproving enhanced super-resolution generative adversarialnetworkrdquo in ICASSP 2020 - 2020 IEEE international confer-ence on acoustics Speech and Signal Processing (ICASSP)pp 3637ndash3641 Barcelona Spain 2020

[17] C Kraichan and S Pumrin ldquoPerformance Analysis on Multi-Frame Image Super-Resolution Via Sparse Representationrdquo in2014 International Electrical Engineering Congress (iEECON)Chonburi Thailand 2014

[18] S Yu-bao W Zhi-hui X Liang Z Zhen-rong and L Zhan-Qiang ldquoMultimorphology Sparsity regularized image super-resolutionrdquo Acta Electronica Sinica vol 38 no 12 pp 2898ndash2903 2010

[19] S Yang M Wang Y Chen and Y Sun ldquoSingle-image super-resolution reconstruction via learned geometric dictionariesand clustered sparse codingrdquo IEEE Transactions on Image Pro-cessing vol 21 no 9 pp 4016ndash4028 2012

[20] Y Tang Y Yuan P Yan and X Li ldquoGreedy regression insparse coding space for single-image super-resolutionrdquo Jour-nal of Visual Communication amp Image Representationvol 24 no 2 pp 148ndash159 2013

[21] L Qiu-sheng and Z Wei ldquoImage super-resolution algorithmsbased on sparse representation of classified image patchesrdquoActa Electronica Sinica vol 40 no 5 pp 920ndash925 2012

[22] H Zhang T Xu H Li et al ldquoStackGAN Text to Photo-Realistic Image Synthesis with Stacked Generative AdversarialNetworksrdquo in 2017 IEEE International Conference on Com-puter Vision (ICCV) Venice Italy 2017

[23] H Zhang T Xu H Li et al ldquoStackGAN++ realistic imagesynthesis with stacked generative adversarial networksrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 41 pp 1947ndash1962 2018

[24] K He X Zhang S Ren and J Sun ldquoDeep Residual Learningfor Image Recognitionrdquo in 2016 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) pp 770ndash778 LasVegas NV USA 2016

[25] J Zhu G Yang P Ferreira et al ldquoA Roi Focused Multi-Scale Super-Resolution Method for the Diffusion TensorCardiac Magnetic Resonancerdquo in Proceedings of the 27thAnnual Meeting (ISMRM) pp 11ndash16 Montreacuteal QC Can-ada 2019

[26] J Zhu G Yang and P Lio ldquoLesion Focused Super Resolu-tionrdquo in Medical Imaging Image Processing 2019 San DiegoCA USA 2019

[27] J Zhu G Yang and P Lio ldquoHow Can We Make GAN Per-form Better in Single Medical Image Super-Resolution ALesion Focused Multi-Scale Approachrdquo in 2019 IEEE 16thInternational Symposium on Biomedical Imaging (ISBI 2019)Venice Italy 2019

[28] M Seitzer G Yang J Schlemper et al ldquoAdversarial and Per-ceptual Refinement for Compressed Sensing MRI Reconstruc-tionrdquo in Medical Image Computing and Computer AssistedIntervention ndash MICCAI 2018 Springer 2018

[29] S Ioffe and C Szegedy ldquoBatch Normalization AcceleratingDeep Network Training by Reducing Internal CovariateShiftrdquo in International conference on machine learningPMLR 2015

11International Journal of Digital Multimedia Broadcasting

[30] A Krizhevsky I Sutskever and G Hinton ldquoImageNet classi-fication with deep convolutional neural networksrdquo Advancesin Neural Information Processing Systems vol 25 no 2 2012

[31] A Radford L Metz and S Chintala ldquoUnsupervised Represen-tation Learning with Deep Convolutional Generative Adver-sarial Networksrdquo 2015 httpsarxivorgabs151106434

[32] A L Maas A Y Hannun and A Y Ng ldquoRectifier nonlinear-ities improve neural network acoustic modelsrdquo in Proceedingsof the 30 th International Conference on Machine Learningp 3 Atlanta Georgia USA 2013

[33] K Simonyan and A Zisserman ldquoVery Deep ConvolutionalNetworks for Large-Scale Image Recognitionrdquo Computer Sci-ence 2014 httpsarxivorgabs14091556

[34] J Johnson A Alahi and L Fei-Fei ldquoPerceptual Losses forReal-Time Style Transfer and Super-Resolutionrdquo in ComputerVision ndash ECCV 2016 Springer 2016

[35] D P Kingma and J Ba ldquoAdam AMethod for Stochastic Opti-mizationrdquo Computer Science 2014 httpsarxivorgabs14126980

[36] M Deshmukh and U Bhosale ldquoImage fusion and image qual-ity assessment of fused imagesrdquo International Journal of ImageProcessing (IJIP) vol 4 no 5 p 484 2010

[37] C Yim and A C Bovik ldquoQuality assessment of deblockedimagesrdquo IEEE Transactions on Image Processing vol 20no 1 pp 88ndash98 2011

[38] Z Wang A C Bovik H R Sheikh and E P SimoncellildquoImage quality assessment from error visibility to structuralsimilarityrdquo IEEE Transactions on Image Processing vol 13no 4 pp 600ndash612 2004

[39] Y Han Y Cai Y Cao and X Xu ldquoA new image fusion perfor-mance metric based on visual information fidelityrdquo Informa-tion fusion vol 14 no 2 pp 127ndash135 2013

[40] J Redmon S Divvala R Girshick and A Farhadi ldquoYou OnlyLook Once Unified Real-Time Object Detectionrdquo in 2016IEEE Conference on Computer Vision and Pattern Recognition(CVPR) Las Vegas NV USA 2015

12 International Journal of Digital Multimedia Broadcasting

  • Generative Adversarial Network-Based Edge-Preserving Superresolution Reconstruction of Infrared Images
  • 1 Introduction
  • 2 Method
    • 21 Stage 1 GAN
    • 22 Stage 2 GAN
      • 3 Experiments
        • 31 Experimental Details
        • 32 Experimental Evaluation
        • 33 Use Advanced Vision Tasks to Compare Superresolution Results
          • 4 Discussion and Future Work
          • 5 Conclusion
          • Data Availability
          • Conflicts of Interest
Page 2: Generative Adversarial Network-Based Edge-Preserving ......2.1. Stage 1 GAN. In Stage 1, we use the 128×128 low-resolution image ILR0 as input and export it to the Stage 1 generator

samples In recent research GAN has been widely used [10ndash12] Among them the SaliencyGAN model proposed byWang et al [10] is a semisupervised salient object detectionmethod for the Internet of Things Yang et al [12] proposeda new model based on conditional generative adversarialnetwork (DAGAN) to reconstruct Compressed SensingMagnetic Resonance Imaging (CS-MRI) Literature [11] pro-posed a novel fast CS-MRI deep learning architecture basedon a conditional generative confrontation network SRGAN[13] is a GAN for image superresolution (SR) and canrecover a photorealistic natural image from four instancesof downsampling However the amplified details of the

image generated through this method usually show unsightlyartifacts To further improve the visual quality researchers[14] have proposed a residual-in-residual dense block(RRDB) network unit and made improvements in the per-ception domain loss Researchers [15] have also proposed anovel generative adversarial frame to improve the edgestructure and texture information in the compressed imageESRGAN+ [16] designed a network architecture with novelbasic blocks to replace the basic structure used by the origi-nal ESRGAN In the field of infrared imaging SR recon-struction has been mostly achieved using sparse codingmethods [17ndash21]

Figure 1 Original image (left) and the image recovered from 4x downsampled images using the proposed method

G0

D0

G1

D1

128times128 256times256_fake

256times256_real

512times512_fake

01

01

Stage- I generator G0

Stage- I discriminator D0

Stage- II generator G1

Stage- II discriminator D1

512times512_real

Figure 2 Model architecture of the proposed method

2 International Journal of Digital Multimedia Broadcasting

In this study we propose a new GAN framework toimprove the perceived quality of infrared images Then wedesign a multiconstraint loss function by combining imagefidelity loss adversarial loss feature fidelity loss and edgefidelity loss By continuously updating and iterating the min-imization of loss functions we obtain the reconstructedimage with high resolution and sharp edges using the pro-posed method (Figure 1)

The main contributions of this paper are as follows

(i) We propose a GAN-based SR reconstruction frame-work for edge preservation of infrared images thatcan enhance the GAN to better restore the edge

structure of the infrared image while maintainingthe detailed information

(ii) To preserve the characteristics and the edgeinformation in the image we propose a multiple con-straints loss function applicable for SR reconstruction

(iii) We validate the proposed method using images frompublicly available datasets and compare the perfor-mance of the proposed method with that from othermainstream methods The results confirm that com-pared with other methods the proposed methodobtains more realistic reconstructed infrared SRimages with sharper edges

Inpu

t Conv

PReL

U

Conv

BN

PReL

U

Conv

BN

Elem

entw

ise su

m

k9n64s1 k3n64s1 k3n64s1

Conv

BN

Elem

entw

ise su

m

k3n64s1

Conv

Pixe

lShu

ffler

x2

PReL

U

k3n256s1

Conv

k9n3s1

Tanh

Resto

ratio

n

6 residual blocks

998661 998661 998661

Skip connection su

m

sum

er x

2

Figure 3 Network structure of the generator G0

Conv

Leak

y Re

LU

Inpu

t

k3n64s1

Conv

BNLe

aky

ReLU

k3n64s2 k3n128s1 k3n256s1 k3n512s1

k3n128s2 k3n256s2 k3n512s2

Conv

BNLe

aky

ReLU

k3n1024s1

Conv

BNLe

aky

ReLU

k3n1024s1

Conv

BNLe

aky

ReLU

k3n1024s1

Ada

ptiv

eAvg

Pool

Conv

Leak

y Re

LU

Sigm

oid

01

Conv

Inpu

t

Figure 4 Network structure of the discriminator D0

Inpu

t Conv

ReLU

Conv

BN ReLU

Conv

BN

Elem

entw

ise su

m

k3n64s1 k3n64s1 k3n64s1

BN

Elem

entw

ise su

m

Conv

ReLU

Conv

k1n3s1

Tanh

Resto

ratio

n

Feature extraction stack16 residual blocks

bull bull bull

Skip connection

Conv

ReLU

k3n256s1 k3n256s1

e sum

e sum

Figure 5 Network structure of the generator G1

3International Journal of Digital Multimedia Broadcasting

We describe the proposed network framework and lossfunction in Section 2 and their quantitative evaluations usingthe public datasets in Section 3 followed by recommendedfuture study on the proposed method and the conclusion

2 Method

To generate a high-resolution image with photograph-grade realistic details and inspired by the literature [2223] we propose a simple and effective two-layer GAN in

Ground-truth

SRCNN

ESPCN

SRGAN

ESRGAN

ESRGAN+

Ours

Figure 6 Comparison of the reconstructed results of images from the FLIR_ ADAS_1_3 validation set The images in the first row areoriginal images those in the second row are the reconstruction results using the SRCNN method those in the third row are thereconstruction results using the ESPCN method those in the fourth row are the reconstruction results using the SRGAN method those inthe fifth row are the reconstruction results using the ESRGAN method those in the sixth row are the reconstruction results using theESRGAN+ method and those in the last row are the reconstruction results using our proposed method

4 International Journal of Digital Multimedia Broadcasting

which the image generation process is divided into twostages (Figure 2)

21 Stage 1 GAN In Stage 1 we use the 128 times 128 low-resolution image ILR0 as input and export it to the Stage 1

generator G0 to generate a false 256 times 256 image (ISR0)which together with a real 256 times 256 image (IHR0) isimported to the Stage 1 discriminator D0 to identify theimage The core of the network structure of the generatorG0 in the proposed method is shown in Figure 3 We adopt

Ground-truth

SRCNN

ESPCN

SRGAN

ESRGAN

ESRGAN+

Ours

Figure 7 Comparison of the reconstructed results of images from the Itir_v1_0 validation set The images in the first row are original imagesthose in the second row are the reconstruction results using the SRCNNmethod those in the third row are the reconstruction results using theESPCNmethod those in the fourth row are the reconstruction results using the SRGANmethod those in the fifth row are the reconstructionresults using the ESRGANmethod those in the sixth row are the reconstruction results using the ESRGAN+method and those in the last roware the reconstruction results using our proposed method

5International Journal of Digital Multimedia Broadcasting

the network design of Ledig et al [10] and introduce the skipconnection which has been proven effective in training deepneural networks We adopt the residual block proposed inthe literature [24] to construct a neural network with sixresidual blocks used as a stack to extract features from theimage Recent work on single-image superresolution [25ndash28] pointed out that with the deepening of the network andthe training and testing phases under the GAN frameworkhallucination artifacts often appear In order to solve thisproblem we are following the method in [29] and adding abatch normalization layer (BN) to the network Each residualblock contains two convolutional layers with a kernel size of3 times 3 and 64 feature maps two BN layers and one parametricrectified linear unit (PReLU) layer [30] The specific settingsof each layer of the generative model are as follows

C PR 64eth THORN⟶ C 64eth THORNBN PReth THORNC 64eth THORNBN PReth THORNSC⟶ 6⋯⟶ C 64eth THORNBN PReth THORNSC⟶ C PR 256eth THORN⟶ C t 3eth THORN

eth1THORN

Here C(PR64) denotes a set of convolutional layers with64 feature maps and activation function PReLUC(64)BN(PR)C(64)BN(PR)SC represents a residual blockBN(PR) is a batch normalization layer with activation func-tion PReLU and SC denotes a skip connection There arein total 6 residual blocks C(t3) represents a convolutionallayer with 3 feature maps and activation function tanh

To distinguish the generated SR samples from real high-resolution (HR) samples we train the discriminator networkD0 whose overall framework is shown in Figure 4 and whichis adopted from the architectural framework summarized byRadford et al [31] in which LeakyReLU [32] activation isused to avoid maximum pooling of the entire network Thediscriminator network includes ten convolutional blocksAll blocks except for the first block contain a convolutionallayer a BN layer and a LeakyReLU layer The number of ker-nels within the filter increases continually from 64 in theVisual Geometry Group (VGG) network [33] to 1024 theneach time the number of kernels increases segmented convo-lution is used to lower the image resolution After that a spe-cial residual block containing two convolution layers and aLeakyReLU layer is connected The output of the last convo-lution unit is sent to a dense layer with an S-type activationfunction to get a true and false result The structure andparameters of each layer of the discriminator D0 networkare as follows

C LR 64eth THORN⟶ C 64eth THORNBN LReth THORN⟶ C 128eth THORNBN LReth THORN⟶ C 256eth THORNBN LReth THORN⟶ C 512eth THORNBN LReth THORN⟶ C 1024eth THORNBN LReth THORN⟶ C LR 1024eth THORN⟶D

eth2THORN

Here LR denotes the activation function LeakyReLU CethLR 64THORN denotes a set of convolutional layers with 64 featuremaps and activation function LeakyReLU Ceth64THORNBNethLRTHORNdenotes a set of convolutional layers with 64 feature mapsfollowed by batch-normalization with activation function

LeakyReLU and D is the dense layer for outputting The fea-ture maps were increased from 64 to 1024

L1 = λ1Ladv + λ2Lmse + λ3Ledge eth3THORN

Stage 1 Reconstruction Loss Function The loss functioncontains three parts adversarial loss image fidelity lossand edge fidelity loss (Eq (3)) These parts each capture dif-ferent perceptual characteristics of the reconstructed imageto obtain a more visually satisfactory reconstructed image

where the weight fλig is a trade-off parameter that isused to balance multiple loss components The first part isthe adversarial loss between the generator G0 and the dis-criminator D0 of GAN This part encourages the generatorto trick the discriminator network to produce a more realisticHR image as follows

Ladv = N

n=1minus log DθD

GθGILR0

eth4THORN

whereDθDethGθG

ethILR0THORNTHORN is the estimated probability of the recon-

structed image GθGethILR0THORN to be a true HR image To obtain a

better gradient we use the minimization minuslog DθDethGθG

ethILR0THORNTHORNto replace the minimizationlog frac121 minusDθD

ethGθGethILR0THORNTHORN

Table 1 Method comparison quantitative results of the SRCNNESPCN SRGAN ESRGAN and ESRGAN+ methods and theproposed method on the FLIR_ ADAS_1_3 validation dataset

CCPSNR(dB)

SSIM VIF UIQI TIME (s)

SRCNN 08160 183963 08655 07942 08770 27112

ESPCN 08627 277171 09349 08269 09398 10206

SRGAN 08463 280445 09497 08120 08503 09202

ESRGAN 09547 285453 09506 07918 09382 01093

ESRGAN+ 09806 337018 09855 09119 09864 02713

Ours 09964 379865 09980 09083 09965 01145

Table 2 Method comparison quantitative results of the SRCNNESPCN SRGAN ESRGAN and ESRGAN+ methods and theproposed method on the Itir_v1_0 dataset

CCPSNR(dB)

SSIM VIF UIQI TIME (s)

SRCNN 07937 202460 08966 07079 08746 25706

ESPCN 08464 221708 09185 08194 09877 10020

SRGAN 08152 226732 09154 07944 08610 06642

ESRGAN 09146 303205 09608 07752 09166 00937

ESRGAN+ 09547 341024 09844 08497 09541 02558

Ours 09962 352865 09943 09064 09936 01059

6 International Journal of Digital Multimedia Broadcasting

The second term of Eq (3) Lmse ensures the fidelity ofthe restored image using the pixel-level mean square error(MSE) loss as follows

Lmse = 1WHC IHR0 minus ISR0

22

eth5THORN

whereWH and C are the height the width and the numberof channels respectively of the image

The third term of Eq (3) Ledge the edge fidelity loss ispurported to reproduce sharp edge information as follows

Ledge = 1WH IandE minus IE

22

eth6THORN

whereW and H are the width and the height respectively ofthe image The labeled edge map IE is extracted by a specificedge filter on the real 256 times 256 image IHR0 while IandE isextracted by a specific edge filter on the 256 times 256 imageISR0 generated by the generator G0 In our experiments wechose the Canny edge detection operator By minimizingthe edge fidelity loss the network continuously guides edgerecovery

22 Stage 2 GAN In the second stage we use the generated256 times 256 low-resolution image ISR0 as input and export itto the Stage 2 generator G1 to generate a 512 times 512 imageISR1 which together with a real 512 times 512 image IHR1 isexported to the Stage 2 discriminator D1 to identify if it isfalse or real We adopt the network design described in theliterature [15] in which a generator model G1 (Figure 5)

SRCNN

ESPCN

SRGAN

ESRGAN

ESRGAN+

Ours

Figure 8 Comparison of edge detection results of superresolution reconstruction results

7International Journal of Digital Multimedia Broadcasting

containing 16 residual blocks is constructed The output for-mat and parameters of each layer of the generator G1 are asfollows

C R 64eth THORN⟶ C 64eth THORNBN Reth THORNC 64eth THORNBN Reth THORNSC⟶ 16⋯⟶ C R 64eth THORNSC⟶ C 256eth THORN⟶ C t 3eth THORN

eth7THORN

where C(R64) denotes a set of convolutional layers with64 feature maps and activation function ReLUC(64)BN(R)C(64)BN(R)SC represents a residual block

The network structure of the discriminator networkD1 inthe second stage adopts a network structure similar to that ofthe discriminator D0 Each layer structure and networkparameters of D1 are as follows

C LR 64eth THORN⟶ C 64eth THORNBN LReth THORN⟶ C 128eth THORNBN LReth THORN⟶ C 256eth THORNBN LReth THORN⟶ C 512eth THORNBN LReth THORN⟶ C 1024eth THORNBN LReth THORN⟶ C LR 1024eth THORN⟶D

eth8THORN

L2 = λ1primeLadv1 + λ2primeLmse1 + λ3primeL feature eth9THORNwhere the weight fλiprimeg is a trade-off parameter that is used tobalance multiple loss components The first term Ladv1 isthe adversarial loss between the generator G1 and the dis-criminator D1 of GAN The second termLmse1 is the imagefidelity loss The third term L feature is the feature fidelity

loss which is defined based on the feature space distancedefined in the literature [34] to facilitate the preservationof feature representation in the reconstructed image similarto that of the original image as follows

L feature = 1WHC ϕ IHR0 minus ISR0

22

eth10THORN

whereWH and C are the height the width and the numberof channels respectively of the image and ϕethsdotTHORN represents thefeature space function which is a pretrained VGG-19 [33]network that maps images to feature space The fourth pool-ing layer is used to calculate the feature-activated L2 distanceto be used as the feature fidelity loss function

Stage 2 Reconstruction Loss Function The loss functioncontains three parts adversarial loss image fidelity lossand feature fidelity loss (Eq (9))

3 Experiments

31 Experimental Details All experiments are performed ona desktop computer with 220GHz times40 Intel Xeon (R) Silver4114 CPU GeForce GTX 1080Ti and 64GB of memory Weuse the PyTorch deep learning framework to implement themethod proposed in this article and all comparison methodsAll methods are trained with the FLIR training data set Dur-ing the training process we choose 8862 images from theFLIR_ADAS_1_3 thermal sensor training dataset released

(a) SRCNN super-resolution reconstruction image matching results (b) ESPCN super-resolution reconstruction image matching results

(c) SRGAN super-resolution reconstruction image matching results (d) ESRGAN super-resolution reconstruction image matching results

(e) ESRGAN+ super-resolution reconstruction image matching results (f) Our method super-resolution reconstruction image matching results

Figure 9 Superresolution reconstruction image matching results (the left image is the original high-resolution image)

8 International Journal of Digital Multimedia Broadcasting

by FLIR Systems Inc a sensor system developer in 2018First we perform the 4x factor downsampling on all experi-mental data to obtain low-resolution (LR) images by lower-ing the resolution of the HR images We set the batch sizeto 4 and used the Adam [35] with a momentum term of β= 09 as the optimization procedure To enable the loss func-tion in the same order of magnitude to better balance losscomponents we set λ1λ2prime and λ3prime of Eq (9) to 10-3 1 and10-6 respectively When training the Stage 1 GAN we set

the learning rate to 10-4 which is lowered to 10-5 when train-ing the Stage 2 GAN

32 Experimental Evaluation To verify the effectiveness of theproposed method we conduct the validation on two publicdata sets the FLIR_ ADAS_1_3 (1366 images) validation setand the Itir_v1_0 dataset (11262 images) We compared theproposed method with the most advanced methods includingsuperresolution using deep convolutional networks (SRCNN)

LR

SRCNN

ESPCN

SRGAN

ESRGAN

ESRGAN+

Our method

Figure 10 Target detection result of superresolution reconstructed image

9International Journal of Digital Multimedia Broadcasting

[5] ESPCN [8] SRGAN [10] ESRGAN [15] and ESRGAN+[16] methods Three images are selected from the verificationset of FLIR_ADAS_1_3 and the data set of Itir_v1_0 and thesubjective results of several methods are shown in Figures 6and 7 From the reconstruction results it is not difficult tosee that the reconstruction results of our proposed methodproduce finer texture and edge details

To facilitate a fair quantitative comparison we used thecorrelation coefficient (CC) [36] the peak-signal-to-noiseratio (PSNR) [37] the Structural Similarity Index Measure(SSIM) [38] the visual information fidelity (VIF) [39] theUniversal Image Quality Index (UIQI) and the time con-sumption six objective indicators to evaluate the quality ofthe reconstructed images and the SR methods The quantita-tive results of the comparison of different reconstructionmethods are shown in Table 1 which shows that our methodis superior to the SRCNN [5] ESPCN [8] SRGAN [13] ESR-GAN [15] and ESRGAN+ [16] methods in the indicators ofCC PSNR SSIM and UIQI on the FLIR data set The VIFindex is slightly lower than the ESRGAN+ method and thetime consumption is greater than the ESRGAN method

The quantitative results of comparison of different recon-struction methods on the Itir_v1_0 dataset are shown inTable 2 indicating that the proposed method is superior tothe SRCNN [5] ESPCN [8] SRGAN [13] ESRGAN [15]and ESRGAN+ [16] methods on the Itir_v1_0 dataset Onlythe time consumption is slightly greater than the ESRGANmethod

In order to more intuitively illustrate the effectiveness ofthe superresolution method proposed in this work forimproving the edge features of infrared images we show inFigure 8 the comparison of the edge detection results of thesuperresolution reconstruction results of various methodsIt can be seen from the figure that the image reconstructedby our method has more and clearer edge information whichis meaningful for the application of infrared images

33 Use Advanced Vision Tasks to Compare SuperresolutionResults Basic vision tasks including image superresolutionreconstruction are all for advanced vision tasks Infraredimages are widely used in target detection and target match-ing tasks but the shortcomings of low resolution and unclearedges of infrared images affect the accuracy of the abovetasks Therefore whether the result of infrared image super-resolution reconstruction can improve the accuracy of theabove tasks has become an evaluation index of the result ofsuperresolution reconstruction In order to further verifyour method we match the superresolution images generatedby several methods with real high-resolution images Scale

Invariant Feature Transform (SIFT) is a representation ofthe statistical results of Gaussian image gradients in the fieldof feature points and is a commonly used image local featureextraction algorithm In the matching result the number ofmatching points can be used as a criterion for matching qual-ity and the corresponding matching points can also deter-mine the similarity of the local features of the two imagesFigure 9 shows the result of matching the superresolutionreconstructed image with the original high-resolution imagethrough the SIFT algorithm It can be seen from the quantitythat the reconstructed image produced by our proposedmethod obtains more correct matching pairs than othermethods

In this experiment we use the classic YOLO [40] methodfor image target detection (Figure 10) It can be seen that thesuperresolution reconstructed image generated by our pro-posed method has better detection results and can detectmore targets

4 Discussion and Future Work

Through experiments we demonstrate that compared withother methods the proposed method has a better perceptualperformance However during the experiment we alsofound that for some images the reconstruction results arenot satisfactory as shown in Figure 11 By analyzing theseimages we found that they have common characteristicsie when the imaging device and the imaging object are mov-ing at high relative speeds the captured image may containmotion blur For such images ordinary SR reconstructionmethods cannot achieve effective edge recovery Thereforein future studies we will address these problems

5 Conclusion

In this study we propose a two-stage GAN framework that isable to reconstruct SR image by recovering edge structureinformation and retaining feature information In the firststage the image fidelity loss the adversarial loss and the edgefidelity loss are combined to preserve the edges of the imageIn the second stage the image adversarial loss the imagefidelity loss and the feature fidelity loss are combined to mineimage visual features By iteratively updating the generativenetwork and the discriminator network the SR reconstruc-tion of infrared image with preserved edges is achievedExperimental verification results show that the proposedmethod outperforms several other image reconstructionmethods in reconstructing SR infrared images

Figure 11 Images with motion blur cannot obtain clear edge information using the ordinary SR reconstruction methods

10 International Journal of Digital Multimedia Broadcasting

Data Availability

The data used to support the findings of this study are avail-able from the corresponding author upon request

Conflicts of Interest

The authors declare that there is no conflict of interestregarding the publication of this paper

References

[1] L Xiang-di et al ldquoInfrared imaging system and applicationsrdquoLaser amp Infrared vol 3 2014

[2] X Sui Q Chen G Gu and X Shen ldquoInfrared super-resolution imaging based on compressed sensingrdquo InfraredPhysics amp Technology vol 63 pp 119ndash124 2014

[3] X Sui H Gao Y Sun Q Chen and G Gu ldquoInfrared super-resolution imaging method based on retina micro-motionrdquoInfrared Physics amp Technology vol 60 pp 340ndash345 2013

[4] J L Suo J I Xiangyang and Q H Dai ldquoAn overview of com-putational photographyrdquo Science China(Information Sciences)vol 6 pp 5ndash24 2012

[5] D Chao C L Chen K He and X Tang ldquoLearning a DeepConvolutional Network for Image Super-Resolutionrdquo in Euro-pean Conference on Computer Vision Springer 2014

[6] C Dong C C Loy K He and X Tang ldquoImage super-resolution using deep convolutional networksrdquo IEEE Transac-tions on Pattern Analysis and Machine Intelligence vol 38no 2 pp 295ndash307 2014

[7] J Kim J Lee and K Lee Deeply-Recursive Convolutional Net-work for Image Super-Resolution 2016 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) IEEE pp1637ndash1645 2016

[8] W Shi J Caballero F Huszar et al ldquoReal-Time Single Imageand Video Super-Resolution Using an Efficient Sub-Pixel Con-volutional Neural Networkrdquo in 2016 IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR) Las Vegas NVUSA 2016

[9] I Goodfellow J Pouget-Abadie M Mirza et al ldquoGenerativeAdversarial Networksrdquo Advances in Neural Information Pro-cessing Systems vol 3 pp 2672ndash2680 2014

[10] C Wang S Dong X Zhao G Papanastasiou H Zhang andG Yang ldquoSaliencyGAN deep learning semisupervised salientobject detection in the fog of IoTrdquo IEEE Transactions onIndustrial Informatics vol 16 no 4 pp 2667ndash2676 2020

[11] G Yang S Yu H Dong et al ldquoDAGAN deep de-aliasing gen-erative adversarial networks for fast compressed sensing MRIreconstructionrdquo IEEE Transactions on Medical Imagingvol 37 no 6 pp 1310ndash1321 2018

[12] S Yu D Hao G Yang G Slabaugh and Y Guo ldquoDeep De-Aliasing for Fast Compressive Sensing MRIrdquo 2017 httpsarxivorgabs170507137

[13] C Ledig L Theis F Huszar et al ldquoPhoto-Realistic SingleImage Super-Resolution Using a Generative Adversarial Net-workrdquo in 2017 IEEE Conference on Computer Vision and Pat-tern Recognition (CVPR) Honolulu HI USA 2016

[14] X Wang K Yu S Wu et al ldquoESRGAN Enhanced Super-Resolution Generative Adversarial Networksrdquo in LectureNotes in Computer Science Springer 2018

[15] Q Mao S Wang S Wang X Zhang and S Ma ldquoEnhancedImage Decoding Via Edge-Preserving Generative AdversarialNetworksrdquo in 2018 IEEE International Conference on Multi-media and Expo (ICME) San Diego CA USA 2018

[16] N C Rakotonirina and A Rasoanaivo ldquoESRGAN+ furtherimproving enhanced super-resolution generative adversarialnetworkrdquo in ICASSP 2020 - 2020 IEEE international confer-ence on acoustics Speech and Signal Processing (ICASSP)pp 3637ndash3641 Barcelona Spain 2020

[17] C Kraichan and S Pumrin ldquoPerformance Analysis on Multi-Frame Image Super-Resolution Via Sparse Representationrdquo in2014 International Electrical Engineering Congress (iEECON)Chonburi Thailand 2014

[18] S Yu-bao W Zhi-hui X Liang Z Zhen-rong and L Zhan-Qiang ldquoMultimorphology Sparsity regularized image super-resolutionrdquo Acta Electronica Sinica vol 38 no 12 pp 2898ndash2903 2010

[19] S Yang M Wang Y Chen and Y Sun ldquoSingle-image super-resolution reconstruction via learned geometric dictionariesand clustered sparse codingrdquo IEEE Transactions on Image Pro-cessing vol 21 no 9 pp 4016ndash4028 2012

[20] Y Tang Y Yuan P Yan and X Li ldquoGreedy regression insparse coding space for single-image super-resolutionrdquo Jour-nal of Visual Communication amp Image Representationvol 24 no 2 pp 148ndash159 2013

[21] L Qiu-sheng and Z Wei ldquoImage super-resolution algorithmsbased on sparse representation of classified image patchesrdquoActa Electronica Sinica vol 40 no 5 pp 920ndash925 2012

[22] H Zhang T Xu H Li et al ldquoStackGAN Text to Photo-Realistic Image Synthesis with Stacked Generative AdversarialNetworksrdquo in 2017 IEEE International Conference on Com-puter Vision (ICCV) Venice Italy 2017

[23] H Zhang T Xu H Li et al ldquoStackGAN++ realistic imagesynthesis with stacked generative adversarial networksrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 41 pp 1947ndash1962 2018

[24] K He X Zhang S Ren and J Sun ldquoDeep Residual Learningfor Image Recognitionrdquo in 2016 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) pp 770ndash778 LasVegas NV USA 2016

[25] J Zhu G Yang P Ferreira et al ldquoA Roi Focused Multi-Scale Super-Resolution Method for the Diffusion TensorCardiac Magnetic Resonancerdquo in Proceedings of the 27thAnnual Meeting (ISMRM) pp 11ndash16 Montreacuteal QC Can-ada 2019

[26] J Zhu G Yang and P Lio ldquoLesion Focused Super Resolu-tionrdquo in Medical Imaging Image Processing 2019 San DiegoCA USA 2019

[27] J Zhu G Yang and P Lio ldquoHow Can We Make GAN Per-form Better in Single Medical Image Super-Resolution ALesion Focused Multi-Scale Approachrdquo in 2019 IEEE 16thInternational Symposium on Biomedical Imaging (ISBI 2019)Venice Italy 2019

[28] M Seitzer G Yang J Schlemper et al ldquoAdversarial and Per-ceptual Refinement for Compressed Sensing MRI Reconstruc-tionrdquo in Medical Image Computing and Computer AssistedIntervention ndash MICCAI 2018 Springer 2018

[29] S Ioffe and C Szegedy ldquoBatch Normalization AcceleratingDeep Network Training by Reducing Internal CovariateShiftrdquo in International conference on machine learningPMLR 2015

11International Journal of Digital Multimedia Broadcasting

[30] A Krizhevsky I Sutskever and G Hinton ldquoImageNet classi-fication with deep convolutional neural networksrdquo Advancesin Neural Information Processing Systems vol 25 no 2 2012

[31] A Radford L Metz and S Chintala ldquoUnsupervised Represen-tation Learning with Deep Convolutional Generative Adver-sarial Networksrdquo 2015 httpsarxivorgabs151106434

[32] A L Maas A Y Hannun and A Y Ng ldquoRectifier nonlinear-ities improve neural network acoustic modelsrdquo in Proceedingsof the 30 th International Conference on Machine Learningp 3 Atlanta Georgia USA 2013

[33] K Simonyan and A Zisserman ldquoVery Deep ConvolutionalNetworks for Large-Scale Image Recognitionrdquo Computer Sci-ence 2014 httpsarxivorgabs14091556

[34] J Johnson A Alahi and L Fei-Fei ldquoPerceptual Losses forReal-Time Style Transfer and Super-Resolutionrdquo in ComputerVision ndash ECCV 2016 Springer 2016

[35] D P Kingma and J Ba ldquoAdam AMethod for Stochastic Opti-mizationrdquo Computer Science 2014 httpsarxivorgabs14126980

[36] M Deshmukh and U Bhosale ldquoImage fusion and image qual-ity assessment of fused imagesrdquo International Journal of ImageProcessing (IJIP) vol 4 no 5 p 484 2010

[37] C Yim and A C Bovik ldquoQuality assessment of deblockedimagesrdquo IEEE Transactions on Image Processing vol 20no 1 pp 88ndash98 2011

[38] Z Wang A C Bovik H R Sheikh and E P SimoncellildquoImage quality assessment from error visibility to structuralsimilarityrdquo IEEE Transactions on Image Processing vol 13no 4 pp 600ndash612 2004

[39] Y Han Y Cai Y Cao and X Xu ldquoA new image fusion perfor-mance metric based on visual information fidelityrdquo Informa-tion fusion vol 14 no 2 pp 127ndash135 2013

[40] J Redmon S Divvala R Girshick and A Farhadi ldquoYou OnlyLook Once Unified Real-Time Object Detectionrdquo in 2016IEEE Conference on Computer Vision and Pattern Recognition(CVPR) Las Vegas NV USA 2015

12 International Journal of Digital Multimedia Broadcasting

  • Generative Adversarial Network-Based Edge-Preserving Superresolution Reconstruction of Infrared Images
  • 1 Introduction
  • 2 Method
    • 21 Stage 1 GAN
    • 22 Stage 2 GAN
      • 3 Experiments
        • 31 Experimental Details
        • 32 Experimental Evaluation
        • 33 Use Advanced Vision Tasks to Compare Superresolution Results
          • 4 Discussion and Future Work
          • 5 Conclusion
          • Data Availability
          • Conflicts of Interest
Page 3: Generative Adversarial Network-Based Edge-Preserving ......2.1. Stage 1 GAN. In Stage 1, we use the 128×128 low-resolution image ILR0 as input and export it to the Stage 1 generator

In this study we propose a new GAN framework toimprove the perceived quality of infrared images Then wedesign a multiconstraint loss function by combining imagefidelity loss adversarial loss feature fidelity loss and edgefidelity loss By continuously updating and iterating the min-imization of loss functions we obtain the reconstructedimage with high resolution and sharp edges using the pro-posed method (Figure 1)

The main contributions of this paper are as follows

(i) We propose a GAN-based SR reconstruction frame-work for edge preservation of infrared images thatcan enhance the GAN to better restore the edge

structure of the infrared image while maintainingthe detailed information

(ii) To preserve the characteristics and the edgeinformation in the image we propose a multiple con-straints loss function applicable for SR reconstruction

(iii) We validate the proposed method using images frompublicly available datasets and compare the perfor-mance of the proposed method with that from othermainstream methods The results confirm that com-pared with other methods the proposed methodobtains more realistic reconstructed infrared SRimages with sharper edges

Inpu

t Conv

PReL

U

Conv

BN

PReL

U

Conv

BN

Elem

entw

ise su

m

k9n64s1 k3n64s1 k3n64s1

Conv

BN

Elem

entw

ise su

m

k3n64s1

Conv

Pixe

lShu

ffler

x2

PReL

U

k3n256s1

Conv

k9n3s1

Tanh

Resto

ratio

n

6 residual blocks

998661 998661 998661

Skip connection su

m

sum

er x

2

Figure 3 Network structure of the generator G0

Conv

Leak

y Re

LU

Inpu

t

k3n64s1

Conv

BNLe

aky

ReLU

k3n64s2 k3n128s1 k3n256s1 k3n512s1

k3n128s2 k3n256s2 k3n512s2

Conv

BNLe

aky

ReLU

k3n1024s1

Conv

BNLe

aky

ReLU

k3n1024s1

Conv

BNLe

aky

ReLU

k3n1024s1

Ada

ptiv

eAvg

Pool

Conv

Leak

y Re

LU

Sigm

oid

01

Conv

Inpu

t

Figure 4 Network structure of the discriminator D0

Inpu

t Conv

ReLU

Conv

BN ReLU

Conv

BN

Elem

entw

ise su

m

k3n64s1 k3n64s1 k3n64s1

BN

Elem

entw

ise su

m

Conv

ReLU

Conv

k1n3s1

Tanh

Resto

ratio

n

Feature extraction stack16 residual blocks

bull bull bull

Skip connection

Conv

ReLU

k3n256s1 k3n256s1

e sum

e sum

Figure 5 Network structure of the generator G1

3International Journal of Digital Multimedia Broadcasting

We describe the proposed network framework and lossfunction in Section 2 and their quantitative evaluations usingthe public datasets in Section 3 followed by recommendedfuture study on the proposed method and the conclusion

2 Method

To generate a high-resolution image with photograph-grade realistic details and inspired by the literature [2223] we propose a simple and effective two-layer GAN in

Ground-truth

SRCNN

ESPCN

SRGAN

ESRGAN

ESRGAN+

Ours

Figure 6 Comparison of the reconstructed results of images from the FLIR_ ADAS_1_3 validation set The images in the first row areoriginal images those in the second row are the reconstruction results using the SRCNN method those in the third row are thereconstruction results using the ESPCN method those in the fourth row are the reconstruction results using the SRGAN method those inthe fifth row are the reconstruction results using the ESRGAN method those in the sixth row are the reconstruction results using theESRGAN+ method and those in the last row are the reconstruction results using our proposed method

4 International Journal of Digital Multimedia Broadcasting

which the image generation process is divided into twostages (Figure 2)

21 Stage 1 GAN In Stage 1 we use the 128 times 128 low-resolution image ILR0 as input and export it to the Stage 1

generator G0 to generate a false 256 times 256 image (ISR0)which together with a real 256 times 256 image (IHR0) isimported to the Stage 1 discriminator D0 to identify theimage The core of the network structure of the generatorG0 in the proposed method is shown in Figure 3 We adopt

Ground-truth

SRCNN

ESPCN

SRGAN

ESRGAN

ESRGAN+

Ours

Figure 7 Comparison of the reconstructed results of images from the Itir_v1_0 validation set The images in the first row are original imagesthose in the second row are the reconstruction results using the SRCNNmethod those in the third row are the reconstruction results using theESPCNmethod those in the fourth row are the reconstruction results using the SRGANmethod those in the fifth row are the reconstructionresults using the ESRGANmethod those in the sixth row are the reconstruction results using the ESRGAN+method and those in the last roware the reconstruction results using our proposed method

5International Journal of Digital Multimedia Broadcasting

the network design of Ledig et al [10] and introduce the skipconnection which has been proven effective in training deepneural networks We adopt the residual block proposed inthe literature [24] to construct a neural network with sixresidual blocks used as a stack to extract features from theimage Recent work on single-image superresolution [25ndash28] pointed out that with the deepening of the network andthe training and testing phases under the GAN frameworkhallucination artifacts often appear In order to solve thisproblem we are following the method in [29] and adding abatch normalization layer (BN) to the network Each residualblock contains two convolutional layers with a kernel size of3 times 3 and 64 feature maps two BN layers and one parametricrectified linear unit (PReLU) layer [30] The specific settingsof each layer of the generative model are as follows

C PR 64eth THORN⟶ C 64eth THORNBN PReth THORNC 64eth THORNBN PReth THORNSC⟶ 6⋯⟶ C 64eth THORNBN PReth THORNSC⟶ C PR 256eth THORN⟶ C t 3eth THORN

eth1THORN

Here C(PR64) denotes a set of convolutional layers with64 feature maps and activation function PReLUC(64)BN(PR)C(64)BN(PR)SC represents a residual blockBN(PR) is a batch normalization layer with activation func-tion PReLU and SC denotes a skip connection There arein total 6 residual blocks C(t3) represents a convolutionallayer with 3 feature maps and activation function tanh

To distinguish the generated SR samples from real high-resolution (HR) samples we train the discriminator networkD0 whose overall framework is shown in Figure 4 and whichis adopted from the architectural framework summarized byRadford et al [31] in which LeakyReLU [32] activation isused to avoid maximum pooling of the entire network Thediscriminator network includes ten convolutional blocksAll blocks except for the first block contain a convolutionallayer a BN layer and a LeakyReLU layer The number of ker-nels within the filter increases continually from 64 in theVisual Geometry Group (VGG) network [33] to 1024 theneach time the number of kernels increases segmented convo-lution is used to lower the image resolution After that a spe-cial residual block containing two convolution layers and aLeakyReLU layer is connected The output of the last convo-lution unit is sent to a dense layer with an S-type activationfunction to get a true and false result The structure andparameters of each layer of the discriminator D0 networkare as follows

C LR 64eth THORN⟶ C 64eth THORNBN LReth THORN⟶ C 128eth THORNBN LReth THORN⟶ C 256eth THORNBN LReth THORN⟶ C 512eth THORNBN LReth THORN⟶ C 1024eth THORNBN LReth THORN⟶ C LR 1024eth THORN⟶D

eth2THORN

Here LR denotes the activation function LeakyReLU CethLR 64THORN denotes a set of convolutional layers with 64 featuremaps and activation function LeakyReLU Ceth64THORNBNethLRTHORNdenotes a set of convolutional layers with 64 feature mapsfollowed by batch-normalization with activation function

LeakyReLU and D is the dense layer for outputting The fea-ture maps were increased from 64 to 1024

L1 = λ1Ladv + λ2Lmse + λ3Ledge eth3THORN

Stage 1 Reconstruction Loss Function The loss functioncontains three parts adversarial loss image fidelity lossand edge fidelity loss (Eq (3)) These parts each capture dif-ferent perceptual characteristics of the reconstructed imageto obtain a more visually satisfactory reconstructed image

where the weight fλig is a trade-off parameter that isused to balance multiple loss components The first part isthe adversarial loss between the generator G0 and the dis-criminator D0 of GAN This part encourages the generatorto trick the discriminator network to produce a more realisticHR image as follows

Ladv = N

n=1minus log DθD

GθGILR0

eth4THORN

whereDθDethGθG

ethILR0THORNTHORN is the estimated probability of the recon-

structed image GθGethILR0THORN to be a true HR image To obtain a

better gradient we use the minimization minuslog DθDethGθG

ethILR0THORNTHORNto replace the minimizationlog frac121 minusDθD

ethGθGethILR0THORNTHORN

Table 1 Method comparison quantitative results of the SRCNNESPCN SRGAN ESRGAN and ESRGAN+ methods and theproposed method on the FLIR_ ADAS_1_3 validation dataset

CCPSNR(dB)

SSIM VIF UIQI TIME (s)

SRCNN 08160 183963 08655 07942 08770 27112

ESPCN 08627 277171 09349 08269 09398 10206

SRGAN 08463 280445 09497 08120 08503 09202

ESRGAN 09547 285453 09506 07918 09382 01093

ESRGAN+ 09806 337018 09855 09119 09864 02713

Ours 09964 379865 09980 09083 09965 01145

Table 2 Method comparison quantitative results of the SRCNNESPCN SRGAN ESRGAN and ESRGAN+ methods and theproposed method on the Itir_v1_0 dataset

CCPSNR(dB)

SSIM VIF UIQI TIME (s)

SRCNN 07937 202460 08966 07079 08746 25706

ESPCN 08464 221708 09185 08194 09877 10020

SRGAN 08152 226732 09154 07944 08610 06642

ESRGAN 09146 303205 09608 07752 09166 00937

ESRGAN+ 09547 341024 09844 08497 09541 02558

Ours 09962 352865 09943 09064 09936 01059

6 International Journal of Digital Multimedia Broadcasting

The second term of Eq (3) Lmse ensures the fidelity ofthe restored image using the pixel-level mean square error(MSE) loss as follows

Lmse = 1WHC IHR0 minus ISR0

22

eth5THORN

whereWH and C are the height the width and the numberof channels respectively of the image

The third term of Eq (3) Ledge the edge fidelity loss ispurported to reproduce sharp edge information as follows

Ledge = 1WH IandE minus IE

22

eth6THORN

whereW and H are the width and the height respectively ofthe image The labeled edge map IE is extracted by a specificedge filter on the real 256 times 256 image IHR0 while IandE isextracted by a specific edge filter on the 256 times 256 imageISR0 generated by the generator G0 In our experiments wechose the Canny edge detection operator By minimizingthe edge fidelity loss the network continuously guides edgerecovery

22 Stage 2 GAN In the second stage we use the generated256 times 256 low-resolution image ISR0 as input and export itto the Stage 2 generator G1 to generate a 512 times 512 imageISR1 which together with a real 512 times 512 image IHR1 isexported to the Stage 2 discriminator D1 to identify if it isfalse or real We adopt the network design described in theliterature [15] in which a generator model G1 (Figure 5)

SRCNN

ESPCN

SRGAN

ESRGAN

ESRGAN+

Ours

Figure 8 Comparison of edge detection results of superresolution reconstruction results

7International Journal of Digital Multimedia Broadcasting

containing 16 residual blocks is constructed The output for-mat and parameters of each layer of the generator G1 are asfollows

C R 64eth THORN⟶ C 64eth THORNBN Reth THORNC 64eth THORNBN Reth THORNSC⟶ 16⋯⟶ C R 64eth THORNSC⟶ C 256eth THORN⟶ C t 3eth THORN

eth7THORN

where C(R64) denotes a set of convolutional layers with64 feature maps and activation function ReLUC(64)BN(R)C(64)BN(R)SC represents a residual block

The network structure of the discriminator networkD1 inthe second stage adopts a network structure similar to that ofthe discriminator D0 Each layer structure and networkparameters of D1 are as follows

C LR 64eth THORN⟶ C 64eth THORNBN LReth THORN⟶ C 128eth THORNBN LReth THORN⟶ C 256eth THORNBN LReth THORN⟶ C 512eth THORNBN LReth THORN⟶ C 1024eth THORNBN LReth THORN⟶ C LR 1024eth THORN⟶D

eth8THORN

L2 = λ1primeLadv1 + λ2primeLmse1 + λ3primeL feature eth9THORNwhere the weight fλiprimeg is a trade-off parameter that is used tobalance multiple loss components The first term Ladv1 isthe adversarial loss between the generator G1 and the dis-criminator D1 of GAN The second termLmse1 is the imagefidelity loss The third term L feature is the feature fidelity

loss which is defined based on the feature space distancedefined in the literature [34] to facilitate the preservationof feature representation in the reconstructed image similarto that of the original image as follows

L feature = 1WHC ϕ IHR0 minus ISR0

22

eth10THORN

whereWH and C are the height the width and the numberof channels respectively of the image and ϕethsdotTHORN represents thefeature space function which is a pretrained VGG-19 [33]network that maps images to feature space The fourth pool-ing layer is used to calculate the feature-activated L2 distanceto be used as the feature fidelity loss function

Stage 2 Reconstruction Loss Function The loss functioncontains three parts adversarial loss image fidelity lossand feature fidelity loss (Eq (9))

3 Experiments

31 Experimental Details All experiments are performed ona desktop computer with 220GHz times40 Intel Xeon (R) Silver4114 CPU GeForce GTX 1080Ti and 64GB of memory Weuse the PyTorch deep learning framework to implement themethod proposed in this article and all comparison methodsAll methods are trained with the FLIR training data set Dur-ing the training process we choose 8862 images from theFLIR_ADAS_1_3 thermal sensor training dataset released

(a) SRCNN super-resolution reconstruction image matching results (b) ESPCN super-resolution reconstruction image matching results

(c) SRGAN super-resolution reconstruction image matching results (d) ESRGAN super-resolution reconstruction image matching results

(e) ESRGAN+ super-resolution reconstruction image matching results (f) Our method super-resolution reconstruction image matching results

Figure 9 Superresolution reconstruction image matching results (the left image is the original high-resolution image)

8 International Journal of Digital Multimedia Broadcasting

by FLIR Systems Inc a sensor system developer in 2018First we perform the 4x factor downsampling on all experi-mental data to obtain low-resolution (LR) images by lower-ing the resolution of the HR images We set the batch sizeto 4 and used the Adam [35] with a momentum term of β= 09 as the optimization procedure To enable the loss func-tion in the same order of magnitude to better balance losscomponents we set λ1λ2prime and λ3prime of Eq (9) to 10-3 1 and10-6 respectively When training the Stage 1 GAN we set

the learning rate to 10-4 which is lowered to 10-5 when train-ing the Stage 2 GAN

32 Experimental Evaluation To verify the effectiveness of theproposed method we conduct the validation on two publicdata sets the FLIR_ ADAS_1_3 (1366 images) validation setand the Itir_v1_0 dataset (11262 images) We compared theproposed method with the most advanced methods includingsuperresolution using deep convolutional networks (SRCNN)

LR

SRCNN

ESPCN

SRGAN

ESRGAN

ESRGAN+

Our method

Figure 10 Target detection result of superresolution reconstructed image

9International Journal of Digital Multimedia Broadcasting

[5] ESPCN [8] SRGAN [10] ESRGAN [15] and ESRGAN+[16] methods Three images are selected from the verificationset of FLIR_ADAS_1_3 and the data set of Itir_v1_0 and thesubjective results of several methods are shown in Figures 6and 7 From the reconstruction results it is not difficult tosee that the reconstruction results of our proposed methodproduce finer texture and edge details

To facilitate a fair quantitative comparison we used thecorrelation coefficient (CC) [36] the peak-signal-to-noiseratio (PSNR) [37] the Structural Similarity Index Measure(SSIM) [38] the visual information fidelity (VIF) [39] theUniversal Image Quality Index (UIQI) and the time con-sumption six objective indicators to evaluate the quality ofthe reconstructed images and the SR methods The quantita-tive results of the comparison of different reconstructionmethods are shown in Table 1 which shows that our methodis superior to the SRCNN [5] ESPCN [8] SRGAN [13] ESR-GAN [15] and ESRGAN+ [16] methods in the indicators ofCC PSNR SSIM and UIQI on the FLIR data set The VIFindex is slightly lower than the ESRGAN+ method and thetime consumption is greater than the ESRGAN method

The quantitative results of comparison of different recon-struction methods on the Itir_v1_0 dataset are shown inTable 2 indicating that the proposed method is superior tothe SRCNN [5] ESPCN [8] SRGAN [13] ESRGAN [15]and ESRGAN+ [16] methods on the Itir_v1_0 dataset Onlythe time consumption is slightly greater than the ESRGANmethod

In order to more intuitively illustrate the effectiveness ofthe superresolution method proposed in this work forimproving the edge features of infrared images we show inFigure 8 the comparison of the edge detection results of thesuperresolution reconstruction results of various methodsIt can be seen from the figure that the image reconstructedby our method has more and clearer edge information whichis meaningful for the application of infrared images

33 Use Advanced Vision Tasks to Compare SuperresolutionResults Basic vision tasks including image superresolutionreconstruction are all for advanced vision tasks Infraredimages are widely used in target detection and target match-ing tasks but the shortcomings of low resolution and unclearedges of infrared images affect the accuracy of the abovetasks Therefore whether the result of infrared image super-resolution reconstruction can improve the accuracy of theabove tasks has become an evaluation index of the result ofsuperresolution reconstruction In order to further verifyour method we match the superresolution images generatedby several methods with real high-resolution images Scale

Invariant Feature Transform (SIFT) is a representation ofthe statistical results of Gaussian image gradients in the fieldof feature points and is a commonly used image local featureextraction algorithm In the matching result the number ofmatching points can be used as a criterion for matching qual-ity and the corresponding matching points can also deter-mine the similarity of the local features of the two imagesFigure 9 shows the result of matching the superresolutionreconstructed image with the original high-resolution imagethrough the SIFT algorithm It can be seen from the quantitythat the reconstructed image produced by our proposedmethod obtains more correct matching pairs than othermethods

In this experiment we use the classic YOLO [40] methodfor image target detection (Figure 10) It can be seen that thesuperresolution reconstructed image generated by our pro-posed method has better detection results and can detectmore targets

4 Discussion and Future Work

Through experiments we demonstrate that compared withother methods the proposed method has a better perceptualperformance However during the experiment we alsofound that for some images the reconstruction results arenot satisfactory as shown in Figure 11 By analyzing theseimages we found that they have common characteristicsie when the imaging device and the imaging object are mov-ing at high relative speeds the captured image may containmotion blur For such images ordinary SR reconstructionmethods cannot achieve effective edge recovery Thereforein future studies we will address these problems

5 Conclusion

In this study we propose a two-stage GAN framework that isable to reconstruct SR image by recovering edge structureinformation and retaining feature information In the firststage the image fidelity loss the adversarial loss and the edgefidelity loss are combined to preserve the edges of the imageIn the second stage the image adversarial loss the imagefidelity loss and the feature fidelity loss are combined to mineimage visual features By iteratively updating the generativenetwork and the discriminator network the SR reconstruc-tion of infrared image with preserved edges is achievedExperimental verification results show that the proposedmethod outperforms several other image reconstructionmethods in reconstructing SR infrared images

Figure 11 Images with motion blur cannot obtain clear edge information using the ordinary SR reconstruction methods

10 International Journal of Digital Multimedia Broadcasting

Data Availability

The data used to support the findings of this study are avail-able from the corresponding author upon request

Conflicts of Interest

The authors declare that there is no conflict of interestregarding the publication of this paper

References

[1] L Xiang-di et al ldquoInfrared imaging system and applicationsrdquoLaser amp Infrared vol 3 2014

[2] X Sui Q Chen G Gu and X Shen ldquoInfrared super-resolution imaging based on compressed sensingrdquo InfraredPhysics amp Technology vol 63 pp 119ndash124 2014

[3] X Sui H Gao Y Sun Q Chen and G Gu ldquoInfrared super-resolution imaging method based on retina micro-motionrdquoInfrared Physics amp Technology vol 60 pp 340ndash345 2013

[4] J L Suo J I Xiangyang and Q H Dai ldquoAn overview of com-putational photographyrdquo Science China(Information Sciences)vol 6 pp 5ndash24 2012

[5] D Chao C L Chen K He and X Tang ldquoLearning a DeepConvolutional Network for Image Super-Resolutionrdquo in Euro-pean Conference on Computer Vision Springer 2014

[6] C Dong C C Loy K He and X Tang ldquoImage super-resolution using deep convolutional networksrdquo IEEE Transac-tions on Pattern Analysis and Machine Intelligence vol 38no 2 pp 295ndash307 2014

[7] J Kim J Lee and K Lee Deeply-Recursive Convolutional Net-work for Image Super-Resolution 2016 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) IEEE pp1637ndash1645 2016

[8] W Shi J Caballero F Huszar et al ldquoReal-Time Single Imageand Video Super-Resolution Using an Efficient Sub-Pixel Con-volutional Neural Networkrdquo in 2016 IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR) Las Vegas NVUSA 2016

[9] I Goodfellow J Pouget-Abadie M Mirza et al ldquoGenerativeAdversarial Networksrdquo Advances in Neural Information Pro-cessing Systems vol 3 pp 2672ndash2680 2014

[10] C Wang S Dong X Zhao G Papanastasiou H Zhang andG Yang ldquoSaliencyGAN deep learning semisupervised salientobject detection in the fog of IoTrdquo IEEE Transactions onIndustrial Informatics vol 16 no 4 pp 2667ndash2676 2020

[11] G Yang S Yu H Dong et al ldquoDAGAN deep de-aliasing gen-erative adversarial networks for fast compressed sensing MRIreconstructionrdquo IEEE Transactions on Medical Imagingvol 37 no 6 pp 1310ndash1321 2018

[12] S Yu D Hao G Yang G Slabaugh and Y Guo ldquoDeep De-Aliasing for Fast Compressive Sensing MRIrdquo 2017 httpsarxivorgabs170507137

[13] C Ledig L Theis F Huszar et al ldquoPhoto-Realistic SingleImage Super-Resolution Using a Generative Adversarial Net-workrdquo in 2017 IEEE Conference on Computer Vision and Pat-tern Recognition (CVPR) Honolulu HI USA 2016

[14] X Wang K Yu S Wu et al ldquoESRGAN Enhanced Super-Resolution Generative Adversarial Networksrdquo in LectureNotes in Computer Science Springer 2018

[15] Q Mao S Wang S Wang X Zhang and S Ma ldquoEnhancedImage Decoding Via Edge-Preserving Generative AdversarialNetworksrdquo in 2018 IEEE International Conference on Multi-media and Expo (ICME) San Diego CA USA 2018

[16] N C Rakotonirina and A Rasoanaivo ldquoESRGAN+ furtherimproving enhanced super-resolution generative adversarialnetworkrdquo in ICASSP 2020 - 2020 IEEE international confer-ence on acoustics Speech and Signal Processing (ICASSP)pp 3637ndash3641 Barcelona Spain 2020

[17] C Kraichan and S Pumrin ldquoPerformance Analysis on Multi-Frame Image Super-Resolution Via Sparse Representationrdquo in2014 International Electrical Engineering Congress (iEECON)Chonburi Thailand 2014

[18] S Yu-bao W Zhi-hui X Liang Z Zhen-rong and L Zhan-Qiang ldquoMultimorphology Sparsity regularized image super-resolutionrdquo Acta Electronica Sinica vol 38 no 12 pp 2898ndash2903 2010

[19] S Yang M Wang Y Chen and Y Sun ldquoSingle-image super-resolution reconstruction via learned geometric dictionariesand clustered sparse codingrdquo IEEE Transactions on Image Pro-cessing vol 21 no 9 pp 4016ndash4028 2012

[20] Y Tang Y Yuan P Yan and X Li ldquoGreedy regression insparse coding space for single-image super-resolutionrdquo Jour-nal of Visual Communication amp Image Representationvol 24 no 2 pp 148ndash159 2013

[21] L Qiu-sheng and Z Wei ldquoImage super-resolution algorithmsbased on sparse representation of classified image patchesrdquoActa Electronica Sinica vol 40 no 5 pp 920ndash925 2012

[22] H Zhang T Xu H Li et al ldquoStackGAN Text to Photo-Realistic Image Synthesis with Stacked Generative AdversarialNetworksrdquo in 2017 IEEE International Conference on Com-puter Vision (ICCV) Venice Italy 2017

[23] H Zhang T Xu H Li et al ldquoStackGAN++ realistic imagesynthesis with stacked generative adversarial networksrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 41 pp 1947ndash1962 2018

[24] K He X Zhang S Ren and J Sun ldquoDeep Residual Learningfor Image Recognitionrdquo in 2016 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) pp 770ndash778 LasVegas NV USA 2016

[25] J Zhu G Yang P Ferreira et al ldquoA Roi Focused Multi-Scale Super-Resolution Method for the Diffusion TensorCardiac Magnetic Resonancerdquo in Proceedings of the 27thAnnual Meeting (ISMRM) pp 11ndash16 Montreacuteal QC Can-ada 2019

[26] J Zhu G Yang and P Lio ldquoLesion Focused Super Resolu-tionrdquo in Medical Imaging Image Processing 2019 San DiegoCA USA 2019

[27] J Zhu G Yang and P Lio ldquoHow Can We Make GAN Per-form Better in Single Medical Image Super-Resolution ALesion Focused Multi-Scale Approachrdquo in 2019 IEEE 16thInternational Symposium on Biomedical Imaging (ISBI 2019)Venice Italy 2019

[28] M Seitzer G Yang J Schlemper et al ldquoAdversarial and Per-ceptual Refinement for Compressed Sensing MRI Reconstruc-tionrdquo in Medical Image Computing and Computer AssistedIntervention ndash MICCAI 2018 Springer 2018

[29] S Ioffe and C Szegedy ldquoBatch Normalization AcceleratingDeep Network Training by Reducing Internal CovariateShiftrdquo in International conference on machine learningPMLR 2015

11International Journal of Digital Multimedia Broadcasting

[30] A Krizhevsky I Sutskever and G Hinton ldquoImageNet classi-fication with deep convolutional neural networksrdquo Advancesin Neural Information Processing Systems vol 25 no 2 2012

[31] A Radford L Metz and S Chintala ldquoUnsupervised Represen-tation Learning with Deep Convolutional Generative Adver-sarial Networksrdquo 2015 httpsarxivorgabs151106434

[32] A L Maas A Y Hannun and A Y Ng ldquoRectifier nonlinear-ities improve neural network acoustic modelsrdquo in Proceedingsof the 30 th International Conference on Machine Learningp 3 Atlanta Georgia USA 2013

[33] K Simonyan and A Zisserman ldquoVery Deep ConvolutionalNetworks for Large-Scale Image Recognitionrdquo Computer Sci-ence 2014 httpsarxivorgabs14091556

[34] J Johnson A Alahi and L Fei-Fei ldquoPerceptual Losses forReal-Time Style Transfer and Super-Resolutionrdquo in ComputerVision ndash ECCV 2016 Springer 2016

[35] D P Kingma and J Ba ldquoAdam AMethod for Stochastic Opti-mizationrdquo Computer Science 2014 httpsarxivorgabs14126980

[36] M Deshmukh and U Bhosale ldquoImage fusion and image qual-ity assessment of fused imagesrdquo International Journal of ImageProcessing (IJIP) vol 4 no 5 p 484 2010

[37] C Yim and A C Bovik ldquoQuality assessment of deblockedimagesrdquo IEEE Transactions on Image Processing vol 20no 1 pp 88ndash98 2011

[38] Z Wang A C Bovik H R Sheikh and E P SimoncellildquoImage quality assessment from error visibility to structuralsimilarityrdquo IEEE Transactions on Image Processing vol 13no 4 pp 600ndash612 2004

[39] Y Han Y Cai Y Cao and X Xu ldquoA new image fusion perfor-mance metric based on visual information fidelityrdquo Informa-tion fusion vol 14 no 2 pp 127ndash135 2013

[40] J Redmon S Divvala R Girshick and A Farhadi ldquoYou OnlyLook Once Unified Real-Time Object Detectionrdquo in 2016IEEE Conference on Computer Vision and Pattern Recognition(CVPR) Las Vegas NV USA 2015

12 International Journal of Digital Multimedia Broadcasting

  • Generative Adversarial Network-Based Edge-Preserving Superresolution Reconstruction of Infrared Images
  • 1 Introduction
  • 2 Method
    • 21 Stage 1 GAN
    • 22 Stage 2 GAN
      • 3 Experiments
        • 31 Experimental Details
        • 32 Experimental Evaluation
        • 33 Use Advanced Vision Tasks to Compare Superresolution Results
          • 4 Discussion and Future Work
          • 5 Conclusion
          • Data Availability
          • Conflicts of Interest
Page 4: Generative Adversarial Network-Based Edge-Preserving ......2.1. Stage 1 GAN. In Stage 1, we use the 128×128 low-resolution image ILR0 as input and export it to the Stage 1 generator

We describe the proposed network framework and lossfunction in Section 2 and their quantitative evaluations usingthe public datasets in Section 3 followed by recommendedfuture study on the proposed method and the conclusion

2 Method

To generate a high-resolution image with photograph-grade realistic details and inspired by the literature [2223] we propose a simple and effective two-layer GAN in

Ground-truth

SRCNN

ESPCN

SRGAN

ESRGAN

ESRGAN+

Ours

Figure 6 Comparison of the reconstructed results of images from the FLIR_ ADAS_1_3 validation set The images in the first row areoriginal images those in the second row are the reconstruction results using the SRCNN method those in the third row are thereconstruction results using the ESPCN method those in the fourth row are the reconstruction results using the SRGAN method those inthe fifth row are the reconstruction results using the ESRGAN method those in the sixth row are the reconstruction results using theESRGAN+ method and those in the last row are the reconstruction results using our proposed method

4 International Journal of Digital Multimedia Broadcasting

which the image generation process is divided into twostages (Figure 2)

21 Stage 1 GAN In Stage 1 we use the 128 times 128 low-resolution image ILR0 as input and export it to the Stage 1

generator G0 to generate a false 256 times 256 image (ISR0)which together with a real 256 times 256 image (IHR0) isimported to the Stage 1 discriminator D0 to identify theimage The core of the network structure of the generatorG0 in the proposed method is shown in Figure 3 We adopt

Ground-truth

SRCNN

ESPCN

SRGAN

ESRGAN

ESRGAN+

Ours

Figure 7 Comparison of the reconstructed results of images from the Itir_v1_0 validation set The images in the first row are original imagesthose in the second row are the reconstruction results using the SRCNNmethod those in the third row are the reconstruction results using theESPCNmethod those in the fourth row are the reconstruction results using the SRGANmethod those in the fifth row are the reconstructionresults using the ESRGANmethod those in the sixth row are the reconstruction results using the ESRGAN+method and those in the last roware the reconstruction results using our proposed method

5International Journal of Digital Multimedia Broadcasting

the network design of Ledig et al [10] and introduce the skipconnection which has been proven effective in training deepneural networks We adopt the residual block proposed inthe literature [24] to construct a neural network with sixresidual blocks used as a stack to extract features from theimage Recent work on single-image superresolution [25ndash28] pointed out that with the deepening of the network andthe training and testing phases under the GAN frameworkhallucination artifacts often appear In order to solve thisproblem we are following the method in [29] and adding abatch normalization layer (BN) to the network Each residualblock contains two convolutional layers with a kernel size of3 times 3 and 64 feature maps two BN layers and one parametricrectified linear unit (PReLU) layer [30] The specific settingsof each layer of the generative model are as follows

C PR 64eth THORN⟶ C 64eth THORNBN PReth THORNC 64eth THORNBN PReth THORNSC⟶ 6⋯⟶ C 64eth THORNBN PReth THORNSC⟶ C PR 256eth THORN⟶ C t 3eth THORN

eth1THORN

Here C(PR64) denotes a set of convolutional layers with64 feature maps and activation function PReLUC(64)BN(PR)C(64)BN(PR)SC represents a residual blockBN(PR) is a batch normalization layer with activation func-tion PReLU and SC denotes a skip connection There arein total 6 residual blocks C(t3) represents a convolutionallayer with 3 feature maps and activation function tanh

To distinguish the generated SR samples from real high-resolution (HR) samples we train the discriminator networkD0 whose overall framework is shown in Figure 4 and whichis adopted from the architectural framework summarized byRadford et al [31] in which LeakyReLU [32] activation isused to avoid maximum pooling of the entire network Thediscriminator network includes ten convolutional blocksAll blocks except for the first block contain a convolutionallayer a BN layer and a LeakyReLU layer The number of ker-nels within the filter increases continually from 64 in theVisual Geometry Group (VGG) network [33] to 1024 theneach time the number of kernels increases segmented convo-lution is used to lower the image resolution After that a spe-cial residual block containing two convolution layers and aLeakyReLU layer is connected The output of the last convo-lution unit is sent to a dense layer with an S-type activationfunction to get a true and false result The structure andparameters of each layer of the discriminator D0 networkare as follows

C LR 64eth THORN⟶ C 64eth THORNBN LReth THORN⟶ C 128eth THORNBN LReth THORN⟶ C 256eth THORNBN LReth THORN⟶ C 512eth THORNBN LReth THORN⟶ C 1024eth THORNBN LReth THORN⟶ C LR 1024eth THORN⟶D

eth2THORN

Here LR denotes the activation function LeakyReLU CethLR 64THORN denotes a set of convolutional layers with 64 featuremaps and activation function LeakyReLU Ceth64THORNBNethLRTHORNdenotes a set of convolutional layers with 64 feature mapsfollowed by batch-normalization with activation function

LeakyReLU and D is the dense layer for outputting The fea-ture maps were increased from 64 to 1024

L1 = λ1Ladv + λ2Lmse + λ3Ledge eth3THORN

Stage 1 Reconstruction Loss Function The loss functioncontains three parts adversarial loss image fidelity lossand edge fidelity loss (Eq (3)) These parts each capture dif-ferent perceptual characteristics of the reconstructed imageto obtain a more visually satisfactory reconstructed image

where the weight fλig is a trade-off parameter that isused to balance multiple loss components The first part isthe adversarial loss between the generator G0 and the dis-criminator D0 of GAN This part encourages the generatorto trick the discriminator network to produce a more realisticHR image as follows

Ladv = N

n=1minus log DθD

GθGILR0

eth4THORN

whereDθDethGθG

ethILR0THORNTHORN is the estimated probability of the recon-

structed image GθGethILR0THORN to be a true HR image To obtain a

better gradient we use the minimization minuslog DθDethGθG

ethILR0THORNTHORNto replace the minimizationlog frac121 minusDθD

ethGθGethILR0THORNTHORN

Table 1 Method comparison quantitative results of the SRCNNESPCN SRGAN ESRGAN and ESRGAN+ methods and theproposed method on the FLIR_ ADAS_1_3 validation dataset

CCPSNR(dB)

SSIM VIF UIQI TIME (s)

SRCNN 08160 183963 08655 07942 08770 27112

ESPCN 08627 277171 09349 08269 09398 10206

SRGAN 08463 280445 09497 08120 08503 09202

ESRGAN 09547 285453 09506 07918 09382 01093

ESRGAN+ 09806 337018 09855 09119 09864 02713

Ours 09964 379865 09980 09083 09965 01145

Table 2 Method comparison quantitative results of the SRCNNESPCN SRGAN ESRGAN and ESRGAN+ methods and theproposed method on the Itir_v1_0 dataset

CCPSNR(dB)

SSIM VIF UIQI TIME (s)

SRCNN 07937 202460 08966 07079 08746 25706

ESPCN 08464 221708 09185 08194 09877 10020

SRGAN 08152 226732 09154 07944 08610 06642

ESRGAN 09146 303205 09608 07752 09166 00937

ESRGAN+ 09547 341024 09844 08497 09541 02558

Ours 09962 352865 09943 09064 09936 01059

6 International Journal of Digital Multimedia Broadcasting

The second term of Eq (3) Lmse ensures the fidelity ofthe restored image using the pixel-level mean square error(MSE) loss as follows

Lmse = 1WHC IHR0 minus ISR0

22

eth5THORN

whereWH and C are the height the width and the numberof channels respectively of the image

The third term of Eq (3) Ledge the edge fidelity loss ispurported to reproduce sharp edge information as follows

Ledge = 1WH IandE minus IE

22

eth6THORN

whereW and H are the width and the height respectively ofthe image The labeled edge map IE is extracted by a specificedge filter on the real 256 times 256 image IHR0 while IandE isextracted by a specific edge filter on the 256 times 256 imageISR0 generated by the generator G0 In our experiments wechose the Canny edge detection operator By minimizingthe edge fidelity loss the network continuously guides edgerecovery

22 Stage 2 GAN In the second stage we use the generated256 times 256 low-resolution image ISR0 as input and export itto the Stage 2 generator G1 to generate a 512 times 512 imageISR1 which together with a real 512 times 512 image IHR1 isexported to the Stage 2 discriminator D1 to identify if it isfalse or real We adopt the network design described in theliterature [15] in which a generator model G1 (Figure 5)

SRCNN

ESPCN

SRGAN

ESRGAN

ESRGAN+

Ours

Figure 8 Comparison of edge detection results of superresolution reconstruction results

7International Journal of Digital Multimedia Broadcasting

containing 16 residual blocks is constructed The output for-mat and parameters of each layer of the generator G1 are asfollows

C R 64eth THORN⟶ C 64eth THORNBN Reth THORNC 64eth THORNBN Reth THORNSC⟶ 16⋯⟶ C R 64eth THORNSC⟶ C 256eth THORN⟶ C t 3eth THORN

eth7THORN

where C(R64) denotes a set of convolutional layers with64 feature maps and activation function ReLUC(64)BN(R)C(64)BN(R)SC represents a residual block

The network structure of the discriminator networkD1 inthe second stage adopts a network structure similar to that ofthe discriminator D0 Each layer structure and networkparameters of D1 are as follows

C LR 64eth THORN⟶ C 64eth THORNBN LReth THORN⟶ C 128eth THORNBN LReth THORN⟶ C 256eth THORNBN LReth THORN⟶ C 512eth THORNBN LReth THORN⟶ C 1024eth THORNBN LReth THORN⟶ C LR 1024eth THORN⟶D

eth8THORN

L2 = λ1primeLadv1 + λ2primeLmse1 + λ3primeL feature eth9THORNwhere the weight fλiprimeg is a trade-off parameter that is used tobalance multiple loss components The first term Ladv1 isthe adversarial loss between the generator G1 and the dis-criminator D1 of GAN The second termLmse1 is the imagefidelity loss The third term L feature is the feature fidelity

loss which is defined based on the feature space distancedefined in the literature [34] to facilitate the preservationof feature representation in the reconstructed image similarto that of the original image as follows

L feature = 1WHC ϕ IHR0 minus ISR0

22

eth10THORN

whereWH and C are the height the width and the numberof channels respectively of the image and ϕethsdotTHORN represents thefeature space function which is a pretrained VGG-19 [33]network that maps images to feature space The fourth pool-ing layer is used to calculate the feature-activated L2 distanceto be used as the feature fidelity loss function

Stage 2 Reconstruction Loss Function The loss functioncontains three parts adversarial loss image fidelity lossand feature fidelity loss (Eq (9))

3 Experiments

31 Experimental Details All experiments are performed ona desktop computer with 220GHz times40 Intel Xeon (R) Silver4114 CPU GeForce GTX 1080Ti and 64GB of memory Weuse the PyTorch deep learning framework to implement themethod proposed in this article and all comparison methodsAll methods are trained with the FLIR training data set Dur-ing the training process we choose 8862 images from theFLIR_ADAS_1_3 thermal sensor training dataset released

(a) SRCNN super-resolution reconstruction image matching results (b) ESPCN super-resolution reconstruction image matching results

(c) SRGAN super-resolution reconstruction image matching results (d) ESRGAN super-resolution reconstruction image matching results

(e) ESRGAN+ super-resolution reconstruction image matching results (f) Our method super-resolution reconstruction image matching results

Figure 9 Superresolution reconstruction image matching results (the left image is the original high-resolution image)

8 International Journal of Digital Multimedia Broadcasting

by FLIR Systems Inc a sensor system developer in 2018First we perform the 4x factor downsampling on all experi-mental data to obtain low-resolution (LR) images by lower-ing the resolution of the HR images We set the batch sizeto 4 and used the Adam [35] with a momentum term of β= 09 as the optimization procedure To enable the loss func-tion in the same order of magnitude to better balance losscomponents we set λ1λ2prime and λ3prime of Eq (9) to 10-3 1 and10-6 respectively When training the Stage 1 GAN we set

the learning rate to 10-4 which is lowered to 10-5 when train-ing the Stage 2 GAN

32 Experimental Evaluation To verify the effectiveness of theproposed method we conduct the validation on two publicdata sets the FLIR_ ADAS_1_3 (1366 images) validation setand the Itir_v1_0 dataset (11262 images) We compared theproposed method with the most advanced methods includingsuperresolution using deep convolutional networks (SRCNN)

LR

SRCNN

ESPCN

SRGAN

ESRGAN

ESRGAN+

Our method

Figure 10 Target detection result of superresolution reconstructed image

9International Journal of Digital Multimedia Broadcasting

[5] ESPCN [8] SRGAN [10] ESRGAN [15] and ESRGAN+[16] methods Three images are selected from the verificationset of FLIR_ADAS_1_3 and the data set of Itir_v1_0 and thesubjective results of several methods are shown in Figures 6and 7 From the reconstruction results it is not difficult tosee that the reconstruction results of our proposed methodproduce finer texture and edge details

To facilitate a fair quantitative comparison we used thecorrelation coefficient (CC) [36] the peak-signal-to-noiseratio (PSNR) [37] the Structural Similarity Index Measure(SSIM) [38] the visual information fidelity (VIF) [39] theUniversal Image Quality Index (UIQI) and the time con-sumption six objective indicators to evaluate the quality ofthe reconstructed images and the SR methods The quantita-tive results of the comparison of different reconstructionmethods are shown in Table 1 which shows that our methodis superior to the SRCNN [5] ESPCN [8] SRGAN [13] ESR-GAN [15] and ESRGAN+ [16] methods in the indicators ofCC PSNR SSIM and UIQI on the FLIR data set The VIFindex is slightly lower than the ESRGAN+ method and thetime consumption is greater than the ESRGAN method

The quantitative results of comparison of different recon-struction methods on the Itir_v1_0 dataset are shown inTable 2 indicating that the proposed method is superior tothe SRCNN [5] ESPCN [8] SRGAN [13] ESRGAN [15]and ESRGAN+ [16] methods on the Itir_v1_0 dataset Onlythe time consumption is slightly greater than the ESRGANmethod

In order to more intuitively illustrate the effectiveness ofthe superresolution method proposed in this work forimproving the edge features of infrared images we show inFigure 8 the comparison of the edge detection results of thesuperresolution reconstruction results of various methodsIt can be seen from the figure that the image reconstructedby our method has more and clearer edge information whichis meaningful for the application of infrared images

33 Use Advanced Vision Tasks to Compare SuperresolutionResults Basic vision tasks including image superresolutionreconstruction are all for advanced vision tasks Infraredimages are widely used in target detection and target match-ing tasks but the shortcomings of low resolution and unclearedges of infrared images affect the accuracy of the abovetasks Therefore whether the result of infrared image super-resolution reconstruction can improve the accuracy of theabove tasks has become an evaluation index of the result ofsuperresolution reconstruction In order to further verifyour method we match the superresolution images generatedby several methods with real high-resolution images Scale

Invariant Feature Transform (SIFT) is a representation ofthe statistical results of Gaussian image gradients in the fieldof feature points and is a commonly used image local featureextraction algorithm In the matching result the number ofmatching points can be used as a criterion for matching qual-ity and the corresponding matching points can also deter-mine the similarity of the local features of the two imagesFigure 9 shows the result of matching the superresolutionreconstructed image with the original high-resolution imagethrough the SIFT algorithm It can be seen from the quantitythat the reconstructed image produced by our proposedmethod obtains more correct matching pairs than othermethods

In this experiment we use the classic YOLO [40] methodfor image target detection (Figure 10) It can be seen that thesuperresolution reconstructed image generated by our pro-posed method has better detection results and can detectmore targets

4 Discussion and Future Work

Through experiments we demonstrate that compared withother methods the proposed method has a better perceptualperformance However during the experiment we alsofound that for some images the reconstruction results arenot satisfactory as shown in Figure 11 By analyzing theseimages we found that they have common characteristicsie when the imaging device and the imaging object are mov-ing at high relative speeds the captured image may containmotion blur For such images ordinary SR reconstructionmethods cannot achieve effective edge recovery Thereforein future studies we will address these problems

5 Conclusion

In this study we propose a two-stage GAN framework that isable to reconstruct SR image by recovering edge structureinformation and retaining feature information In the firststage the image fidelity loss the adversarial loss and the edgefidelity loss are combined to preserve the edges of the imageIn the second stage the image adversarial loss the imagefidelity loss and the feature fidelity loss are combined to mineimage visual features By iteratively updating the generativenetwork and the discriminator network the SR reconstruc-tion of infrared image with preserved edges is achievedExperimental verification results show that the proposedmethod outperforms several other image reconstructionmethods in reconstructing SR infrared images

Figure 11 Images with motion blur cannot obtain clear edge information using the ordinary SR reconstruction methods

10 International Journal of Digital Multimedia Broadcasting

Data Availability

The data used to support the findings of this study are avail-able from the corresponding author upon request

Conflicts of Interest

The authors declare that there is no conflict of interestregarding the publication of this paper

References

[1] L Xiang-di et al ldquoInfrared imaging system and applicationsrdquoLaser amp Infrared vol 3 2014

[2] X Sui Q Chen G Gu and X Shen ldquoInfrared super-resolution imaging based on compressed sensingrdquo InfraredPhysics amp Technology vol 63 pp 119ndash124 2014

[3] X Sui H Gao Y Sun Q Chen and G Gu ldquoInfrared super-resolution imaging method based on retina micro-motionrdquoInfrared Physics amp Technology vol 60 pp 340ndash345 2013

[4] J L Suo J I Xiangyang and Q H Dai ldquoAn overview of com-putational photographyrdquo Science China(Information Sciences)vol 6 pp 5ndash24 2012

[5] D Chao C L Chen K He and X Tang ldquoLearning a DeepConvolutional Network for Image Super-Resolutionrdquo in Euro-pean Conference on Computer Vision Springer 2014

[6] C Dong C C Loy K He and X Tang ldquoImage super-resolution using deep convolutional networksrdquo IEEE Transac-tions on Pattern Analysis and Machine Intelligence vol 38no 2 pp 295ndash307 2014

[7] J Kim J Lee and K Lee Deeply-Recursive Convolutional Net-work for Image Super-Resolution 2016 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) IEEE pp1637ndash1645 2016

[8] W Shi J Caballero F Huszar et al ldquoReal-Time Single Imageand Video Super-Resolution Using an Efficient Sub-Pixel Con-volutional Neural Networkrdquo in 2016 IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR) Las Vegas NVUSA 2016

[9] I Goodfellow J Pouget-Abadie M Mirza et al ldquoGenerativeAdversarial Networksrdquo Advances in Neural Information Pro-cessing Systems vol 3 pp 2672ndash2680 2014

[10] C Wang S Dong X Zhao G Papanastasiou H Zhang andG Yang ldquoSaliencyGAN deep learning semisupervised salientobject detection in the fog of IoTrdquo IEEE Transactions onIndustrial Informatics vol 16 no 4 pp 2667ndash2676 2020

[11] G Yang S Yu H Dong et al ldquoDAGAN deep de-aliasing gen-erative adversarial networks for fast compressed sensing MRIreconstructionrdquo IEEE Transactions on Medical Imagingvol 37 no 6 pp 1310ndash1321 2018

[12] S Yu D Hao G Yang G Slabaugh and Y Guo ldquoDeep De-Aliasing for Fast Compressive Sensing MRIrdquo 2017 httpsarxivorgabs170507137

[13] C Ledig L Theis F Huszar et al ldquoPhoto-Realistic SingleImage Super-Resolution Using a Generative Adversarial Net-workrdquo in 2017 IEEE Conference on Computer Vision and Pat-tern Recognition (CVPR) Honolulu HI USA 2016

[14] X Wang K Yu S Wu et al ldquoESRGAN Enhanced Super-Resolution Generative Adversarial Networksrdquo in LectureNotes in Computer Science Springer 2018

[15] Q Mao S Wang S Wang X Zhang and S Ma ldquoEnhancedImage Decoding Via Edge-Preserving Generative AdversarialNetworksrdquo in 2018 IEEE International Conference on Multi-media and Expo (ICME) San Diego CA USA 2018

[16] N C Rakotonirina and A Rasoanaivo ldquoESRGAN+ furtherimproving enhanced super-resolution generative adversarialnetworkrdquo in ICASSP 2020 - 2020 IEEE international confer-ence on acoustics Speech and Signal Processing (ICASSP)pp 3637ndash3641 Barcelona Spain 2020

[17] C Kraichan and S Pumrin ldquoPerformance Analysis on Multi-Frame Image Super-Resolution Via Sparse Representationrdquo in2014 International Electrical Engineering Congress (iEECON)Chonburi Thailand 2014

[18] S Yu-bao W Zhi-hui X Liang Z Zhen-rong and L Zhan-Qiang ldquoMultimorphology Sparsity regularized image super-resolutionrdquo Acta Electronica Sinica vol 38 no 12 pp 2898ndash2903 2010

[19] S Yang M Wang Y Chen and Y Sun ldquoSingle-image super-resolution reconstruction via learned geometric dictionariesand clustered sparse codingrdquo IEEE Transactions on Image Pro-cessing vol 21 no 9 pp 4016ndash4028 2012

[20] Y Tang Y Yuan P Yan and X Li ldquoGreedy regression insparse coding space for single-image super-resolutionrdquo Jour-nal of Visual Communication amp Image Representationvol 24 no 2 pp 148ndash159 2013

[21] L Qiu-sheng and Z Wei ldquoImage super-resolution algorithmsbased on sparse representation of classified image patchesrdquoActa Electronica Sinica vol 40 no 5 pp 920ndash925 2012

[22] H Zhang T Xu H Li et al ldquoStackGAN Text to Photo-Realistic Image Synthesis with Stacked Generative AdversarialNetworksrdquo in 2017 IEEE International Conference on Com-puter Vision (ICCV) Venice Italy 2017

[23] H Zhang T Xu H Li et al ldquoStackGAN++ realistic imagesynthesis with stacked generative adversarial networksrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 41 pp 1947ndash1962 2018

[24] K He X Zhang S Ren and J Sun ldquoDeep Residual Learningfor Image Recognitionrdquo in 2016 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) pp 770ndash778 LasVegas NV USA 2016

[25] J Zhu G Yang P Ferreira et al ldquoA Roi Focused Multi-Scale Super-Resolution Method for the Diffusion TensorCardiac Magnetic Resonancerdquo in Proceedings of the 27thAnnual Meeting (ISMRM) pp 11ndash16 Montreacuteal QC Can-ada 2019

[26] J Zhu G Yang and P Lio ldquoLesion Focused Super Resolu-tionrdquo in Medical Imaging Image Processing 2019 San DiegoCA USA 2019

[27] J Zhu G Yang and P Lio ldquoHow Can We Make GAN Per-form Better in Single Medical Image Super-Resolution ALesion Focused Multi-Scale Approachrdquo in 2019 IEEE 16thInternational Symposium on Biomedical Imaging (ISBI 2019)Venice Italy 2019

[28] M Seitzer G Yang J Schlemper et al ldquoAdversarial and Per-ceptual Refinement for Compressed Sensing MRI Reconstruc-tionrdquo in Medical Image Computing and Computer AssistedIntervention ndash MICCAI 2018 Springer 2018

[29] S Ioffe and C Szegedy ldquoBatch Normalization AcceleratingDeep Network Training by Reducing Internal CovariateShiftrdquo in International conference on machine learningPMLR 2015

11International Journal of Digital Multimedia Broadcasting

[30] A Krizhevsky I Sutskever and G Hinton ldquoImageNet classi-fication with deep convolutional neural networksrdquo Advancesin Neural Information Processing Systems vol 25 no 2 2012

[31] A Radford L Metz and S Chintala ldquoUnsupervised Represen-tation Learning with Deep Convolutional Generative Adver-sarial Networksrdquo 2015 httpsarxivorgabs151106434

[32] A L Maas A Y Hannun and A Y Ng ldquoRectifier nonlinear-ities improve neural network acoustic modelsrdquo in Proceedingsof the 30 th International Conference on Machine Learningp 3 Atlanta Georgia USA 2013

[33] K Simonyan and A Zisserman ldquoVery Deep ConvolutionalNetworks for Large-Scale Image Recognitionrdquo Computer Sci-ence 2014 httpsarxivorgabs14091556

[34] J Johnson A Alahi and L Fei-Fei ldquoPerceptual Losses forReal-Time Style Transfer and Super-Resolutionrdquo in ComputerVision ndash ECCV 2016 Springer 2016

[35] D P Kingma and J Ba ldquoAdam AMethod for Stochastic Opti-mizationrdquo Computer Science 2014 httpsarxivorgabs14126980

[36] M Deshmukh and U Bhosale ldquoImage fusion and image qual-ity assessment of fused imagesrdquo International Journal of ImageProcessing (IJIP) vol 4 no 5 p 484 2010

[37] C Yim and A C Bovik ldquoQuality assessment of deblockedimagesrdquo IEEE Transactions on Image Processing vol 20no 1 pp 88ndash98 2011

[38] Z Wang A C Bovik H R Sheikh and E P SimoncellildquoImage quality assessment from error visibility to structuralsimilarityrdquo IEEE Transactions on Image Processing vol 13no 4 pp 600ndash612 2004

[39] Y Han Y Cai Y Cao and X Xu ldquoA new image fusion perfor-mance metric based on visual information fidelityrdquo Informa-tion fusion vol 14 no 2 pp 127ndash135 2013

[40] J Redmon S Divvala R Girshick and A Farhadi ldquoYou OnlyLook Once Unified Real-Time Object Detectionrdquo in 2016IEEE Conference on Computer Vision and Pattern Recognition(CVPR) Las Vegas NV USA 2015

12 International Journal of Digital Multimedia Broadcasting

  • Generative Adversarial Network-Based Edge-Preserving Superresolution Reconstruction of Infrared Images
  • 1 Introduction
  • 2 Method
    • 21 Stage 1 GAN
    • 22 Stage 2 GAN
      • 3 Experiments
        • 31 Experimental Details
        • 32 Experimental Evaluation
        • 33 Use Advanced Vision Tasks to Compare Superresolution Results
          • 4 Discussion and Future Work
          • 5 Conclusion
          • Data Availability
          • Conflicts of Interest
Page 5: Generative Adversarial Network-Based Edge-Preserving ......2.1. Stage 1 GAN. In Stage 1, we use the 128×128 low-resolution image ILR0 as input and export it to the Stage 1 generator

which the image generation process is divided into twostages (Figure 2)

21 Stage 1 GAN In Stage 1 we use the 128 times 128 low-resolution image ILR0 as input and export it to the Stage 1

generator G0 to generate a false 256 times 256 image (ISR0)which together with a real 256 times 256 image (IHR0) isimported to the Stage 1 discriminator D0 to identify theimage The core of the network structure of the generatorG0 in the proposed method is shown in Figure 3 We adopt

Ground-truth

SRCNN

ESPCN

SRGAN

ESRGAN

ESRGAN+

Ours

Figure 7 Comparison of the reconstructed results of images from the Itir_v1_0 validation set The images in the first row are original imagesthose in the second row are the reconstruction results using the SRCNNmethod those in the third row are the reconstruction results using theESPCNmethod those in the fourth row are the reconstruction results using the SRGANmethod those in the fifth row are the reconstructionresults using the ESRGANmethod those in the sixth row are the reconstruction results using the ESRGAN+method and those in the last roware the reconstruction results using our proposed method

5International Journal of Digital Multimedia Broadcasting

the network design of Ledig et al [10] and introduce the skipconnection which has been proven effective in training deepneural networks We adopt the residual block proposed inthe literature [24] to construct a neural network with sixresidual blocks used as a stack to extract features from theimage Recent work on single-image superresolution [25ndash28] pointed out that with the deepening of the network andthe training and testing phases under the GAN frameworkhallucination artifacts often appear In order to solve thisproblem we are following the method in [29] and adding abatch normalization layer (BN) to the network Each residualblock contains two convolutional layers with a kernel size of3 times 3 and 64 feature maps two BN layers and one parametricrectified linear unit (PReLU) layer [30] The specific settingsof each layer of the generative model are as follows

C PR 64eth THORN⟶ C 64eth THORNBN PReth THORNC 64eth THORNBN PReth THORNSC⟶ 6⋯⟶ C 64eth THORNBN PReth THORNSC⟶ C PR 256eth THORN⟶ C t 3eth THORN

eth1THORN

Here C(PR64) denotes a set of convolutional layers with64 feature maps and activation function PReLUC(64)BN(PR)C(64)BN(PR)SC represents a residual blockBN(PR) is a batch normalization layer with activation func-tion PReLU and SC denotes a skip connection There arein total 6 residual blocks C(t3) represents a convolutionallayer with 3 feature maps and activation function tanh

To distinguish the generated SR samples from real high-resolution (HR) samples we train the discriminator networkD0 whose overall framework is shown in Figure 4 and whichis adopted from the architectural framework summarized byRadford et al [31] in which LeakyReLU [32] activation isused to avoid maximum pooling of the entire network Thediscriminator network includes ten convolutional blocksAll blocks except for the first block contain a convolutionallayer a BN layer and a LeakyReLU layer The number of ker-nels within the filter increases continually from 64 in theVisual Geometry Group (VGG) network [33] to 1024 theneach time the number of kernels increases segmented convo-lution is used to lower the image resolution After that a spe-cial residual block containing two convolution layers and aLeakyReLU layer is connected The output of the last convo-lution unit is sent to a dense layer with an S-type activationfunction to get a true and false result The structure andparameters of each layer of the discriminator D0 networkare as follows

C LR 64eth THORN⟶ C 64eth THORNBN LReth THORN⟶ C 128eth THORNBN LReth THORN⟶ C 256eth THORNBN LReth THORN⟶ C 512eth THORNBN LReth THORN⟶ C 1024eth THORNBN LReth THORN⟶ C LR 1024eth THORN⟶D

eth2THORN

Here LR denotes the activation function LeakyReLU CethLR 64THORN denotes a set of convolutional layers with 64 featuremaps and activation function LeakyReLU Ceth64THORNBNethLRTHORNdenotes a set of convolutional layers with 64 feature mapsfollowed by batch-normalization with activation function

LeakyReLU and D is the dense layer for outputting The fea-ture maps were increased from 64 to 1024

L1 = λ1Ladv + λ2Lmse + λ3Ledge eth3THORN

Stage 1 Reconstruction Loss Function The loss functioncontains three parts adversarial loss image fidelity lossand edge fidelity loss (Eq (3)) These parts each capture dif-ferent perceptual characteristics of the reconstructed imageto obtain a more visually satisfactory reconstructed image

where the weight fλig is a trade-off parameter that isused to balance multiple loss components The first part isthe adversarial loss between the generator G0 and the dis-criminator D0 of GAN This part encourages the generatorto trick the discriminator network to produce a more realisticHR image as follows

Ladv = N

n=1minus log DθD

GθGILR0

eth4THORN

whereDθDethGθG

ethILR0THORNTHORN is the estimated probability of the recon-

structed image GθGethILR0THORN to be a true HR image To obtain a

better gradient we use the minimization minuslog DθDethGθG

ethILR0THORNTHORNto replace the minimizationlog frac121 minusDθD

ethGθGethILR0THORNTHORN

Table 1 Method comparison quantitative results of the SRCNNESPCN SRGAN ESRGAN and ESRGAN+ methods and theproposed method on the FLIR_ ADAS_1_3 validation dataset

CCPSNR(dB)

SSIM VIF UIQI TIME (s)

SRCNN 08160 183963 08655 07942 08770 27112

ESPCN 08627 277171 09349 08269 09398 10206

SRGAN 08463 280445 09497 08120 08503 09202

ESRGAN 09547 285453 09506 07918 09382 01093

ESRGAN+ 09806 337018 09855 09119 09864 02713

Ours 09964 379865 09980 09083 09965 01145

Table 2 Method comparison quantitative results of the SRCNNESPCN SRGAN ESRGAN and ESRGAN+ methods and theproposed method on the Itir_v1_0 dataset

CCPSNR(dB)

SSIM VIF UIQI TIME (s)

SRCNN 07937 202460 08966 07079 08746 25706

ESPCN 08464 221708 09185 08194 09877 10020

SRGAN 08152 226732 09154 07944 08610 06642

ESRGAN 09146 303205 09608 07752 09166 00937

ESRGAN+ 09547 341024 09844 08497 09541 02558

Ours 09962 352865 09943 09064 09936 01059

6 International Journal of Digital Multimedia Broadcasting

The second term of Eq (3) Lmse ensures the fidelity ofthe restored image using the pixel-level mean square error(MSE) loss as follows

Lmse = 1WHC IHR0 minus ISR0

22

eth5THORN

whereWH and C are the height the width and the numberof channels respectively of the image

The third term of Eq (3) Ledge the edge fidelity loss ispurported to reproduce sharp edge information as follows

Ledge = 1WH IandE minus IE

22

eth6THORN

whereW and H are the width and the height respectively ofthe image The labeled edge map IE is extracted by a specificedge filter on the real 256 times 256 image IHR0 while IandE isextracted by a specific edge filter on the 256 times 256 imageISR0 generated by the generator G0 In our experiments wechose the Canny edge detection operator By minimizingthe edge fidelity loss the network continuously guides edgerecovery

22 Stage 2 GAN In the second stage we use the generated256 times 256 low-resolution image ISR0 as input and export itto the Stage 2 generator G1 to generate a 512 times 512 imageISR1 which together with a real 512 times 512 image IHR1 isexported to the Stage 2 discriminator D1 to identify if it isfalse or real We adopt the network design described in theliterature [15] in which a generator model G1 (Figure 5)

SRCNN

ESPCN

SRGAN

ESRGAN

ESRGAN+

Ours

Figure 8 Comparison of edge detection results of superresolution reconstruction results

7International Journal of Digital Multimedia Broadcasting

containing 16 residual blocks is constructed The output for-mat and parameters of each layer of the generator G1 are asfollows

C R 64eth THORN⟶ C 64eth THORNBN Reth THORNC 64eth THORNBN Reth THORNSC⟶ 16⋯⟶ C R 64eth THORNSC⟶ C 256eth THORN⟶ C t 3eth THORN

eth7THORN

where C(R64) denotes a set of convolutional layers with64 feature maps and activation function ReLUC(64)BN(R)C(64)BN(R)SC represents a residual block

The network structure of the discriminator networkD1 inthe second stage adopts a network structure similar to that ofthe discriminator D0 Each layer structure and networkparameters of D1 are as follows

C LR 64eth THORN⟶ C 64eth THORNBN LReth THORN⟶ C 128eth THORNBN LReth THORN⟶ C 256eth THORNBN LReth THORN⟶ C 512eth THORNBN LReth THORN⟶ C 1024eth THORNBN LReth THORN⟶ C LR 1024eth THORN⟶D

eth8THORN

L2 = λ1primeLadv1 + λ2primeLmse1 + λ3primeL feature eth9THORNwhere the weight fλiprimeg is a trade-off parameter that is used tobalance multiple loss components The first term Ladv1 isthe adversarial loss between the generator G1 and the dis-criminator D1 of GAN The second termLmse1 is the imagefidelity loss The third term L feature is the feature fidelity

loss which is defined based on the feature space distancedefined in the literature [34] to facilitate the preservationof feature representation in the reconstructed image similarto that of the original image as follows

L feature = 1WHC ϕ IHR0 minus ISR0

22

eth10THORN

whereWH and C are the height the width and the numberof channels respectively of the image and ϕethsdotTHORN represents thefeature space function which is a pretrained VGG-19 [33]network that maps images to feature space The fourth pool-ing layer is used to calculate the feature-activated L2 distanceto be used as the feature fidelity loss function

Stage 2 Reconstruction Loss Function The loss functioncontains three parts adversarial loss image fidelity lossand feature fidelity loss (Eq (9))

3 Experiments

31 Experimental Details All experiments are performed ona desktop computer with 220GHz times40 Intel Xeon (R) Silver4114 CPU GeForce GTX 1080Ti and 64GB of memory Weuse the PyTorch deep learning framework to implement themethod proposed in this article and all comparison methodsAll methods are trained with the FLIR training data set Dur-ing the training process we choose 8862 images from theFLIR_ADAS_1_3 thermal sensor training dataset released

(a) SRCNN super-resolution reconstruction image matching results (b) ESPCN super-resolution reconstruction image matching results

(c) SRGAN super-resolution reconstruction image matching results (d) ESRGAN super-resolution reconstruction image matching results

(e) ESRGAN+ super-resolution reconstruction image matching results (f) Our method super-resolution reconstruction image matching results

Figure 9 Superresolution reconstruction image matching results (the left image is the original high-resolution image)

8 International Journal of Digital Multimedia Broadcasting

by FLIR Systems Inc a sensor system developer in 2018First we perform the 4x factor downsampling on all experi-mental data to obtain low-resolution (LR) images by lower-ing the resolution of the HR images We set the batch sizeto 4 and used the Adam [35] with a momentum term of β= 09 as the optimization procedure To enable the loss func-tion in the same order of magnitude to better balance losscomponents we set λ1λ2prime and λ3prime of Eq (9) to 10-3 1 and10-6 respectively When training the Stage 1 GAN we set

the learning rate to 10-4 which is lowered to 10-5 when train-ing the Stage 2 GAN

32 Experimental Evaluation To verify the effectiveness of theproposed method we conduct the validation on two publicdata sets the FLIR_ ADAS_1_3 (1366 images) validation setand the Itir_v1_0 dataset (11262 images) We compared theproposed method with the most advanced methods includingsuperresolution using deep convolutional networks (SRCNN)

LR

SRCNN

ESPCN

SRGAN

ESRGAN

ESRGAN+

Our method

Figure 10 Target detection result of superresolution reconstructed image

9International Journal of Digital Multimedia Broadcasting

[5] ESPCN [8] SRGAN [10] ESRGAN [15] and ESRGAN+[16] methods Three images are selected from the verificationset of FLIR_ADAS_1_3 and the data set of Itir_v1_0 and thesubjective results of several methods are shown in Figures 6and 7 From the reconstruction results it is not difficult tosee that the reconstruction results of our proposed methodproduce finer texture and edge details

To facilitate a fair quantitative comparison we used thecorrelation coefficient (CC) [36] the peak-signal-to-noiseratio (PSNR) [37] the Structural Similarity Index Measure(SSIM) [38] the visual information fidelity (VIF) [39] theUniversal Image Quality Index (UIQI) and the time con-sumption six objective indicators to evaluate the quality ofthe reconstructed images and the SR methods The quantita-tive results of the comparison of different reconstructionmethods are shown in Table 1 which shows that our methodis superior to the SRCNN [5] ESPCN [8] SRGAN [13] ESR-GAN [15] and ESRGAN+ [16] methods in the indicators ofCC PSNR SSIM and UIQI on the FLIR data set The VIFindex is slightly lower than the ESRGAN+ method and thetime consumption is greater than the ESRGAN method

The quantitative results of comparison of different recon-struction methods on the Itir_v1_0 dataset are shown inTable 2 indicating that the proposed method is superior tothe SRCNN [5] ESPCN [8] SRGAN [13] ESRGAN [15]and ESRGAN+ [16] methods on the Itir_v1_0 dataset Onlythe time consumption is slightly greater than the ESRGANmethod

In order to more intuitively illustrate the effectiveness ofthe superresolution method proposed in this work forimproving the edge features of infrared images we show inFigure 8 the comparison of the edge detection results of thesuperresolution reconstruction results of various methodsIt can be seen from the figure that the image reconstructedby our method has more and clearer edge information whichis meaningful for the application of infrared images

33 Use Advanced Vision Tasks to Compare SuperresolutionResults Basic vision tasks including image superresolutionreconstruction are all for advanced vision tasks Infraredimages are widely used in target detection and target match-ing tasks but the shortcomings of low resolution and unclearedges of infrared images affect the accuracy of the abovetasks Therefore whether the result of infrared image super-resolution reconstruction can improve the accuracy of theabove tasks has become an evaluation index of the result ofsuperresolution reconstruction In order to further verifyour method we match the superresolution images generatedby several methods with real high-resolution images Scale

Invariant Feature Transform (SIFT) is a representation ofthe statistical results of Gaussian image gradients in the fieldof feature points and is a commonly used image local featureextraction algorithm In the matching result the number ofmatching points can be used as a criterion for matching qual-ity and the corresponding matching points can also deter-mine the similarity of the local features of the two imagesFigure 9 shows the result of matching the superresolutionreconstructed image with the original high-resolution imagethrough the SIFT algorithm It can be seen from the quantitythat the reconstructed image produced by our proposedmethod obtains more correct matching pairs than othermethods

In this experiment we use the classic YOLO [40] methodfor image target detection (Figure 10) It can be seen that thesuperresolution reconstructed image generated by our pro-posed method has better detection results and can detectmore targets

4 Discussion and Future Work

Through experiments we demonstrate that compared withother methods the proposed method has a better perceptualperformance However during the experiment we alsofound that for some images the reconstruction results arenot satisfactory as shown in Figure 11 By analyzing theseimages we found that they have common characteristicsie when the imaging device and the imaging object are mov-ing at high relative speeds the captured image may containmotion blur For such images ordinary SR reconstructionmethods cannot achieve effective edge recovery Thereforein future studies we will address these problems

5 Conclusion

In this study we propose a two-stage GAN framework that isable to reconstruct SR image by recovering edge structureinformation and retaining feature information In the firststage the image fidelity loss the adversarial loss and the edgefidelity loss are combined to preserve the edges of the imageIn the second stage the image adversarial loss the imagefidelity loss and the feature fidelity loss are combined to mineimage visual features By iteratively updating the generativenetwork and the discriminator network the SR reconstruc-tion of infrared image with preserved edges is achievedExperimental verification results show that the proposedmethod outperforms several other image reconstructionmethods in reconstructing SR infrared images

Figure 11 Images with motion blur cannot obtain clear edge information using the ordinary SR reconstruction methods

10 International Journal of Digital Multimedia Broadcasting

Data Availability

The data used to support the findings of this study are avail-able from the corresponding author upon request

Conflicts of Interest

The authors declare that there is no conflict of interestregarding the publication of this paper

References

[1] L Xiang-di et al ldquoInfrared imaging system and applicationsrdquoLaser amp Infrared vol 3 2014

[2] X Sui Q Chen G Gu and X Shen ldquoInfrared super-resolution imaging based on compressed sensingrdquo InfraredPhysics amp Technology vol 63 pp 119ndash124 2014

[3] X Sui H Gao Y Sun Q Chen and G Gu ldquoInfrared super-resolution imaging method based on retina micro-motionrdquoInfrared Physics amp Technology vol 60 pp 340ndash345 2013

[4] J L Suo J I Xiangyang and Q H Dai ldquoAn overview of com-putational photographyrdquo Science China(Information Sciences)vol 6 pp 5ndash24 2012

[5] D Chao C L Chen K He and X Tang ldquoLearning a DeepConvolutional Network for Image Super-Resolutionrdquo in Euro-pean Conference on Computer Vision Springer 2014

[6] C Dong C C Loy K He and X Tang ldquoImage super-resolution using deep convolutional networksrdquo IEEE Transac-tions on Pattern Analysis and Machine Intelligence vol 38no 2 pp 295ndash307 2014

[7] J Kim J Lee and K Lee Deeply-Recursive Convolutional Net-work for Image Super-Resolution 2016 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) IEEE pp1637ndash1645 2016

[8] W Shi J Caballero F Huszar et al ldquoReal-Time Single Imageand Video Super-Resolution Using an Efficient Sub-Pixel Con-volutional Neural Networkrdquo in 2016 IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR) Las Vegas NVUSA 2016

[9] I Goodfellow J Pouget-Abadie M Mirza et al ldquoGenerativeAdversarial Networksrdquo Advances in Neural Information Pro-cessing Systems vol 3 pp 2672ndash2680 2014

[10] C Wang S Dong X Zhao G Papanastasiou H Zhang andG Yang ldquoSaliencyGAN deep learning semisupervised salientobject detection in the fog of IoTrdquo IEEE Transactions onIndustrial Informatics vol 16 no 4 pp 2667ndash2676 2020

[11] G Yang S Yu H Dong et al ldquoDAGAN deep de-aliasing gen-erative adversarial networks for fast compressed sensing MRIreconstructionrdquo IEEE Transactions on Medical Imagingvol 37 no 6 pp 1310ndash1321 2018

[12] S Yu D Hao G Yang G Slabaugh and Y Guo ldquoDeep De-Aliasing for Fast Compressive Sensing MRIrdquo 2017 httpsarxivorgabs170507137

[13] C Ledig L Theis F Huszar et al ldquoPhoto-Realistic SingleImage Super-Resolution Using a Generative Adversarial Net-workrdquo in 2017 IEEE Conference on Computer Vision and Pat-tern Recognition (CVPR) Honolulu HI USA 2016

[14] X Wang K Yu S Wu et al ldquoESRGAN Enhanced Super-Resolution Generative Adversarial Networksrdquo in LectureNotes in Computer Science Springer 2018

[15] Q Mao S Wang S Wang X Zhang and S Ma ldquoEnhancedImage Decoding Via Edge-Preserving Generative AdversarialNetworksrdquo in 2018 IEEE International Conference on Multi-media and Expo (ICME) San Diego CA USA 2018

[16] N C Rakotonirina and A Rasoanaivo ldquoESRGAN+ furtherimproving enhanced super-resolution generative adversarialnetworkrdquo in ICASSP 2020 - 2020 IEEE international confer-ence on acoustics Speech and Signal Processing (ICASSP)pp 3637ndash3641 Barcelona Spain 2020

[17] C Kraichan and S Pumrin ldquoPerformance Analysis on Multi-Frame Image Super-Resolution Via Sparse Representationrdquo in2014 International Electrical Engineering Congress (iEECON)Chonburi Thailand 2014

[18] S Yu-bao W Zhi-hui X Liang Z Zhen-rong and L Zhan-Qiang ldquoMultimorphology Sparsity regularized image super-resolutionrdquo Acta Electronica Sinica vol 38 no 12 pp 2898ndash2903 2010

[19] S Yang M Wang Y Chen and Y Sun ldquoSingle-image super-resolution reconstruction via learned geometric dictionariesand clustered sparse codingrdquo IEEE Transactions on Image Pro-cessing vol 21 no 9 pp 4016ndash4028 2012

[20] Y Tang Y Yuan P Yan and X Li ldquoGreedy regression insparse coding space for single-image super-resolutionrdquo Jour-nal of Visual Communication amp Image Representationvol 24 no 2 pp 148ndash159 2013

[21] L Qiu-sheng and Z Wei ldquoImage super-resolution algorithmsbased on sparse representation of classified image patchesrdquoActa Electronica Sinica vol 40 no 5 pp 920ndash925 2012

[22] H Zhang T Xu H Li et al ldquoStackGAN Text to Photo-Realistic Image Synthesis with Stacked Generative AdversarialNetworksrdquo in 2017 IEEE International Conference on Com-puter Vision (ICCV) Venice Italy 2017

[23] H Zhang T Xu H Li et al ldquoStackGAN++ realistic imagesynthesis with stacked generative adversarial networksrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 41 pp 1947ndash1962 2018

[24] K He X Zhang S Ren and J Sun ldquoDeep Residual Learningfor Image Recognitionrdquo in 2016 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) pp 770ndash778 LasVegas NV USA 2016

[25] J Zhu G Yang P Ferreira et al ldquoA Roi Focused Multi-Scale Super-Resolution Method for the Diffusion TensorCardiac Magnetic Resonancerdquo in Proceedings of the 27thAnnual Meeting (ISMRM) pp 11ndash16 Montreacuteal QC Can-ada 2019

[26] J Zhu G Yang and P Lio ldquoLesion Focused Super Resolu-tionrdquo in Medical Imaging Image Processing 2019 San DiegoCA USA 2019

[27] J Zhu G Yang and P Lio ldquoHow Can We Make GAN Per-form Better in Single Medical Image Super-Resolution ALesion Focused Multi-Scale Approachrdquo in 2019 IEEE 16thInternational Symposium on Biomedical Imaging (ISBI 2019)Venice Italy 2019

[28] M Seitzer G Yang J Schlemper et al ldquoAdversarial and Per-ceptual Refinement for Compressed Sensing MRI Reconstruc-tionrdquo in Medical Image Computing and Computer AssistedIntervention ndash MICCAI 2018 Springer 2018

[29] S Ioffe and C Szegedy ldquoBatch Normalization AcceleratingDeep Network Training by Reducing Internal CovariateShiftrdquo in International conference on machine learningPMLR 2015

11International Journal of Digital Multimedia Broadcasting

[30] A Krizhevsky I Sutskever and G Hinton ldquoImageNet classi-fication with deep convolutional neural networksrdquo Advancesin Neural Information Processing Systems vol 25 no 2 2012

[31] A Radford L Metz and S Chintala ldquoUnsupervised Represen-tation Learning with Deep Convolutional Generative Adver-sarial Networksrdquo 2015 httpsarxivorgabs151106434

[32] A L Maas A Y Hannun and A Y Ng ldquoRectifier nonlinear-ities improve neural network acoustic modelsrdquo in Proceedingsof the 30 th International Conference on Machine Learningp 3 Atlanta Georgia USA 2013

[33] K Simonyan and A Zisserman ldquoVery Deep ConvolutionalNetworks for Large-Scale Image Recognitionrdquo Computer Sci-ence 2014 httpsarxivorgabs14091556

[34] J Johnson A Alahi and L Fei-Fei ldquoPerceptual Losses forReal-Time Style Transfer and Super-Resolutionrdquo in ComputerVision ndash ECCV 2016 Springer 2016

[35] D P Kingma and J Ba ldquoAdam AMethod for Stochastic Opti-mizationrdquo Computer Science 2014 httpsarxivorgabs14126980

[36] M Deshmukh and U Bhosale ldquoImage fusion and image qual-ity assessment of fused imagesrdquo International Journal of ImageProcessing (IJIP) vol 4 no 5 p 484 2010

[37] C Yim and A C Bovik ldquoQuality assessment of deblockedimagesrdquo IEEE Transactions on Image Processing vol 20no 1 pp 88ndash98 2011

[38] Z Wang A C Bovik H R Sheikh and E P SimoncellildquoImage quality assessment from error visibility to structuralsimilarityrdquo IEEE Transactions on Image Processing vol 13no 4 pp 600ndash612 2004

[39] Y Han Y Cai Y Cao and X Xu ldquoA new image fusion perfor-mance metric based on visual information fidelityrdquo Informa-tion fusion vol 14 no 2 pp 127ndash135 2013

[40] J Redmon S Divvala R Girshick and A Farhadi ldquoYou OnlyLook Once Unified Real-Time Object Detectionrdquo in 2016IEEE Conference on Computer Vision and Pattern Recognition(CVPR) Las Vegas NV USA 2015

12 International Journal of Digital Multimedia Broadcasting

  • Generative Adversarial Network-Based Edge-Preserving Superresolution Reconstruction of Infrared Images
  • 1 Introduction
  • 2 Method
    • 21 Stage 1 GAN
    • 22 Stage 2 GAN
      • 3 Experiments
        • 31 Experimental Details
        • 32 Experimental Evaluation
        • 33 Use Advanced Vision Tasks to Compare Superresolution Results
          • 4 Discussion and Future Work
          • 5 Conclusion
          • Data Availability
          • Conflicts of Interest
Page 6: Generative Adversarial Network-Based Edge-Preserving ......2.1. Stage 1 GAN. In Stage 1, we use the 128×128 low-resolution image ILR0 as input and export it to the Stage 1 generator

the network design of Ledig et al [10] and introduce the skipconnection which has been proven effective in training deepneural networks We adopt the residual block proposed inthe literature [24] to construct a neural network with sixresidual blocks used as a stack to extract features from theimage Recent work on single-image superresolution [25ndash28] pointed out that with the deepening of the network andthe training and testing phases under the GAN frameworkhallucination artifacts often appear In order to solve thisproblem we are following the method in [29] and adding abatch normalization layer (BN) to the network Each residualblock contains two convolutional layers with a kernel size of3 times 3 and 64 feature maps two BN layers and one parametricrectified linear unit (PReLU) layer [30] The specific settingsof each layer of the generative model are as follows

C PR 64eth THORN⟶ C 64eth THORNBN PReth THORNC 64eth THORNBN PReth THORNSC⟶ 6⋯⟶ C 64eth THORNBN PReth THORNSC⟶ C PR 256eth THORN⟶ C t 3eth THORN

eth1THORN

Here C(PR64) denotes a set of convolutional layers with64 feature maps and activation function PReLUC(64)BN(PR)C(64)BN(PR)SC represents a residual blockBN(PR) is a batch normalization layer with activation func-tion PReLU and SC denotes a skip connection There arein total 6 residual blocks C(t3) represents a convolutionallayer with 3 feature maps and activation function tanh

To distinguish the generated SR samples from real high-resolution (HR) samples we train the discriminator networkD0 whose overall framework is shown in Figure 4 and whichis adopted from the architectural framework summarized byRadford et al [31] in which LeakyReLU [32] activation isused to avoid maximum pooling of the entire network Thediscriminator network includes ten convolutional blocksAll blocks except for the first block contain a convolutionallayer a BN layer and a LeakyReLU layer The number of ker-nels within the filter increases continually from 64 in theVisual Geometry Group (VGG) network [33] to 1024 theneach time the number of kernels increases segmented convo-lution is used to lower the image resolution After that a spe-cial residual block containing two convolution layers and aLeakyReLU layer is connected The output of the last convo-lution unit is sent to a dense layer with an S-type activationfunction to get a true and false result The structure andparameters of each layer of the discriminator D0 networkare as follows

C LR 64eth THORN⟶ C 64eth THORNBN LReth THORN⟶ C 128eth THORNBN LReth THORN⟶ C 256eth THORNBN LReth THORN⟶ C 512eth THORNBN LReth THORN⟶ C 1024eth THORNBN LReth THORN⟶ C LR 1024eth THORN⟶D

eth2THORN

Here LR denotes the activation function LeakyReLU CethLR 64THORN denotes a set of convolutional layers with 64 featuremaps and activation function LeakyReLU Ceth64THORNBNethLRTHORNdenotes a set of convolutional layers with 64 feature mapsfollowed by batch-normalization with activation function

LeakyReLU and D is the dense layer for outputting The fea-ture maps were increased from 64 to 1024

L1 = λ1Ladv + λ2Lmse + λ3Ledge eth3THORN

Stage 1 Reconstruction Loss Function The loss functioncontains three parts adversarial loss image fidelity lossand edge fidelity loss (Eq (3)) These parts each capture dif-ferent perceptual characteristics of the reconstructed imageto obtain a more visually satisfactory reconstructed image

where the weight fλig is a trade-off parameter that isused to balance multiple loss components The first part isthe adversarial loss between the generator G0 and the dis-criminator D0 of GAN This part encourages the generatorto trick the discriminator network to produce a more realisticHR image as follows

Ladv = N

n=1minus log DθD

GθGILR0

eth4THORN

whereDθDethGθG

ethILR0THORNTHORN is the estimated probability of the recon-

structed image GθGethILR0THORN to be a true HR image To obtain a

better gradient we use the minimization minuslog DθDethGθG

ethILR0THORNTHORNto replace the minimizationlog frac121 minusDθD

ethGθGethILR0THORNTHORN

Table 1 Method comparison quantitative results of the SRCNNESPCN SRGAN ESRGAN and ESRGAN+ methods and theproposed method on the FLIR_ ADAS_1_3 validation dataset

CCPSNR(dB)

SSIM VIF UIQI TIME (s)

SRCNN 08160 183963 08655 07942 08770 27112

ESPCN 08627 277171 09349 08269 09398 10206

SRGAN 08463 280445 09497 08120 08503 09202

ESRGAN 09547 285453 09506 07918 09382 01093

ESRGAN+ 09806 337018 09855 09119 09864 02713

Ours 09964 379865 09980 09083 09965 01145

Table 2 Method comparison quantitative results of the SRCNNESPCN SRGAN ESRGAN and ESRGAN+ methods and theproposed method on the Itir_v1_0 dataset

CCPSNR(dB)

SSIM VIF UIQI TIME (s)

SRCNN 07937 202460 08966 07079 08746 25706

ESPCN 08464 221708 09185 08194 09877 10020

SRGAN 08152 226732 09154 07944 08610 06642

ESRGAN 09146 303205 09608 07752 09166 00937

ESRGAN+ 09547 341024 09844 08497 09541 02558

Ours 09962 352865 09943 09064 09936 01059

6 International Journal of Digital Multimedia Broadcasting

The second term of Eq (3) Lmse ensures the fidelity ofthe restored image using the pixel-level mean square error(MSE) loss as follows

Lmse = 1WHC IHR0 minus ISR0

22

eth5THORN

whereWH and C are the height the width and the numberof channels respectively of the image

The third term of Eq (3) Ledge the edge fidelity loss ispurported to reproduce sharp edge information as follows

Ledge = 1WH IandE minus IE

22

eth6THORN

whereW and H are the width and the height respectively ofthe image The labeled edge map IE is extracted by a specificedge filter on the real 256 times 256 image IHR0 while IandE isextracted by a specific edge filter on the 256 times 256 imageISR0 generated by the generator G0 In our experiments wechose the Canny edge detection operator By minimizingthe edge fidelity loss the network continuously guides edgerecovery

22 Stage 2 GAN In the second stage we use the generated256 times 256 low-resolution image ISR0 as input and export itto the Stage 2 generator G1 to generate a 512 times 512 imageISR1 which together with a real 512 times 512 image IHR1 isexported to the Stage 2 discriminator D1 to identify if it isfalse or real We adopt the network design described in theliterature [15] in which a generator model G1 (Figure 5)

SRCNN

ESPCN

SRGAN

ESRGAN

ESRGAN+

Ours

Figure 8 Comparison of edge detection results of superresolution reconstruction results

7International Journal of Digital Multimedia Broadcasting

containing 16 residual blocks is constructed The output for-mat and parameters of each layer of the generator G1 are asfollows

C R 64eth THORN⟶ C 64eth THORNBN Reth THORNC 64eth THORNBN Reth THORNSC⟶ 16⋯⟶ C R 64eth THORNSC⟶ C 256eth THORN⟶ C t 3eth THORN

eth7THORN

where C(R64) denotes a set of convolutional layers with64 feature maps and activation function ReLUC(64)BN(R)C(64)BN(R)SC represents a residual block

The network structure of the discriminator networkD1 inthe second stage adopts a network structure similar to that ofthe discriminator D0 Each layer structure and networkparameters of D1 are as follows

C LR 64eth THORN⟶ C 64eth THORNBN LReth THORN⟶ C 128eth THORNBN LReth THORN⟶ C 256eth THORNBN LReth THORN⟶ C 512eth THORNBN LReth THORN⟶ C 1024eth THORNBN LReth THORN⟶ C LR 1024eth THORN⟶D

eth8THORN

L2 = λ1primeLadv1 + λ2primeLmse1 + λ3primeL feature eth9THORNwhere the weight fλiprimeg is a trade-off parameter that is used tobalance multiple loss components The first term Ladv1 isthe adversarial loss between the generator G1 and the dis-criminator D1 of GAN The second termLmse1 is the imagefidelity loss The third term L feature is the feature fidelity

loss which is defined based on the feature space distancedefined in the literature [34] to facilitate the preservationof feature representation in the reconstructed image similarto that of the original image as follows

L feature = 1WHC ϕ IHR0 minus ISR0

22

eth10THORN

whereWH and C are the height the width and the numberof channels respectively of the image and ϕethsdotTHORN represents thefeature space function which is a pretrained VGG-19 [33]network that maps images to feature space The fourth pool-ing layer is used to calculate the feature-activated L2 distanceto be used as the feature fidelity loss function

Stage 2 Reconstruction Loss Function The loss functioncontains three parts adversarial loss image fidelity lossand feature fidelity loss (Eq (9))

3 Experiments

31 Experimental Details All experiments are performed ona desktop computer with 220GHz times40 Intel Xeon (R) Silver4114 CPU GeForce GTX 1080Ti and 64GB of memory Weuse the PyTorch deep learning framework to implement themethod proposed in this article and all comparison methodsAll methods are trained with the FLIR training data set Dur-ing the training process we choose 8862 images from theFLIR_ADAS_1_3 thermal sensor training dataset released

(a) SRCNN super-resolution reconstruction image matching results (b) ESPCN super-resolution reconstruction image matching results

(c) SRGAN super-resolution reconstruction image matching results (d) ESRGAN super-resolution reconstruction image matching results

(e) ESRGAN+ super-resolution reconstruction image matching results (f) Our method super-resolution reconstruction image matching results

Figure 9 Superresolution reconstruction image matching results (the left image is the original high-resolution image)

8 International Journal of Digital Multimedia Broadcasting

by FLIR Systems Inc a sensor system developer in 2018First we perform the 4x factor downsampling on all experi-mental data to obtain low-resolution (LR) images by lower-ing the resolution of the HR images We set the batch sizeto 4 and used the Adam [35] with a momentum term of β= 09 as the optimization procedure To enable the loss func-tion in the same order of magnitude to better balance losscomponents we set λ1λ2prime and λ3prime of Eq (9) to 10-3 1 and10-6 respectively When training the Stage 1 GAN we set

the learning rate to 10-4 which is lowered to 10-5 when train-ing the Stage 2 GAN

32 Experimental Evaluation To verify the effectiveness of theproposed method we conduct the validation on two publicdata sets the FLIR_ ADAS_1_3 (1366 images) validation setand the Itir_v1_0 dataset (11262 images) We compared theproposed method with the most advanced methods includingsuperresolution using deep convolutional networks (SRCNN)

LR

SRCNN

ESPCN

SRGAN

ESRGAN

ESRGAN+

Our method

Figure 10 Target detection result of superresolution reconstructed image

9International Journal of Digital Multimedia Broadcasting

[5] ESPCN [8] SRGAN [10] ESRGAN [15] and ESRGAN+[16] methods Three images are selected from the verificationset of FLIR_ADAS_1_3 and the data set of Itir_v1_0 and thesubjective results of several methods are shown in Figures 6and 7 From the reconstruction results it is not difficult tosee that the reconstruction results of our proposed methodproduce finer texture and edge details

To facilitate a fair quantitative comparison we used thecorrelation coefficient (CC) [36] the peak-signal-to-noiseratio (PSNR) [37] the Structural Similarity Index Measure(SSIM) [38] the visual information fidelity (VIF) [39] theUniversal Image Quality Index (UIQI) and the time con-sumption six objective indicators to evaluate the quality ofthe reconstructed images and the SR methods The quantita-tive results of the comparison of different reconstructionmethods are shown in Table 1 which shows that our methodis superior to the SRCNN [5] ESPCN [8] SRGAN [13] ESR-GAN [15] and ESRGAN+ [16] methods in the indicators ofCC PSNR SSIM and UIQI on the FLIR data set The VIFindex is slightly lower than the ESRGAN+ method and thetime consumption is greater than the ESRGAN method

The quantitative results of comparison of different recon-struction methods on the Itir_v1_0 dataset are shown inTable 2 indicating that the proposed method is superior tothe SRCNN [5] ESPCN [8] SRGAN [13] ESRGAN [15]and ESRGAN+ [16] methods on the Itir_v1_0 dataset Onlythe time consumption is slightly greater than the ESRGANmethod

In order to more intuitively illustrate the effectiveness ofthe superresolution method proposed in this work forimproving the edge features of infrared images we show inFigure 8 the comparison of the edge detection results of thesuperresolution reconstruction results of various methodsIt can be seen from the figure that the image reconstructedby our method has more and clearer edge information whichis meaningful for the application of infrared images

33 Use Advanced Vision Tasks to Compare SuperresolutionResults Basic vision tasks including image superresolutionreconstruction are all for advanced vision tasks Infraredimages are widely used in target detection and target match-ing tasks but the shortcomings of low resolution and unclearedges of infrared images affect the accuracy of the abovetasks Therefore whether the result of infrared image super-resolution reconstruction can improve the accuracy of theabove tasks has become an evaluation index of the result ofsuperresolution reconstruction In order to further verifyour method we match the superresolution images generatedby several methods with real high-resolution images Scale

Invariant Feature Transform (SIFT) is a representation ofthe statistical results of Gaussian image gradients in the fieldof feature points and is a commonly used image local featureextraction algorithm In the matching result the number ofmatching points can be used as a criterion for matching qual-ity and the corresponding matching points can also deter-mine the similarity of the local features of the two imagesFigure 9 shows the result of matching the superresolutionreconstructed image with the original high-resolution imagethrough the SIFT algorithm It can be seen from the quantitythat the reconstructed image produced by our proposedmethod obtains more correct matching pairs than othermethods

In this experiment we use the classic YOLO [40] methodfor image target detection (Figure 10) It can be seen that thesuperresolution reconstructed image generated by our pro-posed method has better detection results and can detectmore targets

4 Discussion and Future Work

Through experiments we demonstrate that compared withother methods the proposed method has a better perceptualperformance However during the experiment we alsofound that for some images the reconstruction results arenot satisfactory as shown in Figure 11 By analyzing theseimages we found that they have common characteristicsie when the imaging device and the imaging object are mov-ing at high relative speeds the captured image may containmotion blur For such images ordinary SR reconstructionmethods cannot achieve effective edge recovery Thereforein future studies we will address these problems

5 Conclusion

In this study we propose a two-stage GAN framework that isable to reconstruct SR image by recovering edge structureinformation and retaining feature information In the firststage the image fidelity loss the adversarial loss and the edgefidelity loss are combined to preserve the edges of the imageIn the second stage the image adversarial loss the imagefidelity loss and the feature fidelity loss are combined to mineimage visual features By iteratively updating the generativenetwork and the discriminator network the SR reconstruc-tion of infrared image with preserved edges is achievedExperimental verification results show that the proposedmethod outperforms several other image reconstructionmethods in reconstructing SR infrared images

Figure 11 Images with motion blur cannot obtain clear edge information using the ordinary SR reconstruction methods

10 International Journal of Digital Multimedia Broadcasting

Data Availability

The data used to support the findings of this study are avail-able from the corresponding author upon request

Conflicts of Interest

The authors declare that there is no conflict of interestregarding the publication of this paper

References

[1] L Xiang-di et al ldquoInfrared imaging system and applicationsrdquoLaser amp Infrared vol 3 2014

[2] X Sui Q Chen G Gu and X Shen ldquoInfrared super-resolution imaging based on compressed sensingrdquo InfraredPhysics amp Technology vol 63 pp 119ndash124 2014

[3] X Sui H Gao Y Sun Q Chen and G Gu ldquoInfrared super-resolution imaging method based on retina micro-motionrdquoInfrared Physics amp Technology vol 60 pp 340ndash345 2013

[4] J L Suo J I Xiangyang and Q H Dai ldquoAn overview of com-putational photographyrdquo Science China(Information Sciences)vol 6 pp 5ndash24 2012

[5] D Chao C L Chen K He and X Tang ldquoLearning a DeepConvolutional Network for Image Super-Resolutionrdquo in Euro-pean Conference on Computer Vision Springer 2014

[6] C Dong C C Loy K He and X Tang ldquoImage super-resolution using deep convolutional networksrdquo IEEE Transac-tions on Pattern Analysis and Machine Intelligence vol 38no 2 pp 295ndash307 2014

[7] J Kim J Lee and K Lee Deeply-Recursive Convolutional Net-work for Image Super-Resolution 2016 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) IEEE pp1637ndash1645 2016

[8] W Shi J Caballero F Huszar et al ldquoReal-Time Single Imageand Video Super-Resolution Using an Efficient Sub-Pixel Con-volutional Neural Networkrdquo in 2016 IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR) Las Vegas NVUSA 2016

[9] I Goodfellow J Pouget-Abadie M Mirza et al ldquoGenerativeAdversarial Networksrdquo Advances in Neural Information Pro-cessing Systems vol 3 pp 2672ndash2680 2014

[10] C Wang S Dong X Zhao G Papanastasiou H Zhang andG Yang ldquoSaliencyGAN deep learning semisupervised salientobject detection in the fog of IoTrdquo IEEE Transactions onIndustrial Informatics vol 16 no 4 pp 2667ndash2676 2020

[11] G Yang S Yu H Dong et al ldquoDAGAN deep de-aliasing gen-erative adversarial networks for fast compressed sensing MRIreconstructionrdquo IEEE Transactions on Medical Imagingvol 37 no 6 pp 1310ndash1321 2018

[12] S Yu D Hao G Yang G Slabaugh and Y Guo ldquoDeep De-Aliasing for Fast Compressive Sensing MRIrdquo 2017 httpsarxivorgabs170507137

[13] C Ledig L Theis F Huszar et al ldquoPhoto-Realistic SingleImage Super-Resolution Using a Generative Adversarial Net-workrdquo in 2017 IEEE Conference on Computer Vision and Pat-tern Recognition (CVPR) Honolulu HI USA 2016

[14] X Wang K Yu S Wu et al ldquoESRGAN Enhanced Super-Resolution Generative Adversarial Networksrdquo in LectureNotes in Computer Science Springer 2018

[15] Q Mao S Wang S Wang X Zhang and S Ma ldquoEnhancedImage Decoding Via Edge-Preserving Generative AdversarialNetworksrdquo in 2018 IEEE International Conference on Multi-media and Expo (ICME) San Diego CA USA 2018

[16] N C Rakotonirina and A Rasoanaivo ldquoESRGAN+ furtherimproving enhanced super-resolution generative adversarialnetworkrdquo in ICASSP 2020 - 2020 IEEE international confer-ence on acoustics Speech and Signal Processing (ICASSP)pp 3637ndash3641 Barcelona Spain 2020

[17] C Kraichan and S Pumrin ldquoPerformance Analysis on Multi-Frame Image Super-Resolution Via Sparse Representationrdquo in2014 International Electrical Engineering Congress (iEECON)Chonburi Thailand 2014

[18] S Yu-bao W Zhi-hui X Liang Z Zhen-rong and L Zhan-Qiang ldquoMultimorphology Sparsity regularized image super-resolutionrdquo Acta Electronica Sinica vol 38 no 12 pp 2898ndash2903 2010

[19] S Yang M Wang Y Chen and Y Sun ldquoSingle-image super-resolution reconstruction via learned geometric dictionariesand clustered sparse codingrdquo IEEE Transactions on Image Pro-cessing vol 21 no 9 pp 4016ndash4028 2012

[20] Y Tang Y Yuan P Yan and X Li ldquoGreedy regression insparse coding space for single-image super-resolutionrdquo Jour-nal of Visual Communication amp Image Representationvol 24 no 2 pp 148ndash159 2013

[21] L Qiu-sheng and Z Wei ldquoImage super-resolution algorithmsbased on sparse representation of classified image patchesrdquoActa Electronica Sinica vol 40 no 5 pp 920ndash925 2012

[22] H Zhang T Xu H Li et al ldquoStackGAN Text to Photo-Realistic Image Synthesis with Stacked Generative AdversarialNetworksrdquo in 2017 IEEE International Conference on Com-puter Vision (ICCV) Venice Italy 2017

[23] H Zhang T Xu H Li et al ldquoStackGAN++ realistic imagesynthesis with stacked generative adversarial networksrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 41 pp 1947ndash1962 2018

[24] K He X Zhang S Ren and J Sun ldquoDeep Residual Learningfor Image Recognitionrdquo in 2016 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) pp 770ndash778 LasVegas NV USA 2016

[25] J Zhu G Yang P Ferreira et al ldquoA Roi Focused Multi-Scale Super-Resolution Method for the Diffusion TensorCardiac Magnetic Resonancerdquo in Proceedings of the 27thAnnual Meeting (ISMRM) pp 11ndash16 Montreacuteal QC Can-ada 2019

[26] J Zhu G Yang and P Lio ldquoLesion Focused Super Resolu-tionrdquo in Medical Imaging Image Processing 2019 San DiegoCA USA 2019

[27] J Zhu G Yang and P Lio ldquoHow Can We Make GAN Per-form Better in Single Medical Image Super-Resolution ALesion Focused Multi-Scale Approachrdquo in 2019 IEEE 16thInternational Symposium on Biomedical Imaging (ISBI 2019)Venice Italy 2019

[28] M Seitzer G Yang J Schlemper et al ldquoAdversarial and Per-ceptual Refinement for Compressed Sensing MRI Reconstruc-tionrdquo in Medical Image Computing and Computer AssistedIntervention ndash MICCAI 2018 Springer 2018

[29] S Ioffe and C Szegedy ldquoBatch Normalization AcceleratingDeep Network Training by Reducing Internal CovariateShiftrdquo in International conference on machine learningPMLR 2015

11International Journal of Digital Multimedia Broadcasting

[30] A Krizhevsky I Sutskever and G Hinton ldquoImageNet classi-fication with deep convolutional neural networksrdquo Advancesin Neural Information Processing Systems vol 25 no 2 2012

[31] A Radford L Metz and S Chintala ldquoUnsupervised Represen-tation Learning with Deep Convolutional Generative Adver-sarial Networksrdquo 2015 httpsarxivorgabs151106434

[32] A L Maas A Y Hannun and A Y Ng ldquoRectifier nonlinear-ities improve neural network acoustic modelsrdquo in Proceedingsof the 30 th International Conference on Machine Learningp 3 Atlanta Georgia USA 2013

[33] K Simonyan and A Zisserman ldquoVery Deep ConvolutionalNetworks for Large-Scale Image Recognitionrdquo Computer Sci-ence 2014 httpsarxivorgabs14091556

[34] J Johnson A Alahi and L Fei-Fei ldquoPerceptual Losses forReal-Time Style Transfer and Super-Resolutionrdquo in ComputerVision ndash ECCV 2016 Springer 2016

[35] D P Kingma and J Ba ldquoAdam AMethod for Stochastic Opti-mizationrdquo Computer Science 2014 httpsarxivorgabs14126980

[36] M Deshmukh and U Bhosale ldquoImage fusion and image qual-ity assessment of fused imagesrdquo International Journal of ImageProcessing (IJIP) vol 4 no 5 p 484 2010

[37] C Yim and A C Bovik ldquoQuality assessment of deblockedimagesrdquo IEEE Transactions on Image Processing vol 20no 1 pp 88ndash98 2011

[38] Z Wang A C Bovik H R Sheikh and E P SimoncellildquoImage quality assessment from error visibility to structuralsimilarityrdquo IEEE Transactions on Image Processing vol 13no 4 pp 600ndash612 2004

[39] Y Han Y Cai Y Cao and X Xu ldquoA new image fusion perfor-mance metric based on visual information fidelityrdquo Informa-tion fusion vol 14 no 2 pp 127ndash135 2013

[40] J Redmon S Divvala R Girshick and A Farhadi ldquoYou OnlyLook Once Unified Real-Time Object Detectionrdquo in 2016IEEE Conference on Computer Vision and Pattern Recognition(CVPR) Las Vegas NV USA 2015

12 International Journal of Digital Multimedia Broadcasting

  • Generative Adversarial Network-Based Edge-Preserving Superresolution Reconstruction of Infrared Images
  • 1 Introduction
  • 2 Method
    • 21 Stage 1 GAN
    • 22 Stage 2 GAN
      • 3 Experiments
        • 31 Experimental Details
        • 32 Experimental Evaluation
        • 33 Use Advanced Vision Tasks to Compare Superresolution Results
          • 4 Discussion and Future Work
          • 5 Conclusion
          • Data Availability
          • Conflicts of Interest
Page 7: Generative Adversarial Network-Based Edge-Preserving ......2.1. Stage 1 GAN. In Stage 1, we use the 128×128 low-resolution image ILR0 as input and export it to the Stage 1 generator

The second term of Eq (3) Lmse ensures the fidelity ofthe restored image using the pixel-level mean square error(MSE) loss as follows

Lmse = 1WHC IHR0 minus ISR0

22

eth5THORN

whereWH and C are the height the width and the numberof channels respectively of the image

The third term of Eq (3) Ledge the edge fidelity loss ispurported to reproduce sharp edge information as follows

Ledge = 1WH IandE minus IE

22

eth6THORN

whereW and H are the width and the height respectively ofthe image The labeled edge map IE is extracted by a specificedge filter on the real 256 times 256 image IHR0 while IandE isextracted by a specific edge filter on the 256 times 256 imageISR0 generated by the generator G0 In our experiments wechose the Canny edge detection operator By minimizingthe edge fidelity loss the network continuously guides edgerecovery

22 Stage 2 GAN In the second stage we use the generated256 times 256 low-resolution image ISR0 as input and export itto the Stage 2 generator G1 to generate a 512 times 512 imageISR1 which together with a real 512 times 512 image IHR1 isexported to the Stage 2 discriminator D1 to identify if it isfalse or real We adopt the network design described in theliterature [15] in which a generator model G1 (Figure 5)

SRCNN

ESPCN

SRGAN

ESRGAN

ESRGAN+

Ours

Figure 8 Comparison of edge detection results of superresolution reconstruction results

7International Journal of Digital Multimedia Broadcasting

containing 16 residual blocks is constructed The output for-mat and parameters of each layer of the generator G1 are asfollows

C R 64eth THORN⟶ C 64eth THORNBN Reth THORNC 64eth THORNBN Reth THORNSC⟶ 16⋯⟶ C R 64eth THORNSC⟶ C 256eth THORN⟶ C t 3eth THORN

eth7THORN

where C(R64) denotes a set of convolutional layers with64 feature maps and activation function ReLUC(64)BN(R)C(64)BN(R)SC represents a residual block

The network structure of the discriminator networkD1 inthe second stage adopts a network structure similar to that ofthe discriminator D0 Each layer structure and networkparameters of D1 are as follows

C LR 64eth THORN⟶ C 64eth THORNBN LReth THORN⟶ C 128eth THORNBN LReth THORN⟶ C 256eth THORNBN LReth THORN⟶ C 512eth THORNBN LReth THORN⟶ C 1024eth THORNBN LReth THORN⟶ C LR 1024eth THORN⟶D

eth8THORN

L2 = λ1primeLadv1 + λ2primeLmse1 + λ3primeL feature eth9THORNwhere the weight fλiprimeg is a trade-off parameter that is used tobalance multiple loss components The first term Ladv1 isthe adversarial loss between the generator G1 and the dis-criminator D1 of GAN The second termLmse1 is the imagefidelity loss The third term L feature is the feature fidelity

loss which is defined based on the feature space distancedefined in the literature [34] to facilitate the preservationof feature representation in the reconstructed image similarto that of the original image as follows

L feature = 1WHC ϕ IHR0 minus ISR0

22

eth10THORN

whereWH and C are the height the width and the numberof channels respectively of the image and ϕethsdotTHORN represents thefeature space function which is a pretrained VGG-19 [33]network that maps images to feature space The fourth pool-ing layer is used to calculate the feature-activated L2 distanceto be used as the feature fidelity loss function

Stage 2 Reconstruction Loss Function The loss functioncontains three parts adversarial loss image fidelity lossand feature fidelity loss (Eq (9))

3 Experiments

31 Experimental Details All experiments are performed ona desktop computer with 220GHz times40 Intel Xeon (R) Silver4114 CPU GeForce GTX 1080Ti and 64GB of memory Weuse the PyTorch deep learning framework to implement themethod proposed in this article and all comparison methodsAll methods are trained with the FLIR training data set Dur-ing the training process we choose 8862 images from theFLIR_ADAS_1_3 thermal sensor training dataset released

(a) SRCNN super-resolution reconstruction image matching results (b) ESPCN super-resolution reconstruction image matching results

(c) SRGAN super-resolution reconstruction image matching results (d) ESRGAN super-resolution reconstruction image matching results

(e) ESRGAN+ super-resolution reconstruction image matching results (f) Our method super-resolution reconstruction image matching results

Figure 9 Superresolution reconstruction image matching results (the left image is the original high-resolution image)

8 International Journal of Digital Multimedia Broadcasting

by FLIR Systems Inc a sensor system developer in 2018First we perform the 4x factor downsampling on all experi-mental data to obtain low-resolution (LR) images by lower-ing the resolution of the HR images We set the batch sizeto 4 and used the Adam [35] with a momentum term of β= 09 as the optimization procedure To enable the loss func-tion in the same order of magnitude to better balance losscomponents we set λ1λ2prime and λ3prime of Eq (9) to 10-3 1 and10-6 respectively When training the Stage 1 GAN we set

the learning rate to 10-4 which is lowered to 10-5 when train-ing the Stage 2 GAN

32 Experimental Evaluation To verify the effectiveness of theproposed method we conduct the validation on two publicdata sets the FLIR_ ADAS_1_3 (1366 images) validation setand the Itir_v1_0 dataset (11262 images) We compared theproposed method with the most advanced methods includingsuperresolution using deep convolutional networks (SRCNN)

LR

SRCNN

ESPCN

SRGAN

ESRGAN

ESRGAN+

Our method

Figure 10 Target detection result of superresolution reconstructed image

9International Journal of Digital Multimedia Broadcasting

[5] ESPCN [8] SRGAN [10] ESRGAN [15] and ESRGAN+[16] methods Three images are selected from the verificationset of FLIR_ADAS_1_3 and the data set of Itir_v1_0 and thesubjective results of several methods are shown in Figures 6and 7 From the reconstruction results it is not difficult tosee that the reconstruction results of our proposed methodproduce finer texture and edge details

To facilitate a fair quantitative comparison we used thecorrelation coefficient (CC) [36] the peak-signal-to-noiseratio (PSNR) [37] the Structural Similarity Index Measure(SSIM) [38] the visual information fidelity (VIF) [39] theUniversal Image Quality Index (UIQI) and the time con-sumption six objective indicators to evaluate the quality ofthe reconstructed images and the SR methods The quantita-tive results of the comparison of different reconstructionmethods are shown in Table 1 which shows that our methodis superior to the SRCNN [5] ESPCN [8] SRGAN [13] ESR-GAN [15] and ESRGAN+ [16] methods in the indicators ofCC PSNR SSIM and UIQI on the FLIR data set The VIFindex is slightly lower than the ESRGAN+ method and thetime consumption is greater than the ESRGAN method

The quantitative results of comparison of different recon-struction methods on the Itir_v1_0 dataset are shown inTable 2 indicating that the proposed method is superior tothe SRCNN [5] ESPCN [8] SRGAN [13] ESRGAN [15]and ESRGAN+ [16] methods on the Itir_v1_0 dataset Onlythe time consumption is slightly greater than the ESRGANmethod

In order to more intuitively illustrate the effectiveness ofthe superresolution method proposed in this work forimproving the edge features of infrared images we show inFigure 8 the comparison of the edge detection results of thesuperresolution reconstruction results of various methodsIt can be seen from the figure that the image reconstructedby our method has more and clearer edge information whichis meaningful for the application of infrared images

33 Use Advanced Vision Tasks to Compare SuperresolutionResults Basic vision tasks including image superresolutionreconstruction are all for advanced vision tasks Infraredimages are widely used in target detection and target match-ing tasks but the shortcomings of low resolution and unclearedges of infrared images affect the accuracy of the abovetasks Therefore whether the result of infrared image super-resolution reconstruction can improve the accuracy of theabove tasks has become an evaluation index of the result ofsuperresolution reconstruction In order to further verifyour method we match the superresolution images generatedby several methods with real high-resolution images Scale

Invariant Feature Transform (SIFT) is a representation ofthe statistical results of Gaussian image gradients in the fieldof feature points and is a commonly used image local featureextraction algorithm In the matching result the number ofmatching points can be used as a criterion for matching qual-ity and the corresponding matching points can also deter-mine the similarity of the local features of the two imagesFigure 9 shows the result of matching the superresolutionreconstructed image with the original high-resolution imagethrough the SIFT algorithm It can be seen from the quantitythat the reconstructed image produced by our proposedmethod obtains more correct matching pairs than othermethods

In this experiment we use the classic YOLO [40] methodfor image target detection (Figure 10) It can be seen that thesuperresolution reconstructed image generated by our pro-posed method has better detection results and can detectmore targets

4 Discussion and Future Work

Through experiments we demonstrate that compared withother methods the proposed method has a better perceptualperformance However during the experiment we alsofound that for some images the reconstruction results arenot satisfactory as shown in Figure 11 By analyzing theseimages we found that they have common characteristicsie when the imaging device and the imaging object are mov-ing at high relative speeds the captured image may containmotion blur For such images ordinary SR reconstructionmethods cannot achieve effective edge recovery Thereforein future studies we will address these problems

5 Conclusion

In this study we propose a two-stage GAN framework that isable to reconstruct SR image by recovering edge structureinformation and retaining feature information In the firststage the image fidelity loss the adversarial loss and the edgefidelity loss are combined to preserve the edges of the imageIn the second stage the image adversarial loss the imagefidelity loss and the feature fidelity loss are combined to mineimage visual features By iteratively updating the generativenetwork and the discriminator network the SR reconstruc-tion of infrared image with preserved edges is achievedExperimental verification results show that the proposedmethod outperforms several other image reconstructionmethods in reconstructing SR infrared images

Figure 11 Images with motion blur cannot obtain clear edge information using the ordinary SR reconstruction methods

10 International Journal of Digital Multimedia Broadcasting

Data Availability

The data used to support the findings of this study are avail-able from the corresponding author upon request

Conflicts of Interest

The authors declare that there is no conflict of interestregarding the publication of this paper

References

[1] L Xiang-di et al ldquoInfrared imaging system and applicationsrdquoLaser amp Infrared vol 3 2014

[2] X Sui Q Chen G Gu and X Shen ldquoInfrared super-resolution imaging based on compressed sensingrdquo InfraredPhysics amp Technology vol 63 pp 119ndash124 2014

[3] X Sui H Gao Y Sun Q Chen and G Gu ldquoInfrared super-resolution imaging method based on retina micro-motionrdquoInfrared Physics amp Technology vol 60 pp 340ndash345 2013

[4] J L Suo J I Xiangyang and Q H Dai ldquoAn overview of com-putational photographyrdquo Science China(Information Sciences)vol 6 pp 5ndash24 2012

[5] D Chao C L Chen K He and X Tang ldquoLearning a DeepConvolutional Network for Image Super-Resolutionrdquo in Euro-pean Conference on Computer Vision Springer 2014

[6] C Dong C C Loy K He and X Tang ldquoImage super-resolution using deep convolutional networksrdquo IEEE Transac-tions on Pattern Analysis and Machine Intelligence vol 38no 2 pp 295ndash307 2014

[7] J Kim J Lee and K Lee Deeply-Recursive Convolutional Net-work for Image Super-Resolution 2016 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) IEEE pp1637ndash1645 2016

[8] W Shi J Caballero F Huszar et al ldquoReal-Time Single Imageand Video Super-Resolution Using an Efficient Sub-Pixel Con-volutional Neural Networkrdquo in 2016 IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR) Las Vegas NVUSA 2016

[9] I Goodfellow J Pouget-Abadie M Mirza et al ldquoGenerativeAdversarial Networksrdquo Advances in Neural Information Pro-cessing Systems vol 3 pp 2672ndash2680 2014

[10] C Wang S Dong X Zhao G Papanastasiou H Zhang andG Yang ldquoSaliencyGAN deep learning semisupervised salientobject detection in the fog of IoTrdquo IEEE Transactions onIndustrial Informatics vol 16 no 4 pp 2667ndash2676 2020

[11] G Yang S Yu H Dong et al ldquoDAGAN deep de-aliasing gen-erative adversarial networks for fast compressed sensing MRIreconstructionrdquo IEEE Transactions on Medical Imagingvol 37 no 6 pp 1310ndash1321 2018

[12] S Yu D Hao G Yang G Slabaugh and Y Guo ldquoDeep De-Aliasing for Fast Compressive Sensing MRIrdquo 2017 httpsarxivorgabs170507137

[13] C Ledig L Theis F Huszar et al ldquoPhoto-Realistic SingleImage Super-Resolution Using a Generative Adversarial Net-workrdquo in 2017 IEEE Conference on Computer Vision and Pat-tern Recognition (CVPR) Honolulu HI USA 2016

[14] X Wang K Yu S Wu et al ldquoESRGAN Enhanced Super-Resolution Generative Adversarial Networksrdquo in LectureNotes in Computer Science Springer 2018

[15] Q Mao S Wang S Wang X Zhang and S Ma ldquoEnhancedImage Decoding Via Edge-Preserving Generative AdversarialNetworksrdquo in 2018 IEEE International Conference on Multi-media and Expo (ICME) San Diego CA USA 2018

[16] N C Rakotonirina and A Rasoanaivo ldquoESRGAN+ furtherimproving enhanced super-resolution generative adversarialnetworkrdquo in ICASSP 2020 - 2020 IEEE international confer-ence on acoustics Speech and Signal Processing (ICASSP)pp 3637ndash3641 Barcelona Spain 2020

[17] C Kraichan and S Pumrin ldquoPerformance Analysis on Multi-Frame Image Super-Resolution Via Sparse Representationrdquo in2014 International Electrical Engineering Congress (iEECON)Chonburi Thailand 2014

[18] S Yu-bao W Zhi-hui X Liang Z Zhen-rong and L Zhan-Qiang ldquoMultimorphology Sparsity regularized image super-resolutionrdquo Acta Electronica Sinica vol 38 no 12 pp 2898ndash2903 2010

[19] S Yang M Wang Y Chen and Y Sun ldquoSingle-image super-resolution reconstruction via learned geometric dictionariesand clustered sparse codingrdquo IEEE Transactions on Image Pro-cessing vol 21 no 9 pp 4016ndash4028 2012

[20] Y Tang Y Yuan P Yan and X Li ldquoGreedy regression insparse coding space for single-image super-resolutionrdquo Jour-nal of Visual Communication amp Image Representationvol 24 no 2 pp 148ndash159 2013

[21] L Qiu-sheng and Z Wei ldquoImage super-resolution algorithmsbased on sparse representation of classified image patchesrdquoActa Electronica Sinica vol 40 no 5 pp 920ndash925 2012

[22] H Zhang T Xu H Li et al ldquoStackGAN Text to Photo-Realistic Image Synthesis with Stacked Generative AdversarialNetworksrdquo in 2017 IEEE International Conference on Com-puter Vision (ICCV) Venice Italy 2017

[23] H Zhang T Xu H Li et al ldquoStackGAN++ realistic imagesynthesis with stacked generative adversarial networksrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 41 pp 1947ndash1962 2018

[24] K He X Zhang S Ren and J Sun ldquoDeep Residual Learningfor Image Recognitionrdquo in 2016 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) pp 770ndash778 LasVegas NV USA 2016

[25] J Zhu G Yang P Ferreira et al ldquoA Roi Focused Multi-Scale Super-Resolution Method for the Diffusion TensorCardiac Magnetic Resonancerdquo in Proceedings of the 27thAnnual Meeting (ISMRM) pp 11ndash16 Montreacuteal QC Can-ada 2019

[26] J Zhu G Yang and P Lio ldquoLesion Focused Super Resolu-tionrdquo in Medical Imaging Image Processing 2019 San DiegoCA USA 2019

[27] J Zhu G Yang and P Lio ldquoHow Can We Make GAN Per-form Better in Single Medical Image Super-Resolution ALesion Focused Multi-Scale Approachrdquo in 2019 IEEE 16thInternational Symposium on Biomedical Imaging (ISBI 2019)Venice Italy 2019

[28] M Seitzer G Yang J Schlemper et al ldquoAdversarial and Per-ceptual Refinement for Compressed Sensing MRI Reconstruc-tionrdquo in Medical Image Computing and Computer AssistedIntervention ndash MICCAI 2018 Springer 2018

[29] S Ioffe and C Szegedy ldquoBatch Normalization AcceleratingDeep Network Training by Reducing Internal CovariateShiftrdquo in International conference on machine learningPMLR 2015

11International Journal of Digital Multimedia Broadcasting

[30] A Krizhevsky I Sutskever and G Hinton ldquoImageNet classi-fication with deep convolutional neural networksrdquo Advancesin Neural Information Processing Systems vol 25 no 2 2012

[31] A Radford L Metz and S Chintala ldquoUnsupervised Represen-tation Learning with Deep Convolutional Generative Adver-sarial Networksrdquo 2015 httpsarxivorgabs151106434

[32] A L Maas A Y Hannun and A Y Ng ldquoRectifier nonlinear-ities improve neural network acoustic modelsrdquo in Proceedingsof the 30 th International Conference on Machine Learningp 3 Atlanta Georgia USA 2013

[33] K Simonyan and A Zisserman ldquoVery Deep ConvolutionalNetworks for Large-Scale Image Recognitionrdquo Computer Sci-ence 2014 httpsarxivorgabs14091556

[34] J Johnson A Alahi and L Fei-Fei ldquoPerceptual Losses forReal-Time Style Transfer and Super-Resolutionrdquo in ComputerVision ndash ECCV 2016 Springer 2016

[35] D P Kingma and J Ba ldquoAdam AMethod for Stochastic Opti-mizationrdquo Computer Science 2014 httpsarxivorgabs14126980

[36] M Deshmukh and U Bhosale ldquoImage fusion and image qual-ity assessment of fused imagesrdquo International Journal of ImageProcessing (IJIP) vol 4 no 5 p 484 2010

[37] C Yim and A C Bovik ldquoQuality assessment of deblockedimagesrdquo IEEE Transactions on Image Processing vol 20no 1 pp 88ndash98 2011

[38] Z Wang A C Bovik H R Sheikh and E P SimoncellildquoImage quality assessment from error visibility to structuralsimilarityrdquo IEEE Transactions on Image Processing vol 13no 4 pp 600ndash612 2004

[39] Y Han Y Cai Y Cao and X Xu ldquoA new image fusion perfor-mance metric based on visual information fidelityrdquo Informa-tion fusion vol 14 no 2 pp 127ndash135 2013

[40] J Redmon S Divvala R Girshick and A Farhadi ldquoYou OnlyLook Once Unified Real-Time Object Detectionrdquo in 2016IEEE Conference on Computer Vision and Pattern Recognition(CVPR) Las Vegas NV USA 2015

12 International Journal of Digital Multimedia Broadcasting

  • Generative Adversarial Network-Based Edge-Preserving Superresolution Reconstruction of Infrared Images
  • 1 Introduction
  • 2 Method
    • 21 Stage 1 GAN
    • 22 Stage 2 GAN
      • 3 Experiments
        • 31 Experimental Details
        • 32 Experimental Evaluation
        • 33 Use Advanced Vision Tasks to Compare Superresolution Results
          • 4 Discussion and Future Work
          • 5 Conclusion
          • Data Availability
          • Conflicts of Interest
Page 8: Generative Adversarial Network-Based Edge-Preserving ......2.1. Stage 1 GAN. In Stage 1, we use the 128×128 low-resolution image ILR0 as input and export it to the Stage 1 generator

containing 16 residual blocks is constructed The output for-mat and parameters of each layer of the generator G1 are asfollows

C R 64eth THORN⟶ C 64eth THORNBN Reth THORNC 64eth THORNBN Reth THORNSC⟶ 16⋯⟶ C R 64eth THORNSC⟶ C 256eth THORN⟶ C t 3eth THORN

eth7THORN

where C(R64) denotes a set of convolutional layers with64 feature maps and activation function ReLUC(64)BN(R)C(64)BN(R)SC represents a residual block

The network structure of the discriminator networkD1 inthe second stage adopts a network structure similar to that ofthe discriminator D0 Each layer structure and networkparameters of D1 are as follows

C LR 64eth THORN⟶ C 64eth THORNBN LReth THORN⟶ C 128eth THORNBN LReth THORN⟶ C 256eth THORNBN LReth THORN⟶ C 512eth THORNBN LReth THORN⟶ C 1024eth THORNBN LReth THORN⟶ C LR 1024eth THORN⟶D

eth8THORN

L2 = λ1primeLadv1 + λ2primeLmse1 + λ3primeL feature eth9THORNwhere the weight fλiprimeg is a trade-off parameter that is used tobalance multiple loss components The first term Ladv1 isthe adversarial loss between the generator G1 and the dis-criminator D1 of GAN The second termLmse1 is the imagefidelity loss The third term L feature is the feature fidelity

loss which is defined based on the feature space distancedefined in the literature [34] to facilitate the preservationof feature representation in the reconstructed image similarto that of the original image as follows

L feature = 1WHC ϕ IHR0 minus ISR0

22

eth10THORN

whereWH and C are the height the width and the numberof channels respectively of the image and ϕethsdotTHORN represents thefeature space function which is a pretrained VGG-19 [33]network that maps images to feature space The fourth pool-ing layer is used to calculate the feature-activated L2 distanceto be used as the feature fidelity loss function

Stage 2 Reconstruction Loss Function The loss functioncontains three parts adversarial loss image fidelity lossand feature fidelity loss (Eq (9))

3 Experiments

31 Experimental Details All experiments are performed ona desktop computer with 220GHz times40 Intel Xeon (R) Silver4114 CPU GeForce GTX 1080Ti and 64GB of memory Weuse the PyTorch deep learning framework to implement themethod proposed in this article and all comparison methodsAll methods are trained with the FLIR training data set Dur-ing the training process we choose 8862 images from theFLIR_ADAS_1_3 thermal sensor training dataset released

(a) SRCNN super-resolution reconstruction image matching results (b) ESPCN super-resolution reconstruction image matching results

(c) SRGAN super-resolution reconstruction image matching results (d) ESRGAN super-resolution reconstruction image matching results

(e) ESRGAN+ super-resolution reconstruction image matching results (f) Our method super-resolution reconstruction image matching results

Figure 9 Superresolution reconstruction image matching results (the left image is the original high-resolution image)

8 International Journal of Digital Multimedia Broadcasting

by FLIR Systems Inc a sensor system developer in 2018First we perform the 4x factor downsampling on all experi-mental data to obtain low-resolution (LR) images by lower-ing the resolution of the HR images We set the batch sizeto 4 and used the Adam [35] with a momentum term of β= 09 as the optimization procedure To enable the loss func-tion in the same order of magnitude to better balance losscomponents we set λ1λ2prime and λ3prime of Eq (9) to 10-3 1 and10-6 respectively When training the Stage 1 GAN we set

the learning rate to 10-4 which is lowered to 10-5 when train-ing the Stage 2 GAN

32 Experimental Evaluation To verify the effectiveness of theproposed method we conduct the validation on two publicdata sets the FLIR_ ADAS_1_3 (1366 images) validation setand the Itir_v1_0 dataset (11262 images) We compared theproposed method with the most advanced methods includingsuperresolution using deep convolutional networks (SRCNN)

LR

SRCNN

ESPCN

SRGAN

ESRGAN

ESRGAN+

Our method

Figure 10 Target detection result of superresolution reconstructed image

9International Journal of Digital Multimedia Broadcasting

[5] ESPCN [8] SRGAN [10] ESRGAN [15] and ESRGAN+[16] methods Three images are selected from the verificationset of FLIR_ADAS_1_3 and the data set of Itir_v1_0 and thesubjective results of several methods are shown in Figures 6and 7 From the reconstruction results it is not difficult tosee that the reconstruction results of our proposed methodproduce finer texture and edge details

To facilitate a fair quantitative comparison we used thecorrelation coefficient (CC) [36] the peak-signal-to-noiseratio (PSNR) [37] the Structural Similarity Index Measure(SSIM) [38] the visual information fidelity (VIF) [39] theUniversal Image Quality Index (UIQI) and the time con-sumption six objective indicators to evaluate the quality ofthe reconstructed images and the SR methods The quantita-tive results of the comparison of different reconstructionmethods are shown in Table 1 which shows that our methodis superior to the SRCNN [5] ESPCN [8] SRGAN [13] ESR-GAN [15] and ESRGAN+ [16] methods in the indicators ofCC PSNR SSIM and UIQI on the FLIR data set The VIFindex is slightly lower than the ESRGAN+ method and thetime consumption is greater than the ESRGAN method

The quantitative results of comparison of different recon-struction methods on the Itir_v1_0 dataset are shown inTable 2 indicating that the proposed method is superior tothe SRCNN [5] ESPCN [8] SRGAN [13] ESRGAN [15]and ESRGAN+ [16] methods on the Itir_v1_0 dataset Onlythe time consumption is slightly greater than the ESRGANmethod

In order to more intuitively illustrate the effectiveness ofthe superresolution method proposed in this work forimproving the edge features of infrared images we show inFigure 8 the comparison of the edge detection results of thesuperresolution reconstruction results of various methodsIt can be seen from the figure that the image reconstructedby our method has more and clearer edge information whichis meaningful for the application of infrared images

33 Use Advanced Vision Tasks to Compare SuperresolutionResults Basic vision tasks including image superresolutionreconstruction are all for advanced vision tasks Infraredimages are widely used in target detection and target match-ing tasks but the shortcomings of low resolution and unclearedges of infrared images affect the accuracy of the abovetasks Therefore whether the result of infrared image super-resolution reconstruction can improve the accuracy of theabove tasks has become an evaluation index of the result ofsuperresolution reconstruction In order to further verifyour method we match the superresolution images generatedby several methods with real high-resolution images Scale

Invariant Feature Transform (SIFT) is a representation ofthe statistical results of Gaussian image gradients in the fieldof feature points and is a commonly used image local featureextraction algorithm In the matching result the number ofmatching points can be used as a criterion for matching qual-ity and the corresponding matching points can also deter-mine the similarity of the local features of the two imagesFigure 9 shows the result of matching the superresolutionreconstructed image with the original high-resolution imagethrough the SIFT algorithm It can be seen from the quantitythat the reconstructed image produced by our proposedmethod obtains more correct matching pairs than othermethods

In this experiment we use the classic YOLO [40] methodfor image target detection (Figure 10) It can be seen that thesuperresolution reconstructed image generated by our pro-posed method has better detection results and can detectmore targets

4 Discussion and Future Work

Through experiments we demonstrate that compared withother methods the proposed method has a better perceptualperformance However during the experiment we alsofound that for some images the reconstruction results arenot satisfactory as shown in Figure 11 By analyzing theseimages we found that they have common characteristicsie when the imaging device and the imaging object are mov-ing at high relative speeds the captured image may containmotion blur For such images ordinary SR reconstructionmethods cannot achieve effective edge recovery Thereforein future studies we will address these problems

5 Conclusion

In this study we propose a two-stage GAN framework that isable to reconstruct SR image by recovering edge structureinformation and retaining feature information In the firststage the image fidelity loss the adversarial loss and the edgefidelity loss are combined to preserve the edges of the imageIn the second stage the image adversarial loss the imagefidelity loss and the feature fidelity loss are combined to mineimage visual features By iteratively updating the generativenetwork and the discriminator network the SR reconstruc-tion of infrared image with preserved edges is achievedExperimental verification results show that the proposedmethod outperforms several other image reconstructionmethods in reconstructing SR infrared images

Figure 11 Images with motion blur cannot obtain clear edge information using the ordinary SR reconstruction methods

10 International Journal of Digital Multimedia Broadcasting

Data Availability

The data used to support the findings of this study are avail-able from the corresponding author upon request

Conflicts of Interest

The authors declare that there is no conflict of interestregarding the publication of this paper

References

[1] L Xiang-di et al ldquoInfrared imaging system and applicationsrdquoLaser amp Infrared vol 3 2014

[2] X Sui Q Chen G Gu and X Shen ldquoInfrared super-resolution imaging based on compressed sensingrdquo InfraredPhysics amp Technology vol 63 pp 119ndash124 2014

[3] X Sui H Gao Y Sun Q Chen and G Gu ldquoInfrared super-resolution imaging method based on retina micro-motionrdquoInfrared Physics amp Technology vol 60 pp 340ndash345 2013

[4] J L Suo J I Xiangyang and Q H Dai ldquoAn overview of com-putational photographyrdquo Science China(Information Sciences)vol 6 pp 5ndash24 2012

[5] D Chao C L Chen K He and X Tang ldquoLearning a DeepConvolutional Network for Image Super-Resolutionrdquo in Euro-pean Conference on Computer Vision Springer 2014

[6] C Dong C C Loy K He and X Tang ldquoImage super-resolution using deep convolutional networksrdquo IEEE Transac-tions on Pattern Analysis and Machine Intelligence vol 38no 2 pp 295ndash307 2014

[7] J Kim J Lee and K Lee Deeply-Recursive Convolutional Net-work for Image Super-Resolution 2016 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) IEEE pp1637ndash1645 2016

[8] W Shi J Caballero F Huszar et al ldquoReal-Time Single Imageand Video Super-Resolution Using an Efficient Sub-Pixel Con-volutional Neural Networkrdquo in 2016 IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR) Las Vegas NVUSA 2016

[9] I Goodfellow J Pouget-Abadie M Mirza et al ldquoGenerativeAdversarial Networksrdquo Advances in Neural Information Pro-cessing Systems vol 3 pp 2672ndash2680 2014

[10] C Wang S Dong X Zhao G Papanastasiou H Zhang andG Yang ldquoSaliencyGAN deep learning semisupervised salientobject detection in the fog of IoTrdquo IEEE Transactions onIndustrial Informatics vol 16 no 4 pp 2667ndash2676 2020

[11] G Yang S Yu H Dong et al ldquoDAGAN deep de-aliasing gen-erative adversarial networks for fast compressed sensing MRIreconstructionrdquo IEEE Transactions on Medical Imagingvol 37 no 6 pp 1310ndash1321 2018

[12] S Yu D Hao G Yang G Slabaugh and Y Guo ldquoDeep De-Aliasing for Fast Compressive Sensing MRIrdquo 2017 httpsarxivorgabs170507137

[13] C Ledig L Theis F Huszar et al ldquoPhoto-Realistic SingleImage Super-Resolution Using a Generative Adversarial Net-workrdquo in 2017 IEEE Conference on Computer Vision and Pat-tern Recognition (CVPR) Honolulu HI USA 2016

[14] X Wang K Yu S Wu et al ldquoESRGAN Enhanced Super-Resolution Generative Adversarial Networksrdquo in LectureNotes in Computer Science Springer 2018

[15] Q Mao S Wang S Wang X Zhang and S Ma ldquoEnhancedImage Decoding Via Edge-Preserving Generative AdversarialNetworksrdquo in 2018 IEEE International Conference on Multi-media and Expo (ICME) San Diego CA USA 2018

[16] N C Rakotonirina and A Rasoanaivo ldquoESRGAN+ furtherimproving enhanced super-resolution generative adversarialnetworkrdquo in ICASSP 2020 - 2020 IEEE international confer-ence on acoustics Speech and Signal Processing (ICASSP)pp 3637ndash3641 Barcelona Spain 2020

[17] C Kraichan and S Pumrin ldquoPerformance Analysis on Multi-Frame Image Super-Resolution Via Sparse Representationrdquo in2014 International Electrical Engineering Congress (iEECON)Chonburi Thailand 2014

[18] S Yu-bao W Zhi-hui X Liang Z Zhen-rong and L Zhan-Qiang ldquoMultimorphology Sparsity regularized image super-resolutionrdquo Acta Electronica Sinica vol 38 no 12 pp 2898ndash2903 2010

[19] S Yang M Wang Y Chen and Y Sun ldquoSingle-image super-resolution reconstruction via learned geometric dictionariesand clustered sparse codingrdquo IEEE Transactions on Image Pro-cessing vol 21 no 9 pp 4016ndash4028 2012

[20] Y Tang Y Yuan P Yan and X Li ldquoGreedy regression insparse coding space for single-image super-resolutionrdquo Jour-nal of Visual Communication amp Image Representationvol 24 no 2 pp 148ndash159 2013

[21] L Qiu-sheng and Z Wei ldquoImage super-resolution algorithmsbased on sparse representation of classified image patchesrdquoActa Electronica Sinica vol 40 no 5 pp 920ndash925 2012

[22] H Zhang T Xu H Li et al ldquoStackGAN Text to Photo-Realistic Image Synthesis with Stacked Generative AdversarialNetworksrdquo in 2017 IEEE International Conference on Com-puter Vision (ICCV) Venice Italy 2017

[23] H Zhang T Xu H Li et al ldquoStackGAN++ realistic imagesynthesis with stacked generative adversarial networksrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 41 pp 1947ndash1962 2018

[24] K He X Zhang S Ren and J Sun ldquoDeep Residual Learningfor Image Recognitionrdquo in 2016 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) pp 770ndash778 LasVegas NV USA 2016

[25] J Zhu G Yang P Ferreira et al ldquoA Roi Focused Multi-Scale Super-Resolution Method for the Diffusion TensorCardiac Magnetic Resonancerdquo in Proceedings of the 27thAnnual Meeting (ISMRM) pp 11ndash16 Montreacuteal QC Can-ada 2019

[26] J Zhu G Yang and P Lio ldquoLesion Focused Super Resolu-tionrdquo in Medical Imaging Image Processing 2019 San DiegoCA USA 2019

[27] J Zhu G Yang and P Lio ldquoHow Can We Make GAN Per-form Better in Single Medical Image Super-Resolution ALesion Focused Multi-Scale Approachrdquo in 2019 IEEE 16thInternational Symposium on Biomedical Imaging (ISBI 2019)Venice Italy 2019

[28] M Seitzer G Yang J Schlemper et al ldquoAdversarial and Per-ceptual Refinement for Compressed Sensing MRI Reconstruc-tionrdquo in Medical Image Computing and Computer AssistedIntervention ndash MICCAI 2018 Springer 2018

[29] S Ioffe and C Szegedy ldquoBatch Normalization AcceleratingDeep Network Training by Reducing Internal CovariateShiftrdquo in International conference on machine learningPMLR 2015

11International Journal of Digital Multimedia Broadcasting

[30] A Krizhevsky I Sutskever and G Hinton ldquoImageNet classi-fication with deep convolutional neural networksrdquo Advancesin Neural Information Processing Systems vol 25 no 2 2012

[31] A Radford L Metz and S Chintala ldquoUnsupervised Represen-tation Learning with Deep Convolutional Generative Adver-sarial Networksrdquo 2015 httpsarxivorgabs151106434

[32] A L Maas A Y Hannun and A Y Ng ldquoRectifier nonlinear-ities improve neural network acoustic modelsrdquo in Proceedingsof the 30 th International Conference on Machine Learningp 3 Atlanta Georgia USA 2013

[33] K Simonyan and A Zisserman ldquoVery Deep ConvolutionalNetworks for Large-Scale Image Recognitionrdquo Computer Sci-ence 2014 httpsarxivorgabs14091556

[34] J Johnson A Alahi and L Fei-Fei ldquoPerceptual Losses forReal-Time Style Transfer and Super-Resolutionrdquo in ComputerVision ndash ECCV 2016 Springer 2016

[35] D P Kingma and J Ba ldquoAdam AMethod for Stochastic Opti-mizationrdquo Computer Science 2014 httpsarxivorgabs14126980

[36] M Deshmukh and U Bhosale ldquoImage fusion and image qual-ity assessment of fused imagesrdquo International Journal of ImageProcessing (IJIP) vol 4 no 5 p 484 2010

[37] C Yim and A C Bovik ldquoQuality assessment of deblockedimagesrdquo IEEE Transactions on Image Processing vol 20no 1 pp 88ndash98 2011

[38] Z Wang A C Bovik H R Sheikh and E P SimoncellildquoImage quality assessment from error visibility to structuralsimilarityrdquo IEEE Transactions on Image Processing vol 13no 4 pp 600ndash612 2004

[39] Y Han Y Cai Y Cao and X Xu ldquoA new image fusion perfor-mance metric based on visual information fidelityrdquo Informa-tion fusion vol 14 no 2 pp 127ndash135 2013

[40] J Redmon S Divvala R Girshick and A Farhadi ldquoYou OnlyLook Once Unified Real-Time Object Detectionrdquo in 2016IEEE Conference on Computer Vision and Pattern Recognition(CVPR) Las Vegas NV USA 2015

12 International Journal of Digital Multimedia Broadcasting

  • Generative Adversarial Network-Based Edge-Preserving Superresolution Reconstruction of Infrared Images
  • 1 Introduction
  • 2 Method
    • 21 Stage 1 GAN
    • 22 Stage 2 GAN
      • 3 Experiments
        • 31 Experimental Details
        • 32 Experimental Evaluation
        • 33 Use Advanced Vision Tasks to Compare Superresolution Results
          • 4 Discussion and Future Work
          • 5 Conclusion
          • Data Availability
          • Conflicts of Interest
Page 9: Generative Adversarial Network-Based Edge-Preserving ......2.1. Stage 1 GAN. In Stage 1, we use the 128×128 low-resolution image ILR0 as input and export it to the Stage 1 generator

by FLIR Systems Inc a sensor system developer in 2018First we perform the 4x factor downsampling on all experi-mental data to obtain low-resolution (LR) images by lower-ing the resolution of the HR images We set the batch sizeto 4 and used the Adam [35] with a momentum term of β= 09 as the optimization procedure To enable the loss func-tion in the same order of magnitude to better balance losscomponents we set λ1λ2prime and λ3prime of Eq (9) to 10-3 1 and10-6 respectively When training the Stage 1 GAN we set

the learning rate to 10-4 which is lowered to 10-5 when train-ing the Stage 2 GAN

32 Experimental Evaluation To verify the effectiveness of theproposed method we conduct the validation on two publicdata sets the FLIR_ ADAS_1_3 (1366 images) validation setand the Itir_v1_0 dataset (11262 images) We compared theproposed method with the most advanced methods includingsuperresolution using deep convolutional networks (SRCNN)

LR

SRCNN

ESPCN

SRGAN

ESRGAN

ESRGAN+

Our method

Figure 10 Target detection result of superresolution reconstructed image

9International Journal of Digital Multimedia Broadcasting

[5] ESPCN [8] SRGAN [10] ESRGAN [15] and ESRGAN+[16] methods Three images are selected from the verificationset of FLIR_ADAS_1_3 and the data set of Itir_v1_0 and thesubjective results of several methods are shown in Figures 6and 7 From the reconstruction results it is not difficult tosee that the reconstruction results of our proposed methodproduce finer texture and edge details

To facilitate a fair quantitative comparison we used thecorrelation coefficient (CC) [36] the peak-signal-to-noiseratio (PSNR) [37] the Structural Similarity Index Measure(SSIM) [38] the visual information fidelity (VIF) [39] theUniversal Image Quality Index (UIQI) and the time con-sumption six objective indicators to evaluate the quality ofthe reconstructed images and the SR methods The quantita-tive results of the comparison of different reconstructionmethods are shown in Table 1 which shows that our methodis superior to the SRCNN [5] ESPCN [8] SRGAN [13] ESR-GAN [15] and ESRGAN+ [16] methods in the indicators ofCC PSNR SSIM and UIQI on the FLIR data set The VIFindex is slightly lower than the ESRGAN+ method and thetime consumption is greater than the ESRGAN method

The quantitative results of comparison of different recon-struction methods on the Itir_v1_0 dataset are shown inTable 2 indicating that the proposed method is superior tothe SRCNN [5] ESPCN [8] SRGAN [13] ESRGAN [15]and ESRGAN+ [16] methods on the Itir_v1_0 dataset Onlythe time consumption is slightly greater than the ESRGANmethod

In order to more intuitively illustrate the effectiveness ofthe superresolution method proposed in this work forimproving the edge features of infrared images we show inFigure 8 the comparison of the edge detection results of thesuperresolution reconstruction results of various methodsIt can be seen from the figure that the image reconstructedby our method has more and clearer edge information whichis meaningful for the application of infrared images

33 Use Advanced Vision Tasks to Compare SuperresolutionResults Basic vision tasks including image superresolutionreconstruction are all for advanced vision tasks Infraredimages are widely used in target detection and target match-ing tasks but the shortcomings of low resolution and unclearedges of infrared images affect the accuracy of the abovetasks Therefore whether the result of infrared image super-resolution reconstruction can improve the accuracy of theabove tasks has become an evaluation index of the result ofsuperresolution reconstruction In order to further verifyour method we match the superresolution images generatedby several methods with real high-resolution images Scale

Invariant Feature Transform (SIFT) is a representation ofthe statistical results of Gaussian image gradients in the fieldof feature points and is a commonly used image local featureextraction algorithm In the matching result the number ofmatching points can be used as a criterion for matching qual-ity and the corresponding matching points can also deter-mine the similarity of the local features of the two imagesFigure 9 shows the result of matching the superresolutionreconstructed image with the original high-resolution imagethrough the SIFT algorithm It can be seen from the quantitythat the reconstructed image produced by our proposedmethod obtains more correct matching pairs than othermethods

In this experiment we use the classic YOLO [40] methodfor image target detection (Figure 10) It can be seen that thesuperresolution reconstructed image generated by our pro-posed method has better detection results and can detectmore targets

4 Discussion and Future Work

Through experiments we demonstrate that compared withother methods the proposed method has a better perceptualperformance However during the experiment we alsofound that for some images the reconstruction results arenot satisfactory as shown in Figure 11 By analyzing theseimages we found that they have common characteristicsie when the imaging device and the imaging object are mov-ing at high relative speeds the captured image may containmotion blur For such images ordinary SR reconstructionmethods cannot achieve effective edge recovery Thereforein future studies we will address these problems

5 Conclusion

In this study we propose a two-stage GAN framework that isable to reconstruct SR image by recovering edge structureinformation and retaining feature information In the firststage the image fidelity loss the adversarial loss and the edgefidelity loss are combined to preserve the edges of the imageIn the second stage the image adversarial loss the imagefidelity loss and the feature fidelity loss are combined to mineimage visual features By iteratively updating the generativenetwork and the discriminator network the SR reconstruc-tion of infrared image with preserved edges is achievedExperimental verification results show that the proposedmethod outperforms several other image reconstructionmethods in reconstructing SR infrared images

Figure 11 Images with motion blur cannot obtain clear edge information using the ordinary SR reconstruction methods

10 International Journal of Digital Multimedia Broadcasting

Data Availability

The data used to support the findings of this study are avail-able from the corresponding author upon request

Conflicts of Interest

The authors declare that there is no conflict of interestregarding the publication of this paper

References

[1] L Xiang-di et al ldquoInfrared imaging system and applicationsrdquoLaser amp Infrared vol 3 2014

[2] X Sui Q Chen G Gu and X Shen ldquoInfrared super-resolution imaging based on compressed sensingrdquo InfraredPhysics amp Technology vol 63 pp 119ndash124 2014

[3] X Sui H Gao Y Sun Q Chen and G Gu ldquoInfrared super-resolution imaging method based on retina micro-motionrdquoInfrared Physics amp Technology vol 60 pp 340ndash345 2013

[4] J L Suo J I Xiangyang and Q H Dai ldquoAn overview of com-putational photographyrdquo Science China(Information Sciences)vol 6 pp 5ndash24 2012

[5] D Chao C L Chen K He and X Tang ldquoLearning a DeepConvolutional Network for Image Super-Resolutionrdquo in Euro-pean Conference on Computer Vision Springer 2014

[6] C Dong C C Loy K He and X Tang ldquoImage super-resolution using deep convolutional networksrdquo IEEE Transac-tions on Pattern Analysis and Machine Intelligence vol 38no 2 pp 295ndash307 2014

[7] J Kim J Lee and K Lee Deeply-Recursive Convolutional Net-work for Image Super-Resolution 2016 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) IEEE pp1637ndash1645 2016

[8] W Shi J Caballero F Huszar et al ldquoReal-Time Single Imageand Video Super-Resolution Using an Efficient Sub-Pixel Con-volutional Neural Networkrdquo in 2016 IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR) Las Vegas NVUSA 2016

[9] I Goodfellow J Pouget-Abadie M Mirza et al ldquoGenerativeAdversarial Networksrdquo Advances in Neural Information Pro-cessing Systems vol 3 pp 2672ndash2680 2014

[10] C Wang S Dong X Zhao G Papanastasiou H Zhang andG Yang ldquoSaliencyGAN deep learning semisupervised salientobject detection in the fog of IoTrdquo IEEE Transactions onIndustrial Informatics vol 16 no 4 pp 2667ndash2676 2020

[11] G Yang S Yu H Dong et al ldquoDAGAN deep de-aliasing gen-erative adversarial networks for fast compressed sensing MRIreconstructionrdquo IEEE Transactions on Medical Imagingvol 37 no 6 pp 1310ndash1321 2018

[12] S Yu D Hao G Yang G Slabaugh and Y Guo ldquoDeep De-Aliasing for Fast Compressive Sensing MRIrdquo 2017 httpsarxivorgabs170507137

[13] C Ledig L Theis F Huszar et al ldquoPhoto-Realistic SingleImage Super-Resolution Using a Generative Adversarial Net-workrdquo in 2017 IEEE Conference on Computer Vision and Pat-tern Recognition (CVPR) Honolulu HI USA 2016

[14] X Wang K Yu S Wu et al ldquoESRGAN Enhanced Super-Resolution Generative Adversarial Networksrdquo in LectureNotes in Computer Science Springer 2018

[15] Q Mao S Wang S Wang X Zhang and S Ma ldquoEnhancedImage Decoding Via Edge-Preserving Generative AdversarialNetworksrdquo in 2018 IEEE International Conference on Multi-media and Expo (ICME) San Diego CA USA 2018

[16] N C Rakotonirina and A Rasoanaivo ldquoESRGAN+ furtherimproving enhanced super-resolution generative adversarialnetworkrdquo in ICASSP 2020 - 2020 IEEE international confer-ence on acoustics Speech and Signal Processing (ICASSP)pp 3637ndash3641 Barcelona Spain 2020

[17] C Kraichan and S Pumrin ldquoPerformance Analysis on Multi-Frame Image Super-Resolution Via Sparse Representationrdquo in2014 International Electrical Engineering Congress (iEECON)Chonburi Thailand 2014

[18] S Yu-bao W Zhi-hui X Liang Z Zhen-rong and L Zhan-Qiang ldquoMultimorphology Sparsity regularized image super-resolutionrdquo Acta Electronica Sinica vol 38 no 12 pp 2898ndash2903 2010

[19] S Yang M Wang Y Chen and Y Sun ldquoSingle-image super-resolution reconstruction via learned geometric dictionariesand clustered sparse codingrdquo IEEE Transactions on Image Pro-cessing vol 21 no 9 pp 4016ndash4028 2012

[20] Y Tang Y Yuan P Yan and X Li ldquoGreedy regression insparse coding space for single-image super-resolutionrdquo Jour-nal of Visual Communication amp Image Representationvol 24 no 2 pp 148ndash159 2013

[21] L Qiu-sheng and Z Wei ldquoImage super-resolution algorithmsbased on sparse representation of classified image patchesrdquoActa Electronica Sinica vol 40 no 5 pp 920ndash925 2012

[22] H Zhang T Xu H Li et al ldquoStackGAN Text to Photo-Realistic Image Synthesis with Stacked Generative AdversarialNetworksrdquo in 2017 IEEE International Conference on Com-puter Vision (ICCV) Venice Italy 2017

[23] H Zhang T Xu H Li et al ldquoStackGAN++ realistic imagesynthesis with stacked generative adversarial networksrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 41 pp 1947ndash1962 2018

[24] K He X Zhang S Ren and J Sun ldquoDeep Residual Learningfor Image Recognitionrdquo in 2016 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) pp 770ndash778 LasVegas NV USA 2016

[25] J Zhu G Yang P Ferreira et al ldquoA Roi Focused Multi-Scale Super-Resolution Method for the Diffusion TensorCardiac Magnetic Resonancerdquo in Proceedings of the 27thAnnual Meeting (ISMRM) pp 11ndash16 Montreacuteal QC Can-ada 2019

[26] J Zhu G Yang and P Lio ldquoLesion Focused Super Resolu-tionrdquo in Medical Imaging Image Processing 2019 San DiegoCA USA 2019

[27] J Zhu G Yang and P Lio ldquoHow Can We Make GAN Per-form Better in Single Medical Image Super-Resolution ALesion Focused Multi-Scale Approachrdquo in 2019 IEEE 16thInternational Symposium on Biomedical Imaging (ISBI 2019)Venice Italy 2019

[28] M Seitzer G Yang J Schlemper et al ldquoAdversarial and Per-ceptual Refinement for Compressed Sensing MRI Reconstruc-tionrdquo in Medical Image Computing and Computer AssistedIntervention ndash MICCAI 2018 Springer 2018

[29] S Ioffe and C Szegedy ldquoBatch Normalization AcceleratingDeep Network Training by Reducing Internal CovariateShiftrdquo in International conference on machine learningPMLR 2015

11International Journal of Digital Multimedia Broadcasting

[30] A Krizhevsky I Sutskever and G Hinton ldquoImageNet classi-fication with deep convolutional neural networksrdquo Advancesin Neural Information Processing Systems vol 25 no 2 2012

[31] A Radford L Metz and S Chintala ldquoUnsupervised Represen-tation Learning with Deep Convolutional Generative Adver-sarial Networksrdquo 2015 httpsarxivorgabs151106434

[32] A L Maas A Y Hannun and A Y Ng ldquoRectifier nonlinear-ities improve neural network acoustic modelsrdquo in Proceedingsof the 30 th International Conference on Machine Learningp 3 Atlanta Georgia USA 2013

[33] K Simonyan and A Zisserman ldquoVery Deep ConvolutionalNetworks for Large-Scale Image Recognitionrdquo Computer Sci-ence 2014 httpsarxivorgabs14091556

[34] J Johnson A Alahi and L Fei-Fei ldquoPerceptual Losses forReal-Time Style Transfer and Super-Resolutionrdquo in ComputerVision ndash ECCV 2016 Springer 2016

[35] D P Kingma and J Ba ldquoAdam AMethod for Stochastic Opti-mizationrdquo Computer Science 2014 httpsarxivorgabs14126980

[36] M Deshmukh and U Bhosale ldquoImage fusion and image qual-ity assessment of fused imagesrdquo International Journal of ImageProcessing (IJIP) vol 4 no 5 p 484 2010

[37] C Yim and A C Bovik ldquoQuality assessment of deblockedimagesrdquo IEEE Transactions on Image Processing vol 20no 1 pp 88ndash98 2011

[38] Z Wang A C Bovik H R Sheikh and E P SimoncellildquoImage quality assessment from error visibility to structuralsimilarityrdquo IEEE Transactions on Image Processing vol 13no 4 pp 600ndash612 2004

[39] Y Han Y Cai Y Cao and X Xu ldquoA new image fusion perfor-mance metric based on visual information fidelityrdquo Informa-tion fusion vol 14 no 2 pp 127ndash135 2013

[40] J Redmon S Divvala R Girshick and A Farhadi ldquoYou OnlyLook Once Unified Real-Time Object Detectionrdquo in 2016IEEE Conference on Computer Vision and Pattern Recognition(CVPR) Las Vegas NV USA 2015

12 International Journal of Digital Multimedia Broadcasting

  • Generative Adversarial Network-Based Edge-Preserving Superresolution Reconstruction of Infrared Images
  • 1 Introduction
  • 2 Method
    • 21 Stage 1 GAN
    • 22 Stage 2 GAN
      • 3 Experiments
        • 31 Experimental Details
        • 32 Experimental Evaluation
        • 33 Use Advanced Vision Tasks to Compare Superresolution Results
          • 4 Discussion and Future Work
          • 5 Conclusion
          • Data Availability
          • Conflicts of Interest
Page 10: Generative Adversarial Network-Based Edge-Preserving ......2.1. Stage 1 GAN. In Stage 1, we use the 128×128 low-resolution image ILR0 as input and export it to the Stage 1 generator

[5] ESPCN [8] SRGAN [10] ESRGAN [15] and ESRGAN+[16] methods Three images are selected from the verificationset of FLIR_ADAS_1_3 and the data set of Itir_v1_0 and thesubjective results of several methods are shown in Figures 6and 7 From the reconstruction results it is not difficult tosee that the reconstruction results of our proposed methodproduce finer texture and edge details

To facilitate a fair quantitative comparison we used thecorrelation coefficient (CC) [36] the peak-signal-to-noiseratio (PSNR) [37] the Structural Similarity Index Measure(SSIM) [38] the visual information fidelity (VIF) [39] theUniversal Image Quality Index (UIQI) and the time con-sumption six objective indicators to evaluate the quality ofthe reconstructed images and the SR methods The quantita-tive results of the comparison of different reconstructionmethods are shown in Table 1 which shows that our methodis superior to the SRCNN [5] ESPCN [8] SRGAN [13] ESR-GAN [15] and ESRGAN+ [16] methods in the indicators ofCC PSNR SSIM and UIQI on the FLIR data set The VIFindex is slightly lower than the ESRGAN+ method and thetime consumption is greater than the ESRGAN method

The quantitative results of comparison of different recon-struction methods on the Itir_v1_0 dataset are shown inTable 2 indicating that the proposed method is superior tothe SRCNN [5] ESPCN [8] SRGAN [13] ESRGAN [15]and ESRGAN+ [16] methods on the Itir_v1_0 dataset Onlythe time consumption is slightly greater than the ESRGANmethod

In order to more intuitively illustrate the effectiveness ofthe superresolution method proposed in this work forimproving the edge features of infrared images we show inFigure 8 the comparison of the edge detection results of thesuperresolution reconstruction results of various methodsIt can be seen from the figure that the image reconstructedby our method has more and clearer edge information whichis meaningful for the application of infrared images

33 Use Advanced Vision Tasks to Compare SuperresolutionResults Basic vision tasks including image superresolutionreconstruction are all for advanced vision tasks Infraredimages are widely used in target detection and target match-ing tasks but the shortcomings of low resolution and unclearedges of infrared images affect the accuracy of the abovetasks Therefore whether the result of infrared image super-resolution reconstruction can improve the accuracy of theabove tasks has become an evaluation index of the result ofsuperresolution reconstruction In order to further verifyour method we match the superresolution images generatedby several methods with real high-resolution images Scale

Invariant Feature Transform (SIFT) is a representation ofthe statistical results of Gaussian image gradients in the fieldof feature points and is a commonly used image local featureextraction algorithm In the matching result the number ofmatching points can be used as a criterion for matching qual-ity and the corresponding matching points can also deter-mine the similarity of the local features of the two imagesFigure 9 shows the result of matching the superresolutionreconstructed image with the original high-resolution imagethrough the SIFT algorithm It can be seen from the quantitythat the reconstructed image produced by our proposedmethod obtains more correct matching pairs than othermethods

In this experiment we use the classic YOLO [40] methodfor image target detection (Figure 10) It can be seen that thesuperresolution reconstructed image generated by our pro-posed method has better detection results and can detectmore targets

4 Discussion and Future Work

Through experiments we demonstrate that compared withother methods the proposed method has a better perceptualperformance However during the experiment we alsofound that for some images the reconstruction results arenot satisfactory as shown in Figure 11 By analyzing theseimages we found that they have common characteristicsie when the imaging device and the imaging object are mov-ing at high relative speeds the captured image may containmotion blur For such images ordinary SR reconstructionmethods cannot achieve effective edge recovery Thereforein future studies we will address these problems

5 Conclusion

In this study we propose a two-stage GAN framework that isable to reconstruct SR image by recovering edge structureinformation and retaining feature information In the firststage the image fidelity loss the adversarial loss and the edgefidelity loss are combined to preserve the edges of the imageIn the second stage the image adversarial loss the imagefidelity loss and the feature fidelity loss are combined to mineimage visual features By iteratively updating the generativenetwork and the discriminator network the SR reconstruc-tion of infrared image with preserved edges is achievedExperimental verification results show that the proposedmethod outperforms several other image reconstructionmethods in reconstructing SR infrared images

Figure 11 Images with motion blur cannot obtain clear edge information using the ordinary SR reconstruction methods

10 International Journal of Digital Multimedia Broadcasting

Data Availability

The data used to support the findings of this study are avail-able from the corresponding author upon request

Conflicts of Interest

The authors declare that there is no conflict of interestregarding the publication of this paper

References

[1] L Xiang-di et al ldquoInfrared imaging system and applicationsrdquoLaser amp Infrared vol 3 2014

[2] X Sui Q Chen G Gu and X Shen ldquoInfrared super-resolution imaging based on compressed sensingrdquo InfraredPhysics amp Technology vol 63 pp 119ndash124 2014

[3] X Sui H Gao Y Sun Q Chen and G Gu ldquoInfrared super-resolution imaging method based on retina micro-motionrdquoInfrared Physics amp Technology vol 60 pp 340ndash345 2013

[4] J L Suo J I Xiangyang and Q H Dai ldquoAn overview of com-putational photographyrdquo Science China(Information Sciences)vol 6 pp 5ndash24 2012

[5] D Chao C L Chen K He and X Tang ldquoLearning a DeepConvolutional Network for Image Super-Resolutionrdquo in Euro-pean Conference on Computer Vision Springer 2014

[6] C Dong C C Loy K He and X Tang ldquoImage super-resolution using deep convolutional networksrdquo IEEE Transac-tions on Pattern Analysis and Machine Intelligence vol 38no 2 pp 295ndash307 2014

[7] J Kim J Lee and K Lee Deeply-Recursive Convolutional Net-work for Image Super-Resolution 2016 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) IEEE pp1637ndash1645 2016

[8] W Shi J Caballero F Huszar et al ldquoReal-Time Single Imageand Video Super-Resolution Using an Efficient Sub-Pixel Con-volutional Neural Networkrdquo in 2016 IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR) Las Vegas NVUSA 2016

[9] I Goodfellow J Pouget-Abadie M Mirza et al ldquoGenerativeAdversarial Networksrdquo Advances in Neural Information Pro-cessing Systems vol 3 pp 2672ndash2680 2014

[10] C Wang S Dong X Zhao G Papanastasiou H Zhang andG Yang ldquoSaliencyGAN deep learning semisupervised salientobject detection in the fog of IoTrdquo IEEE Transactions onIndustrial Informatics vol 16 no 4 pp 2667ndash2676 2020

[11] G Yang S Yu H Dong et al ldquoDAGAN deep de-aliasing gen-erative adversarial networks for fast compressed sensing MRIreconstructionrdquo IEEE Transactions on Medical Imagingvol 37 no 6 pp 1310ndash1321 2018

[12] S Yu D Hao G Yang G Slabaugh and Y Guo ldquoDeep De-Aliasing for Fast Compressive Sensing MRIrdquo 2017 httpsarxivorgabs170507137

[13] C Ledig L Theis F Huszar et al ldquoPhoto-Realistic SingleImage Super-Resolution Using a Generative Adversarial Net-workrdquo in 2017 IEEE Conference on Computer Vision and Pat-tern Recognition (CVPR) Honolulu HI USA 2016

[14] X Wang K Yu S Wu et al ldquoESRGAN Enhanced Super-Resolution Generative Adversarial Networksrdquo in LectureNotes in Computer Science Springer 2018

[15] Q Mao S Wang S Wang X Zhang and S Ma ldquoEnhancedImage Decoding Via Edge-Preserving Generative AdversarialNetworksrdquo in 2018 IEEE International Conference on Multi-media and Expo (ICME) San Diego CA USA 2018

[16] N C Rakotonirina and A Rasoanaivo ldquoESRGAN+ furtherimproving enhanced super-resolution generative adversarialnetworkrdquo in ICASSP 2020 - 2020 IEEE international confer-ence on acoustics Speech and Signal Processing (ICASSP)pp 3637ndash3641 Barcelona Spain 2020

[17] C Kraichan and S Pumrin ldquoPerformance Analysis on Multi-Frame Image Super-Resolution Via Sparse Representationrdquo in2014 International Electrical Engineering Congress (iEECON)Chonburi Thailand 2014

[18] S Yu-bao W Zhi-hui X Liang Z Zhen-rong and L Zhan-Qiang ldquoMultimorphology Sparsity regularized image super-resolutionrdquo Acta Electronica Sinica vol 38 no 12 pp 2898ndash2903 2010

[19] S Yang M Wang Y Chen and Y Sun ldquoSingle-image super-resolution reconstruction via learned geometric dictionariesand clustered sparse codingrdquo IEEE Transactions on Image Pro-cessing vol 21 no 9 pp 4016ndash4028 2012

[20] Y Tang Y Yuan P Yan and X Li ldquoGreedy regression insparse coding space for single-image super-resolutionrdquo Jour-nal of Visual Communication amp Image Representationvol 24 no 2 pp 148ndash159 2013

[21] L Qiu-sheng and Z Wei ldquoImage super-resolution algorithmsbased on sparse representation of classified image patchesrdquoActa Electronica Sinica vol 40 no 5 pp 920ndash925 2012

[22] H Zhang T Xu H Li et al ldquoStackGAN Text to Photo-Realistic Image Synthesis with Stacked Generative AdversarialNetworksrdquo in 2017 IEEE International Conference on Com-puter Vision (ICCV) Venice Italy 2017

[23] H Zhang T Xu H Li et al ldquoStackGAN++ realistic imagesynthesis with stacked generative adversarial networksrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 41 pp 1947ndash1962 2018

[24] K He X Zhang S Ren and J Sun ldquoDeep Residual Learningfor Image Recognitionrdquo in 2016 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) pp 770ndash778 LasVegas NV USA 2016

[25] J Zhu G Yang P Ferreira et al ldquoA Roi Focused Multi-Scale Super-Resolution Method for the Diffusion TensorCardiac Magnetic Resonancerdquo in Proceedings of the 27thAnnual Meeting (ISMRM) pp 11ndash16 Montreacuteal QC Can-ada 2019

[26] J Zhu G Yang and P Lio ldquoLesion Focused Super Resolu-tionrdquo in Medical Imaging Image Processing 2019 San DiegoCA USA 2019

[27] J Zhu G Yang and P Lio ldquoHow Can We Make GAN Per-form Better in Single Medical Image Super-Resolution ALesion Focused Multi-Scale Approachrdquo in 2019 IEEE 16thInternational Symposium on Biomedical Imaging (ISBI 2019)Venice Italy 2019

[28] M Seitzer G Yang J Schlemper et al ldquoAdversarial and Per-ceptual Refinement for Compressed Sensing MRI Reconstruc-tionrdquo in Medical Image Computing and Computer AssistedIntervention ndash MICCAI 2018 Springer 2018

[29] S Ioffe and C Szegedy ldquoBatch Normalization AcceleratingDeep Network Training by Reducing Internal CovariateShiftrdquo in International conference on machine learningPMLR 2015

11International Journal of Digital Multimedia Broadcasting

[30] A Krizhevsky I Sutskever and G Hinton ldquoImageNet classi-fication with deep convolutional neural networksrdquo Advancesin Neural Information Processing Systems vol 25 no 2 2012

[31] A Radford L Metz and S Chintala ldquoUnsupervised Represen-tation Learning with Deep Convolutional Generative Adver-sarial Networksrdquo 2015 httpsarxivorgabs151106434

[32] A L Maas A Y Hannun and A Y Ng ldquoRectifier nonlinear-ities improve neural network acoustic modelsrdquo in Proceedingsof the 30 th International Conference on Machine Learningp 3 Atlanta Georgia USA 2013

[33] K Simonyan and A Zisserman ldquoVery Deep ConvolutionalNetworks for Large-Scale Image Recognitionrdquo Computer Sci-ence 2014 httpsarxivorgabs14091556

[34] J Johnson A Alahi and L Fei-Fei ldquoPerceptual Losses forReal-Time Style Transfer and Super-Resolutionrdquo in ComputerVision ndash ECCV 2016 Springer 2016

[35] D P Kingma and J Ba ldquoAdam AMethod for Stochastic Opti-mizationrdquo Computer Science 2014 httpsarxivorgabs14126980

[36] M Deshmukh and U Bhosale ldquoImage fusion and image qual-ity assessment of fused imagesrdquo International Journal of ImageProcessing (IJIP) vol 4 no 5 p 484 2010

[37] C Yim and A C Bovik ldquoQuality assessment of deblockedimagesrdquo IEEE Transactions on Image Processing vol 20no 1 pp 88ndash98 2011

[38] Z Wang A C Bovik H R Sheikh and E P SimoncellildquoImage quality assessment from error visibility to structuralsimilarityrdquo IEEE Transactions on Image Processing vol 13no 4 pp 600ndash612 2004

[39] Y Han Y Cai Y Cao and X Xu ldquoA new image fusion perfor-mance metric based on visual information fidelityrdquo Informa-tion fusion vol 14 no 2 pp 127ndash135 2013

[40] J Redmon S Divvala R Girshick and A Farhadi ldquoYou OnlyLook Once Unified Real-Time Object Detectionrdquo in 2016IEEE Conference on Computer Vision and Pattern Recognition(CVPR) Las Vegas NV USA 2015

12 International Journal of Digital Multimedia Broadcasting

  • Generative Adversarial Network-Based Edge-Preserving Superresolution Reconstruction of Infrared Images
  • 1 Introduction
  • 2 Method
    • 21 Stage 1 GAN
    • 22 Stage 2 GAN
      • 3 Experiments
        • 31 Experimental Details
        • 32 Experimental Evaluation
        • 33 Use Advanced Vision Tasks to Compare Superresolution Results
          • 4 Discussion and Future Work
          • 5 Conclusion
          • Data Availability
          • Conflicts of Interest
Page 11: Generative Adversarial Network-Based Edge-Preserving ......2.1. Stage 1 GAN. In Stage 1, we use the 128×128 low-resolution image ILR0 as input and export it to the Stage 1 generator

Data Availability

The data used to support the findings of this study are avail-able from the corresponding author upon request

Conflicts of Interest

The authors declare that there is no conflict of interestregarding the publication of this paper

References

[1] L Xiang-di et al ldquoInfrared imaging system and applicationsrdquoLaser amp Infrared vol 3 2014

[2] X Sui Q Chen G Gu and X Shen ldquoInfrared super-resolution imaging based on compressed sensingrdquo InfraredPhysics amp Technology vol 63 pp 119ndash124 2014

[3] X Sui H Gao Y Sun Q Chen and G Gu ldquoInfrared super-resolution imaging method based on retina micro-motionrdquoInfrared Physics amp Technology vol 60 pp 340ndash345 2013

[4] J L Suo J I Xiangyang and Q H Dai ldquoAn overview of com-putational photographyrdquo Science China(Information Sciences)vol 6 pp 5ndash24 2012

[5] D Chao C L Chen K He and X Tang ldquoLearning a DeepConvolutional Network for Image Super-Resolutionrdquo in Euro-pean Conference on Computer Vision Springer 2014

[6] C Dong C C Loy K He and X Tang ldquoImage super-resolution using deep convolutional networksrdquo IEEE Transac-tions on Pattern Analysis and Machine Intelligence vol 38no 2 pp 295ndash307 2014

[7] J Kim J Lee and K Lee Deeply-Recursive Convolutional Net-work for Image Super-Resolution 2016 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) IEEE pp1637ndash1645 2016

[8] W Shi J Caballero F Huszar et al ldquoReal-Time Single Imageand Video Super-Resolution Using an Efficient Sub-Pixel Con-volutional Neural Networkrdquo in 2016 IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR) Las Vegas NVUSA 2016

[9] I Goodfellow J Pouget-Abadie M Mirza et al ldquoGenerativeAdversarial Networksrdquo Advances in Neural Information Pro-cessing Systems vol 3 pp 2672ndash2680 2014

[10] C Wang S Dong X Zhao G Papanastasiou H Zhang andG Yang ldquoSaliencyGAN deep learning semisupervised salientobject detection in the fog of IoTrdquo IEEE Transactions onIndustrial Informatics vol 16 no 4 pp 2667ndash2676 2020

[11] G Yang S Yu H Dong et al ldquoDAGAN deep de-aliasing gen-erative adversarial networks for fast compressed sensing MRIreconstructionrdquo IEEE Transactions on Medical Imagingvol 37 no 6 pp 1310ndash1321 2018

[12] S Yu D Hao G Yang G Slabaugh and Y Guo ldquoDeep De-Aliasing for Fast Compressive Sensing MRIrdquo 2017 httpsarxivorgabs170507137

[13] C Ledig L Theis F Huszar et al ldquoPhoto-Realistic SingleImage Super-Resolution Using a Generative Adversarial Net-workrdquo in 2017 IEEE Conference on Computer Vision and Pat-tern Recognition (CVPR) Honolulu HI USA 2016

[14] X Wang K Yu S Wu et al ldquoESRGAN Enhanced Super-Resolution Generative Adversarial Networksrdquo in LectureNotes in Computer Science Springer 2018

[15] Q Mao S Wang S Wang X Zhang and S Ma ldquoEnhancedImage Decoding Via Edge-Preserving Generative AdversarialNetworksrdquo in 2018 IEEE International Conference on Multi-media and Expo (ICME) San Diego CA USA 2018

[16] N C Rakotonirina and A Rasoanaivo ldquoESRGAN+ furtherimproving enhanced super-resolution generative adversarialnetworkrdquo in ICASSP 2020 - 2020 IEEE international confer-ence on acoustics Speech and Signal Processing (ICASSP)pp 3637ndash3641 Barcelona Spain 2020

[17] C Kraichan and S Pumrin ldquoPerformance Analysis on Multi-Frame Image Super-Resolution Via Sparse Representationrdquo in2014 International Electrical Engineering Congress (iEECON)Chonburi Thailand 2014

[18] S Yu-bao W Zhi-hui X Liang Z Zhen-rong and L Zhan-Qiang ldquoMultimorphology Sparsity regularized image super-resolutionrdquo Acta Electronica Sinica vol 38 no 12 pp 2898ndash2903 2010

[19] S Yang M Wang Y Chen and Y Sun ldquoSingle-image super-resolution reconstruction via learned geometric dictionariesand clustered sparse codingrdquo IEEE Transactions on Image Pro-cessing vol 21 no 9 pp 4016ndash4028 2012

[20] Y Tang Y Yuan P Yan and X Li ldquoGreedy regression insparse coding space for single-image super-resolutionrdquo Jour-nal of Visual Communication amp Image Representationvol 24 no 2 pp 148ndash159 2013

[21] L Qiu-sheng and Z Wei ldquoImage super-resolution algorithmsbased on sparse representation of classified image patchesrdquoActa Electronica Sinica vol 40 no 5 pp 920ndash925 2012

[22] H Zhang T Xu H Li et al ldquoStackGAN Text to Photo-Realistic Image Synthesis with Stacked Generative AdversarialNetworksrdquo in 2017 IEEE International Conference on Com-puter Vision (ICCV) Venice Italy 2017

[23] H Zhang T Xu H Li et al ldquoStackGAN++ realistic imagesynthesis with stacked generative adversarial networksrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 41 pp 1947ndash1962 2018

[24] K He X Zhang S Ren and J Sun ldquoDeep Residual Learningfor Image Recognitionrdquo in 2016 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) pp 770ndash778 LasVegas NV USA 2016

[25] J Zhu G Yang P Ferreira et al ldquoA Roi Focused Multi-Scale Super-Resolution Method for the Diffusion TensorCardiac Magnetic Resonancerdquo in Proceedings of the 27thAnnual Meeting (ISMRM) pp 11ndash16 Montreacuteal QC Can-ada 2019

[26] J Zhu G Yang and P Lio ldquoLesion Focused Super Resolu-tionrdquo in Medical Imaging Image Processing 2019 San DiegoCA USA 2019

[27] J Zhu G Yang and P Lio ldquoHow Can We Make GAN Per-form Better in Single Medical Image Super-Resolution ALesion Focused Multi-Scale Approachrdquo in 2019 IEEE 16thInternational Symposium on Biomedical Imaging (ISBI 2019)Venice Italy 2019

[28] M Seitzer G Yang J Schlemper et al ldquoAdversarial and Per-ceptual Refinement for Compressed Sensing MRI Reconstruc-tionrdquo in Medical Image Computing and Computer AssistedIntervention ndash MICCAI 2018 Springer 2018

[29] S Ioffe and C Szegedy ldquoBatch Normalization AcceleratingDeep Network Training by Reducing Internal CovariateShiftrdquo in International conference on machine learningPMLR 2015

11International Journal of Digital Multimedia Broadcasting

[30] A Krizhevsky I Sutskever and G Hinton ldquoImageNet classi-fication with deep convolutional neural networksrdquo Advancesin Neural Information Processing Systems vol 25 no 2 2012

[31] A Radford L Metz and S Chintala ldquoUnsupervised Represen-tation Learning with Deep Convolutional Generative Adver-sarial Networksrdquo 2015 httpsarxivorgabs151106434

[32] A L Maas A Y Hannun and A Y Ng ldquoRectifier nonlinear-ities improve neural network acoustic modelsrdquo in Proceedingsof the 30 th International Conference on Machine Learningp 3 Atlanta Georgia USA 2013

[33] K Simonyan and A Zisserman ldquoVery Deep ConvolutionalNetworks for Large-Scale Image Recognitionrdquo Computer Sci-ence 2014 httpsarxivorgabs14091556

[34] J Johnson A Alahi and L Fei-Fei ldquoPerceptual Losses forReal-Time Style Transfer and Super-Resolutionrdquo in ComputerVision ndash ECCV 2016 Springer 2016

[35] D P Kingma and J Ba ldquoAdam AMethod for Stochastic Opti-mizationrdquo Computer Science 2014 httpsarxivorgabs14126980

[36] M Deshmukh and U Bhosale ldquoImage fusion and image qual-ity assessment of fused imagesrdquo International Journal of ImageProcessing (IJIP) vol 4 no 5 p 484 2010

[37] C Yim and A C Bovik ldquoQuality assessment of deblockedimagesrdquo IEEE Transactions on Image Processing vol 20no 1 pp 88ndash98 2011

[38] Z Wang A C Bovik H R Sheikh and E P SimoncellildquoImage quality assessment from error visibility to structuralsimilarityrdquo IEEE Transactions on Image Processing vol 13no 4 pp 600ndash612 2004

[39] Y Han Y Cai Y Cao and X Xu ldquoA new image fusion perfor-mance metric based on visual information fidelityrdquo Informa-tion fusion vol 14 no 2 pp 127ndash135 2013

[40] J Redmon S Divvala R Girshick and A Farhadi ldquoYou OnlyLook Once Unified Real-Time Object Detectionrdquo in 2016IEEE Conference on Computer Vision and Pattern Recognition(CVPR) Las Vegas NV USA 2015

12 International Journal of Digital Multimedia Broadcasting

  • Generative Adversarial Network-Based Edge-Preserving Superresolution Reconstruction of Infrared Images
  • 1 Introduction
  • 2 Method
    • 21 Stage 1 GAN
    • 22 Stage 2 GAN
      • 3 Experiments
        • 31 Experimental Details
        • 32 Experimental Evaluation
        • 33 Use Advanced Vision Tasks to Compare Superresolution Results
          • 4 Discussion and Future Work
          • 5 Conclusion
          • Data Availability
          • Conflicts of Interest
Page 12: Generative Adversarial Network-Based Edge-Preserving ......2.1. Stage 1 GAN. In Stage 1, we use the 128×128 low-resolution image ILR0 as input and export it to the Stage 1 generator

[30] A Krizhevsky I Sutskever and G Hinton ldquoImageNet classi-fication with deep convolutional neural networksrdquo Advancesin Neural Information Processing Systems vol 25 no 2 2012

[31] A Radford L Metz and S Chintala ldquoUnsupervised Represen-tation Learning with Deep Convolutional Generative Adver-sarial Networksrdquo 2015 httpsarxivorgabs151106434

[32] A L Maas A Y Hannun and A Y Ng ldquoRectifier nonlinear-ities improve neural network acoustic modelsrdquo in Proceedingsof the 30 th International Conference on Machine Learningp 3 Atlanta Georgia USA 2013

[33] K Simonyan and A Zisserman ldquoVery Deep ConvolutionalNetworks for Large-Scale Image Recognitionrdquo Computer Sci-ence 2014 httpsarxivorgabs14091556

[34] J Johnson A Alahi and L Fei-Fei ldquoPerceptual Losses forReal-Time Style Transfer and Super-Resolutionrdquo in ComputerVision ndash ECCV 2016 Springer 2016

[35] D P Kingma and J Ba ldquoAdam AMethod for Stochastic Opti-mizationrdquo Computer Science 2014 httpsarxivorgabs14126980

[36] M Deshmukh and U Bhosale ldquoImage fusion and image qual-ity assessment of fused imagesrdquo International Journal of ImageProcessing (IJIP) vol 4 no 5 p 484 2010

[37] C Yim and A C Bovik ldquoQuality assessment of deblockedimagesrdquo IEEE Transactions on Image Processing vol 20no 1 pp 88ndash98 2011

[38] Z Wang A C Bovik H R Sheikh and E P SimoncellildquoImage quality assessment from error visibility to structuralsimilarityrdquo IEEE Transactions on Image Processing vol 13no 4 pp 600ndash612 2004

[39] Y Han Y Cai Y Cao and X Xu ldquoA new image fusion perfor-mance metric based on visual information fidelityrdquo Informa-tion fusion vol 14 no 2 pp 127ndash135 2013

[40] J Redmon S Divvala R Girshick and A Farhadi ldquoYou OnlyLook Once Unified Real-Time Object Detectionrdquo in 2016IEEE Conference on Computer Vision and Pattern Recognition(CVPR) Las Vegas NV USA 2015

12 International Journal of Digital Multimedia Broadcasting

  • Generative Adversarial Network-Based Edge-Preserving Superresolution Reconstruction of Infrared Images
  • 1 Introduction
  • 2 Method
    • 21 Stage 1 GAN
    • 22 Stage 2 GAN
      • 3 Experiments
        • 31 Experimental Details
        • 32 Experimental Evaluation
        • 33 Use Advanced Vision Tasks to Compare Superresolution Results
          • 4 Discussion and Future Work
          • 5 Conclusion
          • Data Availability
          • Conflicts of Interest