computationally efficient light field image compression...

13
Received September 4, 2019, accepted September 18, 2019, date of publication September 30, 2019, date of current version October 11, 2019. Digital Object Identifier 10.1109/ACCESS.2019.2944765 Computationally Efficient Light Field Image Compression Using a Multiview HEVC Framework WAQAS AHMAD 1 , MUBEEN GHAFOOR 2 , SYED ALI TARIQ 2 , ALI HASSAN 2 , MÅRTEN SJÖSTRÖM 1 , AND ROGER OLSSON 1 1 Department of Information Systems and Technology, Mid Sweden University, 851 70 Sundsvall, Sweden 2 Department of Computer Science, COMSATS University Islamabad, Islamabad 45550, Pakistan Corresponding author: Waqas Ahmad ([email protected]) The work was supported in part by the Swedish Foundation for International Cooperation in Research, and in part by the European Union Horizon 2020 Research and Innovation Program, European Training Network on Full Parallax Imaging, through the Marie Sklodowska-Curie Grant under Agreement 676401. ABSTRACT The acquisition of the spatial and angular information of a scene using light field (LF) tech- nologies supplement a wide range of post-processing applications, such as scene reconstruction, refocusing, virtual view synthesis, and so forth. The additional angular information possessed by LF data increases the size of the overall data captured while offering the same spatial resolution. The main contributor to the size of captured data (i.e., angular information) contains a high correlation that is exploited by state-of-the-art video encoders by treating the LF as a pseudo video sequence (PVS). The interpretation of LF as a single PVS restricts the encoding scheme to only utilize a single-dimensional angular correlation present in the LF data. In this paper, we present an LF compression framework that efficiently exploits the spatial and angular correlation using a multiview extension of high-efficiency video coding (MV-HEVC). The input LF views are converted into multiple PVSs and are organized hierarchically. The rate-allocation scheme takes into account the assigned organization of frames and distributes quality/bits among them accordingly. Subsequently, the reference picture selection scheme prioritizes the reference frames based on the assigned quality. The proposed compression scheme is evaluated by following the common test conditions set by JPEG Pleno. The proposed scheme performs 0.75 dB better compared to state-of-the-art compression schemes and 2.5 dB better compared to the x265-based JPEG Pleno anchor scheme. Moreover, an optimized motion- search scheme is proposed in the framework that reduces the computational complexity (in terms of the sum of absolute difference [SAD] computations) of motion estimation by up to 87% with a negligible loss in visual quality (approximately 0.05 dB). INDEX TERMS Compression, light field, MV-HEVC, plenoptic. I. INTRODUCTION In the recent past, capturing the spatial and angular infor- mation of a scene [1], [2] has influenced numerous research problems, namely reconstructing 3D scenes [3], [4], process- ing [5]–[8], rendering novel views [9], [10], and refocus- ing [11], [12]. G. Lippmann [13] introduced the concept of recording spatial and angular information in 1908. In 1991, E.H. Adelson [14] characterized the light information of a scene in observable space by using a seven-dimensional plenoptic function that represents each ray by considering its The associate editor coordinating the review of this manuscript and approving it for publication was Gangyi Jiang. direction, spatial position, time, and wavelength information. In 1996, M. Levoy [15] proposed to parameterize each light ray in a scene by recording its intersection with two parallel planes using four parameters and referred to it as a light field (LF). With the availability of computing resources, initially, multiple camera systems (MCSs) were used to capture the LF information of the scene [15]. Each camera records the single perspective of the scene, and hence the number of cameras defines the angular resolution of the captured LF. The physical limitations of the capturing setup result in sparse LF sampling. The advances in optical instruments have led to the development of plenoptic cameras [16] that capture 143002 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ VOLUME 7, 2019

Upload: others

Post on 05-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computationally Efficient Light Field Image Compression ...miun.diva-portal.org/smash/get/diva2:1358279/FULLTEXT01.pdf · plenoptic camera targeting commercial applicationswas also

Received September 4, 2019, accepted September 18, 2019, date of publication September 30, 2019,date of current version October 11, 2019.

Digital Object Identifier 10.1109/ACCESS.2019.2944765

Computationally Efficient Light Field ImageCompression Using a MultiviewHEVC FrameworkWAQAS AHMAD 1, MUBEEN GHAFOOR2, SYED ALI TARIQ2, ALI HASSAN2,MÅRTEN SJÖSTRÖM1, AND ROGER OLSSON11Department of Information Systems and Technology, Mid Sweden University, 851 70 Sundsvall, Sweden2Department of Computer Science, COMSATS University Islamabad, Islamabad 45550, Pakistan

Corresponding author: Waqas Ahmad ([email protected])

The work was supported in part by the Swedish Foundation for International Cooperation in Research, and in part by the European UnionHorizon 2020 Research and Innovation Program, European Training Network on Full Parallax Imaging, through the MarieSklodowska-Curie Grant under Agreement 676401.

ABSTRACT The acquisition of the spatial and angular information of a scene using light field (LF) tech-nologies supplement a wide range of post-processing applications, such as scene reconstruction, refocusing,virtual view synthesis, and so forth. The additional angular information possessed by LF data increases thesize of the overall data captured while offering the same spatial resolution. The main contributor to the sizeof captured data (i.e., angular information) contains a high correlation that is exploited by state-of-the-artvideo encoders by treating the LF as a pseudo video sequence (PVS). The interpretation of LF as a singlePVS restricts the encoding scheme to only utilize a single-dimensional angular correlation present in theLF data. In this paper, we present an LF compression framework that efficiently exploits the spatial andangular correlation using a multiview extension of high-efficiency video coding (MV-HEVC). The inputLF views are converted into multiple PVSs and are organized hierarchically. The rate-allocation schemetakes into account the assigned organization of frames and distributes quality/bits among them accordingly.Subsequently, the reference picture selection scheme prioritizes the reference frames based on the assignedquality. The proposed compression scheme is evaluated by following the common test conditions set by JPEGPleno. The proposed scheme performs 0.75 dB better compared to state-of-the-art compression schemes and2.5 dB better compared to the x265-based JPEG Pleno anchor scheme. Moreover, an optimized motion-search scheme is proposed in the framework that reduces the computational complexity (in terms of the sumof absolute difference [SAD] computations) of motion estimation by up to 87% with a negligible loss invisual quality (approximately 0.05 dB).

INDEX TERMS Compression, light field, MV-HEVC, plenoptic.

I. INTRODUCTIONIn the recent past, capturing the spatial and angular infor-mation of a scene [1], [2] has influenced numerous researchproblems, namely reconstructing 3D scenes [3], [4], process-ing [5]–[8], rendering novel views [9], [10], and refocus-ing [11], [12]. G. Lippmann [13] introduced the concept ofrecording spatial and angular information in 1908. In 1991,E.H. Adelson [14] characterized the light information ofa scene in observable space by using a seven-dimensionalplenoptic function that represents each ray by considering its

The associate editor coordinating the review of this manuscript andapproving it for publication was Gangyi Jiang.

direction, spatial position, time, and wavelength information.In 1996, M. Levoy [15] proposed to parameterize each lightray in a scene by recording its intersection with two parallelplanes using four parameters and referred to it as a lightfield (LF).

With the availability of computing resources, initially,multiple camera systems (MCSs) were used to capture theLF information of the scene [15]. Each camera records thesingle perspective of the scene, and hence the number ofcameras defines the angular resolution of the captured LF.The physical limitations of the capturing setup result in sparseLF sampling. The advances in optical instruments have ledto the development of plenoptic cameras [16] that capture

143002 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ VOLUME 7, 2019

Page 2: Computationally Efficient Light Field Image Compression ...miun.diva-portal.org/smash/get/diva2:1358279/FULLTEXT01.pdf · plenoptic camera targeting commercial applicationswas also

W. Ahmad et al.: Computationally Efficient LF Image Compression Using a Multiview HEVC Framework

the spatial and angular information of a scene onto a singleimage referred to as a plenoptic image. The multiplexingof spatial and angular information is achieved by placing alenslet array between the main lens and the image sensorthat enables the dense LF sampling. The optical configurationof lenslet array relative to the main lens and image sensorresults in two different types of plenoptic cameras (i.e., a con-ventional plenoptic camera and a focused plenoptic camera).The conventional plenoptic camera was initially introducedfor the consumer market [16], and later on, a focusedplenoptic camera targeting commercial applications was alsoproposed [17].

The scene information whether captured by an MCS or bya plenoptic camera requires high storage, processing andtransmission resources. The increase in the size of the datacaptured is a direct consequence of the increase in sensorresolution for plenoptic cameras and the number of camerasin MCSs. The Lytro Illum [16] and Raytrix R29 plenopticcameras capture scene information by consuming 50 MBand 30 MB of storage space for a single plenoptic image,respectively. On the contrary, the MCS-based LF acquisitionsystems, namely the Franhoufer High-Density Camera Array(HDCA) [1] and the Stanford HDCA [18] require 20 GBand 400 MB of storage, respectively, for a single capture.The recent initiative to develop a standard framework forrepresenting and signaling LF modalities, also referred to asJPEG Pleno [19], underscores the requirement of efficientcompression solutions that address the correlation present inLF data.

Standard image and video encoders can be used to com-press the LF data with limited compression efficiency sincethese encoders are built on the assumptions of image andvideo signals. The image and video signals contain spa-tial and temporal correlations that are efficiently exploitedby intra-prediction and inter-prediction tools available inhigh-efficiency video coding (HEVC). As LF data pos-sess spatial and angular correlations, they require additionalimprovements in image- or video-compression schemes. Ini-tially, the HEVC intra-prediction scheme was modified forplenoptic image compression [20]–[24] by enabling eachmicro-lens image to make predictions from already encodedmicro-lens images. The encoding scheme exploits the limitedcorrelations in the plenoptic image that is mainly definedby the search window around the current micro-lens image.An alternative scheme uses a sub-aperture image (SAI)representation [25] of the plenoptic image. The SAIs aretransferred as a pseudo video sequence (PVS) to an HEVCvideo-coding scheme that only exploits a 1D inter-view cor-relation. The same methodology was also applied to LF datacaptured using an MCS by interpreting each view as oneframe of multiple PVSs. To exploit the 2D inter-view correla-tion present in the LF data, the multiview extension of HEVC(MV-HEVC) was proposed [26].

The MV-HEVC scheme was proposed with the assump-tion of a multiple video camera system [27] as an input,which exploits the temporal and inter-view correlation.

Due to the nature of MV-HEVC input, various modulesof HEVC were directly incorporated into MV-HEVC. Thesingle-layer HEVC prediction structures are used to definethe group of pictures (GOPs) of each video layer of the MCS.The rate allocation of HEVC is directly applied to individualvideo layers of the MCS. In MV-HEVC the inter-predictionis performed by using temporal motion vectors (TMV) anddisparity motion vectors (DMVs). Due to the high temporalcorrelations in multiple video camera systems, MV-HEVCuses an HEVC based motion estimation (ME) strategy tocompute TMVs and DMVs.

A. MOTIVATION AND CONTRIBUTIONSIn this paper, we present a framework for the compressionof LF data by proposing modifications to MV-HEVC. Theproposed compression framework introduces a hierarchicalorganization of the input LF views that assign a specific levelto each frame based on its location in the 2D grid. The rate-allocation scheme uses a hierarchical view organization anddistributes better quality to a sparse set of views relative tothe remaining views. A 2D prediction structure is devisedthat utilizes better quality frames as references to improve theoverall compression efficiency. An optimized ME method isproposed that relies on a rectified LF assumption and reducesthe computational complexity. The contributions of this paperare as follows.• The paper presents a detailed description of our initialLF compression proposal [26].

• The experimentation is performed by following theJPEG Pleno common test conditions [28].

• The variation of quality among views is defined as a freeparameter in the proposed coding scheme that can beadapted according to the intended application.

• A ranking based on the quality and distance of eachpredictor frame relative to the current frame is usedto select the best references. The maximum number ofreferences is defined as a free parameter in the proposedcoding scheme to address different memory constraints.

• The proposed compression scheme exploits the rectifiedLF assumption and restricts the 2D motion-search spaceinto a single dimension. Moreover, a dynamic motion-search range estimation scheme is also proposed to fur-ther reduce the computational complexity.

• The proposed compression scheme presents a general-ized solution for plenoptic images and MCSs.

The proposed compression scheme performs, on aver-age, 0.75 dB better than the state-of-the-art plenoptic imagescheme [29] and 2.5 dB better than the JPEG Pleno refer-ence anchor. Similarly, in the case of MCSs, the proposedscheme performs 2.3 dB better than the JPEG Pleno ref-erence anchor. On average, the proposed motion optimiza-tion reduces the computational complexity in terms of thenumber of times sum of absolute differences (SADs) areperformed by 87% with respect to the reference MV-HEVC.The presented scheme is based on a previously developedMV-HEVC, and it will benefit from further improvements to

VOLUME 7, 2019 143003

Page 3: Computationally Efficient Light Field Image Compression ...miun.diva-portal.org/smash/get/diva2:1358279/FULLTEXT01.pdf · plenoptic camera targeting commercial applicationswas also

W. Ahmad et al.: Computationally Efficient LF Image Compression Using a Multiview HEVC Framework

the referenceMV-HEVC. Numerous methods [30]–[32] use asparse set of views to reconstruct the input LF views, and suchmethods rely on video-coding tools to compress the sparse setof views. In addition, MV-HEVC enables each view to take2D inter-view correlations, and hence it can serve as a testbench to develop novel prediction tools for LF data.

The remainder of this paper is organized as follows:Section II presents a detailed account of state-of-the-artLF compression schemes; Section III describes the proposedcompression framework (i.e., the hierarchical organizationof views, reference picture selection, rate allocation, andmotion-search optimization); Section IV presents the testconditions and experimental results, and Section V presentsthe conclusion of the paper.

II. LITERATURE REVIEWIn this section, we present the literature review on compres-sion schemes for LF data captured by conventional plenopticcameras and MCSs. The presented schemes can be classi-fied with different labels, including disparity-based methods,PVS-based methods, deep-learning based methods, view-synthesis based methods, and so forth. To better explainthe state-of-the-art schemes in the context of this paper wecategorized the prior schemes into two groups: plenopticimage compression and MCS. The plenoptic image compres-sion schemes are further divided into two subgroups: lensletcoding (schemes directly coding plenoptic images) andSAI coding (schemes converting plenoptic images into SAIsprior to coding).

A. PLENOPTIC IMAGE COMPRESSIONThe plenoptic contents from conventional plenoptic camerashave received significant attention from the research commu-nitymainly due to the availability of benchmark datasets [18],[33] and its usage in various competitions [34], [35]. Theinitial solution proposed to directly compress the plenop-tic image and later on its conversion into SAI [12] is alsoused to exploit the state-of-the-art video-coding tools forLF data.

1) LENSLET COMPRESSIONLi et al. [20] have extended their previously proposedmethod [36] to compress conventional plenoptic images.A block-based bi-prediction capability is used in the HEVCintra-coding scheme to better exploit the correlation presentin neighboring micro-lens images. Related to similar blockmatching strategies, Conti et al. [21] added a self-similarity(SS) operator, and Monteiro et al. [22] proposed integratingSS and local linear embedding (LLE) operators in the HEVCintra-coding scheme. Monteiro et al. [23] further proposed ahigh-order prediction mode in the HEVC intra-coding struc-ture. The prediction scheme exploits the non-local spatialredundancy present in plenoptic images using blockmatchingwith up to eight degrees of freedom. Zhong et al. [24] haveproposed two prediction tools in the HEVC intra-codingscheme. The first prediction tool modifies the directional

modes of HEVC intra-coding by updating the predictionsamples with the pixel values of already encoded neighbor-ing micro-lens images. The second prediction tool enablesa block-matching scheme to predict the current micro-lensimage by estimating a linear combination of already encodedneighboring micro-lens images. Chao et al. [37] proposed agraph lifting transform for the coding of plenoptic imagesprior to pre-processing and demosaicing. Conti et al. [38]proposed a scalable coding framework that provided field ofview scalability with regions of interest (ROI) support. Theinput plenoptic image with a limited angular resolution iscompressed in the base layer of scalable HEVC (SHVC) andadditional angular information of selected ROI is progres-sively encoded in the enhancement layers. Perra et al. [39]partitioned the plenoptic image into equally sized tiles andcompressed the tiles as a PVS using HEVC video-codingtools.

2) SUB-APERTURE IMAGE COMPRESSIONInstead of providing the video-coding scheme a non-naturalimage, following the initial idea to provide integral imagesas PVSs to a video encoder [40], Liu et al. [25] convertedplenoptic images into SAIs and compressed them as a singlePVS using HEVC video-coding tools. Jia et al. [41] intro-duced an optimized ordering of SAIs that take into accountthe correlation present in the SAIs in order to improve thecompression efficiency of the coding scheme. Zhao et al. [42]divided the input SAIs into two groups. The first group ofSAIs is compressed as a PVS using HEVC and the secondgroup of SAIs is estimated by taking the weighted averagefrom the neighboring SAIs of the first group. Li et al. [43]presented a 2D hierarchical coding structure that partitionsSAIs into four quadrants and restricts the prediction structurewithin each quadrant to address the reference picture buffermanagement in HEVC. Ahmad et al. [26] demonstratedthe application of MV-HEVC for LF data by interpretingthe SAIs as multiple PVSs prior to compression. In thisway, SAIs are allowed to utilize the 2D inter-view corre-lation using the temporal and inter-view prediction toolsof MV-HEVC. Tabus et al. [44] segmented the central SAIinto regions and estimated the displacement of each regionin the side SAIs relative to the central SAI. The researchersproposed a prediction scheme that uses the segmentationinformation of the central SAI, the region displacement ofside SAIs, a JPEG2000 decoded version of selected SAIs, anda JPEG2000 decoded version of lenslet image to reconstructthe SAIs. Tran et al. [45] proposed motion-compensatedwavelet decomposition and estimated the disparity mapusing a variational optimization scheme. An invertiblemotion-compensated wavelet decomposition scheme isapplied to effectively remove redundant information acrossmultiple SAIs. The JPEG2000 is used to encode a sub-bandof SAIs and the estimated disparity map. Chen et al. [46]introduced a disparity-guided sparse coding scheme forplenoptic image compression. A set of sparse SAIs is usedto train the sparse dictionary that is later used to reconstruct

143004 VOLUME 7, 2019

Page 4: Computationally Efficient Light Field Image Compression ...miun.diva-portal.org/smash/get/diva2:1358279/FULLTEXT01.pdf · plenoptic camera targeting commercial applicationswas also

W. Ahmad et al.: Computationally Efficient LF Image Compression Using a Multiview HEVC Framework

the non-selected SAIs. Bakir et al. [32] have compresseda sparse set of SAIs as PVS using HEVC and reconstructedthe remaining SAIs using a convolutional neural network(CNN) based view synthesis scheme.

B. MULTIPLE CAMERA SYSTEM COMPRESSIONHawary et al. [31] exploited the sparsity present in the angulardomain of LF using a sparse Fourier-transform basedmethod.A set of views at pre-defined positions is compressed inthe base layer of SHVC, and the corresponding decodedset is used to reconstruct the remaining views. The remain-ing views are compressed in SHVC enhancement layers bytaking the prediction from previously reconstructed views.Ahmad et al. [47] extended their previous method [26] andinterpreted views of MCSs as frames of multiple PVSsand compressed them using MV-HEVC to demonstrate theapplicability of the scheme for a sparsely sampled LF.Ahmad et al. [30] presented a shearlet transform based pre-diction tool for compression of MCS. A sparse set of viewsare encoded as a PVS using HEVC and at the decoderside, missing views are reconstructed using the shearlet-transform based prediction (STBP) scheme. Xian et al. [48]proposed a homography based LF compression solution. Thehomographies between the side views/SAIs and the centralview/SAI are estimated by minimizing the error in the low-rank approximation model. The scheme is further refinedby using the specific homography corresponding to eachdepth layer in the scene. Senoh et al. [49] proposed a depth-based compression solution for MCSs and plenoptic images.A subset of views/SAIs are encoded using MV-HEVC alongwith depth maps, and on the decoder side, view synthe-sis is performed to estimate the non-selected views/SAIs.Carvalho et al. [50] have created a 4D discrete cosinetransform (DCT)-based compression scheme for MCSs andplenoptic images. Views/SAIs are divided into 4D blocks,and DCT is applied on each block. The generated coefficientsare grouped using a hexadeca-tree structure on a bit-plane bybit-plane basis, and the generated stream is encoded using anadaptive arithmetic coder.

C. SUMMARYThe following key findings can be concluded regarding state-of-the-art LF compression schemes:• Plenoptic image compression has received signifi-cant attention from the research community comparedto MCS, mainly due to availability of benchmarkdatasets and plenoptic image compression grandchallenges [34], [35].

• Plenoptic images captured with Lytro cameras aremainly used in plenoptic image compression grand chal-lenges. The plenoptic image compression is performedeither directly on raw lenslet images or on sub-apertureimages.

• Researchers have introduced LF reconstruction schemesfor LF compression that use video-coding tools for theencoding of sparse LF views [30], [44], [46].

• Special compression requirements have also beenproposed in numerous schemes, including ROI-basedcoding [38], reference picture management [43], andscalable coding [31], [38], [51].

At present, it is inevitable to avoid video-compressiontools for LF data compression since the compression effi-ciency of video data has significantly improved with thelast two decades of research efforts. The interpretation ofLF views as a PVS enables video-coding schemes to exploitthe correlation present in LF data [25], [26], [41]. More-over, LF reconstruction methods are drawing attention in theLF compression field [30]–[32]. These methods reconstructthe input LF views using a sparse set of views that are initiallycompressed using video-coding tools. The enhancement andcustomization of video-coding schemes to improve the com-pression efficiency of LF data can significantly influence theLF compression process.

III. PROPOSED COMPRESSION SCHEMEThe block diagram of the proposed framework is displayedin Fig. 1 and consists of LF acquisition, representation,and compression. The proposed compression framework isapplicable for both dense as well as sparse LF data cap-tured using plenoptic cameras and MCSs, respectively. TheLF data acquired using plenoptic cameras is convertedinto SAIs and then into multiple PVS (MPVS), whereasthe LF views acquired from MCSs are directly convertedinto MPVS. In the rest of this paper, SAIs from plenopticcameras and views fromMCSs are referred to as ‘‘LF views’’.Fig. 1 illustrates that, after LF acquisition and its representa-tion as MPVS, it is compressed using MV-HEVC. The com-pression blocks highlighted in gray represent the modulesthat have been either added or modified from the referenceMV-HEVC modules, whereas the non-highlighted (white)blocks represent reference MV-HEVC modules that arenot modified in the proposed framework. The compressionscheme initially organizes the LF views in hierarchical order,followed by the rate-allocation scheme that assigns the qual-ity to individual LF views in terms of quantization levels. Thereference frame management scheme arranges the indexesof frames according to the hierarchical organization usedfor the prediction of other frames in the LF views. Finally,an optimized motion search is performed by taking advantageof the temporal and inter-view dependencies of LF views toreduce the computational complexity. The following subsec-tions discuss the compression modules in detail.

A. HIERARCHICAL ORGANIZATION OF LIGHT FIELDVIEWSThe proposed scheme divides the input LF views into hier-archical levels, and later the assigned levels are used byreference picture management and rate-allocation schemesto improve the compression efficiency. The central LF view(referred to as central-LF) is considered as the most essentialview, and it is placed in the first hierarchical level. A uni-formly sub-sampled sparse representation of input LF views

VOLUME 7, 2019 143005

Page 5: Computationally Efficient Light Field Image Compression ...miun.diva-portal.org/smash/get/diva2:1358279/FULLTEXT01.pdf · plenoptic camera targeting commercial applicationswas also

W. Ahmad et al.: Computationally Efficient LF Image Compression Using a Multiview HEVC Framework

FIGURE 1. Block diagram of the overall proposed framework. The input from plenoptic cameras or MCSs is converted into MPVS and transferred asinput to the MV-HEVC based LF coding framework. A hierarchical organization of frames is introduced in MV-HEVC, and modifications are proposed inreference picture management, rate-allocation, and ME schemes.

FIGURE 2. Hierarchical organization for horizontal parallax only LF data (17 views). The central view at index 8 is of high importance, andit is placed in the first level. Views with indexes {0,4,12,16} are placed in the second level, views with indexes {2,6,10,14} are placed in thethird level, and the remaining views are placed in the last level. The same hierarchical organization is applied to 2D LF views.

is interpreted as the next important piece of information of thescene and is placed in the second hierarchical level. With-out relying on any prior information, it is assumed that theuniformly distributed sparse views contain most of the sceneinformation. Similarly, the remaining LF views are placed insubsequent hierarchical levels. Fig. 2 contains an example ofthe hierarchical organization for Stanford LF data in a singledimension. The central view at index eight {8} is placed inthe highest hierarchical level. Viewswith indexes {0,4,12,16}are placed in the second level, views with indexes {2,6,10,14}are placed in the third level, and finally, the remaining views

are placed in the last level. The views placed in higher levelsare assigned with the best available quality and are used forthe prediction of same and lower level views. The lowestlevel views are not used for prediction of any other view. Thepresented hierarchical organization scheme is extended for2D LF views.

The hierarchical organization of a single layer withM views is presented in Algorithm 1. The central-LF view isassigned to the highest level. The algorithm then successivelypartitions the input LF views into two parts and assigns sub-sequent levels until further division is not possible. Finally,

143006 VOLUME 7, 2019

Page 6: Computationally Efficient Light Field Image Compression ...miun.diva-portal.org/smash/get/diva2:1358279/FULLTEXT01.pdf · plenoptic camera targeting commercial applicationswas also

W. Ahmad et al.: Computationally Efficient LF Image Compression Using a Multiview HEVC Framework

the remaining LF views are assigned to the lowest level. Theleft and right side of the hierarchical structure are stored intwo separate lists (i.e., HL and HR) and are provided as anoutput by Algorithm 1. The same methodology is extendedfor 2D input LF views with the sizeMxN .

Algorithm 1 Hierarchical Organization of input LF ViewsInput: Total Number of viewsM

1: HL(1, 1) = bM−12 c

2: HR(1, 1) = bM−12 c

3: C = {0,L0,M − 1}4: i = 15: while bM−1

2i+1≥ 2c do

6: for k = 1 : numel(C)− 1 do

7: F = C(k)+ bM−12i+1c;

8: if F ≤ bM−12 c then9: HL(i, 1) = F ;

10: else11: HR(i, 1) = F ;12: C = updatelist(C,Li)13: i = i+ 114: for m = 1 : M do15: Lm = m , ∀ m /∈ C ;16: HL(1, numel(HL(2, :))+ 1) = 0 ;17: HR(1, numel(HR(2, :))+ 1) = M − 1 ;18: Output: HL, HR

B. REFERENCE PICTURE SELECTION/MANAGEMENTThe reference picture list maintains the indexes of framesused for the prediction of other frames in the video data.In video coding, various prediction structures [52]–[55] areproposed by taking into account the properties of image andvideo signals. The reference MV-HEVC uses the same videosignal assumptions for the management of reference picturelists. However, in our proposed compression scheme, we haveconsidered the following important factors for LF data.

1) Block Level Analysis: In the presence of occlusion,it will be beneficial to select a reference view that isfurther away from the current view since nearby viewswill most likely contain the same occlusion.

2) Frame Level Analysis: In video data, the residualbetween consecutive frames is mainly incurred bythe motion of objects. However, the residual betweenconsecutive views in LF data is mainly incurred byperspective changes and occlusion due to objects inthe scene. Generally, the correlation of the currentframe/view decreases as predictions are taken fromfurther away frames/views. However, this behavior ismore likely in the case of LF data compared to videodata. Therefore, encoding efficiency increases if viewsare encoded from references selected in close vicinityof the current view.

Algorithm 2 Reference Picture SelectionInput HL,HR, PMAX

1: RL(1, 1) = NULL; Base frame is Intra coded2: DL(1, 1) = HL(1, 1); Base frame is encoded first3: Calculating Reference pictures for Left side4: k = 25: for s = 2 : size(HL, 1) do6: for t = 1 : size(HL, 2) do7: if HL(u, t) 6= NULL then8: DL(1, k) = HL(s, t)9: for u = 1 : k − 1 do

10: if u ≤ PMAX then11: RL(k, u) = DL(1, u)12: end13: end14: k = k + 115: end16: end17: end18: Calculating Reference pictures for Right side19: k = 220: for s = 2 : size(HR, 1) do21: for t = 1 : size(HR, 2) do22: if HR(u, t) 6= NULL then23: DR(1, k) = HR(s, t)24: for u = 1 : k − 1 do25: if u ≤ PMAX then26: RR(k, u) = DR(1, u)27: end28: end29: k = k + 130: end31: end32: end33: R = [RL;RR]34: DO = [DL DR]35: end36: Output: DO, R

3) GOP Level Analysis: In LF compression, it has beendemonstrated that [25], [26] providing better quality topredictor frames improves the compression efficiency.The variable quality of the predictor frame will requirejoint analysis of the impact of the distance and qualityof the predictor frame on the encoding of the currentframe.

Therefore, it is proposed to select a combination of ref-erence views such as to keep the views near the currentview (to exploit the high correlation), and also the viewsthat are relatively farther away from the current view (tohandle the occluded regions). However, increasing the ref-erence frames improves the compression efficiency at thecost of additional memory and computational complexityof the encoding scheme. Experimentation is performed to

VOLUME 7, 2019 143007

Page 7: Computationally Efficient Light Field Image Compression ...miun.diva-portal.org/smash/get/diva2:1358279/FULLTEXT01.pdf · plenoptic camera targeting commercial applicationswas also

W. Ahmad et al.: Computationally Efficient LF Image Compression Using a Multiview HEVC Framework

quantify the impact of the number of reference frames onoverall compression efficiency, and the results are reportedin Section IV-D.

Algorithm 2 explains the proposed reference picture selec-tion scheme for a 1D case. The proposed scheme takes themaximum number of predictor (PMAX) frames as an inputparameter along with the left (HL) and right (HR) lists fromAlgorithm 1. References for each list are maintained inde-pendently, and both lists only share the central view. Thereference list index for the first frame is assigned with nullvalue since it will be encoded in intra mode. From lines 5-17,the algorithm iterates over all the remaining views and esti-mates the reference list for each view. Similarly, the referencelist for views placed on the right-side list (HR) is estimatedfrom lines 18-32. The algorithm also sets the decoding order(DO) of the input LF views.

C. RATE-ALLOCATION SCHEMEThe rate-allocation scheme [47] makes use of the hierarchicalstructure of the input LF views to assign a specific quan-tization parameter (QP) to each view. The views placed inhigher levels are assigned with better quality compared toremaining views placed in the subsequent levels. The rate-allocation scheme tends to keep better quality in terms of QPfor views that are used for the prediction of other views; thegreater the dependency on a view, the lowerwill be its QP. Theviews that are not used in the prediction of any other views areassigned with relatively higher QP values. The scheme usesMV-HEVC parameters including picture order count (POC),view ID (VID), decoding order (DO), and view order index(VOI) to estimate the QP for each view. The parameters POCand VID represent the location of each view in the inputLF grid. While the parameters DO and VOI represent thedecoding order of each view in the horizontal and verticalaxes.

Algorithm 3 presents the rate-allocation scheme used inthe proposed compression scheme. The algorithm takes thefollowing parameters as input: the number of views in bothdimensions (M × N ), POC and VID of the central frame(cPOC, cVID), POC array (APOC), DO array (ADO), VID array(AVID), VOI array (AVOI), and prediction level arrays (PPOCand PVID). Views belonging to the central column or row areassigned QP offset equal to the maximum of their predictionlevels. For each remaining view, the QP offset is estimated bycalculating the frame distance and decoding distance betweenthe current frame and the base frame. The proposed rate-allocation scheme limits the extent of the QP offset by nor-malizing the calculated QP offset with the parameter Qmax .The normalized QP offset of each frame is added to the QPof the central frame (Qc) to determine QP of each frame,as explained in equation (1).

QP(m, n) =M∑m=1

N∑n=1

Qc +Q0(m, n)max(Q0)

× QMAX (1)

Algorithm 3 Rate-Allocation SchemeInput: M , N , cPOC, cVID, APOC , ADO, AVID, AVOI ,PPOC, PVID

1: for x = 1:M do2: for y = 1:N do3: if x == cPOC ‖ y == cVID then4: Qo(x,y) = max(PPOC(x),PVID(y))5: else6: W = Weightage(sPOC(x), tVID(y))7: dPOC = b

|APOC(x)−cPOC|W c

8: dVID = b|AVID(y)−cVID|

W c

9: dDO = remainder(ADO(x), cPOC)10: dVOI = remainder(AVOI(y), cVID)11: Qo(x,y) = dPOC + dVID + dDO + dVOI12: end13: end14: end15: Output: Qo

D. MOTION-SEARCH OPTIMIZATIONThe reference MV-HEVC was initially proposed for MCSsthat contain temporal and inter-view correlations. Since theMCS data are mainly dominated by temporal correlations,it forced MV-HEVC to adopt a conventional 2D ME schemefor finding the best possible match of the current blocknamely a motion vector (MV). The MV-HEVC mainly relieson test zone search (TZS) [56] ME scheme that consists ofthree stages: 1) initial position selection, 2) coarse search, and3) refined search. In the first step, the initial center position ofthe search window is selected by choosing the MV informa-tion of the neighboring coding units (CUs). In the second step,a coarse diamond-pattern search is performed using the initialcenter position to find the MV that yields the minimum SAD.In cases where the difference between the obtained MV andthe initial position is higher than the specified threshold,an additional raster search is performed to obtain a closerestimate of the MV. In the final step, refinement is performedby changing the initial center position to the estimated MV ofthe second step, and a diamond-pattern search is performedagain to obtain the refined MV. This diamond-pattern based2D motion search in MV-HEVC is computationally expen-sive, which significantly contributes to the overall encodingtime [56].

In LF data, the motion can be modeled by the disparitybetween the views. Disparity depends on the position ofobjects in the scene, and since each camera/view is essentiallylooking at the same scene, the views contain similar infor-mation, which can be exploited to optimize the ME process.In the proposed framework, we have exploited the rectifiedLF assumption to optimize the ME process in two ways:(i) We restricted the motion search in either horizontal or ver-tical directions depending on the orientation of the ref-erence and current frame, and (ii) the range of motionsearch is dynamically adapted based on the disparity between

143008 VOLUME 7, 2019

Page 8: Computationally Efficient Light Field Image Compression ...miun.diva-portal.org/smash/get/diva2:1358279/FULLTEXT01.pdf · plenoptic camera targeting commercial applicationswas also

W. Ahmad et al.: Computationally Efficient LF Image Compression Using a Multiview HEVC Framework

cameras/views and the maximum motion encountered in thecentral LF column.

In the proposed LF compression framework, the centralcolumn of the 2D input LF is encoded first. The maximumMV between the current and reference frames is estimatedand normalized by using the distance between the currentand reference frames. For the remaining columns of theLF, whenever inter-view prediction is performed, the motionsearch is limited to the vertical direction only, with searchrange equal to the maximum motion found in the centralcolumn. Similarly, for temporal prediction, the motion searchis restricted only to the horizontal direction. The maximumhorizontal search range is defined based on the maximumvertical search range, and the horizontal and vertical distancesbetween adjacent cameras. The motion-search complexity issignificantly reduced from 2D to 1D with a negligible loss inrate-distortion (RD) performance.

The pseudo algorithm of the proposed optimization inthe ME module is presented in Algorithm 4. The algorithmtakes POCi, POCref, CPOC, nPOC, Viewj, Viewref, stepX , andstepY parameters as input. POCi and POCref represent thePOC number of the current frame being encoded and the ref-erence frame respectively. Moreover, CPOC represents POCnumber of the central LF column, nPOC represents the numberof frames in each view, and nView represents the number ofviews; Viewj and Viewref represent the VID of the current andthe reference frames, respectively. The parameters stepX andstepY indicate the horizontal and vertical distance betweenadjacent cameras.

In the proposed scheme, the first frame belongs to thecentral LF column (CPOC). The maximum motion betweenthe current (Viewj) and the reference views (Viewref) isfound using the ‘‘FindMaxVerticalMV’’ function that uti-lizes the reference ME process. The maximum motion vec-tor, V , is normalized by the distance between the refer-ence and the current view. If the normalized MV is largerthan the previous maximum vertical motion, Vmax, thenVmax is updated. Subsequently, the maximum horizontal MV,Hmax is scaled by multiplying Vmax with the ratio of thehorizontal and vertical distance between the adjacent viewsstepX and stepY , respectively. The process is repeated forall the frames in the central LF column, and after encod-ing the central LF column, the estimated motion-searchranges Vmax and Hmax are fixed for the rest of the LF.In non-central LF frames, optimized motion estimation isperformed. If the VID of the current and reference frames(Viewj, Viewref) is same, then a horizontal motion search isperformed with a range limited to the maximum horizontalmotion given by Hmax. Similarly, if VIDs of reference andcurrent frames do not match, then vertical motion estimationis performed with a search range limited to the maximumvertical motion, Vmax.

IV. EXPERIMENTAL RESULTS AND DISCUSSIONThe experiments were performed to analyze the followingaspects of the proposed compression framework:

Algorithm 4 Optimized Motion EstimationInput: POCi, POCref, CPOC, nPOC, nView, Viewi, Viewref,stepX , stepY

1: Vmax = 0 F Initialized once2: for i = 1 : nPOC do3: for j = 1 : nView do4: if POCi == CPOC then F Perform

Reference Motion Estimation with default search range,i.e., (x, y) = (±64,±64)

5: V = FindMaxVerticalMV(Viewj, Viewref)6: Vnorm = V/abs(Viewj − Viewref)7: if Vnorm > Vmax then8: Vmax = Vnorm9: Hmax = (stepX/stepY ) ∗ Vmax

10: end11: else F Perform optimized motion search12: if Viewj == Viewref then F i.e. same Layer

(temporal prediction)13: Hscaled = abs(POCi − POCref) ∗ Hmax14: SetMvSrchRng(±Hscaled, 0)15: else F i.e. same POC (inter-view prediction)16: Vscaled(y) = abs(Viewj − Viewref) ∗ Vmax17: SetMvSrchRng(0,±Vscaled)18: end if19: end if20: end for21: end for22: Output: MvSrchRng

• to select the appropriate input format for the proposedcoding scheme,

• to analyze the impact of variable quality allocation onrate-distortion improvement,

• to quantify the number of reference pictures for LF data,• to estimate the reduction in computational complexityusing the proposed motion-search optimization, and

• to demonstrate the performance of the proposed com-pression scheme in comparison with state-of-the-artmethods.

A. TEST ARRANGEMENT AND EVALUATION CRITERIAThe proposed LF compression scheme was tested on six LFimages selected from three different datasets [1], [18], [57].Table 1 displays the selected LF images that contain M × Nviews in RGB format, and their equivalent YUV444 formatwas used as a reference input signal.

Following the JPEG Pleno common test conditions [28],weighted peak signal-to-noise ratio (PSNR) as presented in(2) was used to estimate the degradation in reconstructionquality relative to the reference input. The mean PSNR forthe Y, U, and V components was evaluated by followingequations (3) to (5). The proposed scheme is compared withthe JPEG Pleno reference scheme (x265) and a state-of-the-art graph-learning based plenoptic image compression

VOLUME 7, 2019 143009

Page 9: Computationally Efficient Light Field Image Compression ...miun.diva-portal.org/smash/get/diva2:1358279/FULLTEXT01.pdf · plenoptic camera targeting commercial applicationswas also

W. Ahmad et al.: Computationally Efficient LF Image Compression Using a Multiview HEVC Framework

TABLE 1. Selected LF images from JPEG Pleno datasets.

scheme [29].

PSNRYUV =6PSNRY + PSNRU + PSNRV

8(2)

PSNRmean =1MN

M∑m=1

N∑n=1

PSNR(m, n) (3)

The PSNR of each view is calculated by

PSNR(i, j) = 10log10(1023)2

MSE(i, j), (4)

and the mean square error is calculated using

MSE(i, j) =1WH

W∑x=1

H∑y=1

[I (x, y)− I ′(x, y)]2 (5)

where I and I ′ represent the reference and decoded view,respectively, and W and H represent the width and height ofthe view, respectively.

B. INPUT FORMAT SELECTIONThe Lytro and HDCALF images are provided in 10 bpp RGBformat, and the Stanford LF image is given in 8 bpp RGBformat. To find the optimum input format for the proposedencoding scheme, the ‘‘Bike’’ LF image from Lytro datasetwas testedwith three different input formats (10-bit YUV444,8-bit YUV444, and 8-bit YUV420 format). Fig. 3 displaysthe rate-distortion curves for three different variations ofthe input LF data. In all the test cases, the comparison wasperformed with the original input format. The results clearlyindicate that the quantization of the input LF image from10 bpp to 8 bpp and chroma sub-sampling from YUV444 toYUV420 yields better compression efficiency with an aver-age Bjøntegaard delta PSNR (BD-PSNR) gain of 0.45 dB.Hence, in the proposed compression scheme, the referenceinput is converted into an 8-bit YUV420 format prior toencoding.

C. FIXED VERSUS VARIABLE QUALITYTo analyze the impact of variable quality allocation amongdifferent views of input LF, an experiment was performedwith two different quantization schemes: i) a variableQP scheme, as explained in Section III-C, and ii) a fixed QPscheme for all the frames. The experiment was performed on

FIGURE 3. Rate-Distortion comparison for three different input formats(10-bit YUV444, 8-bit YUV444, and 8-bit YUV420 format).

FIGURE 4. The rate-distortion analysis between the fixed qualitycompression scheme and the variable quality compression scheme on the‘‘Bikes’’ LF image.

the ‘‘Bikes’’ LF image, and the RD curves presented in Fig. 4show that, overall, better compression efficiency is achievedby assigning variable QPs compared to fixed QPs among theinput LF frames. The variable QP scheme shows a BD-PSNRimprovement of 0.58 dB.

The mean PSNR measure, as suggested by JPEG Pleno,does not represent the variation among the views; hence,in Fig. 4, the variance of PSNR among compressed viewsis also presented. In the ‘‘Bikes’’ LF image, the fixedQP scheme has an average variance of 0.53 dB for fourselected bitrates, whereas the variable QP scheme has anaverage variance of 1.13 dB. The variable coding of LF viewsorganized in hierarchical groups can be beneficial for appli-cations such as disparity estimation, view synthesis, and soforth. Moreover, the presented scheme can also benefit novelprediction tools [30] that use a sparse set of views to pre-dict the remaining views. However, some applications mightrequire consistent quality among views (i.e., refocusing).Hence, in the proposed framework, the maximum variationin QP values is defined as a free parameter to better addressthe desired requirements.

143010 VOLUME 7, 2019

Page 10: Computationally Efficient Light Field Image Compression ...miun.diva-portal.org/smash/get/diva2:1358279/FULLTEXT01.pdf · plenoptic camera targeting commercial applicationswas also

W. Ahmad et al.: Computationally Efficient LF Image Compression Using a Multiview HEVC Framework

FIGURE 5. The impact of the number of reference pictures on rate-distortion curves for Lytro, Stanford, and HDCA LF images.

FIGURE 6. The rate-distortion analysis between reference motion-search(RMS) and the proposed LF motion-search (LMS) for the ‘‘Bikes’’ LF image.

D. REFERENCE PICTURES SELECTIONAn analysis was performed to quantify the effect of the num-ber of reference pictures on the RD improvements relativeto computational complexity. The experiment was performedon ‘‘Bikes’’, ‘‘Set2’’, and ‘‘Tarot Cards’’ LF images takenfrom three different datasets as mentioned in Table 1. Theimprovements in the RD curves as a consequence of theincrease in the number of reference pictures is illustratedin Fig. 5. A significant improvement in the RD curves can beseen when more than two references in each dimension (hori-zontal and vertical) are selected. However, at a certain numberof reference pictures, the improvement in RD performancebecomes saturated, and the execution complexity increasesunnecessarily. Based on the analysis, a parameter was definedthat quantifies the number of reference pictures for eachLF dataset. We chose a maximum of five horizontal andfive vertical references for the Stanford and HDCA datasets;however, for the Lytro dataset, we decided on a maximum ofthree reference pictures in each dimension.

E. MOTION-SEARCH OPTIMIZATIONAn analysis was performed to estimate the reduction incomputational complexity using the proposed motion-search

TABLE 2. The reduction in number of times SAD is computed using theproposed motion-search optimization.

optimization. The LF image ‘‘Bikes’’ from Lytro datasetwas encoded using the reference (MV-HEVC TZS scheme)and the optimized motion-search scheme and the computa-tional complexity was compared in terms of the reductionin the number of SAD computations during the motion-search process. Table 2 presents the motion optimizationanalysis, which demonstrates that the proposed motion opti-mization scheme results in more than an 80% reduction in theSAD computations for each LF image. Fig. 6 consists of anRD comparison between the reference motion-search and theproposed motion-search optimization, which demonstratesthat the proposed optimization in motion-search results in anegligible loss in the RD curves, mainly due to the rectifica-tion process. The experiment revealed that, when providingthe rectified LF as input to MV-HEVC, the search for the bestcandidate block can be reduced to a single dimension withminimum loss in perceived quality.

F. PROPOSED COMPRESSION SCHEMEThe proposed compression scheme was evaluated on allthe selected LF images, and the RD curves are presentedin Fig. 7. In plenoptic image compression, the proposedcompression scheme performs better than [29] and the x265-based schemes with an average BD-PSNR gain of 0.7 dBand 2.5 dB, respectively, as presented in Table 3. Similarly,in the case of MCSs, the proposed scheme is compared withthe x265-based scheme, and it reveals a PSNR improvement

VOLUME 7, 2019 143011

Page 11: Computationally Efficient Light Field Image Compression ...miun.diva-portal.org/smash/get/diva2:1358279/FULLTEXT01.pdf · plenoptic camera targeting commercial applicationswas also

W. Ahmad et al.: Computationally Efficient LF Image Compression Using a Multiview HEVC Framework

FIGURE 7. The rate-distortion comparison of the proposed scheme with graph learning scheme [29] and JPEG Pleno anchor scheme for selected LF images.

TABLE 3. BD-PSNR gain of the proposed scheme relative to graphlearning scheme [29] and JPEG Pleno anchor schemes.

of 2.4 dB for the ‘‘Tarot cards’’ LF image and 2.2 dB for the‘‘Set2’’ LF image. In all the RD comparisons, the proposedscheme performs equally better in all the tested bitrates.The RD information indicates that gains in compression effi-ciency for plenoptic images are higher compared for MCS.The SAIs of plenoptic images contain high angular correla-tion, which is a consequence of the narrow baseline; also inaddition, the small disparity among SAIs contributes to lessoverhead of MVs. In an alternative perspective, the proposedcoding scheme relates the disparity information to the frame-per-second (FPS) analogy of video acquisition systems. Thelower disparity in the captured SAIs represents high FPS,which in turn, reflects high correlation in neighboring SAIs.On the other hand, high disparity relates to low FPS and lowercorrelation among neighboring views.

V. CONCLUSIONIn this paper, we have presented an optimized LF com-pression framework built on the multiview extension ofhigh-efficiency video coding (MV-HEVC). In this process,

the views/SAIs are converted into multiple pseudo videosequences (MPVSs) and are hierarchically organized intogroups. Variable quality is distributed among frames basedon their assigned hierarchical level. A sparse set of viewsbelonging to higher hierarchical level is assigned with bet-ter quality and used as reference pictures for the remainingviews to improve the compression efficiency. An analysis wasperformed to quantify the number of reference pictures forLF data. Moreover, a motion-search optimization suitable forLF data is proposed to reduce the computational complex-ity (in terms of the number of SADs computed) by upto 87% with negligible loss in perceived quality. The pro-posed scheme is compared with the x265-based JPEG Plenoscheme and the graph-learning scheme by following thecommon test conditions set by JPEG Pleno. The proposedscheme leads to a 2.5 dB PSNR improvement compared to thex265-based JPEG Pleno anchor scheme and is 0.75 dB betterthen the graph-learning scheme. In the future, we intend toinvestigate various aspect of compression schemes, such asrate-control, random access, scalability, and so forth.

ACKNOWLEDGMENTThe software code of proposed scheme is made available athttps://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-37186.

REFERENCES[1] M. Ziegler, R. O. H. Veld, J. Keinert, and F. Zilly, ‘‘Acquisition system for

dense lightfield of large scenes,’’ in Proc. 3DTV Conf., True Vis.-Capture,Transmiss. Display 3D Video (3DTV-CON), Jun. 2017, pp. 1–4.

[2] M. Martínez-Corral, J. C. Barreiro, A. Llavador, E. Sánchez-Ortiga,J. Sola-Pikabea, G. Scrofani, and G. Saavedra, ‘‘Integral imagingwith Fourier-plane recording,’’ Proc. SPIE, vol. 10219, May 2017,Art. no. 102190B.

143012 VOLUME 7, 2019

Page 12: Computationally Efficient Light Field Image Compression ...miun.diva-portal.org/smash/get/diva2:1358279/FULLTEXT01.pdf · plenoptic camera targeting commercial applicationswas also

W. Ahmad et al.: Computationally Efficient LF Image Compression Using a Multiview HEVC Framework

[3] D. Cho, M. Lee, S. Kim, and Y.-W. Tai, ‘‘Modeling the calibration pipelineof the Lytro camera for high quality light-field image reconstruction,’’ inProc. IEEE Int. Conf. Comput. Vis., Dec. 2013, pp. 3280–3287.

[4] C. Kim, H. Zimmer, Y. Pritch, A. Sorkine-Hornung, and M. Gross, ‘‘Scenereconstruction from high spatio-angular resolution light fields,’’ ACMTrans. Graph., vol. 32, no. 4, pp. 1–73, Jul. 2013.

[5] H. Mihara, T. Funatomi, K. Tanaka, H. Kubo, Y. Mukaigawa, andH. Nagahara, ‘‘4D light field segmentation with spatial and angular con-sistencies,’’ in Proc. IEEE Int. Conf. Comput. Photography (ICCP),May 2016, pp. 1–8.

[6] Z. Yu, J. Yu, A. Lumsdaine, and T. Georgiev, ‘‘An analysis of colordemosaicing in plenoptic cameras,’’ in Proc. IEEE Conf. Comput. Vis.Pattern Recognit., Jun. 2012, pp. 901–908.

[7] T. E. Bishop, S. Zanetti, and P. Favaro, ‘‘Light field superresolution,’’ inProc. IEEE Int. Conf. Comput. Photography (ICCP), Apr. 2009, pp. 1–9.

[8] A. Ansari, A. Dorado, G. Saavedra, and M. M. Corral, ‘‘Plenoptic imagewatermarking to preserve copyright,’’ Proc. SPIE, vol. 10219, May 2017,Art. no. 102190A.

[9] S. Hong, A. Ansari, G. Saavedra, and M. Martinez-Corral, ‘‘Full-parallax3D display from stereo-hybrid 3D camera system,’’ Opt. Lasers Eng.,vol. 103, pp. 46–54, Apr. 2018.

[10] P. A. Kara, A. Cserkaszky, A. Barsi, M. G. Martini, and T. Balogh,‘‘Towards adaptive light field video streaming,’’ COMSOC MMTCCommun.-Frontiers, vol. 12, no. 4, p. 61, 2017.

[11] D. G. Dansereau, O. Pizarro, and S. B. Williams, ‘‘Linear volumetric focusfor light field cameras,’’ ACMTrans. Graph., vol. 34, no. 2, pp. 1–15, 2015.

[12] D. G. Dansereau, O. Pizarro, and S. B. Williams, ‘‘Decoding, calibrationand rectification for lenselet-based plenoptic cameras,’’ in Proc. IEEEConf. Comput. Vis. Pattern Recognit., Jun. 2013, pp. 1027–1034.

[13] G. Lippmann, ‘‘Épreuves réversibles donnant la sensation du relief,’’J. Phys. Theor. Appl., vol. 7, no. 1, pp. 821–825, 1908.

[14] E. H. Adelson and J. R. Bergen, ‘‘The plenoptic function and the ele-ments of early vision,’’ in Computational Models of Visual Processing,M. Landy and J. A. Movshon, Eds. Cambridge, MA, USA: MIT Press,1991, pp. 3–20.

[15] M. Levoy and P. Hanrahan, ‘‘Light field rendering,’’ in Proc. 23rd Annu.Conf. Comput. Graph. Interact. Techn., 1996, pp. 31–42.

[16] R. Ng, M. Levoy, M. Brédif, G. Duval, M. Horowitz, andP. Hanrahan, ‘‘Light field photography with a hand-held plenopticcamera,’’ Comput. Sci. Tech. Rep., vol. 2, no. 11, pp. 1–11, 2005.

[17] C. Perwass and L. Wietzke, ‘‘Single lens 3D-camera with extended depth-of-field,’’ Proc. SPIE, vol. 8291, Feb. 2012, Art. no. 829108.

[18] V. Vaish and A. Adams. The New Stanford Light Field Archive.Accessed: Sep. 27, 2019. [Online]. Available: http://lightfield.stanford.edu/lfs.html

[19] P. Schelkens, Z. Y. Alpaslan, T. Ebrahimi, K.-J. Oh, F. M. B. Pereira,A. M. G. Pinheiro, I. Tabus, and Z. Chen, ‘‘Jpeg pleno: A standard frame-work for representing and signaling plenoptic modalities,’’ Proc. SPIE,vol. 10752, Sep. 2018, Art. no. 107521P.

[20] Y. Li, M. Sjöström, R. Olsson, and U. Jennehag, ‘‘Coding of focusedplenoptic contents by displacement intra prediction,’’ IEEE Trans. CircuitsSyst. Video Technol., vol. 26, no. 7, pp. 1308–1319, Jul. 2016.

[21] C. Conti, P. Nunes, and L. D. Soares, ‘‘HEVC-based light fieldimage coding with bi-predicted self-similarity compensation,’’ in Proc.IEEE Int. Conf. Multimedia Expo Workshops (ICMEW), Jul. 2016,pp. 1–4.

[22] R. Monteiro, L. Lucas, C. Conti, P. Nunes, N. Rodrigues, S. Faria,C. Pagliari, E. da Silva, and L. Soares, ‘‘Light field HEVC-based imagecoding using locally linear embedding and self-similarity compensatedprediction,’’ in Proc. IEEE Int. Conf. Multimedia Expo Workshops(ICMEW), Jul. 2016, pp. 1–4.

[23] R. J. S. Monteiro, P. J. L. Nunes, N. M. M. Rodrigues, and S. M. M. Faria,‘‘Light field image coding using high-order intrablock prediction,’’ IEEEJ. Sel. Topics Signal Process., vol. 11, no. 7, pp. 1120–1131, Oct. 2017.

[24] R. Zhong, S. Wang, B. Cornelis, Y. Zheng, J. Yuan, and A. Munteanu,‘‘Efficient directional and L1-optimized intra-prediction for light fieldimage compression,’’ in Proc. IEEE Int. Conf. Image Process. (ICIP),Sep. 2017, pp. 1172–1176.

[25] D. Liu, L. Wang, L. Li, Z. Xiong, F. Wu, and W. Zeng, ‘‘Pseudo-sequence-based light field image compression,’’ in Proc. IEEE Int. Conf. MultimediaExpo Workshops (ICMEW), Jul. 2016, pp. 1–4.

[26] W. Ahmad, R. Olsson, and M. Sjöström, ‘‘Interpreting plenoptic imagesas multi-view sequences for improved compression,’’ in Proc. IEEE Int.Conf. Image Process. (ICIP), Sep. 2017, pp. 4557–4561.

[27] G. Tech, Y. Chen, K. Müller, J.-R. Ohm, A. Vetro, and Y.-K. Wang,‘‘Overview of the multiview and 3D extensions of high efficiency videocoding,’’ IEEE Trans. Circuits Syst. Video Technol., vol. 26, no. 1,pp. 35–49, Jan. 2016.

[28] F. Pereira, C. Pagliari, E. da Silva, I. Tabus, H. Amirpour, M. Bernardo, andA. Pinheiro, JPEG Pleno Light Field Coding Common Test Conditions,Standard ISO/IEC JTC 1/SC29/WG1N81022, 81st Meeting, Vancouver,BC, Canada, 2018.

[29] I. Viola, H. P. Maretic, P. Frossard, and T. Ebrahimi, ‘‘A graph learningapproach for light field image compression,’’ Proc. SPIE, vol. 10752,Sep. 2018, Art. no. 107520E.

[30] W. Ahmad, S. Vagharshakyan, M. Sjöström, A. Gotchev, R. Bregovic, andR. Olsson, ‘‘Shearlet transform based prediction scheme for light fieldcompression,’’ inProc. DataCompress. Conf. (DCC), Snowbird, UT,USA,Mar. 2018, p. 396.

[31] F. Hawary, C. Guillemot, D. Thoreau, and G. Boisson, ‘‘Scalable light fieldcompression scheme using sparse reconstruction and restoration,’’ in Proc.ICIP, Sep. 2017, pp. 3250–3254.

[32] N. Bakir, W. Hamidouche, O. Déforges, K. Samrouth, and M. Khalil,‘‘Light field image compression based on convolutional neural networksand linear approximation,’’ in Proc. 25th IEEE Int. Conf. Image Process.(ICIP), Oct. 2018, pp. 1128–1132.

[33] P. Paudyal, R. Olsson, M. Sjöström, F. Battisti, and M. Carli, ‘‘SMART:A light field image quality dataset,’’ in Proc. 7th Int. Conf. MultimediaSyst., 2016, p. 49.

[34] M. Rerabek, T. Bruylants, T. Ebrahimi, F. Pereira, and P. Schelkens,‘‘ICME 2016 grand challenge: Light-field image compression,’’ Call Pro-posals Eval. Procedure, to be published.

[35] Call for Proposals on Light Field Coding, JPEG Pleno, Standard ISO/IECJTC 1/SC29/WG1N74014, 74th Meeting, Geneva, Switzerland, Jan. 2017.

[36] Y. Li, R. Olsson, and M. Sjöström, ‘‘Compression of unfocused plenopticimages using a displacement intra prediction,’’ in Proc. IEEE Int. Conf.Multimedia Expo Workshops (ICMEW), Jul. 2016, pp. 1–4.

[37] Y.-H. Chao, G. Cheung, and A. Ortega, ‘‘Pre-demosaic light field imagecompression using graph lifting transform,’’ inProc. IEEE Int. Conf. ImageProcess. (ICIP), Sep. 2017, pp. 3240–3244.

[38] C. Conti, L. D. Soares, and P. Nunes, ‘‘Scalable light field coding withsupport for region of interest enhancement,’’ in Proc. 26th Eur. SignalProcess. Conf. (EUSIPCO), Sep. 2018, pp. 1855–1859.

[39] C. Perra and P. Assuncao, ‘‘High efficiency coding of light fieldimages based on tiling and pseudo-temporal data arrangement,’’ in Proc.IEEE Int. Conf. Multimedia Expo Workshops (ICMEW), Feb. 2016,pp. 1–4.

[40] R. Olsson, M. Sjöström, and Y. Xu, ‘‘A combined pre-processing andH.264-compression scheme for 3D integral images,’’ in Proc. Int. Conf.Image Process., Oct. 2006, pp. 513–516.

[41] C. Jia, Y. Yang, X. Zhang, X. Zhang, S. Wang, S. Wang, and S. Ma,‘‘Optimized inter-view prediction based light field image compressionwithadaptive reconstruction,’’ in Proc. IEEE Int. Conf. Image Process. (ICIP),Sep. 2017, pp. 4572–4576.

[42] S. Zhao and Z. Chen, ‘‘Light field image coding via linear approxima-tion prior,’’ in Proc. IEEE Int. Conf. Image Process. (ICIP), Sep. 2017,pp. 4562–4566.

[43] L. Li, Z. Li, B. Li, D. Liu, and H. Li, ‘‘Pseudo-sequence-based 2-Dhierarchical coding structure for light-field image compression,’’ IEEEJ. Sel. Topics Signal Process., vol. 11, no. 7, pp. 1107–1119, Oct. 2017.

[44] I. Tabus, P. Helin, and P. Astola, ‘‘Lossy compression of lenslet imagesfrom plenoptic cameras combining sparse predictive coding and jpeg2000,’’ in Proc. IEEE Int. Conf. Image Process. (ICIP), Sep. 2017,pp. 4567–4571.

[45] T.-H. Tran, Y. Baroud, Z. Wang, S. Simon, and D. Taubman, ‘‘Light-fieldimage compression based on variational disparity estimation and motion-compensated wavelet decomposition,’’ in Proc. IEEE Int. Conf. ImageProcess. (ICIP), Sep. 2017, pp. 3260–3264.

[46] J. Chen, J. Hou, and L.-P. Chau, ‘‘Light field compression with disparity-guided sparse coding based on structural key views,’’ IEEE Trans. ImageProcess., vol. 27, no. 1, pp. 314–324, Jan. 2018.

[47] W. Ahmad, M. Sjöström, and R. Olsson, ‘‘Compression scheme forsparsely sampled light field data based on pseudo multi-view sequences,’’Proc. SPIE, vol. 10679, May 2018, Art. no. 106790M.

[48] X. Jiang,M. Le Pendu, R. A. Farrugia, and C. Guillemot, ‘‘Light field com-pression with homography-based low-rank approximation,’’ IEEE J. Sel.Topics Signal Process., vol. 11, no. 7, pp. 1132–1145, Oct. 2017.

VOLUME 7, 2019 143013

Page 13: Computationally Efficient Light Field Image Compression ...miun.diva-portal.org/smash/get/diva2:1358279/FULLTEXT01.pdf · plenoptic camera targeting commercial applicationswas also

W. Ahmad et al.: Computationally Efficient LF Image Compression Using a Multiview HEVC Framework

[49] T. Senoh, K. Yamamoto, N. Tetsutani, and H. Yasuda, ‘‘Efficient light fieldimage coding with depth estimation and view synthesis,’’ inProc. 26th Eur.Signal Process. Conf. (EUSIPCO), Sep. 2018, pp. 1840–1844.

[50] M. B. de Carvalho, M. P. Pereira, G. Alves, E. A. B. da Silva, C. L. Pagliari,F. Pereira, and V. Testoni, ‘‘A 4D DCT-based lenslet light field codec,’’ inProc. 25th IEEE Int. Conf. Image Process. (ICIP), Oct. 2018, pp. 435–439.

[51] K. Komatsu, K. Takahashi, and T. Fujii, ‘‘Scalable light field coding usingweighted binary images,’’ in Proc. 25th IEEE Int. Conf. Image Process.(ICIP), Oct. 2018, pp. 903–907.

[52] H. Schwarz, D. Marpe, and T. Wiegand, ‘‘Analysis of hierarchical Bpictures and MCTF,’’ in Proc. ICME, 2006, pp. 1929–1932.

[53] S.Matsuoka, Y.Morigami, T. Song, and T. Shimamoto, ‘‘Coding efficiencyimprovement with adaptive GOP size selection for H.264/SVC,’’ in Proc.3rd Int. Conf. Innov. Comput. Inf. Control, Jun. 2008, p. 356.

[54] H.-W. Chen, C.-H. Yeh, M.-C. Chi, C.-T. Hsu, M.-J. Chen, andM.-J. Chen,‘‘Adaptive GOP structure determination in hierarchical B picture codingfor the extension of H.264/AVC,’’ in Proc. Int. Conf. Commun., CircuitsSyst., May 2008, pp. 697–701.

[55] B. Zatt, M. Porto, J. Scharcanski, and S. Bampi, ‘‘Gop structure adaptiveto the video content for efficient H.264/AVC encoding,’’ in Proc. IEEE Int.Conf. Image Process., Sep. 2010, pp. 3053–3056.

[56] N. Purnachand, L. N. Alves, and A. Navarro, ‘‘Fast motion estimationalgorithm for HEVC,’’ in Proc. IEEE 2nd Int. Conf. Consum. Electron.-Berlin (ICCE-Berlin), Sep. 2012, pp. 34–37.

[57] M. Rerabek and T. Ebrahimi, ‘‘New light field image dataset,’’ in Proc. 8thInt. Conf. Qual. Multimedia Exper. (QoMEX), 2016, pp. 1–2.

WAQAS AHMAD received the M.Sc. degree inelectronics engineering from Mohammad Ali Jin-nah University, in 2012. He is currently pursuingthe Ph.D. degree with the Department of Infor-mation Systems and Technology (IST), Mid Swe-den University. His research interest includes lightfield compression.

MUBEEN GHAFOOR received the Ph.D. degreein image processing from Mohammad Ali JinnahUniversity, Pakistan. He is currently an Assis-tant Professor with the Computer Science Depart-ment, COMSATS University Islamabad. He hasvast research and industrial experience in thefields of data sciences, image processing, machinevision systems, biometrics systems, signal anal-ysis, GPU-based hardware design, and softwaresystem designing.

SYED ALI TARIQ received the B.S. degreein computer and information sciences from thePakistan Institute of Engineering and Applied Sci-ences (PIEAS), Pakistan, and the M.S. degreein computer science from Abasyn University,Islamabad. He is currently pursuing the Ph.D.degree with COMSATS University Islamabad,where he is also with the Medical Imaging andDiagnostics Lab. His research interests includeimage processing, deep learning, biometric sys-

tems, and GPU-based parallel computing.

ALI HASSAN received the M.S. degree incomputer science from COMSATS UniversityIslamabad, where he is currently a Researcher withthe Medical Imaging and Diagnostics Lab. Hisresearch interests include image processing anddeep learning.

MÅRTEN SJÖSTRÖM received the M.Sc. degreein electrical engineering and applied physics fromLinköping University, Sweden, in 1992, the Licen-tiate of Technology degree in signal processingfrom the KTH Royal Institute of Technology,Stockholm, Sweden, in 1998, and the Ph.D. degreein modeling of nonlinear systems from the ÉcolePolytechnique Fédérale de Lausanne (EPFL), Lau-sanne, Switzerland, in 2001. He was an ElectricalEngineer with ABB, Sweden, from 1993 to 1994,

and a Fellow with CERN, from 1994 to 1996. In 2001, he joined MidSweden University, and he was appointed as an Associate Professor and aFull Professor of signal processing, in 2008 and 2013, respectively. He hasbeen the Head of the computer and system science atMid SwedenUniversity,since 2013. He founded the Realistic 3D Research Group, in 2007. Hiscurrent research interests include multidimensional signal processing andimaging, and system modeling and identification.

ROGER OLSSON received the M.Sc. degree inelectrical engineering and the Ph.D. degree intelecommunication from Mid Sweden University,Sweden, in 1998 and 2010, respectively. He waswith the video compression and distribution indus-try, from 1997 to 2000. He was a Junior Lecturerwith Mid Sweden University, from 2000 to 2004,where he taught courses in telecommunication,signals and systems, and signal and image pro-cessing. Since 2010, he has been a Researcher

with Mid Sweden University. His research interests include plenoptic imagecapture, processing, and compression, plenoptic systemmodeling, and depthmap capture and processing.

143014 VOLUME 7, 2019