inpainting-based image coding: a patch-driven approach · image compression efficiency for the...

Inpainting-based Image Coding: A Patch-driven Approach

Nuno Miguel Ventura Couto

Dissertation submitted for obtaining the degree of Master in Electrical and Computer Engineering

Jury

President: Prof. José Bioucas Dias

Supervisor: Prof. Fernando Pereira

Co-Supervisor: Dr. Matteo Naccari

Members: Prof. Jorge Salvador Marques

October 2010

i

Acknowledgments

First, I would like to express my deepest gratitude to my family, in particular to the four women of my life, my girlfriend Telma, my mother Leonor, my sister Inês and my grand-mother Quinita, who have unconditionally cherished, supported and even inspired me to keep learning and evolving towards becoming a better person, student and professional.

I would also like to thank Professor Fernando Pereira, my advisor, for assigning me this Thesis and giving me the unique opportunity of working under his supervision. The profound devotion, the sense of responsibility and dedication, the seeking for perfection, the warm and good humored brainstorming and advice and also the infinite support that Professor Fernando Pereira has provided me with makes this Thesis, by far, the most remarkable academic experience I have ever had.

I am also grateful to the Image Group, especially to my co-advisor Dr. Matteo Naccari, for providing me with insight, advice and technical support when I needed the most. I would also like to express my gratitude to Catarina Brites and João Ascenso for their motivational and knowledge-sharing talks and companion over the late working nights and weekends. Big thanks to Instituto de Telecomunicações for providing me with excellent working conditions and a scholarship for research and development.

A special reference goes to Professors Leonel Sousa and António Rodrigues to whom I had the pleasure, as a students’ representative for several years, of reporting and discussing the students’ concerns, aiming at fostering the relations between them, the professors and the faculty.

It has been a pleasure to get to know, work and become friends with Ricardo Cabral, Rui Trindade, André Esteves, Gonçalo Carmo, André Chibeles, Rafael Antunes, Maurício Ramalho, Filipa Henriques, André Neves, Filipe Wiener, Tiago Veiga, Tiago Correia and Alexandre Gomes, who have all directly supported me along my journey at IST.

A reference must be made to Deloitte, notably to Miguel Eiras Antunes, who provided me some extra days to finish and polish this Thesis, which has been a very kind and valuable help.

At last, but not least, a word to my close friends Diogo Alvim, David Simão, André Domingos, Inês Pires and Ricardo Costa, who have always been by my side and believed in me under any circumstance.

Thank you all.

iii

Abstract

The remarkable demand for universal distribution and consumption of image and video content over various networks has pressured the digital consumer electronics industry to launch an almost infinite variety of electronic devices capable of acquiring, processing, editing and storing these very attractive types of content. The combination of this trend with the increasing demand for higher qualities and larger screen resolutions has posed another challenging problem to the image coding research community: to significantly increase the current image compression efficiency for the various, relevant target qualities. In this context, better exploiting the Human Visual System behaviors and characteristics is largely recognized as an appealing way to target further improvements in terms of image compression efficiency. With this purpose in mind, inpainting-based image coding solutions have recently emerged as a novel coding paradigm to further exploit the image visual redundancy, thus allowing increasing the compression efficiency while assuring the required perceived image quality.

Motivated by these context and circumstances, the objective of this Thesis is to design, implement and evaluate an advanced inpainting-based image coding solution that should fulfill the following main requirements: 1) low encoding complexity; 2) maximum perceived image quality for a given rate; 3) automatic and adaptive coding; and 4) objective perceptually-driven image quality evaluation.

The conducted performance evaluation shows that the developed inpainting-based image coding solution outperforms the JPEG coding standard for the PSNR metric and achieving similar MS-SSIM indexes for all the selected test images, while significantly reducing the bitrate ‘consumption’, which is very encouraging.

The technical novelties brought in this work mainly regard the criterion for the classification of structural and textural areas, the statistical-based classification of the areas to be inpainted and the inpainting adjustment based on the selected image feature.

Keywords: Image inpainting; Image coding; Patch-based data modeling; Perceptual performance evaluation; Low encoding complexity.

v

Resumo

A crescente procura pela distribuição e consumo de imagens e vídeos sobre várias redes pressionou a indústria da electrónica de consumo a lançar uma variedade quase infinita de dispositivos electrónicos capazes de adquirir, processar, editar e armazenar este tipo de conteúdos. A combinação desta tendência com a crescente procura por qualidades superiores e maiores resoluções de ecrã colocou um problema desafiante à comunidade de investigação em codificação de imagem: aumentar significativamente a eficiência de compressão de imagem para as várias qualidades pretendidas. Neste contexto, uma melhor exploração dos comportamentos e características do Sistema Visual Humano, é considerado um caminho atractivo para atingir mais aumentos em termos de eficiência de compressão de imagem. Com este propósito, soluções de codificação de imagem baseadas em inpainting têm emergido como um importante paradigma de codificação para explorar a redundância visual das imagens permitindo, por isso, aumentar a eficiência de compressão e, simultaneamente, assegurar uma boa qualidade perceptiva da imagem.

Tendo como motivação este contexto e circunstâncias, o objectivo desta Tese consiste em projectar, implementar e avaliar o desempenho de uma solução avançada de codificação de imagem baseada em inpainting que deve ter as seguintes características: 1) baixa complexidade de codificação; 2) máxima qualidade perceptiva de imagem para um dado débito; 3) codificação automática e adaptativa; e 4) avaliação objectiva da qualidade perceptiva das imagens.

A avaliação de desempenho realizada mostra que a solução de codificação de imagem baseada em inpainting supera o standard de codificação de imagem JPEG para a métrica PSNR e atinge resultados semelhantes a métrica MS-SSIM, o que é muito encorajador atendendo ao facto que tal é conseguido consumindo muito menos débito.

As novidades técnicas deste trabalho prendem-se com o critério usado para a classificação das áreas de estrutura e textura, com a classificação baseada em ferramentas estatísticas das áreas a ser inpainted, e com o ajuste de inpainting baseado na feature escolhida.

Palavras Chave: Image inpainting; Codificação de imagem; Modelação baseada em patches; Avaliação perceptiva de desempenho; baixa complexidade de codificação.

vii

Table of Contents 1. Introduction ................................................................................................................ 1

1.2. Objectives .......................................................................................................................................... 3

1.3. Technical Novelty ............................................................................................................................. 4

1.4. Report Structure ............................................................................................................................... 4

2. Inpainting-related Tools and Codecs: a Review ..................................................... 7

2.1. Digital Inpainting Paradigm: Two Perspectives .............................................................................. 7

2.2. Clustering Inpainting Tools ............................................................................................................. 9

2.2.1. Image Inpainting Tools ............................................................................................................. 9

2.2.1.1. Geometric Modeling ........................................................................................................... 10

2.2.1.2. Patch-based Modeling ........................................................................................................ 10

2.2.2. Video Inpainting Tools ............................................................................................................ 10

2.2.2.1. Geometric Modeling ........................................................................................................... 11

2.2.2.2. Patch-based Modeling ........................................................................................................ 11

2.3. A Relevant Pure Inpainting Solution ............................................................................................. 11

2.3.1. Objective and Technical Approach ......................................................................................... 11

2.3.2. Architecture and Walkthrough ................................................................................................ 12

2.3.3. Main Tools .............................................................................................................................. 13

2.3.4. Performance Evaluation ......................................................................................................... 15

2.4. A Relevant Inpainting-based Image Coding Solution....................................................................17

2.4.1. Objective and Technical Approach ......................................................................................... 18

2.4.2. Architecture and Walkthrough ................................................................................................ 18

2.4.3. Main Tools .............................................................................................................................. 20

2.4.4. Performance Assessment ........................................................................................................ 24

3. High-level Inpainting-based Image Coding Architecture .................................... 27

3.1. High-level Encoder Architecture and Walkthrough ..................................................................... 27

3.2. High-level Decoder Architecture and Walkthrough ...................................................................... 29

4. Describing the Encoder Tools ................................................................................. 33

4.1. Analysis for Classification and Coding .......................................................................................... 35

4.1.1. Edge Extraction ...................................................................................................................... 35

4.1.1.1. RGB-to-YUV Color Space Conversion ............................................................................... 35

4.1.1.2. Edge Extraction .................................................................................................................. 36

4.1.2. Preliminary Block Classification ............................................................................................ 37

4.1.3. Textural Blocks Further Classification ................................................................................... 40

viii

4.1.3.1. Not-to-be-Inpainted Core Textural Blocks Classification .................................................. 41

4.1.3.2. Not-to-be-Inpainted Additional Textural Blocks Classification ......................................... 43

4.1.4. Coding Mode Decision ........................................................................................................... 48

4.2. Feature Extraction.......................................................................................................................... 50

4.2.1. Feature Selection and Extraction ........................................................................................... 50

4.2.2. YUV-to-RGB Color Space Conversion ................................................................................... 51

4.3. Standard-based Image Encoder ..................................................................................................... 52

4.4. Coding Mode Encoder .................................................................................................................... 52

5. Describing the Decoder Tools ................................................................................. 55

5.1. Standard-based Image Decoder ..................................................................................................... 57

5.2. Coding Mode Decoder .................................................................................................................... 57

5.3. Image Inpainting Module ............................................................................................................... 57

5.3.1. Initial Processing .................................................................................................................... 58

5.3.1.1. RGB-to-YUV Color Space Conversion ............................................................................... 58

5.3.1.2. Image Areas to be Inpainted Creation ................................................................................ 58

5.3.2. Neighboring Luminance Inferring .......................................................................................... 59

5.3.2.1. Border Pixels Identification ................................................................................................ 59

5.3.2.2. Image Pixels Confidence Initialization ............................................................................... 61

5.3.2.3. Patch Confidence Computation .......................................................................................... 62

5.3.2.4. Source Patch Determination .............................................................................................. 66

5.3.2.5. Target Patch Filling-in ....................................................................................................... 71

5.3.3. Filled Pixels Confidence Updating ......................................................................................... 71

5.3.4. Block Pixels Luminance Adjustment ....................................................................................... 72

6. Performance Evaluation .......................................................................................... 75

6.1. Test Conditions ............................................................................................................................... 75

6.2. Rate-Distortion Performance Evaluation ...................................................................................... 78

6.2.1. Studying the Not-to-be-Inpainted Additional Textural Blocks Seeding Performance Impact . 78

6.2.2. Studying the Not-to-be-Inpainted Core Textural Blocks Neighborhood Performance Impact 82

6.2.3. Studying the Patch Side Performance Impact ......................................................................... 84

6.2.4. Studying the Block Luminance Averages Adjustment Performance Impact ........................... 87

6.2.5. Best Coding Solution…………………………………………………………………………91

6.3. Conclusions ..................................................................................................................................... 94

7. Conclusions and Future Work ................................................................................ 95

7.1. Summary and Conclusions ............................................................................................................. 95

7.2. Future Work ................................................................................................................................... 96

References ........................................................................................................................... 99

ix

Index of Figures

Figure 1 – Example of image consumption electronic devices and image sharing applications. ............................ 2

Figure 2 – Example of pure inpainting solutions results: (a) original image and (b) circular window grid removal; (c) original damaged image and (d) restored image [2] [3]. .................................................................................... 2

Figure 3 – Illustration of the inpainting problem from the pure inpainting perspective and from the coding

perspective at the decoder side. ............................................................................................................................... 8

Figure 4 – Proposed clustering for digital inpainting tools...................................................................................... 9

Figure 5 – High-level system architecture. ............................................................................................................ 12

Figure 6 – Evolution of the inpainted image with an increasing number of inpainting iterations [1]. .................. 13

Figure 7 – Propagation direction as the normal to the shrunk version of the hole boundary [1]. .......................... 14

Figure 8 – Digital restoration of a vandalized color image: (a) input image; (b) restored image [1] . ................ 16

Figure 9 - Text removal through image inpainting: (a) input image (superposition of the original image and the

text in red); (b) restored image [1] . ...................................................................................................................... 16

Figure 10 – Restoration of an old gray-level photograph: (a) original deteriorated image; (b) user-defined mask (in read); (c) restored image [1]. ............................................................................................................................ 17

Figure 11 – System architecture [4] . .................................................................................................................... 18

Figure 12 – Example walkthrough for the Lena test image: (a) original image; (b) extracted edge map (blue curves); (c) removed blocks (in black) and edge-map sub-set (blue curves); (d) recovered structural blocks after structure propagation; (e) output image after texture synthesis and recombination; (f) Decoded image by the benchmark (baseline JPEG in this case) [4]. ......................................................................................................... 20

Figure 13 – Illustration of the selection of necessary textural exemplar blocks (dark gray lines denote the thinned edges, light gray blocks denote structural blocks, white and black denote necessary and non-necessary textural blocks, respectively) [4]. ....................................................................................................................................... 22

Figure 14 – Illustration of the selection of necessary structural exemplar blocks (light gray lines are thinned edges, white and black blocks denote necessary and non-necessary structural blocks, respectively [4] . ............. 23

Figure 15 – Proposed pixel-wise structure propagation method: (a) edge and its influencing region (arrowed

dash and dash-dot lines are the propagation directions); (b) restoration of influencing region [4]. .................... 23

Figure 16 – Visual quality comparisons between the restored images obtained using the proposed JPEG-based solution (top row) and the baseline JPEG standard itself (bottom row): (a) kodim02; (b) kodim03; (c) kodim05 [4] . ........................................................................................................................................................................ 26

Figure 17 – Visual quality comparisons between the restored images obtained using the proposed H.264/AVC Intra based coding solution (top row) and using the H.264/AVC Intra coding itself (bottom row): (a) Jet; (b) Lena; (c) Milk; (d) Peppers [4]. ............................................................................................................................. 26

Figure 18 – Adopted high-level inpainting-based image encoder architecture. .................................................... 28

Figure 19 – Illustration of the Analysis for Classification and Coding module input and outputs: (a) input image; (b) areas to be and not to be inpainted (black and non-black areas, respectively); (c) feature for the areas to be inpainted, e.g. edge information (in dark purple) [4]. ............................................................................................ 28

Figure 20 – Adopted high-level inpainting-based image decoder architecture. .................................................... 30

x

Figure 21 – Illustration of the Image Inpainting and Blending modules: (a) areas to be and not to be inpainted (black and non-black areas, respectively) and feature for the areas to be inpainted, e.g. edge information (in dark purple); (b) output image (black areas have been inpainted); (c) input image for comparison [4]........................ 31

Figure 22 – Adopted encoder architecture............................................................................................................. 34

Figure 23 – Illustration of the RGB-to-YUV Color Space Conversion sub-module: (a) and (c) input images in the RGB color space; (b) and (d) Y components......................................................................................................... 35

Figure 24 – Illustration of the Edge Extraction stage with the Canny detector: (a) and (b) input luminance component; (b) and (e) luminance component edge-map; (c) and (f) superposition between the luminance component and the corresponding edge-map (edges in white). ............................................................................. 37

Figure 25 – Illustration of the Preliminary Block Classification sub-module: (a) and (d) input luminance component at 8×8 block level; (b) and (e) luminance component edge map; (c) and (f) preliminary block classification (textural and structural blocks in blue and non-blue, respectively, with image edges in red). ........ 38

Figure 26 – Preliminary Block Classification sub-module flowchart.................................................................... 39

Figure 27 – Illustration of the Not-to-be-Inpainted Core Textural Blocks Selection stage results considering 8-neighbors: (a) and (c) textural and structural blocks (blue and non-blue areas, respectively); (b) and (d) not-to-be-inpainted core textural blocks and potential to-be-inpainted textural blocks (pink and green areas, respectively). ............................................................................................................................................................................... 42

Figure 28 – Not-to-be-Inpainted Core Textural Blocks Classification stage flowchart. ....................................... 42

Figure 29 – Illustration of the Not-to-be-Inpainted Additional Textural Blocks Classification stage: (a) and (b) not-to-be-inpainted core textural and potential to-be-inpainted textural blocks (in shock pink and shock green, respectively); (b) and (d) not-to-be inpainted additional textural and to be-inpainted textural blocks (in yellow and orange, respectively). ...................................................................................................................................... 44

Figure 30 – Not-to-be-Inpainted Additional Textural Blocks Classification stage flowchart. .............................. 45

Figure 31 – Coding Mode Decision sub-module flowchart................................................................................... 49

Figure 32 – Illustration of the Coding Mode Decision sub-module: (a) and (c) structural, not-to-te-inpainted core textural, not-to-be-inpainted additional textural, to-be-inpainted textural blocks (in gray level, shock pink, yellow and orange, respectively); (b) and (d) coding mode matrix labeling the blocks to be and not to be inpainted (in black and white, respectively). .............................................................................................................................. 49

Figure 33 – Feature Selection and Extraction sub-module flowchart. ................................................................... 50

Figure 34 – Illustration of the YUV-to-RGB color space conversion: (a) and (c) Y component with the block luminance averages in the areas to be inpainted; (b) and (d) corresponding image in the RGB color space to be given to the Standard-based Image Encoder. ........................................................................................................ 51

Figure 35 – Adopted decoder architecture............................................................................................................. 56

Figure 36 – Illustration of the Image Areas to be Inpainted Creation stage for 512×512 luminance samples: (a) and (c) decoded luminance for the areas not to be inpainted combined with the selected image feature for the areas to be inpainted; (b) and (d) image to be inpainted. ....................................................................................... 59

Figure 37 – Illustration of the Border Pixels Identification stage: (a) and (c) image to be inpainted (black and non-black areas are to be and not to be inpainted, respectively); (b) and (d) border (white) and non-border (black) pixels considering the areas to be inpainted in (a) and (c), respectively. .............................................................. 60

Figure 38 – Border Pixels Identification stage flowchart. ..................................................................................... 60

Figure 39 – Confidence Values Initialization stage flowchart. .............................................................................. 61

Figure 40 – Patch Confidence Computation stage flowchart. ............................................................................... 63

Figure 41 – Illustration of the patch area for patch sides equal to 3, 5 and 7 image pixels. .................................. 64

xi

Figure 42 – Zoomed illustration of patch confidence computation for the first inpainting iteration: (a) areas to be inpainted (‘unknown’) and standard decoded (‘known’) areas (black and non-black areas); (b) pixel confidence values within the current (size 3) patch (yellow-lined). ........................................................................................ 65

Figure 43 – Zoomed example patches: the green-lined patch is given higher inpainting priority than the red-lined patch; black areas represent the areas to be inpainted and non-black areas represent currently known luminance values. .................................................................................................................................................................... 65

Figure 44 – Illustration of a search window, a target patch, a potential candidate patch, two candidate patches and several unknown luminance values. ............................................................................................................... 68

Figure 45 – Source Patch Determination stage flowchart. .................................................................................... 69

Figure 46 – Example of SSD updating with a target patch and two candidate patches in the search window used in this inpainting iteration. ..................................................................................................................................... 70

Figure 47 – Example illustrating the Target Patch Filling-in process. .................................................................. 71

Figure 48 – Example of the evolution of the neighboring luminance values inferring process along the inpainting iterations for Lena and Peppers. ............................................................................................................................ 72

Figure 49 – Block Pixels Luminance Adjustment sub-module flowchart. ............................................................ 73

Figure 50 – Test images: (a) Lena; (b) Peppers; (c) Jet. ........................................................................................ 76

Figure 51 – PSNR RD performance for studying the not-to-be-inpainted additional textural blocks seeding for the image Lena. ..................................................................................................................................................... 79

Figure 52 – PSNR RD performance for studying the not-to-be-inpainted additional textural blocks seeding for the image Peppers. ................................................................................................................................................. 80

Figure 53 – PSNR RD performance for studying the not-to-be-inpainted additional textural blocks seeding for the image Jet. ......................................................................................................................................................... 80

Figure 54 – MS-SSIM RD performance for studying the not-to-be-inpainted additional textural blocks seeding for the image Lena. ................................................................................................................................................ 81

Figure 55 – MS-SSIM RD performance for studying the not-to-be-inpainted additional textural blocks seeding for the image Peppers. ........................................................................................................................................... 81

Figure 56 – MS-SSIM RD performance for studying the not-to-be-inpainted additional textural blocks seeding for the image Jet. ................................................................................................................................................... 81

Figure 57 – PSNR RD performance for studying the not-to-be-inpainted core textural blocks neighborhood impact with the image Lena. ................................................................................................................................. 82

Figure 58 – PSNR RD performance for studying the not-to-be-inpainted core textural blocks neighborhood impact with the image Peppers. ............................................................................................................................. 83

Figure 59 – PSNR RD performance for studying the not-to-be-inpainted core textural blocks neighborhood impact with the image Jet. ..................................................................................................................................... 83

Figure 60 – MS-SSIM RD performance for studying the not-to-be-inpainted core textural blocks neighborhood impact with the image Lena. ................................................................................................................................. 83

Figure 61 – MS-SSIM RD performance for studying the not-to-be-inpainted core textural blocks neighborhood impact with the image Peppers. ............................................................................................................................. 84

Figure 62 – MS-SSIM RD performance for studying the not-to-be-inpainted core textural blocks neighborhood impact with the image Jet. ..................................................................................................................................... 84

Figure 63 – PSNR RD performance for studying the patch side impact with the image Lena. ............................. 85

Figure 64 – PSNR RD performance for studying the patch side impact with the image Peppers. ........................ 85

xii

Figure 65 – PSNR RD performance for studying the patch side impact with the image Jet. ................................ 85

Figure 66 – MS-SSIM RD performance for studying the patch side impact with the image Lena. ...................... 86

Figure 67 – MS-SSIM RD performance for studying the patch side impact with the image Peppers. .................. 86

Figure 68 – MS-SSIM RD performance for studying the patch side impact with the image Jet. .......................... 87

Figure 69 – PSNR RD performance for studying the block luminance averages adjustment impact with the image Lena. ...................................................................................................................................................................... 87

Figure 70 – PSNR RD performance for studying the block luminance averages adjustment impact with the image Peppers. ................................................................................................................................................................. 88

Figure 71 – PSNR RD performance for studying the block luminance averages adjustment impact with the image Jet. ......................................................................................................................................................................... 88

Figure 72 – MS-SSIM RD performance for studying the block luminance averages adjustment impact with the image Lena. ........................................................................................................................................................... 89

Figure 73 – MS-SSIM RD performance for studying the block luminance averages adjustment impact with the image Peppers........................................................................................................................................................ 89

Figure 74 – MS-SSIM RD performance for studying the block luminance averages adjustment impact with the image Jet. ............................................................................................................................................................... 90

Figure 75 – PSNR RD performance comparison with the selected benchmarks with the image Lena. ................ 91

Figure 76 – PSNR RD performance comparison with the selected benchmarks with the image Peppers............. 91

Figure 77 – PSNR RD performance comparison with the selected benchmarks with the image Jet. .................... 92

Figure 78 – MS-SSIM RD performance comparison with the selected benchmarks with the image Lena. .......... 92

Figure 79 – MS-SSIM RD performance comparison with the selected benchmarks with the image Peppers. ..... 93

Figure 80 – MS-SSIM RD performance comparison with the selected benchmarks with the image Jet. ............. 93

xiii

Index of Tables

Table 1 – Bit savings of the proposed solution having the JPEG and H.264/AVC Intra as the benchmarks [1] ..25

Table 2 – Search window side values versus patch side values............................................................................. 67

Table 3 – Number of inpainted and not inpainted blocks for the IST-Inpainting/Chess (4 Neighbors) and IST-Inpainting/LA (4 Neighbors) solutions...................................................................................................................79

xv

List of Acronyms

CIE Comission Internationale L’Éclairage

DC Direct Current

DCT Discrete Cosine Transform

H.264/AVC Advanced Video Coding

HVS Human Visual System

I/O Input-Output

IJG Independent JPEG Group

IST Instituto Superior Técnico

JBIG Joint Bi-level Image Experts

JPEG Joint Photographic Experts Group

MPEG Motion Pictures Experts Group

MS-SSIM Multi-Scale Structural Similarity

MSE Mean Squared Error

PDE Partial Differential Equation

PDF Probability Density Function

PSNR Peak Signal-to-Noise-Ratio

RD Rate Distortion

RGB Red, Green and Blue

RLE Run-Length Encoder

SSD Sum of Squared Differences

USC -SIPI University of South California Signal & Image Processing Institute

YUV Luminance and Chrominances Color Space

1

Chapter 1

1. Introduction

This first chapter is focused on providing the context and motivation for this Thesis, as well as presenting the main objectives and the technical novelty of the developed inpainting-based image coding solution.

1.1. Context and Motivation

With the impressive demand for universal distribution and consumption of image and video content over various networks, the digital consumer electronics industry has been ‘forced’ to put on the market an almost infinite variety of electronic devices capable of acquiring, processing, editing and storing this very attractive types of content (see Figure 1). The combination of this trend with the increasing demand for higher qualities and larger screen resolutions has posed another challenging problem, hopefully attainable, to the image coding research community: to significantly increase the image and compression efficiency for the various target qualities.

After significant compression efficiency improvements in the past twenty years, which have essentially been achieved based on a mix of spatial transforms, quantization and entropy coding, image coding seems to have arrived to a saturation point, at least if the currently adopted coding approaches are not significantly changed. In this context, it is largely discussed in the image coding research community that the better exploitation of the characteristics of the Human Visual System (HVS) seems to be a relevant and interesting way to go forward, to target further improvements in terms of compression efficiency. An interesting approach in this direction is the so-called digital inpainting paradigm.

2

Figure 1 – Example of image consumption electronic devices and image sharing applications.

Digital inpainting is currently a very hot image and video processing topic, introduced in the year 2000 by Bertalmio et al. [1], with the goal of ‘translating’ the techniques used by expert art restorers to fill-in/retouch artwork, e.g. paintings, to the digital image processing world. The digitalization of analog inpainting techniques had the purpose of providing new tools to ‘boost’ the resolution of some image and video processing classical problems, such as image completion/restoration and object removal. In fact, the incorporation of these tools in traditional image and video processing solutions ‘brought to life’ the so-called pure inpainting solutions, which mainly target restoring damaged or removed areas in images in an unnoticeable way for a common viewer (see Figure 2). The basic idea behind this type of solutions is to fill-in with information inferred from their surroundings image areas which have to be restored or simply covered after object removal. An example of image restoration by means of inpainting tools is shown in Figure 2 (a) and (b), where the circular window grid has been selected to be removed and has been after inpainted; another example is included in Figure 2 (c) and (d) where a scratched photo has been restored by image inpainting tools. Note that, in both cases, image digital inpainting tools allow filling-in the target areas in a very effective way.

Figure 2 – Example of pure inpainting solutions results: (a) original image and (b) circular window grid

removal; (c) original damaged image and (d) restored image [2] [3].

3

In the past decade, many pure inpainting solutions have emerged in the literature but only in more recent years digital inpainting tools have been exploited for coding purposes, notably because only at this time they have become mature and ready to be effectively incorporated in image coding solutions. These solutions are the so-called inpainting-based coding solutions and regard exploiting digital inpainting tools to take an important step further in terms of compression efficiency, avoiding wasting bits to encode information which may very likely be properly restored to the target quality using digital inpainting tools. Therefore, the inpainting-based coding solutions may be seen, in some degree, as another synergy between the signal processing and coding worlds, with very high potential for evolution. The idea behind these solutions is to first analyze the original image to determine the best coding mode for each image area: those areas which are believed to be ‘difficult’ to properly restored by digital inpainting tools at the decoder side will be selected as not to be inpainted and, thus, coded with an off-the-shelf image coding solution, e.g. JPEG (Joint Photographic Experts Group); the remaining areas will be coded with the novel inpainting approach, this means basically restored at the decoder side without any or only very small help from the encoder side by exploiting the available surrounding information.

Very recently, some inpainting-based image coding solutions [4] have succeeded in achieving significant compression ratios in comparison to relevant image coding standard benchmarks, e.g. the JPEG standard, by exploiting the visual redundancy inherent to natural images, without compromising the perceived visual quality of the entirely decoded/reconstructed image. The fact that this is nowadays a hot research topic, to which increasingly relevancy has been given by research groups worldwide, and that it has a very wide range of applications, have encouraged the author to dedicate this Thesis to the development of an advanced inpainting-based image coding solution.

1.2. Objectives

In the context above described, the main objective of this Thesis is the design, implementation and evaluation of an advanced inpainting-based image coding solution which is able to efficiently exploit the visual redundancy inherent to natural images using HVS features and behaviors. Having this target in mind, the developed solution should meet the following requirements:

• Low Encoding Complexity – A fundamental requirement is to reach low encoding complexity; in fact, this should be considered as the main constraint when various valid options emerge for designing the codec architecture and exploiting the visual redundancy at the encoder. This is justified by the fact that the developed solution might be applicable, for instance, to smart phones, which impose severe restrictions in terms of processing capacity and battery when compared to personal computers, for instance. In this context, it will be especially relevant to assess if the exploitation of one or more image features (depending on the allowable complexity) extracted at the encoder side for the areas to be inpainted may have a positive impact on the obtained compression performance.

• Maximum Perceived Image Quality for a Given Rate – Naturally, the developed image codec should maximize the perceived quality for a certain given rate in order it may be competitive regarding alternative, standard image coding solutions.

• Automatic and Adaptive Coding – The developed codec should be as much automatic as possible, avoiding to ask for user input to be able to adapt to the image peculiarities, even if at the cost of some reduction in terms of compression efficiency. Making coding solutions automatic is believed to be increasingly more important to the end user since he/she uses and ‘sees’ the technology and its applications from an Input/Output (I/O) perspective and, thus, is not interested in having to interfere with the required processing.

• Objective Perceptually-Driven Image Quality Evaluation – In the literature, the evaluation of the image quality is typically based on objective metrics, many of them are HVS agnostic, such as the very popular PSNR; hence, to achieve more solid conclusions in terms of subjective quality, objective perceptually driven image quality metrics should also be considered for the codec performance assessment.

4

To summarize, the solution to be developed should meet the above described requirements so that it may be considered as a low complexity, efficient, automatic and adaptive inpainting-based image coding solution. By taking also into consideration the HVS characteristics and behaviors, it is believed that this solution may bring value added to the inpainting technology literature.

1.3. Technical Novelty

The developed advanced inpainting-based image coding solution is able to analyze the input image to ultimately classify each of its areas, either as to be or not to be inpainted, depending on its characteristics and on the inpainting tools available at the decoder side. The not-to-be-inpainted blocks (square image areas) will be coded with an off-the-shelf image coding solution, whereas the blocks to be inpainted will be coded with the novel inpainting-based coding solution.

The main technical novelties regarding the available literature on inpainting-based image coding to be proposed in this Thesis regard:

• Structural versus Textural Classification Criterion – In a preliminary stage, the image blocks are classified either as structural or textural based on the percentage of connected edge pixels in each block. The novelty in this Thesis is not related to the type of classification, which has already been proposed in [4] [5], but rather to the criterion used for this classification. In fact, the usage as classification criterion of the percentage of connected edge pixels is novel and motivated by the fact that the HVS tends to associate image structure to the connectivity and dependencies the image exhibits.

• Areas to be Inpainted Statistical Classification – The criterion used to classify the areas to be inpainted should be adaptive to the input image and user-independent. In the literature, adaptive solutions are typically manually fine-tuned. In this solution, the developed criterion allows classifying the areas to be inpainted based on statistical tools, notably first on the estimation of the probability density function (PDF) of the block variation metric (this metric has been proposed in [4]) and, second, on a classification threshold defined as a function of the PDF’s centroid, i.e. mass center, and the associated standard deviation. Hence, this novelty will not only allow eliminating the need for any manual fine-tuning, but it will also make the classification of the areas to be inpainted adaptive to the input image characteristics.

• Inpainting Adjustment Based on Selected Feature – The proposed inpainting-based coding solution adopts the block luminance averages as an image feature to be extracted at the encoder side and used at the decoder side to further improve the perceived visual quality of the inpainted image areas. This image feature is exploited after the areas to be inpainted have been inferred from their surroundings. This approach is novel since there is no similar solution reported in the literature.

The technical novelties described above are believed to be interesting for the inpainting research community, since they essentially take the human visual system into account and ultimately allow automatizing this solution, making it user-independent.

1.4. Report Structure

This Thesis is organized in seven chapters, including the current one which introduces the Thesis, and the seventh which presents the main conclusions and future work.

First, Chapter 2 provides the reader with the inpainting problem definition, notably by distinguishing two perspectives: the pure inpainting perspective and the inpainting-based coding perspective. Next, a clustering of the digital inpainting tools is presented so as to allow structuring the tools at hand and better understanding their relationships. Finally, the pure inpainting and the inpainting-based image coding solutions in the literature which have been considered more representative and relevant are reviewed.

Chapter 3 gives a first perspective on the proposed advanced inpainting-based image coding solution, notably by presenting both the encoder and decoder’s high-level architecture and corresponding functional descriptions.

5

Chapter 4 and Chapter 5 present an in-depth description of the encoder and decoder tools, respectively, starting with the corresponding adopted architecture and describing next in detail the algorithms for the various modules, sub-modules and processing stages.

As for Chapter 6, it reports the performance evaluation for the proposed solution, starting with the test images, test conditions, performance metrics and image coding benchmarks and presenting after the Rate-Distortion (RD) results for the relevant objective quality metrics and codecs under comparison.

Finally, Chapter 7 presents the conclusions and identifies relevant open questions for future work.

7

Chapter 2

2. Inpainting-related Tools and Codecs: a Review

The main objective of this chapter is to review the most relevant inpainting-related concepts and tools, and also some of the most interesting inpainting-based image in the literature. In this context, the inpainting problem will be stated first and two different perspectives of the digital inpainting paradigm will be introduced and, second, the digital inpainting tools will be clustered. After, a detailed review of some very relevant pure inpainting and inpainting-based image coding solutions will be provided.

2.1. Digital Inpainting Paradigm: Two Perspectives

Inpainting is the name given by art workers to the process of manually restoring paintings in an undetectable fashion for the observers. Recent breakthroughs in digital technology have made it possible to address inpainting-related problems in a rather efficient, less time consuming fashion. In this context, digital inpainting

paradigm may be seen from two different perspectives:

• Pure Inpainting – Signal processing solutions which the main goal is restoring damaged or removed areas using information from undamaged areas, e.g. restoring cracks in photographs, occlusion recovery, in the most transparent way, i.e. an observer would not be able to notice that an inpainting procedure had taken place; therefore, the user would not be able to distinguish the original undamaged image from its restored image.

• Inpainting-based Coding – Signal processing solutions which main goal is exploiting digital inpainting tools to significantly increase the compression efficiency in comparison with standard-based coding solutions and standard coding solutions themselves, e.g. JPEG and JPEG 2000 for images, and H.264/AVC (Advanced Video Coding) for video.

From the pure inpainting perspective, the inpainting problem may be stated as follows. Let � be the original image (or a frame in a video sequence), which is composed by a source area, denoted by �, whose pixel values are known, and a target area (a.k.a. hole), denoted by �, representing the damaged region to be repaired or a region to be filled-in, this means inpainted. As shown in Figure 3, these are non-overlapping areas, i.e. � � � ��, and �� stands for the boundary between the source and target areas.

8

Figure 3 – Illustration of the inpainting problem from the pure inpainting perspective and from the coding

perspective at the decoder side.

From the pure inpainting perspective, digital inpainting consists mainly in repairing objects in images or video sequences by filling-in target areas, eventually according to some previously chosen assistant information, extracted from the source image areas which may also include the area to be inpainted if repairing is the problem, or neighbor source areas if filling a hole is the problem; this task has to be performed in the most transparent way as defined above. To assure this goal, it is crucial that, after the inpainting process takes place, the restored image still preserves similar global and local properties to those presented in the original image, and also temporal consistency if video is being considered. Concluding, the pure inpainting perspective is only concerned with assuring similar perceived visual quality between the original undamaged image or video and the corresponding reconstruction in the repairing approach and realistic filling in the filling approach.

From the coding perspective, inpainting-based coding solutions aim at taking advantage from recent digital pure inpainting breakthroughs, with the intention of exploiting and reducing data redundancy so as to obtain higher compression ratios in image and video coding. The basic idea is to avoid wasting bits to send the decoder too much information for areas which may be effectively recovered by the decoder with or without some assistant information which may be extracted by the encoder from the original data. After significant compression efficiency improvements over the last two decades, both image and video compression technologies seem to have arrived to a saturation point in terms of compression improvements, at least if the current coding approaches are not significantly reconsidered. In this context, the exploitation of the human visual system’s characteristics seems to be a very interesting way to go forward. When adopting digital inpainting tools for compression purposes, the most relevant questions that need to be addressed are [4]:

• What distinctive information should be extracted from the original data with the aim of representing significant visual information for the areas to be inpainted by the decoder?

• How to reconstruct an image or a video sequence with this very same assistant information, while assuring identical perceived quality between the original and the inpainted reconstruction?

From this coding perspective, the encoder has access to the whole original data which broadens the gamut of distinctive information that can be extracted and sent to help the decoder inpainting the areas that will be represented using this novel approach to improve the compression efficiency. In contrast, the decoder is given only partial or no assistant information about the original data, notably for the areas to be inpainted, but has still to be ‘intelligent’ enough to perform inpainting with or without it, at least using the decoded neighbor regions. In this context, the inpainting problem, as defined from the pure inpainting perspective, only emerges at the decoder side meaning that the decoder is asked to ‘do more intelligent work’ than in traditional compression schemes, where improvements in the compression efficiency have mostly been based on significant increases of the encoding complexity.

Concluding, inpainting-based coding solutions exploit digital pure inpainting tools, eventually in combination with other coding tools, to achieve greater compression ratios when compared to traditional coding solutions, while avoiding compromising the perceived visual quality of the reconstructed image or video data.

9

2.2. Clustering Inpainting Tools

As for the majority of signal processing problems, the various ways to address the inpainting problem can be organized, clustered and classified depending on the technical approach, concepts and tools used. Based on the literature review made for the purpose of understanding and structuring the problem at hand, this means image and video inpainting, some clustering dimensions emerged as more relevant. While there is no single good clustering approach, having some appropriate organization for inpainting solutions helps in understanding their relationships, notably similarities and differences between available and emerging solutions. In this context, the main dimensions proposed to cluster and classify the technologies and solutions for image and video inpainting, regardless of the inpainting perspective adopted, are depicted in Figure 4.

Figure 4 – Proposed clustering for digital inpainting tools.

As shown in Figure 4, the dimensions adopted to organize and classify digital inpainting tools are:

• Type of Data – This has been considered to be the first dimension for clustering as it clearly distinguishes two related, yet different worlds: image and video. The type of data implicitly defines the nature of the redundancy to be exploited, notably spatial redundancy for image and spatial and temporal redundancies for video.

• Type of Data Modeling – This dimension regards the type of data modeling used for the data to be inpainted. As such, it allows identifying two different, yet complementary approaches that are vastly shared by the inpainting research community: geometric modeling and patch-based modeling. In this context, modeling implies a set of models and underlying assumptions which will allow digitally addressing the inpainting problem. As shown in Figure 4, these types of data modeling are the same for image and video data, reflecting the fact that they are largely agnostic to the type of data. However, for video, they might not be applicable exactly in the same way or with the same computational cost as for images, because video sequences bring out even more challenging requirements to cope with when performing inpainting.

In the following, the various inpainting tools clustering branches and leafs will be briefly discussed.

2.2.1. Image Inpainting Tools

In the proposed clustering, digital inpainting tools are divided into two main categories, depending on the type of data being considered, notably images and video sequences. The main differences between these two types of data are the world they live in and the amount of information to be processed. Images live in a 2D world which is spatially constrained by two coordinates, whereas video sequences live in a 3D world in the sense that they are governed by both the spatial image coordinates and time, which is considered to be the third dimension.

10

2.2.1.1. Geometric Modeling

Regarding image data, digital inpainting tools can be further classified using the second dimension adopted, i.e. the type of data modeling. The types of data modeling considered in image inpainting tools can be either geometric or patch-based, as shown in Figure 4.

Geometric modeling is based on the propagation of structural image properties, e.g. edges, from the source to the target area to inpaint the missing information in the image. Moreover, this type of data modeling works at pixel level and is good at restoring small defects and thin structures.

From the available set of structural image properties, edges are the most vastly used features in inpainting tools as the human visual system strongly relies on them to identify and understand the objects’ attributes and their mutual associations. In this context, geometric models can be seen, in some degree, as edge-continuing models, where edge information is first extracted and then diffused inwards the hole to be inpainted.

As for structure propagation, partial differential equations (PDE) are vastly used by the inpainting research community since they allow interpolating structural image properties in a smooth fashion, achieving good results in terms of the restored image perceived visual quality and coding efficiency, despite being computationally demanding. An example of this type of data modeling is the pure inpainting solution proposed by Bertalmio et al. [1] largely adopted in the literature, which will be described in Section 2.3.

2.2.1.2. Patch-based Modeling

Patch-based modeling is considered to be the most relevant alternative to geometric modeling among the digital inpainting tools available in the literature. This is justified by the fact that the manipulation of patch-based models is straightforward, in contrast, for instance, with parametric models for which it is hard to find appropriate mathematical methods to perform inpainting.

Patch-based modeling is supported by texture synthesis procedures which are responsible for searching the most texture-compatible source fragments matching the textural information of the target pixels’ vicinity. Unlike geometric modeling, patch-based modeling does not work at pixel level, but rather at a more global level and, therefore, it is best suited for filling-in large regions. The underlying assumption is that patch-based models consider the patch as the fundamental element in the image instead of the pixels, in order to perform searching and template matching in an efficient and adequate fashion.

Regardless of the modeling adopted, both geometric and patch-based models standing alone allow achieving good results. However, to fully take advantage from the best of both worlds in terms of the restored image perceived visual quality and coding efficiency, hybrid models may be considered by the inpainting research community. This is evidenced, for example, in the solution proposed by Liu et al. [4] that will be described in Section 2.4.

2.2.2. Video Inpainting Tools

As aforementioned, although image inpainting poses challenging difficulties, video inpainting discloses even more demanding problems. First of all, the amount of data in video sequences is much greater than for images and, second, temporal consistency is a must, due to the HVS’s sensitivity to motion. Moreover, when considering videos, both spatial and temporal redundancies must be exploited; this directly impacts on the tools to be used in pure video inpainting and in inpainting-based video compression.

As shown in Figure 4, video inpainting tools can also be further classified by the second dimension adopted, notably using the same categories as for image inpainting tools. However, this does not mean that these types of modeling are applicable exactly in the same way, in the same proportion or even with the same computational cost, independently of the type of data addressed. If that was the case, video inpainting tools would strictly consider the video as a set of uncorrelated images and, in consequence, they would simply be a naïve extension of image inpainting tools, discarding the temporal correlation between frames. What happens instead is that the type of data modeling adopted by video inpainting tools needs to be adapted to cope with temporal consistency (avoiding intense flickering and a noticeable poor sense of motion), hence allowing to effectively exploit both the spatial and temporal redundancies.

11

2.2.2.1. Geometric Modeling

As far as geometric modeling is concerned, its extension from images to video data, i.e. from one to several images, is far from being straightforward and can be very computationally expensive [6]. Although geometric modeling typically comprises edge extraction and propagation, when dealing with video sequences, the main problem is the computational cost of the edge propagation procedure which is dramatically higher than for images, as many frames need to be inpainted. To tackle this issue, inpainting researchers try to lower the order of the PDEs used, e.g. using a Laplacian operator, or to design simpler, yet less efficient, structure propagation procedures.

2.2.2.2. Patch-based Modeling

As for images, patch-based models are frequently used to perform video inpainting as their complexity is much lower than the complexity of geometric models for video. The main difference between this approach and the equivalent one for images is that, for video, compatible source patches have to be sought over several frames, although they are likely to be found in not-so-distant frames. However, as mentioned in [7], the patch-based modeling already proposed for image inpainting should not just be simply extended to video inpainting by brute force, since just searching for the most compatible source fragment in the whole video would be too time consuming. As for image inpainting tools, combining both geometric and patch-based modeling for video data may allow inpainting areas with rather different properties.

2.3. A Relevant Pure Inpainting Solution

To enrich the reader’s experience and knowledge on the image inpainting research literature, a very relevant pure inpainting solution will be reviewed. Although the selected pure inpainting solution has been designed at the beginning of the last decade, it is still largely considered to be one of best pure image inpainting approaches ever published. This pure image inpainting solution was designed by Bertalmio et al. [1] and corresponds to a PDE-based structure propagation algorithm. In the afore-proposed clustering of digital inpainting tools, this solution would fit under the image branch, more specifically under the geometric modeling leaf.

2.3.1. Objective and Technical Approach

The objective of this pure inpainting solution is to digitally reproduce some of the best inpainting practices used by professional restorers, notably:

• Filling-in the damaged area according to the global picture which helps preserving the unity of the artwork;

• Continuing structural properties from the source area into the damaged area which allows achieving better results in terms of the restored image perceived visual quality;

• Filling the distinct areas inside the hole with color so as to match the color distribution at the boundary of the hole which helps restoring the detail in the reconstructed image.

To fulfill the requirements above, the proposers’ strategy is to automatically fill-in the regions to be restored by smoothly diffusing information (in this case, gray-values) from its surroundings. The propagation mechanism is modeled by an iterative computation of PDEs and no textural information is used for inpainting.

Unlike other approaches, the designed procedure does not require any user interaction, e.g. manual indication of source information, besides the selection of the regions to be inpainted. This inpainting algorithm cannot only restore damaged images, e.g. cracks in old photographs, but can also proceed to object and text removal. As this is done in an automatic fashion, it is possible to fill-in several regions, incorporating distinct structural properties and backgrounds. Moreover, this inpainting solution has been designed for filling-in structured regions, e.g. regions crossing through boundaries, works for any topology of the region to be inpainted and can be applied both to color and gray-level images independently of their type, i.e. natural or synthetic. When considering color images, they are decomposed in their RGB (Red, Green and Blue) components and the still-to-be-described inpainting algorithm is applied to each one separately. Furthermore, a LUV-like color model, which is an easy-to-compute transformation of the 1931 CIE XYZ (Commission Internationale de L’Éclairage) color space (one

12

of the first mathematically defined) that is typically used to deal with colored lights by the computer graphics research community [8], is adopted to prevent the appearance of spurious colors.

2.3.2. Architecture and Walkthrough

The authors propose a pure inpainting algorithm based on PDEs that is iteratively run to progressively shrink the area to be restored by continuing inwards the lines, i.e. gray-values arriving at the boundary of the hole, ��, as labeled in Figure 3. The algorithm is provided with a gray-level input image and creates a set of images such that they progressively converge to the output image which should have a similar target quality to the undamaged version of the input image. The evolution of this inpainting procedure can be described by

��

(2.1)

where the super index � stands for the �� inpainting iteration, �� denotes the pixel coordinates, �� is the enhancement rate and �� represents the update of the image �� . Therefore, the image �� is an improved version of �� , with �� being the image improvement factor.

Although an architecture has not been included in [1], for the sake of comprehension, a simplified architecture is provided in Figure 5.

Figure 5 – High-level system architecture.

A short walkthrough of the system is presented next:

• Manual Mask Selection – The original image is provided to the Manual Mask Selection module with the intention of selecting the area to inpaint. The user defines a mask (which can be easily done using a software similar to Microsoft Paintbrush) identifying the regions to be inpainted; the motivation may be repairing or filling in that area after removing some object. The output of this module is an edited version of the original image, i.e. the image to be restored, consisting in the superposition of the user-defined mask and the original image (see Figure 5). As the portions to be reconstructed depend on the subjective choice of the user, having the original image directly as the input would not be sufficient to trigger the inpainting algorithm; therefore, the algorithm’s input is considered to be the edited image now including a hole to fill in.

• Anisotropic Diffusion – The edited image is delivered to the Anisotropic Diffusion module which is responsible for pre-processing it so as to minimize the impact of noise when estimating, at the Image Inpainting module, the direction in which the gray-level information should be propagated, i.e. the propagation direction. Note that anisotropic diffusion does not blur the gray-level information while removing noise, which is essential for assuring the good perceived visual quality of the restored image.

• Image Inpainting – The already pre-processed image is provided to the Image Inpainting module to be iteratively reconstructed, which leads to the loop included in Figure 5. During this process, an anisotropic diffusion is applied every few iterations to cope with noise that may have been generated meanwhile resulting from not-so-accurate estimations of the propagation direction. The loop only ends when the target area is fully inpainted, i.e. when the hole does not exist anymore, after being successively shrunk. The output of the Image Inpainting module is the image resulting from the last iteration of the inpainting loop. In this last generated image, the area that had previously been selected to be inpainted is completely

13

restored. Figure 6 shows that, as expected, a better reconstruction is obtained as the number of iterations increases, until the algorithm converges.

Figure 6 – Evolution of the inpainted image with an increasing number of inpainting iterations [1].

2.3.3. Main Tools

In this section, an in-depth description of the key tools used in this pure inpainting solution will be given. In particular, this description will be focused on the tools that support the Image Inpainting and the Anisotropic Diffusion modules, notably the Image Inpainting by Structure Propagation and the Noise Minimizing Noise through Anisotropic Diffusion tools, respectively. The order by which they will be described is associated to their importance in this inpainting solution; the first is the more important tool, as it is in the Image Inpainting module where almost all ‘inpainting action’ takes place, whereas the other can be seen as a complementary tool that enhances the first, therefore, less important, hence, here less detailed.

A) Image Inpainting by Structure Propagation

This tool allows propagating gray-values from the source to the target area so as to inpaint the areas that need to be restored. More specifically, this tool regards essentially the design of �� , which is intrinsically related to the gray-level information to be propagated, �� , and to the propagation direction,�� . In this context, �� can be written as

�� (2.2)

where �� is a measure of the change in �� . Equation (2.2) allows estimating the information �� from the input image and computing its change along the direction ��. The design of �� consists in the

following steps, which mainly regard the computation of its various terms, notably �� and �� : • Computing the Measure of Change �� !� "� – The authors consider that the gray-values’ propagation

should be smooth so as to be compliant with the goal of this solution, this means to digitally preserve the unity and achieve good perceived visual quality of the reconstructed image; therefore, �� should be an image smoothness estimator. In particular, a discrete 2D implementation of the Laplace operator is considered

�� ##� �� ##� �� (2.3)

where subscripts denote second order derivatives. Although other smoothness estimators could be used, good results have been obtained with this straightforward choice. In this context, the 2D smoothness

estimator (2.3) is first computed, followed by �� which is given by

�� $� � % �� % $� �� $� % �� % $�).

14

(2.4)

• Computing the Direction of Change &�� !� "� – For better understanding what computing �� means, the reader should know what propagation direction, ��, has been adopted to perform image inpainting. For the proposers of this inpainting solution, one possibility would be to define this direction as the normal to the shrunk version of the hole boundary (��) for each point �� inside the hole, as shown in Figure 7. This choice would be motivated by the belief that it would lead to the propagation of the lines with equal luminance in the image, i.e. isophotes, at the boundary of the hole, �.

Figure 7 – Propagation direction as the normal to the shrunk version of the hole boundary [1].

However, this belief did not come true after testing. What was verified instead was that the isophotes

arriving at '� tend to curve to align with ��. In this context, ��was then considered to be the isophotes’ directions, this means the inpainting procedure will be performed along the isophotes. Among the available options to estimate the isophotes’ directions, a time varying estimation is used, notably �� ()�� , where ()�� stands for the direction of minimal spatial change. This allows estimating in a coarser fashion at the beginning and iteratively achieving the sought continuity at ��, whereas considering a time invariant propagation direction would require knowing the isophotes’

directions from the start. In this context, the normalized propagation direction, ��*+��+, is computed by

�� ,�� ,� � %�-�� #�� .�-�� / �� #�� /�

(2.5)

• Computing the inpainted image 01 !� "� – After the two previous steps, the change of �� along the normalized propagation direction,�2�� , is computed by

2�� ,�� ,� (2.6)

which is then multiplied by a slope-limited version of the norm of the image gradient, +(�� + (note that +(�+ � +()�+), given by

+(�� + � 345 6�#78� �/ � 9�#:;� </ � 9�-78� </ � 9�-:;� </� �2� = >6�#7;� �/ � 9�#:8� </ � 9�-7;� </ � 9�-:8� </� ��2� ? >�@

(2.7)

where the sub indexes A and B stand, respectively, for the minimum and maximum between the derivative and zero, while C and D denote backward and forward differences, respectively. The slope-limiters are used to assure the stability of the algorithm, instead of computing central differences. Concluding, the computation of the inpainted �� is given by

15

�E�� F�� ,�� ,�G +(�� +

(2.8)

Conceptually, equations (2.1)-(2.8) are computed in �H, i.e. a dilatation of � with a ball of radius I and the information of both the gray-levels and the isophotes’ directions are propagated from the band �H % � towards the hole, �. However, these values are only updated inside �, meaning that (2.1) is applied only inside the hole expressing the fact that only the hole will be iteratively shrunk and the rest of the input image will be preserved.

B) Minimizing Noise through Anisotropic Diffusion

The purpose of this tool is to minimize the impact of noise when inpainting the image to be restored, so as to improve the results in terms of the restored image perceived visual quality. In this context, this tool is used by two architecture modules, notably the Anisotropic Diffusion and Image Inpainting modules. On the one hand, when used by the Anisotropic Diffusion module, this tool can be seen as pre-processing tool as it is used to remove noise inherent to the provided input image; this means, the anisotropic diffusion is applied to the entire input image. On the other hand, when used by the Image Inpainting module, this tool allows curving the isophotes to prevent them from crossing each other, which allows computing more accurate estimations for the propagation direction; this is done only inside �, instead of considering the entire image.

As this tool is much less relevant than the Image Inpainting by Structure Propagation tool, the authors only mention one processing step related to the computation of the anisotropic diffusion equation, which will be briefly described next:

• Computing anisotropic diffusion – Apparently, the difference between applying anisotropic diffusion at the Image Inpainting and at the Anisotropic Diffusion modules seems to be the area where the anisotropic diffusion equation is computed and the number of diffusion iterations applied. When used by the Anisotropic Diffusion module, this equation is computed for all pixels of the input image and only one diffusion iteration seems to be applied. When used by the Image Inpainting module (the more relevant case), every few iterations until the algorithm converges, the image generated by the Image Inpainting module, �, undergoes the anisotropic diffusion method proposed in [9] [10], allowing to smooth the information arriving at the boundary of the hole without losing sharpness in the reconstruction. More specifically, a straightforward discrete equivalent of the following continuous-time/continuous-space anisotropic diffusion equation is used

��J� K� �� LHJ� K�MJ� K� ��+(�J� K� ��+� �J� K� � �H �

(2.9)

where M is the Euclidean curvature of the isophotes of � and LHJ� K� is a smooth function imposing the Dirichlet boundary conditions (LHJ� K� � > in �H and LHJ� K� � $ in �). For more details on the discrete implementation used to perform anisotropic diffusion, please see [9] [10].

In the inpainting loop, a periodic interleaving of N inpainting steps using (2.1) and O diffusion steps using (2.9) takes place, totalizing a number of P steps depending on the size of �. This constant may be defined beforehand or the algorithm may stop the inpainting process if changes in the image fall below a specified threshold.

2.3.4. Performance Evaluation

The performance of the proposed algorithm has only been evaluated regarding the visual quality perceived by the observer as this work intends to digitally reproduce the effect of manual inpainting techniques used by professional restorers. Therefore, no objective quality or error metrics as the PSNR or the Mean Squared Error (MSE), respectively, has been considered. This may not be a great loss as these vastly used metrics have shown to be misleading and sometimes incoherent when used for assessing the subjective impact of restored images.

16

Test Conditions

The input images consider both gray-level and color images; the images are available in [1] and include, among others, vandalized (severely scratched) images, images superimposed with text and old deteriorated photographs (with cracks). The iteration process parameters are set to Q � $R, S � T and the speed set to �U � >V$, whereas the total number of iterations, P, may be variable depending on the size of the hole to be inpainted.

Results and Analysis

All the considered input color images took less than 5 minutes to inpaint using a non-optimized C++ code, running under Linux on a Pentium II PC with 128 Mb RAM and a clock frequency of 300 MHz.

As shown in Figure 8 and Figure 9, the algorithm performs well both in scratch and text removal, as a common observer would not be able to notice that the output image had been restored by image inpainting. These two cases are typical examples where a texture synthesis approach cannot be used, since there are too many different regions to be inpainted [1].

(a) (b)

Figure 8 – Digital restoration of a vandalized color image: (a) input image; (b) restored image [1] .

(a) (b)

Figure 9 - Text removal through image inpainting: (a) input image (superposition of the original image and

the text in red); (b) restored image [1] .

Regarding restoration, Figure 10 (a) shows the original deteriorated (with cracks) gray-level photograph, Figure 10 (b) shows (in red) the user-defined mask to limit the area to be inpainted and, finally, Figure 10 (c) shows the restored image. Here, the number of iterations was set to 3000 and two schemes were tested: single-resolution and multi-resolution inpainting. When the size of the hole is relatively large, a multi-resolution approach is used to speed up the inpainting procedure. As classically done in image processing, a converged result with a lower resolution is used to initialize the process to reach a higher resolution solution. Using single-resolution, it took approximately 7 minutes to perform the restoration, whereas only 2 minutes were needed when a 2-level multi-resolution was adopted. Despite the overall visual impact being good, the nose and right eye of the middle girl have not been successfully restored. This is somehow justified by the fact that the algorithm is not sensitive to high level information, e.g. eyes must be preserved, and takes into account only the mask that has been provided by the user to limit the areas where inpainting should take place.

17

Figure 10 – Restoration of an old gray-level photograph: (a) original deteriorated image; (b) user-defined mask

(in read); (c) restored image [1].

Strengths and Weaknesses

This algorithm described in this section is suited for a wide gamut of applications, from restoration of old photographs, to object and text removal. The idea of smoothly propagating structural information from the source area into the hole has shown, in several different scenarios, that the perceived visual quality tends to be high, though limited results are obtained when dealing with large textured regions. Another positive aspect is that the inpainting procedure is almost entirely automatic as only a user-defined mask is provided as input and it takes only a few minutes to generate the output image. Moreover, the results cannot only be seen as an end point, but can also be taken as the first step for manual restoration, saving a huge amount of working hours to professional restorers.

On the other hand, the major shortcomings of this algorithm are related to the high complexity of a PDE-based propagation method and to the fact that texture is not always well reproduced. This last drawback is particularly relevant when inpainting needs to be performed on very complex images with several different textural properties.

2.4. A Relevant Inpainting-based Image Coding Solution

Inspired by progressive breakthroughs in inpainting technology, researchers in image coding have recently tried more efficient ways of exploiting the visual redundancy in images and video by incorporating some inpainting tools in their coding frameworks. In other words, the main idea behind this coding paradigm is to achieve higher compression ratios by coding and transmitting only essential information and use inpainting procedures to assist the decoder in reconstructing the inpainted coded data without negatively impacting the perceived visual quality.

As mentioned in [4], it is important to observe that the selection of the assistant information directly impacts on the compression efficiency and on the reconstructed image perceived visual quality. Moreover, since the original image is always available at the encoder, a wide range of features may be used as assistant information, e.g. edges, semantic objects or even texture. In this context, a relevant inpainting-based image coding solution will be described in the following; this solution has been reviewed due to its conceptual richness and efficiency in addressing the problem at hand in this Thesis: inpainting-based image coding.

The inpainting-based image coding solution to be described was proposed by Liu et al. [4] and aims at designing an image compression framework which takes advantage from both structure propagation and texture synthesis methods to restore, at the decoder side, regions that have been removed at the encoder side to improve the coding efficiency in comparison with off-the-shelf coding standards. Considering the proposed clustering of digital inpainting tools, this solution would fit under the image branch and would be considered a hybrid solution as it combines both types of data modeling.

18

2.4.1. Objective and Technical Approach

The objective of this inpainting-based image coding solution is to design a fully automatic framework towards image compression, which aims at significantly reducing visual redundancy inherent to natural images while achieving good restored image perceived visual quality. In this context, some distinctive features are extracted from the originals at the encoder side, which help selecting the regions to be and not to be inpainted; therefore, allowing the system to choose, for each region, the most suitable coding approach from those available. In the proposed solution, edges have been chosen as the features to be extracted since the human visual system relies on them to identify and interpret the objects’ attributes and their mutual associations; thus, it is expected that their inclusion in this inpainting-based image coding solution will positively impact the restored image perceived visual quality. Furthermore, this solution performance has been assessed having baseline JPEG and H.264/AVC Intra as the benchmark and, at similar visual quality levels, has shown bit-savings up to 44% and 33%, respectively.

2.4.2. Architecture and Walkthrough

For better understanding the proposed system architecture (see Figure 11), some intuition about inpainting will be provided along both the encoder and decoder walkthroughs. Furthermore, an example walkthrough is illustrated in Figure 12 to ease the reader’s understanding.

Figure 11 – System architecture [4] .

Encoder Walkthrough

This system has only one input which consists in natural color images selected from the University of South California Signal & Image Processing Institute database (USC-SIPI) [11] and from the Kodak Image Library [12]. The images selected from the USC-SIPI and from the Kodak Image Library have 512×512 and 768×512 spatial resolutions, respectively.

• Image Analysis – In this module, the original image (see Figure 12 (a)) is first analyzed with the goal of distinguishing two types of regions:

• Removed regions, corresponding to regions that can be recovered by image inpainting tools at the decoder side eventually under the guidance of distinctive image features, i.e. assistant information, that have been extracted from the original image at the encoder side; therefore, the Removed regions are to be inpainted;

• Exemplar regions, corresponding to regions which cannot be recovered by inpainting tools with the desired target quality or which are not inpainted due to other requirements; therefore, the Exemplar

regions are not to be inpainted.

With this purpose, an edge-map is extracted from the original image (indicated by blue curves in Figure 12 (b)) based on which this region distinction is made, i.e. exemplar selection. Moreover, this edge-map will allow defining which blocks are considered to be textural or structural. For better understanding, the exemplar

selection process is illustrated in Figure 12 (c), where the Removed regions are marked in black and the Exemplar regions (a.k.a. exemplars) assume the colors of the original image.

• Assistant Info Encoder – Besides illustrating both the exemplar and removed regions, Figure 12 (c) shows a sub-set of the edge-map (not the whole one as in Figure 12 (b)) that had previously been

19

extracted at the Image Analysis module. This sub-set is the only assistant information to be coded by the Assistant Info Encoder, which compresses it using a bi-level information coding standard, notably JBIG (Joint Bi-level Image Experts Group).

• Exemplar Encoder – This module is responsible for coding the exemplars selected by the Image Analysis module. This encoder is considered to be standard-based, notably using either JPEG or H.264/AVC Intra coding solutions.

After reading [4], one concludes that the information coded both by the Assistant Info Encoder and Exemplar Encoder modules is after banded together and then transmitted as a unique bitstream; this means that a multiplexer module is missing in Figure 11 between the encoders and the channel.

Decoder Walkthrough

As for the encoder side, a decoder module seems to be missing in the proposed architecture, which would be responsible for demultiplexing the meanwhile transmitted bitstream. The outputs of this module would be two different streams that would be given to the Assistant Info Decoder and to the Exemplar Decoder. Considering Figure 11, it may be concluded that the authors probably meant to graphically express that the Assistant Info Decoder and the Exemplar Decoder are given different coded information; however, the way this is expressed may be misleading, which justifies the inclusion, at this stage, of this remark.

• Assistant Info Decoder – The compressed assistant information which was meanwhile transmitted over the channel, i.e. the edge-map sub-set, is decoded by this module, which, naturally, has to standard-compliant with the Assistant Info Encoder, otherwise the decoding would not be possible.

• Exemplar Decoder – This module is responsible for decoding the exemplars that were coded by the Exemplar Encoder module and have been meanwhile transmitted over the channel. As for the Assistant Info Decoder, this module also has to be standard-compliant with the corresponding encoder in order to the decoding be performed correctly.

• Assisted Image Inpainting – Both the decoded assistant and exemplar information are the inputs to this module, which is responsible for recovering the regions that have been removed at encoder side, notably at the Image analysis module. In particular, the transmitted assistant edge information, i.e. the edge-map sub-set, guides the structure propagation procedure that will allow recovering the structural blocks (see example results in Figure 12 (d)), whereas the remaining blocks, i.e. textural blocks, will be recovered by texture synthesis.

• Recombination – At last, as shown in Figure 11, the regions that have just been inpainted by the Assisted Image Inpainting module are combined with the Exemplar regions that have been decoded by the Exemplar Decoder so as to the Recombination module be able to generate the output image, i.e. reconstructed image (see Figure 12 (e)). Comparing Figure 12 (e) with Figure 12 (f) (which is the image decoded by the benchmark coding solution), one may conclude that the common observer would not be able to perceive significant differences in terms of visual quality; this is particularly remarkable considering that a bit-saving of 20,1% has been achieved for this particular test image, having JPEG as the benchmark.

20

Figure 12 – Example walkthrough for the Lena test image: (a) original image; (b) extracted edge map (blue

curves); (c) removed blocks (in black) and edge-map sub-set (blue curves); (d) recovered structural blocks after

structure propagation; (e) output image after texture synthesis and recombination; (f) Decoded image by the

benchmark (baseline JPEG in this case) [4].

2.4.3. Main Tools

In this section, the main tools used in the proposed architecture modules will be described in detail. In particular, this description will target the tools that support the Image Analysis and the Assisted Image Inpainting modules, notably the Image Analysis by Edge Extraction and Exemplar Selection and the Edge-assisted Hybrid Image Inpainting, respectively. This choice has been made because it is in these modules that most ‘inpainting-related action’ takes place. Conversely, addressing the remaining architecture modules would not acquaint the reader with the novelties introduced by this solution, as the Exemplar Encoder, the Assistant Info Encoder and their corresponding decoders are off-the-shelf coding solutions, therefore, have not been designed specifically for inpainting purposes. Moreover, the Recombination module will also not be described as it is responsible only for putting together the decoded exemplars and the inpainted regions; therefore, its function is not directly related to inpainting or compression, but rather to showing the whole output image.

A) Image Analysis by Edge Extraction and Exemplar Selection

This encoder side tool allows analyzing the original image so the exemplar selection be is performed, i.e. the selection of the regions that are to be and not to be inpainted. For each of these regions, the best coding approach is chosen from those available. In this context, this tool comprises essentially two stages, notably Edge Extraction and Exemplar Selection, for which several processing steps will be identified and described in the following.

Edge Extraction Stage

The first stage refers to the extraction of edge information from the original image which, on the one hand, will assist the encoder in defining the exemplar and removed regions and, on the other, will ease the decoder’s task when inpainting the Removed regions; in particular, the authors adopt the topology-based edge detector [13] to perform edge extraction. The most important processing steps for the Edge Extraction stage will be described in the following:

• Gaussian Filtering – First, the original image, which is here described by DJ�� J � � where � is a square

region in WX, goes through a two-dimensional isotropic Gaussian filter in order to the noise inherent to natural images be removed.

• Edge Thresholding – Second, the normalized image gradient, +(DJ�+, and its vector field, Y, are computed for each pixel J belonging to the resultant image. Then, a thresholding step takes place: if +(DJ�+ is the local maximum gradient along the direction Y and larger than a given threshold, the pixel J is considered to be part of an edge. Finally, the pixels with non-maximum gradients are verified by

21

spatially-adapted thresholds which prevent edges from not being detected due to incorrect estimations of Y.

• Edge-thinning – As for some edge detectors available in the literature, this particular method detects edges that are often more than one-pixel-width, which may pose additional difficulties to the decoder when inpainting the Removed regions. Moreover, this fact also means that there is redundant information in the extracted edge-map which, if to be promptly coded, would cost a larger number of bits. In this context, after the Edge Thresholding processing step, the edge pixels are thinned using an edge-thinning method which has been designed by the authors. The edge-thinning method proposed in [13] could have been used as it is part of the topology-based edge detector already described, however, it would not satisfy a requirement of this inpainting-based image coding solution, which is that pixel values on edges are not to be coded but rather to be inferred from connected surrounding edges [4].

The designed edge-thinning method, which takes into account the consistency of pixel values on edges and the edge smoothness, is provided with the already detected edge pixels which are then grouped into eight connective links (which will after be thinned separately) with the goal of finding a one-pixel-width line composed by � pixels, this means, DJ�� $�T� Z � � such that the following cost function is minimized

[ � \]+�DJ��+^�_ � 2]]`9DJ�� DJa�< � b^

a_^�_ ]+cJ��+d^

�_ � (2.10)

where +�DJ��+ is the normalized Laplacian for each edge pixel, `9DJ�� DJa�< is a constraint on edge

pixel values, +cJ��+d is the normalized curvature of the edge at each pixel and, finally, \� 2, b and e are positive weighting factors. In particular, the constraint on the edge pixel expresses the difference only among eight neighboring pixels and is given by

`9DJ�� DJa�< � f+DJ�� % DJa�+� �D�Ja � �ghJ��>��i��jkl�mj @

(2.11)

where ghJ�� is the 8-neighborhood of J�. Moreover, cJ�� is computed by

cJ�� `�n o (DJ��+(DJ��+pV�

(2.12)

Note that given a starting point for each edge-link, several paths with smaller energies (computed by (2.10)) are stored in a dynamic programming algorithm. Thus, each stored path is extended by adding one neighbor which leads to a minimal energy solution.

Concluding, this stage is mainly responsible for filtering the original image, detecting its edges and, after, removing edge redundancy by edge-thinning.

Exemplar Selection Stage

The second and last stage of this tool regards the selection of both the exemplar and the removed regions which, for simplicity, is performed at block level, based on the available already thinned edges resultant from the Edge Extraction stage. In this stage, the input image is first divided into 8×8 non-overlapping blocks and each one of them is classified either as structural or textural, according to the distance between the block pixels and the available thinned edges. More specifically, if more than 25% of the pixels in a block are close (e.g. five pixels away at the most) to edges, then the block is considered to be a structural block; otherwise, textural. When all blocks have been classified, distinct procedures are used to select the exemplars for textural and structural blocks, notably Textural Exemplar Selection and Structural Exemplar Selection.

22

Regardless of the blocks being textural or structural, the exemplars will be selected in two subsequent processing steps which address the selection of the necessary and the additional exemplars, according to their impact both on visual fidelity and on image inpainting. Typically, an image cannot be restored to the target quality without necessary exemplar blocks, whereas additional exemplar blocks help improving its perceived visual quality and ease the decoder’s task when inpainting, at the Assisted Image Inpainting module, the Removed regions. The blocks which are not selected as necessary or additional, will be removed and coded by an off-the-shelf coding solution.

The most important processing steps for the Exemplar Selection stage will be described next:

• Textural Exemplar Selection – Given the thinned edges, the textural blocks are processed so as to the exemplars be selected. With this purpose, first, the necessary textural exemplar blocks will be selected, followed by additional textural exemplar blocks; the remaining textural blocks, i.e. the non-exemplars, will be removed. In this context, the selection both of the necessary and additional textural blocks will be described in the following:

• Selecting Necessary Textural Exemplar Blocks – The necessary textural exemplar blocks are selected in the border of structural regions. This is justified by the fact that, in these regions, there may be important information about the transitions between different textures. In particular, if a textural block is next to a structural one either along the vertical or horizontal direction, then it is classified as a necessary textural exemplar block, as illustrated in Figure 13.

Figure 13 – Illustration of the selection of necessary textural exemplar blocks (dark gray lines denote the

thinned edges, light gray blocks denote structural blocks, white and black denote necessary and non-necessary

textural blocks, respectively) [4].

• Selecting Additional Textural Exemplar Blocks – The additional textural exemplar blocks may be progressively selected to enhance the detail of the restored image, as they aim to represent local textural variations. On the one hand, the authors state that, if a block contains obvious changes, it should be preserved in advance; on the other, removing large-scale regions is not recommended, as textural variation is a local image property. Hence, for each non-necessary textural block,�Oq , a variation parameter,�rq, which expresses the texture variation among its 4-neighbors, is computed by

rq � srtkOq� � s/ ] ,uOq� % uOv�,wx�yzw{� �

(2.13)

where s and s/ are positive weighting factors, g| stands for the 4-neighbor of �Oq and the functions rtk�� and u�� are the variance and the mean of the pixel values in a block, respectively. In this context, according to an additional block ratio which is set according to the image being tested, the blocks with higher variation parameters will be selected as additional textural exemplar blocks.

The blocks that are neither selected as necessary or additional textural exemplar blocks are removed, only to be recovered at the Assisted Image Inpainting module.

• Structural Exemplar Selection – As for the Textural Exemplar Selection, the structural blocks will also be processed so as to the exemplars be selected. Analogously, they will be processed in two subsequent steps to the necessary structural exemplar blocks and additional structural exemplar blocks be selected:

23

• Selecting Necessary Structural Exemplar Blocks – The thinned edges are typically the boundaries of different regions. Therefore, to assure good restored image perceived visual quality, they are categorized into four types, notably: “isolated”, if an edge pixel is only connected with one edge pixel; “branch”, if an edge pixel is connected to more than three edge pixels, i.e. a conjunction; “bridge”, if an edge pixel is connected with two conjunctions and, finally, “circle”, if the edge gives a loop trace. For better understanding, these types of edges have been illustrated in Figure 14.

Figure 14 – Illustration of the selection of necessary structural exemplar blocks (light gray lines are thinned

edges, white and black blocks denote necessary and non-necessary structural blocks, respectively [4] .

As shown in Figure 14, the selected necessary structural exemplar blocks are: for the “isolated” edges, their neighboring blocks to allow restoring, at the decoder side, the transitions of textural regions; for “branch” and “bridge” edges, the conjunction blocks (for a similar reason); for “circle” edges, two blocks which contain the most pixels belonging to the inner and outer regions. Concluding, the selected necessary structural exemplar blocks provide information about the transitions between textures which, after testing, has shown that the decoder is capable of effectively restoring the structural regions.

• Selecting Additional Structural Exemplar Blocks – As for selecting additional textural exemplar blocks, for each non-necessary structural block, equation (2.13) is computed. However, for structural blocks, the different image partitions defined by the edges are independently considered when computing the mean and the variance of the pixel values in a block and the resulting variation parameters are added together to obtain the total variation parameter of a block. As for Selecting Additional Textural Exemplar Blocks, the selection of additional structural exemplars is also dependent on a additional block ratio which is set according to the image being tested.

B) Edge-assisted Image Inpainting

This decoder side tool, which is used by the Assisted Image Inpainting module, allows recovering the Removed regions using a hybrid data modeling. Moreover, the inpainting algorithm has been designed to deal with arbitrarily-shaped regions, contrarily to the block-wise approach adopted at the encoder side. This tool comprises essentially two stages, notably the Pixel-wise Structure Propagation and Texture Synthesis stages, for which several processing steps will be identified and described based on Figure 15. Note that, both these stages are guided by a confidence map [14] to prioritize the regions to be restored.

Figure 15 – Proposed pixel-wise structure propagation method: (a) edge and its influencing region (arrowed

dash and dash-dot lines are the propagation directions); (b) restoration of influencing region [4].

24

Pixel-wise Structure Propagation Stage

In the proposed solution, the structure propagation method is performed at the pixel level, i.e. pixel-wise, since the edges may have different geometric shapes and have only one-pixel-width. Moreover, in this stage, the basic unit is the edge and its influencing region (denoted in Figure 15 (a) by the dashed region), i.e. the region containing close-to-the-edge (e.g. ten pixels at the most) unknown pixels, instead of a structural block. In this context, the most relevant processing steps for this stage will be described next:

• Restoring Unknown Edge Pixels – To restore this basic unit, the first step is to generate the unknown pixels in an edge from the known ones (denoted in Figure 15 (a) by black and white points, respectively) through linear interpolation.

• Restoring the Edge Influencing Region – After restoring an edge, the neighboring structure and texture inside the influencing region are filled-in. To do so, each pixel in the influencing region will have two candidates: a structure candidate that is to be propagated parallel along the edge and a texture candidate that is to be generated from the neighboring available pixels (denoted by S-candidate and T-candidate in Figure 15 (b), respectively). In this context, these candidates are generated by a pair matching approach which is based on a weighted SSD (Sum of Squared Differences) criterion and aims at searching for the most similar pixel value, among its 8-neigbor, to fill-in a given unknown pixel. Typically, if an unknown pixel is close to an edge, the structure candidate will be chosen to fill-in the pixel; otherwise, the textural one will be chosen. This means that, the pixel candidate that allows obtaining a smooth transition from structure to texture will be considered to fill in the unknown pixel.

After one influencing region pixel has been filled-in, these steps are taken again until no unknown pixels subsist in the edge influencing region.

Texture Synthesis Stage

• Texture Synthesis – The remainder unknown pixels, i.e. outside the influencing region, are filled-in by a patch-based texture synthesis approach [14] which aims at, based on a SSD criterion, finding the most texture-compatible source patch, i.e. for which pixels are known, within a search range. After, the source patch is selected to fill-in the target patch (which comprises the unknown pixels within a given the patch size) and a graph cut algorithm is used to perform the matching between them, assuring the least visible textural mismatches possible.

2.4.4. Performance Assessment

The authors propose both a JPEG-based and a H.264/AVC Intra based coding solutions that have the baseline JPEG and the H.264/AVC Intra compression standards as the benchmark, respectively. For both solutions, not only the bit savings have been assessed, but also the restored image perceived visual quality has been evaluated.

Test Conditions

The developed inpainting-based image compression framework is tested on natural color images available on the USC-SIPI [11] and Kodak Image Library [12]. All the parameters used in this solution have been fixed except for two, which are quality control parameters, notably the additional block ratios for textural and structural blocks.

On the one hand, the fixed parameters regard, at the encoder side, the weighting factors used in (2.10) and (2.13), notably in the Edge Extraction and Exemplar Selection stages. In specific, they are set to \ � $V>, 2 � >VT, b � >VT, e � $ for (10) and l � l/ � $V> for (2.13). At the decoder side, in the Pixel-wise Structure Propagation stage, only pixels which are 10 pixels away (at the most) from the edges are considered to be inside the influencing region and the search range for texture candidate is set to 9×9 pixels, whereas the structure candidate is sought in the entire influencing region. Finally, the search range and patch size for the Texture Synthesis stage are set to 11×11 and 7×7 pixels, respectively.

25

On the other hand, the additional block ratios for both structural and textural exemplars are manually defined according to the test image being considered; naturally, if a very textured image is to be tested, e.g. Lena, the additional textural block ratio should be much greater than the corresponding one for structural exemplar blocks (see Table 1), to improve the perceived visual quality of the restored image.

Regarding the JPEG-based solution, the exemplar locations are denoted, at the 8×8 block level by a binary map which is to be coded by an arithmetic encoder. The original image is then coded by the JPEG standard, during which the removed blocks will be skipped in to be filled in with the DC values copied from previous blocks so as to the DC prediction can still be performed in exemplar block compression.

Regarding the H.264/AVC Intra based solution, since the coding is performed at the 16×16 macroblock level, the authors consider to instances: if a macroblock is entirely removed, then a new macroblock type I_SKIP is coded; otherwise, the macroblock has a new element called block removal pattern that indicates which of its constituents four 8×8 blocks is removed (the block removal pattern is later coded by the arithmetic encoder). The exemplar blocks are coded using H.264/AVC Intra and, similarly to the JPEG-based solution, the removed blocks are filled with DC values from previous blocks to allow the intra prediction of the H.264/AVC Intra standard. In all tests, the quantization parameter for JPEG is set to 75, whereas for H.264/AVC Intra coding is set to 24.

Results and Analysis

The achieved results have been evaluated in terms of compression efficiency and visual quality. In this context, the bit-rate savings will be first analyzed based on Table 1, followed by the visual comparisons between the results obtained with coding standards themselves and the JPEG-based and H.264/AVC Intra based proposed solution, which are, shown in Figure 16 and Figure 17.

Table 1 – Bit savings of the proposed solution having JPEG and H.264/AVC Intra as the benchmarks [4].

As shown in Table 1, the proposed systems achieve up to 44% and 32.7% bit savings (notably for the Milk test image) when compared with baseline JPEG and H.264/AVC Intra by themselves, respectively. In the worst-case scenario, among all tested images, this inpainting-based image compression framework allows consuming 15.2 and 10.7% less bits than baseline JPEG and H.264/AVC Intra, respectively. Naturally, the compression efficiency improvements depend on the features available in the test image, but, one may conclude that the more complex the image is, the less compression efficiency can be achieved. This is justified by the fact that when coding very detailed images, the extracted edge-map typically comprises miscellaneous edges implying that many blocks have to be considered as necessary exemplars; thus, only a limited number of regions may be removed and coded with the novel approach at the encoder side.

26

Figure 16 – Visual quality comparisons between the restored images obtained using the proposed JPEG-based

solution (top row) and the baseline JPEG standard itself (bottom row): (a) kodim02; (b) kodim03; (c) kodim05

[4] .

Figure 16 shows that, considering the baseline JPEG as the benchmark, the proposed JPEG-based solution allows achieving very similar results in terms of the restored image perceived visual quality; these results are particularly remarkable taking into account that the JPEG-based proposed solution, for these three test images, allows achieving bit savings ranging from 15,2% to 33%, according to Table 1.

Figure 17 – Visual quality comparisons between the restored images obtained using the proposed H.264/AVC

Intra based coding solution (top row) and using the H.264/AVC Intra coding itself (bottom row): (a) Jet;

(b) Lena; (c) Milk; (d) Peppers [4].

Figure 17 shows that, considering H.264/AVC Intra as the benchmark, the proposed H.264/AVC Intra based coding solution allows also achieving very similar results in terms of the restored image perceived visual quality; these results are noteworthy especially if taking into account that, for these four test images, the proposed solution achieves bit-rate savings range from 10.7% to 17.6%, according to Table 1.

Strengths and Weaknesses

This fully automatic inpainting-based image coding framework makes it possible to achieve significant compression efficiency gains having state-of-the-art coding standards as the benchmark, by using a hybrid data modeling for inpainting purposes while achieving great subjective visual impact for the restored images, as evidenced in Figure 16 and Figure 17. As for the weaknesses, the computational complexity of the novel method is high since this approach performs not only Edge Extraction and Exemplar Selection at the encoder side, but also Edge-based Assisted Image Inpainting at the decoder side. In particular, note that the complexity of the decoder is greatly dependent on the values of the parameters used in the Assistant Image Inpainting module, notably the patch size and search range which directly impact on the number of SSD calculations.

After becoming familiar with digital inpainting tools and their clustering as well as being introduced to some relevant pure inpainting and inpainting-based coding solutions, the reader should benefit from getting acquainted and understanding the adopted inpainting-based image coding high-level architecture, which will be presented in Chapter 3.

27

Chapter 3

3. High-level Inpainting-based Image Coding

Architecture

Chapter 3 aims at providing the reader with a first perspective and explanation on the adopted inpainting-based image coding architecture. Considering the clustering of digital inpainting tools proposed in Chapter 2, the adopted solution fits under the image branch and will be based on a patch-based data modeling as will be described in Chapter 5. In this context, the main goal of this chapter is to present the adopted high-level inpainting-based image coding architecture as well as a brief functional description of its constituent modules.

To ease the reader’s experience, the encoder and decoder architectures will be presented separately, followed by the corresponding walkthroughs which introduce a functional description of the various modules. Furthermore, the inputs and outputs of the most relevant encoder and decoder modules will be illustrated since this will help understanding what will be described in the corresponding walkthroughs.

3.1. High-level Encoder Architecture and Walkthrough

Based on the literature reviewing made for the purpose of understanding and structuring the problem at hand, the adopted high-level inpainting-based image encoder architecture presented in Figure 18 has been designed based on the careful consideration of the encoder architectures adopted by the inpainting research community, notably their conceptual and functional strengths and weaknesses. Nonetheless, this does not necessarily mean that this architecture is strictly speaking a ‘super-architecture’ in the sense that each and every single inpainting-based coding solution available in the literature has an architecture which is a particular case of this architecture, as there are variants in structuring, functioning and naming. From a functional perspective, the encoder is essentially concerned with two main tasks:

1. Analysis – Analyzes the input image to select the best coding approach for each image area for those coding modes available, with the target to improve the overall compression efficiency in comparison with the selected image coding benchmarks, e.g. JPEG and JPEG 2000. Naturally, as for any other coding solution, a rate-distortion performance trade-off will have to be managed.

2. Data Encoding – Codes each of the two types of images areas, notably the areas to be and not to be inpainted which are defined through a coding mode matrix; the two types of image areas are encoded using one of the two available coding modes: 1) the default standard-based image coding mode; and 2) the novel inpainting-based image coding mode. Naturally, the novel coding mode should only be adopted when it brings RD performance advantages regarding the standard-based coding mode as the main target is to maximize the RD performance.

28

As many inpainting-based image coding solutions, the adopted high-level encoder architecture assumes that the input image is in the RGB color space, which will be converted into the YUV (Y stands for the luminance and U an V for the chrominance components, respectively) color space is composed by a luminance component, Y, and two chrominance components, U and V, with the same or down-sampled spatial resolutions.

Figure 18 – Adopted high-level inpainting-based image encoder architecture.

The encoder walkthrough intends to describe, from a functional perspective, each of the encoder modules presented in Figure 18, i.e. will essentially address the modules’ inputs and outputs and its function, without presenting any processing details regarding the adopted solution (to be addressed in Chapter 4). Hence, the encoder proceeds as follows:

• Analysis for Classification and Coding – This module is the most relevant in the adopted high-level encoder architecture as it is provided with the input image (see Figure 19 (a)) aiming at classifying it into two types of image areas (see Figure 19 (b)) which will be coded with alternative coding approaches:

• Areas to be inpainted, corresponding to areas that can be efficiently restored with inpainting tools by the decoder, using decoded data from their neighboring data and/or some assistant information, i.e. image feature, e.g. edge information (see Figure 19 (c));

• Areas not to be inpainted, corresponding to areas that cannot be efficiently reconstructed through inpainting tools to the required target quality or which will not be inpainted due to other requirements. Naturally, it is expected that all areas classified with this coding mode provide a RD performance which is better than the alternative one provided by the other coding branch. Making this selection of image areas in an effective way is the critical task of this encoder module which has a major impact on the final RD performance of the overall coding architecture and, thus, on the eventual gains of the inpainting-based image coding approach.

Figure 19 – Illustration of the Analysis for Classification and Coding module input and outputs: (a) input image;

(b) areas to be and not to be inpainted (black and non-black areas, respectively); (c) feature for the areas to be

inpainted, e.g. edge information (in dark purple) [4].

29

Naturally, the coding mode decision, i.e. the location of both types of image areas, must be also be coded and provided to the decoder so the inpainting can be performed for the encoder selected image areas.

• Standard-based Image Encoder – The Areas not to be inpainted are given to this module so they are appropriately coded with an off-the-shelf standard image coding solution, e.g. JPEG or JPEG 2000. This represents the fall back coding mode which RD performance must be ‘overridden’ by the novel alternative coding mode.

• Feature Extraction – This module is provided with the Areas to be inpainted, from which a gamut of image feature, e.g. edge information, may be extracted. The encoder may extract always the same feature for each image area, different features for each image area depending on its inpainting difficulty or may even not send any features for the image areas for which the decoder should be able to inpaint them with the required quality using only information inferred from the decoded surrounding areas, i.e. from the Areas not to be inpainted.

• Feature Encoder – This module is responsible for coding the Feature for the areas to be inpainted that have eventually been extracted at the Feature Extraction module. Moreover, the Feature Encoder module may be either standard-based or not, e.g. Movie Pictures Experts Group (MPEG-7) descriptions may be used; naturally, the Feature Encoder has to be selected according to the feature’s nature, e.g. JBIG coding standard may be used if edge information has been extracted as feature.

• Coding Mode Encoder – In order the decoder to know which areas are to be and not to be inpainted, the coding mode matrix that has been generated at the Analysis for Classification and Coding module has to be encoded; this task is performed by the Coding Mode Encoder module. As for the Feature Encoder module, this encoder may be either standard-based or not.

• Multiplexer – The information coded by the Standard-based Image Encoder, the Feature Encoder and the Coding Mode Encoder modules is finally multiplexed into a unique bitstream to be transmitted to the decoder over the channel being considered, e.g. wireless network.

As will be seen in Chapter 4, the most innovative and complex encoder module is the Analysis for Classification and Coding module which may be more or less intelligent in terms of its classification task. In this solution, to limit the encoder complexity, it will be assumed that this module does not reconstruct the inpainted areas to check their quality during the classification process; this is a critical constraint since it prevents this module to ‘play for sure’ in the sense of only using the inpainting coding mode when it is for sure a better solution from the RD performance point of view. However, this has the advantage of allowing the decoder to use whatever inpainting solution since the encoder does not rely on a particular one to make the classification. This implies that more ‘clever’ inpainting solutions at the decoder may allow reaching higher reconstruction qualities, eventually at the cost of some additional complexity.

3.2. High-level Decoder Architecture and Walkthrough

As for the encoder, the adopted high-level decoder architecture (see Figure 20) has been designed to match the adopted encoder architecture which resulted from the conceptual and functional ‘digestion’ of the high-level inpainting-based coding architectures available in the literature; naturally, not every inpainting-based image coding solution strictly shares all the decoder modules to be described. From a functional perspective, the goal of the decoder is twofold:

1. Data Decoding – Decode the Areas not to be inpainted, the feature that may have been extracted for the Areas to be inpainted and also the coding mode matrix, which defines where each coding mode has been applied;

2. Inpainting – Recover the Areas to be inpainted eventually using all or only part of the available decoded data, this means the decoded Areas not to be inpainted and the decoded Feature for the areas to be

inpainted. This second decoder task is related to the main goal of this Thesis, i.e. to develop an inpainting-based image coding solution where the decoder recovers part of the image data through a

30

novel coding paradigm which does not exist in the traditional image coding solutions: image inpainting. Hence, unlike the encoder which is concerned with maximizing the overall compression efficiency, notably in comparison with the relevant image coding benchmarks, the decoder is more focused on maximizing the quality for the areas which have been selected, at the encoder side, to be inpainted.

The adopted high-level decoder architecture assumes that the bitstream multiplexed at the encoder side has been integrally preserved by the transmission channel, i.e. there is no information loss.

Demultiplexer

Figure 20 – Adopted high-level inpainting-based image decoder architecture.

Similarly to the encoder, the decoder walkthrough intends to address, from a functional perspective, each of the decoder modules presented in Figure 20, leaving the in-depth processing details for Chapter 5. Hence, the decoder proceeds as follows:

• Demultiplexer – This module is responsible for demultiplexing the bitstream transmitted over the channel, resulting into three different types of information – coded data for Areas not to be inpainted, coded data for the Feature for the areas to be inpainted and coded data for the coding mode – which have been coded with different approaches and, therefore, will have to be provided to the corresponding decoders.

• Standard-based Image Decoder – This module will decode the areas not to be inpainted; naturally, this decoder will have to be coherent with the standard-based encoder selected for the Standard-based Image Encoder module; otherwise, the decoding would not be possible.

• Feature Decoder – The image feature that have eventually been extracted from the Areas to be inpainted by the Feature Extraction module at the encoder side, which should help the decoder performing image inpainting, are to be decoded by the Feature Decoder. This module also has to match the Feature Encoder solution in order for the decoding to be possible.

• Coding Mode Decoder – The coded data for the coding mode decision indicating the areas to be and not to be inpainted which had been created by the Coding Mode Encoder module, is provided to the Coding Mode Decoder module to be decoded. As for the other decoders, this module also has to be coherent with the corresponding encoder; otherwise, the decoding would not be possible.

• Image Inpainting – This module is the most important among the decoder modules, as its function is intrinsically related to the definition of the inpainting problem presented in Section 2.1; the Areas to be

inpainted (black areas in Figure 21 (a)) have to be filled-in using information from the Areas not to be

inpainted (non-black areas in Figure 21 (a)) and eventually also from the relevant decoded feature.

• Blending – Finally, given the inpainted and standard decoded areas, i.e. the decoded areas which have not been inpainted, this last module blends them together so as to generate the output image illustrated in Figure 21 (b). The perceived visual quality of this module’s output may have to be transparent with

31

respect to the input image, as shown by Figure 21 (b) and (c). This fact may have an impact on the quantization used for the standard-based encoder and on the size of the image areas to be inpainted.

Figure 21 – Illustration of the Image Inpainting and Blending modules: (a) areas to be and not to be inpainted

(black and non-black areas, respectively) and feature for the areas to be inpainted, e.g. edge information (in

dark purple); (b) output image (black areas have been inpainted); (c) input image for comparison [4].

To summarize, this chapter has defined both the adopted high-level inpainting-based image encoder and decoder architectures and provided the corresponding walkthroughs, whereas the processing details of the designed inpainting-based image coding solution will be addressed next. The encoder and decoder processing details will be addressed in separated chapters, so as to achieve a more balanced structure for this Thesis and, simultaneously, to ease the reader’s experience.

33

Chapter 4

4. Describing the Encoder Tools

As mentioned at the end of Chapter 3, the description of the algorithmic details of the processing tools included in the encoder architecture will be addressed in this chapter. This description will be enriched with illustrations of the encoder’s modules inputs and outputs, aiming at easing the reader’s experience.

The developed inpainting-based image coding solution has been implemented in MATLAB; this decision has been motivated by the wide gamut of already built-in processing toolboxes available, notably Image Processing Toolbox™ which allows performing image processing, analysis, visualization and overall algorithmic development in a very effective, efficient and developer-friendly fashion [15]. The encoder high-level architecture has been presented in Figure 18 and includes three complementary encoders, targeting three different types of information, and an analysis module which has to take the encoder’s key decisions, notably in terms of defining the coding mode for each image block, determining the final compression performance.

In this chapter, to ease the description of the encoder tools, a more detailed encoder architecture than the one in Figure 18 is presented in Figure 22. For better understanding, the decoder modules, sub-modules and stages will be described by their order of appearance.

34

Figure 22 – Adopted encoder architecture.

35

4.1. Analysis for Classification and Coding

As aforementioned, the Analysis for Classification and Coding module plays one of the most relevant roles among all the encoder modules, since it will ultimately determine the image areas to be and not to be inpainted, and thus also the final codec performance. In this context, its processing tools and algorithmic details will naturally be more in-depth described in comparison with other encoder modules for the simple reason that they are more complex and innovative. Furthermore, to ease this module’s description and the reader’s experience, the Analysis for Classification and Coding sub-modules have been ‘grouped’ into consecutive stages according to their function in the encoder architecture (evidenced by the colored rectangles in Figure 22), with each stage addressed in a separate sub-section.

4.1.1. Edge Extraction

The first encoder sub-module regards essentially the extraction of the image edges which has to be performed to allow selecting the areas to be and not to be inpainted. This sub-module comprises two consecutive stages (as evidenced by the red rectangle in Figure 22, notably the RGB-to-YUV Color Space Conversion and the Edge Extraction stages, which will be described in the following.

4.1.1.1. RGB-to-YUV Color Space Conversion

First of all, the input image (illustrated in Figure 23 (a) and (c) for the images Lena and Peppers, respectively) goes through a color space conversion, notably from the RGB space to the YUV space where Y stands for the luminance component expressing the image brightness, while U and V stand for the chrominance components and express the image color. This color space conversion is performed because, unlike the RGB color space, the YUV color space takes the human visual perception into account, allowing higher compression and, thus, a more significant reduction of the bandwidth necessary for the chrominance components, to which the HVS is less sensitive [16].

The output of this stage is the Y component, which is illustrated in Figure 23 (b) and (d) for the image Lena and Peppers, respectively. As depicted in Figure 22, only this component will be processed in terms of analysis and classification as it is the component to which the HVS is most sensitive to and also with most energy; processing only the luminance at the encoder analysis module is a typical solution in the literature since it allows reducing the overall complexity without significant performance penalty even when the three components are coded; in this case, the U, V components would be coded using the decisions taken based on the processing of the Y component. As the developed image codec targets only luminance coding and inpainting, the chrominance components, U and V will not be processed.

Figure 23 – Illustration of the RGB-to-YUV Color Space Conversion sub-module: (a) and (c) input images in the

RGB color space; (b) and (d) Y components.

36

4.1.1.2. Edge Extraction

Next, the Y component is given to the Edge Extraction stage so that the image edges are extracted; this will allow later adequately classifying the image areas at the Preliminary Block Classification sub-module. Among the gamut of image features which could allow appropriately classifying the image blocks for coding purposes, it was decided to use the edge information as edges typically correspond to areas where strong intensity contrasts exist and detecting them should significantly reduce the amount of data to process after, while allowing to preserve the essential structural properties of the image.

In this encoding solution, the edge detection is performed using the widely adopted Canny detector [17], which is a very mature, accurate and reliable edge detection tool. According to [18], the Canny detector aims at searching for local maxima in the input image’s gradient, generating, as output, a binary edge-map. Although the Canny detector has not been in practice implemented, as it is available in the Image Processing Toolbox™, the fundamental steps involved in the edge extraction process [19] will be briefly summarized next:

1. Image-inherent Noise Reduction – Firstly, the luminance component is convolved with a Gaussian mask representing a discrete approximation of a Gaussian filter [20], resulting in a slightly blurred image which is not affected by image-inherent noise; for commodity, this image will be hereafter called smoothed image.

2. Smoothed Image’s Gradient Computation – Secondly, the smoothed image’s gradient is computed recurring to a pair of Sobel convolution masks, so as to highlight the image regions which have high first spatial derivatives.

3. Edge Gradient and Direction Computation – Given the smoothed image’s first spatial derivatives, }# and }-, respectively along the horizontal and vertical directions, the edge gradient, }, and direction, ~,

are computed as follows:

345 } �� 6}#/ � }-/�Y � ��U�� o}-}#p

@ (4.1)

As an edge may be disposed in various directions, the Canny detector rounds the edge direction to one of four angles, representing vertical (90º), horizontal (0º) and the two diagonals (45º and 135º). As mentioned in [20], this is performed as:

Y�� 345 >�� D�Y � �>�� TTVR�� $R�VR�� $�>��R�� DY ��TTVR�� VR��>� �D�Y ��VR�� $$TVR��$�R� �D�Y ��$$TVR� $R�VR��

@ (4.2)

4. Non-maximum Suppression – The next algorithmic step has to determine if the gradient magnitude exhibits a local maximum in the gradient direction. With this purpose, the smoothed image is scanned along the image gradient direction, so as to set to zero the pixels which are not part of the local maxima; this has the effect of suppressing all image information that is not part of local maxima, implying that the remaining set of edge points is composed by thin edges.

5. Hysteresis Thresholding – The final step of the adopted edge extraction algorithm corresponds to hysteresis thresholding, which requires two thresholds: the high and low hysteresis thresholds. This is justified by the fact that two types of edges should be detected, notably the ‘strong’ and ‘weak’ edges. The former are continuous curves which are very likely to correspond to genuine edges, whereas the latter are associated to pixels exhibiting high gradients and are typically connected to ‘strong’ edges. The hysteresis thresholding process plays a key role in the adopted edge extraction algorithm, making it more robust to noise and, therefore, more likely to detect true ‘weak’ edges, in comparison with other edge detectors, e.g. the Sobel method [18]. In detail, as mentioned in [19], any image pixel that has a

37

luminance value greater than the high hysteresis threshold, P, is immediately classified as an edge pixel, i.e. ��A��t��j�u`LjAte�e� � $. Then, the yet-not-classified image pixels in the eight-neighbors of the already classified edge pixels and which luminance value is greater than the low

hysteresis threshold, P/, are also classified as edge pixels, i.e. ��A��t��j�u`LjAte�e� � $V As for the remaining image pixels, they will be classified as non-edge pixels, i.e. ��A��t�j��u`LjAte�e� � >. For better understanding, the hysteresis thresholding is formally expressed as follows:

��A��t��j�u`LjAte�e� � �$� �D��e� = P��$� �D�� ghe��>� i��jkl��mj�� @ e� = P/ (4.3)

where, �e� stands for the luminance value for the pixel e, ghe� represents the 8-neighbors of the pixel e and, naturally, P = P/.

Once this process is completed, a binary edge-map with the luminance component dimensions is generated, where all pixels have been classified either as edge pixels or non-edge pixels (Figure 24 (b) and (e), for the images Lena and Peppers, respectively).

Figure 24 – Illustration of the Edge Extraction stage with the Canny detector: (a) and (b) input luminance

component; (b) and (e) luminance component edge-map; (c) and (f) superposition between the luminance

component and the corresponding edge-map (edges in white).

4.1.2. Preliminary Block Classification

This sub-module has the mission of scanning the luminance component at 8×8 block level so image blocks may be preliminary classified. The fact that luminance blocks are to be processed instead of the entire luminance component at once is motivated by the fact that the classification has to be locally performed to be effective and adaptive. In particular, considering non-overlapping 8×8 pixels blocks expresses a tradeoff between complexity and locality and is also related to the fact that the JPEG codec later used encodes 8×8 samples blocks.

Among the reviewed inpainting-based image coding solutions, the preliminary block classification is typically edge-based as edges are image inherent features which typically correspond to areas where strong intensity contrasts exist and are associated to structural areas in an image. Nonetheless, there are various valid ways of implementing the edge-based block classification, e.g. classifying the block according to the distance of its constituent pixels to the image edges [4] or according to the number of edge pixels in the block [5]. After

38

carefully reviewing the inpainting-based image coding solutions available in the literature, it has intuitively made more sense to perform the block classification based on the percentage of connected edge pixels in a block. This has been motivated by the fact that humans tend to associate structure to the essential image information that is connected/continuous or exhibits key dependencies.

In this solution, the preliminary block classification sub-module aims at classifying each image block either into one of the following block types:

• Structural Blocks – Correspond to image blocks in which, at least, a given percentage of the edge pixels have been classified as connected edge pixels, i.e. edge pixels for which at least one of its 8-neighbors is also an edge pixel; exhaustive experiments have heuristically determined 25% to be a good value for this classification.

• Textural Blocks – Correspond to image blocks in which there are no edge pixels or where less than 25% of the edge pixels are connected; note that the first condition is necessary, as if there are no edge pixels at all in a given block, the computation of the percentage of connected edge pixels would not make sense as it is computed as the ratio between the number of connected edge pixels in a block and the number of edge pixels in that block (the result would be 0/0).

In this solution, a patch-based data modeling has been adopted since the goal of this solution is to inpaint some selected textural areas and, as stated in the Section 2.2.2.1, this is the most appropriate data modeling for this purpose. The patch based data modeling aims at finding textural compatible image fragments in the ‘known’ image areas to fill in the areas to be inpainted and provides good results for restoring large textural areas. As for the structural blocks, they typically have to be filled-in by adopting a geometric data modeling, which aims at propagating/elongating structural image properties, e.g. edge information and provides better results when restoring small and thin structural image areas. In this context, in the proposed image codec solution, the ‘best’ inpainting block candidates will be selected from the textural blocks as the patch-based data modeling is suited for texture synthesis, whereas the structural blocks will not be inpainted, but rather coded with an off-the-shelf image coding solution.

To ease the reader’s experience and understanding, this sub-module’s input and output have been illustrated in Figure 25 for the images Lena (top row) and Peppers (bottom row); in the following, the description of this sub-module’s algorithm will be based on the detailed flowchart presented in Figure 26.

Figure 25 – Illustration of the Preliminary Block Classification sub-module: (a) and (d) input luminance

component at 8×8 block level; (b) and (e) luminance component edge map; (c) and (f) preliminary block

classification (textural and structural blocks in blue and non-blue, respectively, with image edges in red).

39

Are there

edge pixels?

Analyze

Block PixelsClassify as

Textural Block

Is this an

edge pixel?

Go to

Next PixelAny of the

8-neighbors is

edge pixel?

Classify as

Connected Pixel

Is this the last

block pixel?

8x8 Luminance

Blocks

Y Component

Edge-map

YES NO

YES NO

YES

NO

Go to

Next Pixel

Compute the Percentage

of Connected Edge

Pixels in this Block

Is it ≥ 25% ?

Classify as

Structural Block

Classify as

Textural Block

Is this

the last block?

Block Classification Matrix

YES

YES

NO

NO

NOYES

Go to

Next Block

Figure 26 – Preliminary Block Classification sub-module flowchart.

The inputs to this sub-module correspond both to 8×8 pixel luminance blocks and the luminance component edge-map which has already been extracted at the Edge Extraction stage. As for the output, it consists in a Block Classification matrix, which in this preliminary classification has only two possible labels signaling either if the image blocks have been classified as structural or textural (the textural blocks are further classified in Section 4.1.3). This classification is formally expressed as follows:

O�i�c��tmm�D��t��i�� f$� �D�C�i�c��m��jJ��kt��>� �D�C�i�c��m�m�k��kt�@ (4.4)

where � and stand for a given matrix row and column, respectively. Note that this matrix is defined at the block level, so the total number of matrix rows times the total number of matrix columns equals the number of 8×8 pixel blocks in the image. With the target to fill the Block Classification matrix, the algorithm proceeds as:

40

For all image blocks do:

1. No-Edges Block Detection – The current image block is processed to check if, at least, one of its constituent pixels is an edge pixel. If that is the case, the algorithm proceeds to step 1.1 for further processing, otherwise step 1.4 is processed.

1.1. Block Pixels Analysis – In case at least one edge pixel has been detected in the current block, the block will be further analyzed so as to determine if there are connected edge pixels in the block. This process will only end when all block pixels have been analyzed; then, based on the percentage of connected edge pixels, the block will be classified as structural or textural.

For all pixels in the current block do:

1.1.1. Edge Pixels Check – This test aims at checking if the current block pixel is an edge pixel or not; in case it is, the algorithm proceeds to step 1.1.1.1; otherwise, it will proceed to the next block pixel and go back to step 1.1.1.

1.1.1.1. Edge Pixel’s 8-neighbors Check – If the current block pixel is an edge pixel, its 8-neighbors will be tested so as to check if, at least one of them, is also an edge pixel. If that is the case, the algorithm will go to step 1.1.1.2; otherwise, it will proceed to the next block pixel.

1.1.1.2. Connected Edge Pixel Classification – The current block pixel, which is an edge pixel and that has already being determined to have at least one edge pixel among its 8-neighors, is classified as connected edge pixel. After, the algorithm will proceed to the next block pixel and go back to step 1.1.1.

1.2. Connected Edge Pixels Block Percentage Computation – After all the pixels in the current image block have been analyzed, the percentage of connected edge pixels is computed for this block, i.e. the ratio between the number of connected edge pixels and the number of edge pixels in the current block.

1.3. Threshold Check – The computed connected edge pixels block percentage is compared to the threshold defined for the block preliminary classification, this means, if this percentage is greater than or equal to 25% (heuristically defined), the current block will be classified as structural, which is formally expressed by making O�i�c��tmm�D��t��i�� >. Otherwise, the block will be classified as textural, which is formally expressed by making O�i�c��tmm�D��t��i�� $. After this step, the algorithm proceeds to the next image block and goes back to step 1.

1.4. Immediate Textural Block Classification – If there is no edge pixel in the current image block, then the block will immediately be classified as textural block. This is performed to ‘force’ the computation of the percentage of connected edge pixels only to happen for blocks containing edge pixels, as evidenced in Figure 26. If a block does not contain any edge pixels, the percentage of connected edge pixels must not be computed. After, the algorithm will proceed to the next luminance block and go back to step 1.

After all image blocks have been classified either as structural or textural, this sub-module’s output, i.e. the Block Classification matrix, is generated; this matrix filling has been formally expressed by equation (4.4). Furthermore, the Block Classification matrix will be provided to the Not-to-be-Inpainted Core Textural Blocks

Classification stage, where it will be updated.

4.1.3. Textural Blocks Further Classification

The next sub-module, which is identified in Figure 22 by the green rectangle, comprises two stages which ultimately aim at selecting, among the textural blocks, those image blocks which will and will not be inpainted, depending on their ‘difficulty’ to be properly inferred from the neighboring ‘known’ areas, and also on their impact on the restored image perceived visual quality and the RD tradeoff.

In this context, the first stage of this sub-module regards further classifying the textural blocks, into:

41

• Not-to-be-Inpainted Core Textural Blocks – Consist in the textural blocks which will not be inpainted because they have structural blocks in their neighborhood and, therefore, they contain essential information about textural transitions, this means core textural information. Therefore, it would be extremely difficult to properly inpaint them based on other textural blocks which are farther from the structural blocks.

• Potential to-be-Inpainted Textural Blocks – Consist in the textural blocks which have not been selected as not-to-be-inpainted core textural blocks because they are farther away from the image structural blocks. Their naming intends to express the fact that some of these blocks may be selected to be inpainted depending on the RD tradeoff and/or target quality.

Note that this further classification will require the Block Classification matrix expressed by equation (4.4) to be updated, since these two new block types will emerge from the textural blocks. The algorithmic details involved in this process will be presented in Section 4.1.3.1.

As for the second stage, it regards further classifying the potential to-be-inpainted textural blocks into two new block types:

• Not-to-be-Inpainted Additional Textural Blocks – Consist in textural image blocks which will also not be inpainted in order to provide additional information about the textural areas which are farther from the structural areas. In this context, they may be seen as ‘seeds’ that neighbor the areas to be inpainted which aim at providing extra and more local and ‘reliable’ textural information about the surroundings of the areas to be inpainted.

• To-be-inpainted Textural Blocks – Consist in the textural blocks which are likely to be properly inpainted by adopting a patch-based data modeling. These blocks are simultaneously selected with the not-to-be-

inpainted additional textural blocks, meaning that the potential to-be-inpainted textural blocks may be further classified as not-to-be-inpainted additional textural blocks or to-be-inpainted textural blocks

depending on the coding goals, notably RD performance and quality.

After the processing involved in this stage, the final block classification matrix will be generated, based on which the blocks to be and not to be inpainted will be defined in the Coding Mode Decision sub-module. The algorithmic details involved in the classification of the aforementioned blocks will be presented in Section 4.1.3.2.

4.1.3.1. Not-to-be-Inpainted Core Textural Blocks Classification

As in [4], the developed solution consists in immediately selecting, among the textural blocks, the core textural blocks which will not be inpainted since they are ‘too essential’ and ‘too difficult’ to inpaint. These image blocks are those which have structural blocks in their neighborhood (4 or 8 neighbors, user-defined); this classification is motivated by the fact that these image blocks contain very important information about the textural transitions, which would be very difficult to infer from their surroundings and/or available image features. Moreover, the proposed way to select the not-to-be-inpainted core textural blocks will allow inpainting, at the decoder, the inner areas based on the outer areas.

To ease the reader’s experience and understanding, the results associated to this stage are illustrated in Figure 27 (b) and (d) for the images Lena and Peppers; they are followed by the description of the implemented algorithm and processing details which correspond to the flowchart presented in Figure 28.

42

Figure 27 – Illustration of the Not-to-be-Inpainted Core Textural Blocks Selection stage results considering 8-

neighbors: (a) and (c) textural and structural blocks (blue and non-blue areas, respectively); (b) and (d) not-to-

be-inpainted core textural blocks and potential to-be-inpainted textural blocks (pink and green areas,

respectively).

Based on the example presented in Figure 27, it is possible to conclude that the main task of this stage is to distinguish the not-to-be-inpainted core textural blocks from the potential to-be-inpainted textural blocks (painted, respectively, in shock pink and green in Figure 27 (b) and (d)) among the full set of textural blocks, shown in shock blue in Figure 27 (a) and (c).

Figure 28 – Not-to-be-Inpainted Core Textural Blocks Classification stage flowchart.

43

This stage input corresponds to the so-far available Block Classification matrix identifying the structural and textural blocks that has been generated at the Preliminary Block Classification sub-module. Now, the classification process continues with:

For all image blocks, do:

1. Textural Blocks Check – The first step is to test if the current block is a textural block. If that is the case, the algorithm will proceed to step 1.1; otherwise, it will proceed to the next image block and go back to step 1.

1.1. Structural Neighbors Check – The current textural block’s 4 or 8 neighbors (user-defined parameter) will be checked in order to find if any of them is a structural block. This is motivated by the fact that the textural blocks which have structural neighboring blocks are believed to contain important textural properties which cannot be easily and properly filled-in through the adopted inpainting methods. If this check is positive, i.e. there is at least one structural block neighboring the current textural block, the algorithm will go to step 1.1.1; otherwise, it proceeds to step 1.1.2.

1.1.1. Not-to-be-Inpainted Core Textural Blocks Classification – If at least one of the current textural block’s neighbors is a structural block, then the current block is classified as a not-to-be-

inpainted core textural block (shock pink blocks in Figure 27 (b) and (d)). This is formally expressed by ‘relabeling’ the current textural block as follows: O�i�c��tmm�D��t��i�� = 2.

1.1.2. Potential to-be-inpainted Textural Blocks Classification – If no structural block has been found among the current textural block’s neighbors, this image block will be classified as a potential to-be-inpainted textural block (shock green blocks in Figure 27 (b) and (d)). This is formally expressed by ‘relabeling’ the current textural block as follows: O�i�c��tmm�D��t��i�� = 3.

After all textural blocks have been processed and classified either as not-to-be-inpainted core textural blocks or as potential to-be-inpainted textural blocks, this stage’s output is generated which corresponds to an updated version of the Block Classification matrix (defined by equation (4.4)), where all texture blocks are classified as follows:

O�i�c��tmm�D��t��i�� T� �D�C�i�c��m��i��i�Cj��et��j`��ikj��jJ��kt�� D�C�i�c��m�ei�j��t��i�Cj��et��j`��jJ��kt�>� �D�C�i�c��m�m�k��kt�� @ (4.5)

Comparing equations (4.4) and (4.5), it should be clear that the textural blocks, which were assigned with ‘1’ in equation (4.4) gave place to two new block types corresponding to the labels ‘2’ and ‘3’, referring to the non-

to-be-inpainted core textural blocks and to the potential to-be-inpainted textural blocks, respectively.

4.1.3.2. Not-to-be-Inpainted Additional Textural Blocks Classification

This stage is ultimately responsible for selecting, among the potential to-be-inpainted textural blocks, the not-to-be-inpainted additional textural blocks and simultaneously the to-be-inpainted textural blocks. This means that given the updated Block Classification matrix (4.5) and after the processing involved in this stage, the final block classification matrix will be generated, identifying four types of blocks which will be labeled as follows:

O�i�c��tmm�D��t��i�� T� �D�C�i�c��m��i��i�Cj��et��j`��ikj��jJ��kt�� D�C�i�c��m��i��i�Cj��et��j`�t``��i�t��jJ��kt�R� �D�C�i�c��m��i�Cj��et��j`��jJ��kt��C�i�c��>� �D�C�i�c��m�m�k��kt��@ (4.6)

The not-to-be-inpainted additional textural blocks are selected in order to provide extra and more local ‘original’ textural information about the textural areas which are farther away from the structural areas. This

44

selection is believed to improve the subjective visual impact of the inpainting procedure, since this type of blocks provides more reliable information in the surroundings of the areas to be inpainted. Note that the effectiveness of the image inpainting methods, which may be more or less intelligent and complex depending on the preset goals and allowable decoder complexity, critically depends on the amount of ‘known’ neighboring areas from which the areas to be inpainted will be filled-in and on their scattering within the image. This means that the larger the areas to be inpainted, the less effective the inpainting method is likely to be, especially when the output image (after inpainting) has to be transparent in comparison to the input image. So, it intuitively makes sense that the amount of not-to-be-inpainted additional textural blocks depends on the target output quality.

As it has been decided to limit the encoder complexity in the designed solution, the decoder tools are not available (and thus not replicated) at the encoder. This means that it is not possible checking in advance at the encoder the inpainting results to be obtained at the decoder; hence, the encoder has to be very ‘smart’ to carefully and intelligently select the areas to be and not to be inpainted in order to maximize the RD performance while avoiding to fix a specific decoder’s inpainting solution and associated capabilities which may depend on the allowable decoder complexity.

To ease the reader’s experience and understanding, this stage’s results have been illustrated in Figure 29 for the images Lena (top row) and Peppers (bottom row); next, the description of the corresponding algorithm which is based on the flowchart in Figure 30 is presented.

Figure 29 – Illustration of the Not-to-be-Inpainted Additional Textural Blocks Classification stage: (a) and (b)

not-to-be-inpainted core textural and potential to-be-inpainted textural blocks (in shock pink and shock green,

respectively); (b) and (d) not-to-be inpainted additional textural and to be-inpainted textural blocks (in yellow

and orange, respectively).

As evidenced in Figure 29, the not-to-be-inpainted additional textural blocks (painted in yellow in Figure 29 (b)) and the to-be-inpainted textural blocks (painted in orange in Figure 29 (b)) are both selected from the potential to-be-inpainted textural blocks (painted in shock green in Figure 29 (a)). As aforementioned, this fact is expressed through equations (4.5) and (4.6).

45

Figure 30 – Not-to-be-Inpainted Additional Textural Blocks Classification stage flowchart.

The input to this stage corresponds to the updated Block Classification matrix (4.5) identifying the not-to-be-

inpainted core textural blocks, the potential to-be-inpainted textural blocks and the structural blocks. Now, the classification process will be finalized with:

For all potential to-be-inpainted textural blocks, do:

1. Block Variation Metric Computation – The selection of the not-to-be-inpainted additional textural and the to-be-inpainted textural blocks must be ‘smart’ as the decoder inpainting tools are not available at the encoder. Thus, this selection process should be adaptive to each input image which may have rather different properties, e.g. shading. In this context, inspired by the selection of the additional textural blocks in [4] (corresponding to the not-to-be-inpainted additional textural blocks in the developed solution), this selection is performed based on the following block variation metric:

46

rOq� � srtkOq� � s/ ] ,uOq� % �uOv�,wx��y�w{� (4.7)

where rOq� represents the block variation metric for the current potential to-be-inpainted textural block, Oq , s and s/ are positive weighing factors (where, s � s/ � $), ghOq� expresses the 8-neighbors of the current potential to-be-inpainted block (alternatively, the user may decide to use just 4-neighbors) and rtk�� and u�� denote the block’s variance and average value, respectively.

As shown by equation (4.7), the block variation metric is computed by adding two complementary terms which intend to express the following aspects:

• Individual Block Variation – The first term, srtkOq�, contributes with the individual block variation, disregarding any similarity with its neighboring blocks. This means that, if the pixels in the current potential to-be-inpainted textural block are such that the block luminance variance is high, then the image block should be classified as not-to-be-inpainted additional textural; this is justified by the fact that the pixels of a high-variance block should be rather difficult to be inferred from its surroundings and/or from image features that might be extracted and transmitted.

• Neighboring Blocks’ Variation – The second term, s/� ,uOq� % �uOv�,wx��y�w{� , contributes with

the sum of the absolute differences between the expected value for the current potential to-be-

inpainted textural block and its neighbors’ expected values. This means that the more significant is this sum, the more ‘different’ are these block averages; this fact should influence the decision of making this block or not a not-to-be-inpainted additional textural block. Note that the current block may have a low individual block variation, which would make this potential to-be-inpainted textural

block to be classified as to-be-inpainted textural if only the individual block variation was considered. However, this block luminance average value could not be ‘similar’ to the corresponding ones of their neighbors meaning that it should also be difficult to infer the luminance values for this block from neighboring blocks; hence, this block should be classified as a not-to-be-inpainted additional textural

block.

2. Not-to-be-Inpainted Additional Textural Blocks Classification Threshold – After the block variation metric has been computed for all potential to-be-inpainted textural blocks, it is necessary to determine a threshold which should allow selecting the not-to-be-inpainted additional textural and the to-be-inpainted

textural blocks. After exhaustive experiments, two alternative approaches have emerged:

• Manually fine-tuning the block variation metric threshold for each input image which has the clear advantage of optimizing the further classification of the textural blocks, but has the obvious disadvantage of not being automatically adaptive to the input image.

• Designing an automatic adaptive method of defining the block variation metric threshold for a given input image, which may not generate the ‘optimal’ threshold for a specific image but comes close enough and has the strength of being automatically adaptive to the input image.

Although both approaches are valid and have strengths and weaknesses, it has been decided here to design an automatic adaptive mechanism for defining the block variation metric threshold, meaning that this automatic solution has been preferred over the manual threshold fine-tuning solution for each input image. In this context, it has been decided to estimate the required threshold based on the probability density function for the block variation metric obtained based on its histogram. This approach allows interpreting the block variation metric from a statistical point of view, thus exploiting statistical tools to obtain an adaptive and user-independent threshold, e.g. the PDF’s centroid also known as the mass center and its associated standard deviation. After carefully reviewing the most relevant inpainting-based image coding solutions in the literature, notably [4], it has been concluded that none of them considers an automatic adaptive threshold definition; therefore, the proposed mechanism is a novel contribution in the context of inpainting-based image coding.

47

2.1. Block Variation Metric Histogram Computation – The histogram of this metric may be formally expressed as follows:

��7� � �]��7�C��q�aq_ (4.8)

where ��7� stands for the total number of observations of the block variation metric, ��7�C��q� is the number of observations for this metric that fall into each one of the c disjoint categories (known as bins). In this solution, the number of bins has been set to 5, after exhaustive experiments. According to [21], there is no optimal number of bins, and trying to determine it usually implies making strong assumptions about the shape of the distribution; hence, exhaustive experiments are typically conducted to determine the number of bins.

2.2. Block Variation Metric PDF Estimation – The next step consists in estimating the probability density function based on the already computed histogram, which is mathematically expressed as follows:

�kiC9��7�C��q�< � ��7�C��q��7� (4.9)

where �kiC9��7�C��q�< corresponds to the probability associated to the C��q. 2.3. Block Variation Metric Threshold Computation – The next step is to compute the required

classification threshold, P ¡¢��q:q ¢�q��. In a first approach, this threshold had been considered to be the

PDF’s centroid, which is mathematically expressed as follows:

�£¤��j��ki�` � �]�kiC��7�C��q�� ¥ C��qaq_

(4.10)

In this first approach, a potential to-be-inpainted textural block would be classified as not-to-be-

inpainted additional textural if its associated variation metric was larger than or equal to the P ¡¢��q:q ¢�q�� threshold; otherwise, as to-be-inpainted textural block. However, after testing, this

classification method would form, in some cases, very large textural areas to be filled in which significantly limited the quality of the image reconstructed at the decoder side. Hence, a ‘chess-like’ seeding pattern for the not-to-be-inpainted additional textural blocks has been incorporated in the classification and the classification threshold has been considered to be the sum of the PDF centroid and the standard deviation for the block variation metric, which is given as:

P ¡¢��q:q ¢�q�� £¤��j��ki�` ��¦ $��7� ]9r9Ov< % �£¤��j��ki�`</^§¨©v_ (4.11)

The fact that the selection threshold has been estimated based on the block variation metric’s PDF allows selecting the blocks to be inpainted without any user interaction, as intended.

3. Block Variation Metric Check – In this step, each block variation metric will be compared to the classification threshold, P ¡¢��q:q ¢�q�� , to allow further classifying the potential to-be-inpainted textural

blocks. If the block variation metric is equal or greater than this threshold, the algorithm will proceed to step 3.1; otherwise, it will go to step 3.2.

3.1. Immediate Not-to-be-Inpainted Additional Textural Block Classification – If the block variation metric is larger than or equal to P ¡¢��, then the block will be immediately classified as not-to-be-

inpainted additional textural block. By incorporating the standard deviation of the PDF for the block

48

variation metric in the classification threshold, only the few blocks are immediately selected as not-to-

be-inpainted additional textural; this is desired, it will allow classifying more blocks as to-be-

inpainted textural after. This execution of the current algorithmic step formally corresponds to performing the following label assignment: O�i�c��tmm�D��t��i�� .

3.2. To-be-Inpainted Textural Block Classification – The potential to-be-inpainted textural blocks which have not yet been further classified because their associated block variation metric is lower than the classification threshold, are further classified by applying a ‘chess-like’ seeding pattern where the

to-be-inpainted textural blocks and not-to-be-inpainted additional textural blocks are alternately classified in raster scan order (see Figure 29 (b) and (d)). This seeding pattern allows uniformly scattering not-to-be-inpainted additional textural blocks in the image, which allows very significantly improving the RD results in comparison with forming larger areas to be inpainted (which happened in the first approach, where this seeding pattern was not included); this will be shown when presenting the performance evaluation in Chapter 6.

Hence, the assignments O�i�c��tmm�D��t��i�� and O�i�c��tmm�D��t��i�� R will be performed alternately, referring to the classification of the current potential to-be-inpainted textural block as not-to-be-inpainted additional textural or as to-be-inpainted textural block.

After all potential to-be-inpainted textural blocks have been further classified into not-to-be-inpainted

additional textural blocks or to-be-inpainted textural blocks, this sub-module’s output is generated, i.e. the final Block Classification matrix, which is mathematically expressed by equation (4.6).

4.1.4. Coding Mode Decision

The last sub-module of the Analysis for Classification and Coding module aims at defining the coding mode for all image blocks which have been classified in the two previous sub-modules. This means that the blocks to be inpainted will be encoded using the novel inpainting-based coding solution, whereas the not to be inpainted blocks will be encoded with a traditional image coding solution.

Given the final Block Classification matrix expressed in (4.6), which classifies the four types of blocks considered in this solution, notably the structural blocks, the not-to-be-inpainted core textural blocks, the not-to-

be-inpainted additional textural blocks and the to-be-inpainted textural blocks, this sub-module aims at explicitly defining the areas which will be and not be inpainted. In this context, the process involved in this sub-module may be seen as a ‘label conversion’, which allows generating the following output:

�i`��L�Bi`j�� f$� �D�C�i�c��m��i�Cj��et��j`��jJ��kt�>� i��jkl�mj�� @ (4.12)

The Coding Mode matrix will be responsible for informing the decoder about which areas will be and not be inpainted; in particular, the only blocks to be inpainted will be to-be-inpainted textural blocks, for which the O�i�c��tmm�D��t��i�� R.

The decision of not selecting some structural blocks to be inpainted is mainly due to the increase in the decoder’s complexity it would require, as well as to the fact that, to properly reconstruct those blocks, a PDE-based structure propagation procedure would likely have to be designed at the decoder side, which is far from being a trivial and low-complexity add-on. As for the not-to-be-inpainted core textural blocks and the not-to-be-

inpainted additional textural blocks, the decision of selecting them also as not to be inpainted is related to both the target objective and subjective qualities of the inpainted image and the RD performance which had to be taken into account. In this context, the method for determining the image blocks to and not to be inpainted will be described based on the flowchart in Figure 31.

49

Classify as to

be inpainted

Is this the

last block?

Coding Mode Matrix

YES

YES

Go to Next

Block Label

NO

Block Classification Matrix

To be

Inpainted

Textural?

Classify as

not to be

inpainted

Block Label

Figure 31 – Coding Mode Decision sub-module flowchart.

For all blocks, do:

1. To-be-Inpainted Textural Block Label Check – Given the current block classification label, this check aims at testing if it corresponds to a to-be-inpainted textural block label; in that case, the algorithm will proceed to step 1.1; otherwise, it goes to step 1.2.

1.1. To be Inpainted Block Classification – If the current label is to-be-inpainted textural, then the following assignment is performed: �i`��L�Bi`j�� $. After, the algorithm will proceed to the next block and go back to step 1.

1.2. Not to be Inpainted Block Classification – If the current block label corresponds to a not-to-be-

inpainted core textural, to a not-to-be-inpainted additional textural or to a structural block, it will be converted into a not-to-be-inpainted coding label. This label conversion is formally expressed by performing the following assignment: �i`��L�Bi`j�Bt�k�J�� >. After, the algorithm will proceed to the next block and go back to step 1.

After all labels have been converted, this sub-module’s processing has reached an end and the output is generated, i.e. the Coding Mode matrix which is illustrated in Figure 32 (b) and (d) for the images Lena and Peppers, respectively.

(a) (b)

(d)(c)

Figure 32 – Illustration of the Coding Mode Decision sub-module: (a) and (c) structural, not-to-te-inpainted

core textural, not-to-be-inpainted additional textural, to-be-inpainted textural blocks (in gray level, shock pink,

yellow and orange, respectively); (b) and (d) coding mode matrix labeling the blocks to be and not to be

inpainted (in black and white, respectively).

50

4.2. Feature Extraction

This module has the function of selecting and extracting the image feature which will further improve inpainting results for the blocks to be inpainted. For that purpose, this module comprises two sub-modules:

1. Feature Selection and Extraction – Regards selecting and extracting the image feature for the blocks to be inpainted, which is the most important task of this module.

2. YUV-to-RGB Color Space Conversion – Regards performing a color space conversion back to the RGB color space; this particular sub-module, had to be included in this solution since the software version [22] used for the Standard-based Image Encoder requires this encoder to be provided with a ‘.bmp’ image in the RGB color space. Naturally, if this encoder did not have this requirement, this sub-module would have been discarded.

4.2.1. Feature Selection and Extraction

As mentioned in Section 3.1, a wide gamut of image features may be used as the original image is available at the encoder; this also means that the same or different image features may be extracted for each image area.

Since, as mentioned before, having a reduced encoder complexity has been considered as an important requirement, the developed coding solution proposes to extract a single, although very important, image feature, notably the block luminance averages of the blocks to be inpainted, i.e. the mean value of the block pixels luminance, to further improve the inpainting results. On the one hand, the fact that only one feature is extracted has the goal of limiting the encoder complexity, as considering many and different image features would require a significant extra processing effort, especially if the intention was to determine the most suitable image feature, among those available, for each area to be inpainted. On the other hand, the block luminance average has been selected as the image feature to extract, mainly for three reasons:

• The block luminance averages constitute extremely important information about the areas to be inpainted, which may be used to locally adjust the pixels values within the block after the areas are inpainted at the decoder;

• The block averages are one of the most important features in terms of HVS sensitivity, which is expected to improve the perceived visual quality of the restored image;

• The block averages are extracted and processed in a straightforward fashion, which makes them very attractive to use.

The algorithmic steps involved in this module will be presented in the following based on the flowchart in Figure 33.

Figure 33 – Feature Selection and Extraction sub-module flowchart.

51

Given the Coding Mode matrix, which identifies the blocks to be and not to be inpainted at the decoder side, and the input Y component, this sub-module generates, as output, the selected image feature for the blocks to be inpainted.

For all bock positions in the Coding Mode matrix, do:

1. Block to be Inpainted Label Check – The current block label is tested to check if it has a to-be-

inpainted textural block label; if this is the case, the algorithm proceeds to step 1.1; otherwise, it proceeds to the next block position and to step 1 until a to-be-inpainted textural block label is found (as the image feature will only be extracted for those blocks).

1.1. Average Block Value Computation – This step regards the extraction of the selected image feature, i.e. the average block luminance value, for the current to-be-inpainted textural block. Therefore, in this step, the input Y component blocks have to be provided so that the block luminance average value may be computed.

1.2. Average Block Value Fill-in – After computing the block luminance average value for the current to-

be-inpainted textural block, the next step is to fill in all pixels in the current to-be-inpainted textural

block with this single value. After, the algorithm will proceed to the next block label and afterwards will go to step 1 so as to check if it corresponds to a block to be inpainted.

After the image feature has been extracted for all the blocks to be inpainted, the algorithm reaches an end. Note that this sub-module’s output is an image comprising the luminance values for the blocks not to be inpainted and the luminance block averages for the blocks to be inpainted, which will be provided to the YUV-to-RGB Color Space Conversion sub-module.

4.2.2. YUV-to-RGB Color Space Conversion

The adopted Standard-based Image Encoder software version [22] expects a ‘.bmp’ image in the RGB color space as input; therefore, in this solution, there is the need of performing a color space conversion from the YUV back to the RGB color space. Naturally, there may be available in the literature other software versions which do not have this requirement. Given the luminance component for the blocks not to be inpainted, the selected feature which has been filled in, at the Feature Selection and Extraction sub-module, the locations of the blocks to be inpainted, this sub-module performs the YUV-to-RGB color space conversion by filling-in the U and V components with a constant value (‘127’ has been used) as the developed codec does not inpaint the chrominance components so they are, in fact, not needed (see Figure 34).

Figure 34 – Illustration of the YUV-to-RGB color space conversion: (a) and (c) Y component with the block

luminance averages in the areas to be inpainted; (b) and (d) corresponding image in the RGB color space to be

given to the Standard-based Image Encoder.

52

4.3. Standard-based Image Encoder

As mentioned in Section 3.1, the Standard-based Image Encoder module aims at encoding the areas not to be inpainted using an off-the-shelf standard image coding solution; in this context, this encoder solution could have been either the JPEG or JPEG 2000 standard image codecs. Since the JPEG standard is still the most relevant and widely used image coding standard, it is was decided to adopt this codec for the Standard-based Image Encoder module; this decision has also been motivated by the fact that there is no specific need for scalability in the decoding process which is an important JPEG 2000 advantage and very likely the main reason for JPEG 2000 to have been adopted for digital cinema.

Although, from a functional perspective, the role of the Standard-based Image Encoder module is to code only the blocks not to be inpainted, in practice, this module will also code the selected block feature that has been extracted for the areas to be inpainted: this means the block luminance averages will be coded as block DC values. This option is advantageous, mainly because:

• The JPEG encoder expects a whole image as input, and as there are image features that have been extracted and need to be coded and sent to the decoder, they have been filled in the location of the blocks to be inpainted;

• The bitrate cost associated with coding the block DC (Direct Current) values which are coded with the JPEG encoder is insignificant due to the DC coefficient prediction tool used to code the DC DCT (Discrete Cosine Transform) coefficient in the JPEG standard. In this way, sending the block luminance averages ‘comes for free’ since they would be sent anyway as this reduces the overall bitrate for the standard encoded blocks due to the inter-block DC prediction tool used. This fact has led to the discarding of an additional encoder just to code the selected image feature, in this case the block luminance averages.

In particular, the JPEG software version referenced in [23] and distributed by the Independent JPEG Group (IJG) [22] was used, notably with the following options and parameters:

• Baseline JPEG;

• 4:2:0 sub-sampling format;

• Visually optimized quantization matrix;

• Huffman coding.

4.4. Coding Mode Encoder

To code the Coding Mode matrix, i.e. the matrix identifying the location of the image areas to be and not to be inpainted (which has been formally expressed in equation (4.12)), a Run-Length Encoder (RLE) has been adopted which performs lossless data compression. Although other lossless encoders could have been used, this solution has been preferred due to its straightforward implementation and low complexity; note that lossy encoders could not have been used since this would imply loosing essential information about the coding mode adopted for each block location, which would not allow the decoder to perform appropriate image inpainting.

The RLE codec improves the coding efficiency of the repeating Coding Mode labels, called runs, by storing only a single data value and the count, i.e. the length of the run. This implicitly means that a new symbol will only be identified when there is a transition between two different symbol values. In this solution, the Coding Mode matrix is composed either by ‘1’ or ‘0’ (data values), expressing the blocks to be and not to be inpainted, respectively. In this context, the RLE encoder proceeds as follows:

For all the labels in the Coding Mode matrix, do:

1. Run-Length Occurrences Computation – For all classification matrix’s positions, the sequence of ‘0’ and ‘1’ run-lengths is determined by counting the number of successive ‘0’ or ‘1’, using a top-down and left-right scanning order. The minimum run-length is 0 and the maximum is the number of labels in the

53

classification matrix. For synchronization purposes, it is assumed that the first run-length always corresponds to a ‘0’ label.

2. Run-length Bit Cost Computation – After, the total bit cost for all run-lengths is determined as follows:

O��im� � �]�q ª«¬/ $eq �q (4.8)

where �q and eq are, respectively, the number of occurrences of each possible run-length (for ‘0’ and ‘1’) and its associated probability.

Hence, the bit cost computed for the coding mode matrix will be seen as overhead relatively to just coding the original input image; as such, it will be taken into account in the RD performance results.

To summarize, this chapter has focused on describing the algorithmic solutions for the encoder modules in the proposed inpainting-based image codec, notably regarding the Analysis for Classification and Coding, the Standard-based Image Encoder and the Coding Mode Encoder. In this context, the reader is now acquainted with the methods for analyzing, classifying and coding the input image which ultimately allow performing the selection of the blocks to be and not to be inpainted. As aforementioned, the encoder will send to the decoder an encoded image, composed by the blocks not to be inpainted and the extracted feature for all the blocks to be inpainted, as well as the encoded coding mode matrix so that the decoder knows which blocks to inpaint.

The main technical novelties introduced in the proposed encoder solution regard:

• The preliminary block classification method, notably the usage of the percentage of connected edge pixels to preliminarily classify the image blocks as structural or textural;

• The not-to-be-inpainted additional textural block classification method, notably by estimating the classification threshold using the probability density function for the block variation metric, which allows to automatically adapt the classification solution for the not-to-be-inpainted additional textural blocks and the to-be-inpainted textural blocks.

As for Chapter 5, it will be focused on the algorithmic solutions for the decoder modules, notably on the method for performing image inpainting which will critically determine the final RD performance.

55

Chapter 5

5. Describing the Decoder Tools

After describing the encoder tools in Chapter 4, the reader has now to be acquainted with the algorithmic details of the processing tools included in the decoder architecture, which will be addressed in this chapter. Naturally, the encoder and decoder pairs have to fit together and, thus, the encoder description largely hints its decoder description.

The high-level inpainting-based decoder architecture has been presented in Chapter 3, notably in Figure 20, which includes three decoders targeting at recovering three essential types of information: i) the coding mode decision information; ii) the coded luminance data for the areas not to be inpainted; and iii) the coded selected feature for the areas to be inpainted. However, the main technical novelty proposed in this decoder is not related to the data decoding processes themselves but rather to the image inpainting process where the decoder has to ‘create’ texture for certain areas based on the neighbors’ texture. In this context, the processing tools and algorithmic details will naturally be more in-depth described for the Image Inpainting module.

As for the encoder architecture in Chapter 4, the adopted decoder architecture, further detailed and completed in comparison to the one in Figure 20, is presented in Figure 35. For better understanding, the decoder architecture will be described by modules, which may comprise sub-modules. The sub-modules consist in stages which will be presented by their order of appearance in the decoder.

56

Inpainted image with DC adjustment!

Figure 35 – Adopted decoder architecture.

57

5.1. Standard-based Image Decoder

As mentioned in Section 3.2, this module aims at decoding the areas not to be inpainted, using a conventional image coding standard which, in order the decoding is possible, must be compatible with the solution adopted for the Standard-based Image Encoder module. In this context, the JPEG standard decoder software version suggested in [23], made available by the Independent JPEG Group [22], has been used. This JPEG decoder software version has to be provided with a ‘.jpeg’ coded stream to generate a decoded ‘.bmp’ image in the RGB color space.

Although, from a functional perspective, the role of the Standard-based Image Decoder module is to decode only the areas not to be inpainted, in practice, this module will also decode the feature that has been extracted for the areas to be inpainted, i.e. the block luminance averages; this solution was adopted for the reasons stated in Chapter 4, when describing the Standard-based Image Encoder. This means that it has not emerged a need for using an additional decoder to decode the selected feature, extracted and coded at the encoder side for each to-be-inpainted block.

5.2. Coding Mode Decoder

The Coding Mode Decoder module is responsible for decoding the coding mode decision information which has been generated by the Analysis for Classification and Coding module and, after, coded by the Coding Mode Encoder module. The input to this module is, therefore, the coded stream with the coding mode decisions and its output is the decoded Coding Mode matrix which will be provided to the Image Inpainting module, so that this module knows which image areas are to be and not to be inpainted. As for the Standard-based Image Decoder module, also this decoder has to match the corresponding encoder (in this case, a run-length encoder); otherwise, decoding the Coding Mode matrix would not be possible.

5.3. Image Inpainting Module

As aforementioned, the Image Inpainting module is associated to the major technical novelty introduced at the decoder side, notably in comparison with standard image coding solutions. In this solution, this critical module aims at performing two complementary inpainting tasks:

• Neighboring Luminance Values Inferring – Regards inferring luminance information for the areas that have been selected to be inpainted based on the ‘known’ neighboring areas, without exploiting any image features eventually extracted for the to-be-inpainted areas;

• Block Average Luminance Values Adjustment – Regards adjusting the already inferred pixel values for the to be inpainted areas, by exploiting the associated extracted image feature, in this case the block luminance averages; this process is done to eventually improve the decoded image’s subjective and objective qualities and it is only performed after all ‘unknown’ luminance values have been inpainted by means of the previous task, as evidenced in Figure 35.

With these purposes in mind, this module is provided with three types of decoded information: the luminance decoded data for the areas not to be inpainted, the selected feature for the areas to be inpainted and the coding mode decisions; naturally, its output is the complete reconstructed/decoded image. The reader should be acquainted with the fact that, in practice, before starting the inpainting procedure, the so-called image to be

inpainted is created including: the decoded luminance for the areas not to be inpainted; and a constant value corresponding to an ‘impossible’ luminance value, e.g. ‘-1’, for the areas to be inpainted in order all these areas are to be signaled in the same way; the extracted image feature is preserved in a copy of the JPEG decoded image, in order it is not lost while inferring luminance values from the neighboring ‘known’ areas.

Note that, in the Image Inpainting module, the neighboring luminance values inferring task will not be performed block-by-block; therefore, the image to be inpainted mentioned above has to be generated first. This means that, whenever additional image pixels are filled-in with luminance values, the image to be inpainted will be updated and the following inpainting iterations will take place until all the to-be-inpainted pixels have been inpainted. Hence, as both the areas to be and not to be inpainted have been, for practical reasons, gathered in the same image, there is no need for the decoder architecture to have a Blending module, which is part of the high-

58

level decoder architecture presented in Figure 20. In fact, the output image will be generated by the Image Inpainting module to which are given, as input, the decoded not to-be-inpainted image areas.

The Image Inpainting module has the ultimate target of filling-in the selected textural blocks, i.e. the to-be-

inpainted textural blocks, which is adequately performed, with a manageable complexity, using a patch based data modeling. This means that, although the areas to be filled-in correspond to 8×8 blocks, the inpainting procedure (at least the neighboring luminance values inferring part) will not be performed block-by-block but rather patch-by-patch. The patches are the entities allowing searching for texture-compatible and ‘known’ image fragments, matching the textural information of the to-be-inpainted pixels’ vicinity. Furthermore, the patches are typically square-shaped to avoid favoring any particular spatial direction in terms of inpainting and its size is typically smaller than 8×8 pixels (smaller then the adopted block size for the JPEG coding solution, for instance) in order the inpainting procedure results may achieve the required detail and to properly ‘reconstruct’ small textural image elements.

For better understanding, this module has been divided into several sub-modules which comprise various processing stages and their description will be carried out in separate sub-sections.

5.3.1. Initial Processing

The first sub-module of the Image Inpainting module regards the decoder initial processing and comprises two stages: the RGB-to-YUV Color Space Conversion and the Image Areas to be Inpainted Creation. These stages will be described in the following.

5.3.1.1. RGB-to-YUV Color Space Conversion

In practice, the Standard-based Image Decoder decodes an image which is composed by the luminance for the blocks not to be inpainted and the block luminance average (selected feature) for the blocks to be inpainted; so the color space conversion is applied to this (only luminance) image. Although these two types of information result, in practice, from the same decoder, Figure 35 shows separately the decoded luminance and the decoded feature to make the adopted decoder architecture conceptually ‘cleaner’, as the used solution does not correspond to the general case.

The RGB-to-YUV color space conversion is needed for two reasons:

1. The JPEG decoder software version adopted for the Standard-based Image Decoder module generates, as output, a decoded image in the RGB color space; however, only the luminance component will be inpainted. This fact implicitly requires working, at the decoder side, in the YUV color space.

2. The YUV color space is typically more efficient in taking the human visual perception into account which is important for the inpainting process; this fact also asks for working, at the decoder side, in the YUV color space.

This stage has one output, notably the Y component to which the inpainting process is applied as it is the component carrying most of the image energy and to which the HVS is most sensitive to; inpainting only the luminance areas at decoder is a typical solution adopted by the inpainting research community under the assumption that inpainting the chrominances should be less critical as the HVS is less sensitive to this information and, thus, the luminance inpainting solutions should be good enough for the chrominances.

5.3.1.2. Image Areas to be Inpainted Creation

The function of this processing stage is to create the so-called image to be inpainted (Figure 36 (b) and (d)); this process has to be performed because, in practice, the inpainting procedure will not take place block-by-block but rather patch-by-patch, and the coding mode matrix has been coded and decoded at the block level. Hence, all the pixels belonging to areas to be inpainted must be signaled with the same ‘impossible’ luminance value; otherwise, the patch-based inpainting method would not be possible. As, from a practical perspective, the Standard-based Image Decoder generates a decoded image as output, comprising the luminance for the areas not to be inpainted and the average luminance (extracted image feature) for the areas to be inpainted (Figure 36 (a) and (c) for the images Lena and Peppers, respectively), this decoded image cannot go directly through the image

59

inpainting procedure under the risk of losing the extracted image feature information for the areas to be inpainted as these areas will be filled during the inpainting process. It should be stressed that a copy of the image illustrated in Figure 36 (a) or (c) has been provided to this module, which allows preserving the image feature for later use in the Image Inpainting module.

Figure 36 – Illustration of the Image Areas to be Inpainted Creation stage for 512×512 luminance samples: (a)

and (c) decoded luminance for the areas not to be inpainted combined with the selected image feature for the

areas to be inpainted; (b) and (d) image to be inpainted.

For better understanding, the image to be inpainted consists in an image as illustrated in Figure 36 (b) and (d) which will have its ‘unknown’ areas filled-in by the inpainting module. Note that, all the non-black areas, i.e. the areas not to be inpainted, will never be inpainted.

5.3.2. Neighboring Luminance Inferring

This sub-module is one of the most relevant in the Image Inpainting module and has the task of inpainting the target areas based on the luminance values inferring from ‘known’ neighboring areas. In detail, this sub-module comprises five stages which essentially target the prioritization of the areas to be inpainted and the methods for filling them in. These stages will be described by their order of appearance in the decoder architecture in Figure 35.

5.3.2.1. Border Pixels Identification

This stage aims at identifying the pixels at the border between the currently ‘unknown’ and ‘known’ areas, i.e. the border pixels (white pixels in Figure 37 (b) and (d), for the images Lena and Peppers, respectively), which will become the center of the patches to be filled-in, i.e. the target patches. More precisely, an image pixel e is considered to be a border pixel if:

1. It belongs to the still ‘unknown’ (thus yet to be inpainted) areas, ��, and

2. There is at least one pixel, �, which belongs to the 8-neighbors of the pixel e, i.e. �� ghe�, and also belongs to the currently ‘known’ image areas,��.

Thus, a border pixel is formally expressed as:

�e � ��®�� ghe� ¯ �� (5.1)

Since the inpainting is an iterative process, the superscript � stands for the current inpainting iteration, expressing the fact that the ‘known’ and ‘unknown’ image areas differ from iteration to iteration; naturally, the more the inpainting iterations, the smaller the ‘unknown’ areas and the larger the ‘known’ areas, as the to-be-

(a)

(c)

(b)

(d)

60

inpainted areas are filled-in with luminance values inferred from the not to be inpainted areas and/or from the areas which have meanwhile been inpainted in previous inpainting iterations.

The border pixels identification is then mathematically performed as:

Oik`jk��Jj�m�� f$� �e � �� ®�� ghe� ¯ ��>� i��jkl�mj�� @ (5.2)

In this context, to identify the border pixels, this stage has to be provided with the image to be inpainted which has been generated at the previous sub-module (Figure 37 (a) and (c) for the images Lena and Peppers, respectively) so as to generate, as output, the Border Pixels matrix, which has only two possible labels: ‘1’ if a pixel is a border pixel and ‘0’ otherwise. To ease the reader’s experience and understanding, the description of this sub-module’s algorithm and processing details will be based in the flowchart presented in Figure 38.

(b)(a)

(d)(c)

Figure 37 – Illustration of the Border Pixels Identification stage: (a) and (c) image to be inpainted (black and

non-black areas are to be and not to be inpainted, respectively); (b) and (d) border (white) and non-border

(black) pixels considering the areas to be inpainted in (a) and (c), respectively.

Figure 38 – Border Pixels Identification stage flowchart.

According to Figure 38, the Border Pixels Identification algorithm proceeds as follows:

For all image pixels, in raster scan order, do:

61

1. Pixel to be Inpainted Check – The current pixel value is checked, aiming at finding if it is still to be inpainted, i.e. if e � �E; in that case, the algorithm will go to step 1.1, otherwise, will proceed to the next image pixel and go back to step 1.

1.1. Known Neighboring Pixels Check – For the currently under processing ‘unknown’ pixel, its 8-neighbors will be checked to find if any of them is a ‘known’ pixel, i.e. °� � � ±h²� ¯ �E. If it is, the algorithm will go to step 1.1.1; otherwise, it goes to step 1.1.2.

1.1.1. Border Pixel Classification – Since it has at least one ‘known’ neighbor pixel, the current pixel is classified as border pixel which is formally expressed by Oik`jk��Jj�m�� $. The border pixels are illustrated by the white pixels in Figure 37 (b) and (d) for the images Lena and Peppers, respectively. After, the algorithm will proceed to the next image pixel, thus going back to step 1.

1.1.2. Non-Border Pixel Classification – As it has no ‘known’ neighbor pixel, the current pixel is classified as a non-border pixel which is formally expressed by Oik`jk��Jj�m�� >; this is illustrated by the black pixels in Figure 37 (b) and (d) for the images Lena and Peppers, respectively. After, the algorithm will proceed to the next pixel, thus going back to step 1.

After all the image pixels have been classified either as border or non-border pixels, the processing for this stage is complete.

5.3.2.2. Image Pixels Confidence Initialization

Before any inpainting step takes place, this decoder stage aims at initializing the confidence values expressing how reliable are the image pixels after the decoding process; this will allow prioritizing the patches, since their processing order is not irrelevant for the final inpainting performance. This initialization intends to express the initial confidence of the luminance values for the areas to be and not to be inpainted. Intuitively, this algorithm will assign the confidence values based on two key facts:

1. The reliability of the luminance values available from the standard decoded areas is high as they correspond to the input image, although naturally distorted by the DCT quantization process; therefore, the pixels belonging to these areas will be assigned with the highest confidence value;

2. The luminance data values for the pixels belonging to the areas to be inpainted are totally ‘unknown’ at this phase (black areas in Figure 37 (a) and (c), for the images Lena and Peppers, respectively); therefore, the pixels belonging to these areas will be assigned with the lowest confidence value.

To ease the reader’s experience, this stage description will be based on the flowchart presented in Figure 39.

Figure 39 – Confidence Values Initialization stage flowchart.

This stage is given the image to be inpainted, created at the Image Areas to be Inpainted Creation stage, and generates, as output, the initialized version of the Pixel Confidence matrix using the following mathematical notation:

62

��Jj��i�D�`j��j��e� � � f $� �e� � �>� �e� � ��@

(5.3)

where � designates the areas not to be inpainted (standard decoded areas) and � designates the areas to be inpainted, which together compose the complete image to be inpainted, �; this means that � � �� , as mentioned in Section 2.1, notably in Figure 3. As shown in Figure 35, this initialization stage will be performed only for the first inpainting step and not for remaining ones since the to-be-inpainted pixels’ confidence values will be always updated before proceeding to the next inpainting iteration, as will be detailed in Section 5.3.3. In fact, this is the reason for omitting the superscript � in (5.3) as the processing involved in this stage only occurs for the first inpainting iteration, i.e. � � $; this means that, in this case, the ‘known’ areas correspond only to the standard decoded areas, not yet including already inpainted areas.

In this context, the initial confidence assignment algorithm proceeds as follows:

For all image pixels, in raster scanning order, do:

1. Pixel Classification Check – The current pixel is checked to determine if it is to be inpainted or not. If it is, the algorithm will go to step 1.1 so that the lowest confidence value is assigned to this pixel; otherwise, it will go to step 1.2 so that the highest confidence value is assigned to it.

1.1. Lowest Confidence Value Assignment – As the current pixel belongs to the areas to be inpainted (black areas in Figure 37 (a) and (c)), its confidence value is set to ‘0’, which is formally expressed by: ��Jj��i�D�`j��j�e� � >. As the image feature for the areas to be inpainted is not exploited in the neighboring luminance values inferring process (only after this inpainting task finishes), the decoder ‘virtually’ knows nothing about those areas, fact that is expressed by the aforementioned confidence value assignment. After, the algorithm will proceed to the next image pixel, thus going back to step 1.

1.2. Highest Confidence Value Assignment – As the current image pixel belongs to the areas not to be inpainted (non-black areas in Figure 37 (a) and (c)), its confidence value is set to ‘1’, which is formally expressed by: ��Jj��i�D�`j��j�e� � $. This assignment expresses the fact that, as these areas correspond to standard decoded areas for which the luminance values are known, the confidence value of these pixels is the highest possible. After, the algorithm will proceed to the next pixel, thus going back to step 1.

After the confidence value has been initialized for all image pixels, this stage’s processing is finished and the Pixel Confidence matrix is fully initialized; this matrix will be then provided to the Patch Confidence Computation stage.

5.3.2.3. Patch Confidence Computation

This stage’s function is to prioritize the filling-in order for the to-be-inpainted pixels which, as mentioned in [14], influences the subjective quality of the output image; naturally, it is expected to influence also the corresponding image objective quality. This solution adopts a patch-based data modeling, which is applied to the textural areas selected at the encoder side to be inpainted at the Image Inpainting module. Furthermore, the patches with higher confidence, i.e. those containing more ‘known’ pixel luminance values, are given higher priority for the filling-in process.

As for the other decoder stages, the algorithmic steps in the Patch Confidence Computation stage will be described based on a flowchart, notably the one presented in Figure 40.

63

Figure 40 – Patch Confidence Computation stage flowchart.

This stage is provided with the Border Pixels matrix, which has been generated at the Border Pixels Identification stage, and with the Pixel Confidence matrix which has been initialized at the Image Pixels Confidence Initialization stage. As for its output, this stage will ultimately determine the target patch, i.e. the patch to be filled-in the current inpainting iteration, as evidenced in Figure 35. With this purpose, first, the patch confidence is computed and will be associated with the border pixel at which the patch is centered for the simple reason that the patch area is determined based on the border pixel coordinates, as will be detailed in the following. This is a typical solution adopted in the literature [14] [24].

Thus, for determining the patch confidence, the algorithm proceeds as follows:

For all border pixels in the Border Pixels matrix, in raster scan order, do:

1. Patch Confidence Computation – This step aims at computing the confidence value associated to the patch for the border pixel under consideration, i.e. �² � ³�E; the patches are square-shaped to avoid favoring any spatial direction in terms of inpainting and are centered at the current border pixel. The patch confidence is calculated based on the Pixel Confidence matrix and it is stored at the border pixel location (which is the center of the patch) in this matrix as follows:

��Jj��i�D�`j��j�e� � �� Jj��i�D�`j��j��*+ d+µ��¶·�¯�¸¹� (5.4)

where d stands for the patch, + d+ is the area of the patch in pixels and �E corresponds to the areas for

which the luminance values are currently ’known’, either because they have been standard decoded or they have been inferred through inpainting in previous inpainting iterations. Note that the confidence value associated with the current patch is stored at the pixel level in Pixel Confidence matrix, notably at the location of the border pixel which is the center of the patch being considered. Putting equation (5.4) into words, the confidence value for a given patch is expressed by the ratio between the sum of the confidence values for the pixels within the patch and belonging to currently ‘known’ image areas, and the patch area. Naturally, the confidence values for the pixels that will be filled in are continuously updated along the inpainting process as it will be explained in Section 5.3.3. For better understanding, various patch areas are illustrated in Figure 41.

64

Figure 41 – Illustration of the patch area for patch sides equal to 3, 5 and 7 image pixels.

Intuitively, the role of the patch in the inpainting procedure is to allow performing a template matching among the ‘known’ luminance values within the patch and the corresponding ‘known’ ones in other patches which contain ‘known’ luminance values for all of their pixels. As shown in Figure 41by the blue pixel, the patch is centered at a pixel, notably at a border pixel; for a center pixel to exist, the patch side must be odd, which is formally expressed as follows:

et��m�`j � Tº � $ (5.5)

where ∆ stands for the number of pixels between the patch center, i.e. the border pixel, and the limit of the patch (imposed by the patch side). In this context, given a patch side, º is straightforwardly determined by solving equation (5.5) in order to this variable. Knowing º and the patch center coordinates, the patch area is easily measured as the patch is square-shaped. Naturally, several restrictions have to be applied depending on the location of the patch center pixel and the patch side value, to avoid the patch exceeding the image’s dimensions. The patch is always created having the patch side defined by the user.

In this solution, the patch side is user-defined before any inpainting procedure starts and it may either be set to 3, 5 or 7 pixels, thus corresponding to a patch area of 9, 25 or 49 image pixels. The fact that different patch areas may be selected emerged as a relevant add-on to the developed solution since there is barely any justification in the literature for adopting a given patch area and also because it has not become clear if the patch size has or not a significant impact on the inpainting results. However, the patch area typically adopted in the reviewed pure inpainting and inpainting-based image coding solutions is inferior to 8×8 pixels, as inpainting patches with larger areas do not seem to allow properly restoring the luminance values to the required level of detail. In this context, all patch sides made available in this solution are below 8 (and odd), i.e. 3, 5 or 7. To ease the reader’s experience, an example illustrating the computation of the patch confidence value (by means of equation (5.4)) for a given patch is provided in Figure 42 and will be described in the following.

65

...

......

...

...

...

...

...

...

...

...

...

...

...

Figure 42 – Zoomed illustration of patch confidence computation for the first inpainting iteration: (a) areas to

be inpainted (‘unknown’) and standard decoded (‘known’) areas (black and non-black areas); (b) pixel

confidence values within the current (size 3) patch (yellow-lined).

As shown in Figure 42 (a), the current size 3 patch (yellow-lined) contains some ‘known’ luminance values, whereas others are still unknown (black areas). As this example illustrates, in the first inpainting iteration, the ‘known’ areas correspond only to the standard decoded areas which will never be inpainted and the confidence values matrix (5.3) is filled with ‘1’ or ‘0’. In this context, the confidence for the current patch would be 5/9 as defined by equation (5.4), as there are five ‘known’ pixels (corresponding to ��Jj��i�D�`j��j��e� � $) among the nine available in the yellow-lined patch.

Intuitively, equation (5.4) will, as mentioned in [14], approximately ‘force’ the areas to be inpainted to be closed in a concentric fashion, meaning that, as the filling-in process progresses, the outer layers of these areas will tend to be associated with greater confidence values and, therefore, will be inpainted first. As it will be detailed in an up-coming stage, after a given inpainting iteration has been completed, the confidence value for the pixels within the current patch which have just been inpainted will be updated so that the patch priorities in the next inpainting iteration may be recomputed and adapted to the evolving decoded image. As aforementioned, in this solution, the patches for which the confidence is the highest, are given the highest priority for inpainting. Intuitively, this expresses the fact that the patch priority is determined by the confidence value associated to that patch, which consists in a measure of the amount of reliable information within the patch. This means that the patches containing more standard decoded pixels and/or more pixels which have already been filled-in through inpainting, will have higher priority than those which are located, for instance, in an inner part of an area to be inpainted. Moreover, the more the reliable pixels in the patch, the higher the patch priority, as illustrated in Figure 43, where the green-lined patch contains more ‘known’ pixels than the red-lined patch; therefore, the green-lined patch will be given higher priority. Note that the patches which are centered at ‘corners’, as the green-lined patch, are filled first since they are surrounded by more reliable pixels, meaning that they have more pixels to which to match.

Figure 43 – Zoomed example patches: the green-lined patch is given higher inpainting priority than the red-

lined patch; black areas represent the areas to be inpainted and non-black areas represent currently known

luminance values.

66

Among all patches centered at border pixels, the highest confidence patch, i.e. the target patch, will be the one which will be filled-in in the current inpainting iteration. However, there might be patches with precisely the same confidence values, either because at the first inpainting iteration the number of ‘known’ pixels was the same or the sum of the confidence values for the pixels within a patch, in any other iteration, is the same. To solve the patch confidence ‘draws’, the solution proposed in [24], which consists in including another term, typically called in the literature as the data term, has been tested; in [24], this data term intends to give additional inpainting preference to the patches containing a higher percentage of edge pixels and which have the higher variance for its constituent pixels. In particular, using the notation in [24], the patch priority would be given as:

�t��k�ik��K�e� � �i�D�`j��j�rt��je� » £t�t�rt��je�� e � �� (5.7)

where the data value is computed as:

£t�t�rt��je� � ¼�½¾$� ] �µ�¶·¯¸¿À » rtk9 d<, d, � �e � �� (5.8)

In equation (5.8), �Á stands for the edge-map of the ’known’ areas and rtk9 d< represents the variance

of the pixel values within the patch. After testing, this criterion did not noticeably improve the inpainting results, because few edge pixels are found within the patches as only textural areas have been selected to

be inpainted; this means that the data term value would almost always be equal to rtk9 d< , d,Â . It

should be mentioned that this criteria was used in [24] for pure inpainting purposes; therefore, all information within the patch is known, which does not happen in an inpainting-based image coding solution. This fact also caused the patch variance to be zero very often as, for some patches, the known luminance values are the same; this would make the patch priority to be zero, although the patch could have the highest confidence for the current inpainting iteration. Although there are other data terms in the literature, they mainly intend to give priority to structural areas which are also inpainted in those solutions (but not here) [14]; they are also extremely complex and computationally demanding. As no structural areas are inpainted in the developed inpainting-based image coding solution, most data terms adopted in the literature do not fit in with benefit. Although many experiments have been made, it has not been possible to find a good metric for solving the patch confidence ‘draws’ in the sense of really making a difference in terms of inpainting quality; in this context, no additional preference has been given to patches with the same confidence values which means that, in case of a ‘draw’, the selection is performed randomly among the highest confidence patches. Nonetheless, assigning additional preference for patches with the same priority should be matter of concern for future work.

As typically in the literature, the selected highest priority patch will be called hereafter the target patch, as it is the one which will be inpainted in the current inpainting iteration.

5.3.2.4. Source Patch Determination

The Source Patch Determination stage is responsible for determining the most similar patch to the target

patch, i.e. the so-called source patch; therefore, the source patch is likely to be a good match. This naming, which is typically adopted in the literature, intends to express that the unknown luminance values within the target patch will be inferred from the corresponding known ones in the source patch. Formally, the source patch is expressed as:

�� ¬�¼Ã�¶ÄÅÆÇÈÇÅ¹É�¸¹ ` EÊËÌÍE � ÎÊÏÐÑÐÊEÍ� (5.9)

where the distance ` EÊËÌÍE � ÎÊÏÐÑÐÊEÍ� represents the ‘degree of similarity’ between the known pixels in a

candidate patch and the known pixels within the target patch. A candidate patch corresponds to a patch in which all luminance values have to be known because:

67

• The distance between the target and a candidate patch is computed based only on the known luminance values in both patches, i.e. only these pixels will allow the template matching;

• The ‘unknown’ luminance values in the target patch will be filled-in with the luminance values from the candidate patch minimizing ` EÊËÌÍE� ÎÊÏÐÑÐÊEÍ�, notably with those which have not contributed to the

distance computation.

In this context, the distance between the target patch for the current inpainting iteration and the various candidate patches should be computed so the source patch, i.e. the candidate patch minimizing ` EÊËÌÍE � ÎÊÏÐÑÐÊEÍ�, is found. In this solution, the distance ` EÊËÌÍE � ÎÊÏÐÑÐÊEÍ� has been defined as the SSD

between the known luminance values in both the candidate and target patches, as typically found in the literature [4]. The candidate patch minimizing the SSD will be then considered to be the source patch, meaning that its luminance values will be copied to fill-in the corresponding ‘unknown’ pixels in the target patch.

The source patch is sought in the target patch ‘known’ surroundings, which are limited by a search window. This search window is considered to be square-shaped in order no spatial direction is favored in terms of inpainting and it is centered at the border pixel associated with the target patch, as illustrated in Figure 44. This choice has been motivated by two main reasons:

• Limiting the decoder complexity since the involved computational power grows quickly with the size of the search window;

• It is very likely that the areas to be inpainted may be properly inferred from the closer ‘known’ surrounding areas, with farther away areas bringing little value added.

In this solution, the search window side may be set to 7, 9 or 11 pixels, corresponding to a search window of 7×7, 9×9, or 11×11 pixels. The decision of making available only three search window side values targets limiting the decoder complexity. Note that, independently of the number of options made available to the user, the search window side must be odd, such as the patch side, since the square-shaped search window needs to have a center pixel.

Allowing the user to select a search window side before any inpainting iteration takes place emerged as a straightforward add-on, which is typically not available in the literature. Note that the search window area cannot under any circumstance, be the same as the patch area, as the search window’s center pixel is also the target patch’s center pixel. If both areas were the same, the search window would be coincident with the target patch which would make it impossible to search for a source patch in the target patch’s surroundings. In this context, the allowable window side values for each patch side are summarized in Table 2.

Table 2 – Search window sides values versus patch side values.

Patch Side [in pixels] Search Window Side [in pixels]

3 7, 9, 11

5 7, 9 , 11

7 9 , 11

The suggested default search window side value is 11 pixels, as it is the search window value which is common to the three possible patch sides as shown in Table 2. The adoption as default window size value of 11 pixels instead of choosing the value 9 which is also common to all possible patch side values is related to the fact that this high value (11) allows ‘exploring’ a larger image area when searching for the source patch, eventually increasing the number of candidate patches among which the source patch is selected, without significantly ‘boosting’ the computational effort.

68

Figure 44 – Illustration of a search window, a target patch, a potential candidate patch, two candidate patches

and several unknown luminance values.

Next, the algorithmic steps involved in this stage will be described, based on the flowchart presented in Figure 45.

69

Figure 45 – Source Patch Determination stage flowchart.

For each pixel in search window associated to a border pixel, in raster scan order, do:

1. Known Pixel Check – The first step regards testing if the luminance value for the current search window pixel is known; at this stage, it does not matter if the pixel value comes from a not to be inpainted area or from an already inpainted area. If the luminance value is known, the algorithm will go to step 1.1; otherwise, the algorithm will proceed to the next search window pixel, thus going back to step 1.

1.1. Potential Candidate Patch Creation – As the luminance value for the current search window pixel is known, then a potential candidate patch will be created, having this pixel at its center, which is illustrated in Figure 44 by the orange patch. This patch is called potential candidate patch as, although it is centered at a ‘known’ pixel, that condition is insufficient for the moment to allow assessing if there is any other ‘known’ pixel in this patch.

1.1.1. Potential Candidate Patch Unknown Pixels Check – Having created a potential candidate patch in step 1.1, the other pixels contained in that patch will be checked to test if, for any of them, the corresponding luminance value is so far unknown. If that is the case, the potential candidate patch

70

cannot in fact be considered a real candidate patch because that would mean that a patch which could eventually be a source patch (if minimizing the SSD), and hence be used to fill-in the unknown luminance values in the target patch, would have unknown luminance values. This fact is illustrated in Figure 44 notably for the orange patch: this patch corresponds to a potential

candidate patch but the luminance value for one of its constituent pixels is unknown (black pixel in Figure 44); therefore, this potential candidate patch will be discarded and never be considered a real candidate patch. After this check, the next action will either be to proceed to the next search window pixel and, thus, go back to step 1, or ending the processing if this is the last search window pixel. On the other hand, if this potential candidate patch does not contain any unknown luminance value, the algorithm will proceed to step 1.1.2.

1.1.2. Candidate Patch Classification – If the potential candidate patch does not contain any unknown luminance values, then it is classified as a candidate patch; this means this patch is a candidate for being a source patch, depending on its distance, i.e. degree of similarity, to the target patch. A candidate patch cannot have a single unknown luminance value since, otherwise, the filling-in procedure could not be completed as there would no luminance values to fully inpaint the target

patch. Naturally, the candidate patch needs to have the same size as the target patch; this setting is automatically made before the inpainting procedure starts, after the target patch size has been defined by the user. As an example, both the pink and purple patches in Figure 44 are candidate

patches as the luminance values for all its constituent pixels are known.

1.1.3. SSD Computation – The distance between each candidate patch and the target patch is computed through the SSD between the pixels with known luminance values within the target and candidate patches, i.e. �� EÊËÌÍE�� Ò ÎÊÏÐÑÐÊEÍ�� ; this is formally expressed as:

ÓÓ£9 �¢�Ô�� ¢��q�¢��< �]]Õ EÊËÌÍEÃ� Ö� %� ÎÊÏÐÑÐÊEÍÃ� Ö�×vq/ (5.10)

If this is the first SSD computation for determining the source patch, the current SSD will be stored as the minimum SSD, since there is no other value for comparison; if that is not the case, the current value will be compared to the minimum SSD value already stored and the minimum will be updated if the current value is lower than the current minimum SSD. As an example, assume the SSD between the target patch and the pink candidate patch in Figure 46 is 6 and corresponds to the minimum SSD so far. If among all the other candidate patches, the purple candidate patch in Figure 46 has a SSD of 3, then the minimum SSD will be updated as it is lower than the SSD minimum so far, this means 6.

Figure 46 – Example of SSD updating with a target patch and two candidate patches in the search window used

in this inpainting iteration.

71

After all search window pixels have been processed, the candidate patch to which the absolute minimum SSD is associated will be considered as the source patch, thus providing the luminance values to fill the unknown luminance values in the target patch.

5.3.2.5. Target Patch Filling-in

In this stage, only the pixels for which the luminance values are ‘unknown’ in the target patch will be filled with the corresponding ones in the source patch (pixels with a‘-1’ in Figure 47, left).

Figure 47 – Example illustrating the Target Patch Filling-in process.

5.3.3. Filled Pixels Confidence Updating

This sub-module is responsible for the updating needed before processing the next inpainting iteration; this essentially consists in updating the confidence values for the pixels in the current target patch which have just been inpainted. After, there will be a check to test if there are still any pixels to be inpainted, i.e. if there is still any ‘unknown’ luminance value in the image under processing. In that case, the image will have to be further processed; otherwise, all the image areas that were meant to be inpainted have, in fact, already been inpainted. In detail, a filled-in pixel, e:q¡¡��, has its confidence value, ��Jj�m��i�D�`j��j�e:q¡¡��, updated by the

confidence computed for the target patch. As mentioned in [14], this allows:

• Updating the confidence values without recurring to any image specific parameters or user interaction;

• ‘Forcing’ the confidence values to decay as the inpainting iterations proceed, thus expressing the fact that there is less certainty for the pixel positions corresponding to the inner hole pixels since they are farther away from the not-to-be-inpainted decoded pixels.

As shown in Figure 35, after the pixel confidence values have been updated, the algorithm will proceed to the next inpainting iteration, if there is still any ‘unknown’ luminance value in the image to be inpainted. Otherwise, the inpainting based on the neighboring luminance values inferring task has reached the end, generating the inpainted luminance component (Figure 48 (d) and (h) for Lena and Peppers, respectively). Naturally, the forced insertion of not-to-be inpainted blocks using a chess pattern, as explained in Chapter 4, has a key role in the initialization and evolution of the pixel confidences and, thus, on the patch confidences and inpainting order.

72

(e) (f) (g) (h)

... ... ...

(a) (b) (c) (d)

Inpainting Iteratin 9800Inpainting Iteration 3200 Inpainting Iteration 6800

.........

(e) (f) (g) (h)

... ... ...

(a) (c) (d)

Inpainting Iteration 1

.........

Inpainting Iteration 1 Inpainting Iteration 4700Inpainting Iteration 3400Inpainting Iteration 2000

Figure 48 – Example of the evolution of the neighboring luminance values inferring process along the inpainting

iterations for Lena and Peppers.

5.3.4. Block Pixels Luminance Adjustment

The processing of this sub-module aims at enhancing the image resulting from the inpainting process based on neighboring luminance values inferring process, by using the selected image feature that has been extracted, at the encoder side, for the areas which have meanwhile been inpainted. In this case, the selected image feature is the luminance average at the block level; in this codec, the coding of these features ‘came for free’ through the JPEG codec. This feature represents very important information from the original image to which the HVS is very sensitive; thus, it will allow adjusting the luminance average values block-by-block, after the inpainting from neighboring areas. This adjustment, will certainly improve both the objective and subjective qualities of the output image, in comparison with the inpainted image generated by just inferring the target areas from their surroundings.

The algorithmic steps involved in this sub-module will be described based on the flowchart presented in Figure 49.

73

Figure 49 – Block Pixels Luminance Adjustment sub-module flowchart.

This stage is provided with three inputs, notably the block luminance averages for the blocks that have been inpainted, the coding mode matrix and the inpainted luminance blocks, so as to generate the final adjusted luminance, i.e. a version of the inpainted luminance component where the average block luminance values are adjusted to the ‘original’ block luminance averages, which have been extracted as an image feature at the encoder; in practice, the block luminance averages are not precisely the original ones as they have been quantized during the JPEG encoding process. This adjustment process proceeds as follows:

For each inpainted block, in a raster scan order, do:

1. Average Block Luminance Difference Computation – This step aims at computing the difference between the current inpainted block luminance average and the decoded block luminance average as:

�uOq �� u ØOq Ù�d¢q��Ú % \wq � �Oq ��et��j` (5.11)

where Oq designates the current inpainted block, u�� the average value and \wq is the decoded image feature for the current inpainted block, i.e. ‘original’ block luminance average naturally distorted by the encoder quantization.

2. Block Pixels Luminance Adjustment – All pixels in the current already inpainted block are adjusted by compensating the difference computed in step 1, for all pixels in the current inpainted block, which is mathematically expressed as:

74

�Û�v��J� K� � �Ù�d¢q��J� K� � ��uOq �� e � Oq ��et��j` (5.12)

where �Û�v�� and �Ù�d¢q�� stand for the adjusted luminance component and the luminance

component after the inpainting process. Intuitively, this adjustment intends to perform a DC shift to the blocks which have been inpainted by exploiting the selected image feature, i.e. the block luminance averages. As the HVS is very sensitive to the block luminance average values, this adjustment allows not only exploiting an image feature to enhance the objective quality inpainting results, but also to improve the perceived visual impact of the output image.

After the block average luminance values adjustment has been performed for all inpainted blocks, the adjusted luminance component will be generated. This sub-module, which exploits the ‘original’ block luminance averages is, in fact, the main novelty of the developed decoder solution and it is believed to be a relevant contribution to the inpainting literature; although other image features have already been exploited for inpainting purposes, e.g. the edge information in [4], the block luminance averages have not been used to enhance the inpainting results, as far as it is known by the author of this Thesis.

To summarize, this chapter has focused on the algorithmic details of the inpainting-based image decoder, notably by in-depth describing the Image Inpainting module which is the main novelty of this image coding solution, when comparing to standard image coding solutions. In this context, the reader has been acquainted with the methods for prioritizing the areas to be inpainted, determining the source patches from which the missing information will be inferred and exploiting the selected image feature, which has been extracted at the encoder side, to help the decoder further enhancing the inpainting results.

Regarding the already existing solutions in the literature, the main novelty proposed at the decoder side corresponds to the exploitation of the selected image feature, i.e. the block luminance averages, for enhancing the areas inpainted by inferring luminance values from the their neighboring areas. It is important to stress that, in this solution, this feature has been encoded ‘for free’, as it would have been sent anyway as DC DCT coefficients, and it has a positive impact in the perceived visual quality of the output image.

As for Chapter 6, it will be focused on the performance assessment of the proposed inpainting-based image codec.

75

Chapter 6

6. Performance Evaluation

This chapter aims at evaluating the performance of the developed inpainting-based image coding solution. With this purpose, this solution has been compared with the two selected image coding benchmarks, notably with the most representative and widely adopted image coding benchmark, the JPEG standard, and the state-of-the-art image coding standard, JPEG 2000.

For better understanding the strengths and weaknesses of this inpainting-based image coding solution and, thus, the applications where it may be successfully adopted, this codec is tested with a representative set of test conditions and compared to the selected image coding benchmarks based on their Rate-Distortion performance, which includes an objective perceptually-driven and a non perceptually-driven image quality evaluation.

6.1. Test Conditions

Before going into the details of the RD results, it is crucial to state the test conditions under which the performance of the developed image codec has been evaluated. In this context, the test conditions have to be relevant and meaningful from a practical point of view, so that the conclusions are reliable and useful for assessment the current codec and developing future work.

Having this in mind, first, the test images that were selected for the performance evaluation will be presented and shortly described. Next, the coding conditions considered are defined and, finally, the comparative performance evaluation between the developed solution and the selected image coding benchmarks will be provided.

Test Material

The test images considered to evaluate the performance of the developed image codec have been selected from the USC-SIPI image database [11]; the selected test images – Lena, Peppers and Jet - are presented in Figure 50 and contain rather different image properties.

76

Figure 50 – Test images: (a) Lena; (b) Peppers; (c) Jet.

• Test Images – All selected test images correspond to natural images in the RGB color space; the images are briefly described in the following, according to their main characteristics, notably in terms of structure and texture:

• Lena (Figure 50 (a)) – The image Lena is a very ‘rich’ image as it contains a mix of detail, flat textural areas (notably over the Lena’s straw hat and below her chin), structural areas and even some shading. This image intends to test the developed inpainting-based image coding solution’s performance when not-so-similar textural areas have to be inpainted.

• Peppers (Figure 50 (b)) – The image Peppers contains a significant number of structural areas, mostly due to the presence of several kinds of peppers which are overlapped; but the image also contains textural areas with intense illumination.

• Jet (Figure 50 (c)) – The image Jet comprises structural areas which are essentially associated to the mountains and to the shape of the jet itself. As for the textural areas, they are not very homogeneous as they are mostly associated with the clouds surrounding the mountains. This is believed to pose some difficulties as far as inpainting is concerned.

• Spatial Resolution – All the test images have a 512×512 samples luminance spatial resolution.

Coding Conditions

After presenting the test material, the reader should be acquainted with the coding conditions that will be considered for the performance evaluation:

• Patch Side – The patch side tested values correspond to 3, 5 and 7 pixels, since the patch area typically adopted in the reviewed pure inpainting and inpainting-based image coding solutions is inferior to 8×8 pixels, as inpainting patches with larger areas does not seem to allow to properly restore the luminance values to the required level of detail. In this context, all patch sides made available in this solution are below 8 pixels (and odd), i.e. 3, 5 or 7 pixels. The patch side must be odd in order the square-shaped patch has a center pixel.

• Window Side – As mentioned in Section 5.3.2.4, the search window side has been fixed to 11 pixels for all conducted tests, as this is the search window side value which is common to the three possible patch sides (3, 5 or 7 pixels). This choice has been taken since this value is larger than any possible patch side value among those available (this is mandatory, so that the source patch may be determined) and because it is common to all possible patch side values (see Table 2). Although considering a search window side of 9 pixels would also meet these requirements, the higher value (11) should allow ‘exploring’ a larger image area when searching for the source patch, eventually increasing the number of candidate patches among which the source patch is selected, without significantly ‘boosting’ the computational effort.

• Neighbors for Block Variation Metric Computation – The number of neighbors considered for the computation of the block variation metric is 8; this means no spatial direction is favored when selecting the not-to-be-inpainted additional textural and the to-be-inpainted textural blocks.

• Neighbors for Not-to-be-Inpainted Core Textural Blocks Classification – The number of neighbors considered for testing the classification of the not-to-be-inpainted core textural blocks is 4 and 8; studying these two values should allow understanding the benefits of adopting a 4-neighborhood instead of a 8-neighborhood.

77

Benchmarking

After presenting the coding conditions, the image coding benchmarks will be presented in the following:

• Image Coding Benchmarks – The developed inpainting-based image coding solution is compared both to the JPEG and JPEG 2000 coding standards. The JPEG coding standard is a block-based coding standard which is widely adopted in the market, whereas the JPEG 2000 coding standard is the state-of-the-art coding standard since it is typically more efficient than JPEG; moreover, it provides very flexible quality and spatial resolution scalability but it is much less deployed in the market.

• Coding Rates – The coding rates, in bits per pixel (bpp), used to derive the RD points in the RD curves for the image coding benchmarks are: 0.25, 0.5, 0.75, 1.00, 1.25 and 1.50 bpp. For the JPEG coding solution, it is not possible to define directly the desired rate, but rather only indirectly via a quality parameter. In practice, the selected quality parameters for each rate were such that the coding rates resulted in most approximate below or equal to the desired coding rate above defined.

• Performance Metrics – The performance evaluation for this solution is conducted based on two metrics, notably:

• PSNR – The Peak Signal-to-Noise Ratio expresses, in a logarithmic decibel scale, the ratio between the maximum power of a signal and the power of the corrupting noise. When applied to image coding purposes, the PSNR is typically used as a metric to evaluate the objective quality of the totally decoded/reconstructed, £j�i`j`�� , image having the input/original image as reference, Ük�L��t�� ; therefore, the noise here corresponds to the error introduced by the used image (lossy) codec. Considering the image’s dimensions to be A ¥ �, this metric is typically defined through the mean squared error

BÓu � $A ¥ � ]]�Ük�L��t�� % �£j�i`j`�� /�Ýv_Þ

8Ýq_Þ (6.1)

coming for the PSNR:

�Ó�ß � $> ª«¬Þ FTRR/BÓuG (6.2)

given that the maximum luminance value, considering 8 bits per sample, is Th % $ � TRR. Although very popular, the PSNR is not a very reliable metric for subjective quality evaluation as it does not take into account the perceived visual quality, and thus the peculiarities of the human visual system. The reader should be acquainted with the fact that in lossy image compression (which is the case for the JPEG and JPEG 2000 standards and also the proposed inpainting-based image codec), the PSNR typically ranges from 30 to 50 dB.

• MS-SSIM – The Multi-Scale Structural Similarity (MS-SSIM) index measures the quality of the perceived image, taking into account the signal image density, the distance between the image plane and the observer and the perceptual capability of the observer’s visual system. This metric is computed as:

BÓ % ÓÓ�BJ� K� � � ��;J� K��àáâÕ�vJ� K�×ãx;v_q ¥ ÕmvJ� K�×äx (6.3)

where B is the finest scale (scales incorporate image details at different image resolutions and viewing

distances) obtained after B % $ scaling iterations, �;J� K�, �vJ� K� and mvJ� K� are the luminance,

contrast and structure components at their different scales and \v, 2vand bv are constants set according

to the M scale, aforementioned, so they match the HVS contrast sensitivity function [25]. In this context, the MS-SSIM is the selected metric for performing the objective perceptually-driven image

78

quality evaluation given the promising results in the literature which report a Pearson’s correlation of up to 0.69 with subjective quality scores. Furthermore, the MS-SSIM index is a decimal value between -1 and 1 where the score 1 is associated to the case where the original and the decoded images are perceptually identical.

6.2. Rate-Distortion Performance Evaluation

This section reports the RD performance for the proposed inpainting-based image coding solution, under the aforestated coding conditions and in comparison with the most relevant image coding benchmarks, notably JPEG and JPEG 2000. In particular, four studies have been conducted regarding: i) the performance impact of the Not-to-be-Inpainted Additional Textural Blocks Seeding; ii) the performance impact of the Not-to-be-

Inpainted Core Textural Blocks Neighborhood; iii) the performance impact of the Patch Side; and iv) the performance impact of the Block Luminance Averages Adjustment. After, the suggested default coding conditions for the proposed image codec will be presented based on the experiments that have been carried out. For better understanding the influence of the selected coding conditions and to ease the reader’s experience, the performance evaluation will be conducted in successive steps. First, tests for some relevant coding parameters and conditions will be performed to determine the parameter value and conditions reaching the best performance for the developed coding solution; after, this best solution will be compared with the selected image coding benchmarks in order fair conclusions are derived. For all tests, the conclusions will be taken metric by metric – PSNR and MS-SSIM – considering all the adopted test images.

6.2.1. Studying the Not-to-be-Inpainted Additional Textural Blocks

Seeding Performance Impact

The first set of results regards testing alternative methods adopted for the further classification of the potential to-be-inpainted blocks either as not-to-be-inpainted additional textural or to-be-inpainted textural blocks; in practice, this first study targets understanding the performance impact of spreading some to not-to-be-inpainted block ‘seeds’ in larger textural areas. This test is believed to be the more relevant one for starting the ‘walk’ towards the best proposed solution, since the RD performance is rather different for the two solutions to be here evaluated.

As mentioned in Section 4.1.3.2, the first developed method adopts the centroid of the PDF as the Block Variation Metric for the classification threshold which allows selecting the not-to-be-inpainted additional

textural and the to-be-inpainted textural blocks. As a result, the to-be-inpainted textural blocks would form very large areas which may not be so accurately inpainted in some cases; this would very likely bring important limitations to the maximum quality/distortion reached for a given rate, and would also lead to very noticeable artifacts in the decoded/reconstructed image. For this reason, a ‘chess-like’ seeding pattern of not-to-be-

inpainted additional textural blocks has been incorporated in the developed solution so as to provide more ‘local’ and uniformly scattered information for the texture to be ‘created’ by inferring the luminance values from the surroundings of the ‘unknown’ areas. With the purpose of testing the two developed classification methods, the following cases are here assessed:

• IST-Inpainting/LA (IST stands for Instituto Superior Técnico and LA for Large Areas) corresponding to the first developed solution where there are no ‘chess-like’ seeding patterns of not-to-be-inpainted

additional textural blocks involved;

• IST-Inpainting/Chess corresponding to the second developed solution, thus including some ‘chess-like’ seeding pattern of not-to-be-inpainted additional textural blocks.

In this study, the following parameters are fixed to limit the encoder complexity:

• Patch Side is set to 3 pixels (since this will be the solution selected in a following section);

• 4 neighborhoods are adopted for the not-to-be-inpainted core textural blocks (since adopting 8 neighborhoods does not allow achieving any significant RD improvements while it would increase the encoder complexity, as will be seen in a following section).

79

RD Performance: PSNR versus Bitrate

Figure 51 to Figure 53 present the RD curves for the PSNR metric for each adopted test image. Analyzing the RD variation in these figures, the following conclusions may be taken:

• As expected, the PSNR increases with the bitrate for both tested coding solutions (LA versus Chess), first, in a more steep fashion and, after, with a slower variation until a saturation level is reached.

• For the lower bitrates, the PSNR difference between the two methods does not seem to be very significant, but the IST-Inpainting/LA (4 Neighbors) solution achieves slightly worse RD performance results, especially for the test image Lena.

• For the medium and higher bitrates, this difference becomes increasingly more evident in favor of the IST-Inpainting/Chess solution. While for lower bitrates inpainting larger or smaller areas is not a major ‘problem’ since there is a very intense block effect which will ‘compensate’ the eventual bad results from the larger inpainted areas, for the highly detailed images obtained at higher bitrates this does not happen. Hence, the IST-Inpainting/Chess solution ‘wins’ in terms of the PSNR metric.

• Considering Table 3, the IST-Inpainting/LA (4 Neighbors) solution allows inpainting about up to 7% more image blocks than the IST-Inpainting/Chess (4 Neighbors), meaning that it allows increasing the bitrate savings; however, as shown in the charts, this may not have a positive impact in terms of quality and thus PSNR. Furthermore, considering Table 3 and Figure 51 to Figure 53, it may be concluded that the images for which the percentage of inpainted blocks is the highest, i.e. the Lena and Jet, reach worst PSNR results than the images for which this percentage is lower, i.e. Peppers. This effect was expected, as the IST-Inpainting/LA (4 Neighbors) solution does not use any seeding pattern to provide some ‘local’ texture spreading to improve the quality of the reconstruction.

Table 3 – Number of inpainted and not inpainted blocks for the IST-Inpainting/Chess (4 Neighbors) and IST-

Inpainting/LA (4 Neighbors) solutions.

IST-Inpainting/Chess (4 Neighbors) IST-Inpainting/LA (4 Neighbors)

Inpainted

Blocks

JPEG

coded

Blocks

Blocks

Inpainted

[%]

Inpainted

Blocks

JPEG

coded

Blocks

Blocks Inpainted

[%]

Lena 518 3578 12,65 748 3348 18,26

Peppers 348 3748 8,50 533 3563 13,01

Jet 631 3465 15,41 870 3226 21,24

Figure 51 – PSNR RD performance for studying the not-to-be-inpainted additional textural blocks seeding for

the image Lena.

27,00

32,00

37,00

42,00

0,200 0,400 0,600 0,800 1,000 1,200 1,400

PS

NR

[d

B]

bits per pixel (bpp)

IST-Inpainting/Chess (4 Neighbors)

IST-Inpainting/LA (4 Neighbors)

80


the image Peppers.


the image Jet.

RD Performance: MS-SSIM versus Bitrate

The MS-SSIM RD performance results are presented in Figure 54 to Figure 56. Considering the RD variation in these figures, the following conclusions may be taken:

• As expected, the MS-SSIM increases with the bitrate for both tested coding solutions (LA versus Chess), first, in a more steep fashion and, after, with a slower variation; hence, the MS-SSIM index increases with the bitrate, meaning that the more quality the decoded image has, the more similar it becomes to the original input image. For the MS-SSIM metric, the saturation effect is much more intense expressing the fact that when the decoded image accuracy regarding the original increases above a certain limit, there is no impact in the perceived quality since the user cannot ‘perceive’ such level of additional detail; this is an effect the PSNR cannot express.

• The MS-SSIM results for the two tested coding solutions differ in a rather noticeable way. For all bitrates, the MS-SSIM RD performance associated with the IST-Inpainting/LA (4 Neighbors) solution are worse than those obtained for the IST-Inpainting/Chess (4 Neighbors) solution. The fact that the IST-

Inpainting/LA (4 Neighbors) methods implies inpainting larger areas (thus with fewer helpers) has a strong impact on the reconstructed quality which perceived quality will be naturally lower than the quality obtained with the IST-Inpainting/Chess (4 Neighbors) solution.

27,00

29,00

31,00

33,00

35,00

37,00

39,00

41,00

0,200 0,400 0,600 0,800 1,000 1,200 1,400

PS

NR

[d

B]




27,00

29,00

31,00

33,00

35,00

37,00

39,00

41,00

0,200 0,400 0,600 0,800 1,000 1,200 1,400

PS

NR

[d

B]




81

Figure 54 – MS-SSIM RD performance for studying the not-to-be-inpainted additional textural blocks seeding

for the image Lena.


for the image Peppers.


for the image Jet.

0,910

0,920

0,930

0,940

0,950

0,960

0,970

0,980

0,990

1,000

0,200 0,400 0,600 0,800 1,000 1,200 1,400

MS

-S

SIM




0,900

0,920

0,940

0,960

0,980

1,000

0,200 0,400 0,600 0,800 1,000 1,200 1,400

MS

-S

SIM



IST-Inpaintng/LA (4 Neighbors)

0,860

0,880

0,900

0,920

0,940

0,960

0,980

1,000

0,200 0,400 0,600 0,800 1,000 1,200 1,400

MS

-S

SIM



IST-Inpaintng/LA (4 Neighbors)

82

Based on Figure 51 to Figure 56, it may be concluded that the PSNR and MS-SSIM metrics are negatively affected when considering a more ‘greedy’ solution in the sense of reducing the not-to-be-inpainted additional

textural areas which work as important helpers in terms of the quality achieved for the areas to be inpainted. It has also been concluded that the ‘greedy’ strategy of having very large areas to be inpainted does not allow maximizing the perceived image quality as the inpainting solution would have to fill-in, without any a close ‘clue’, the luminance values for inner areas; scattering textural ‘seeds’ along the image, to provide more local textural information for the areas to be inpainted is, in fact, a better solution.

Having all the above stated factors into account led to discarding the ‘greedy’ solution in terms of inpainting blocks, this means the IST-Inpainting/LA (4 Neighbors) solution. The major conclusion of this study is related to the fact that it is not only the number or percentage of blocks to be inpainted that matters, but also the way they are scattered in the images. As one of the objectives of this Thesis is to maximize the compression performance, it has been decided that the ‘winning’ solution here is the IST-Inpainting/Chess (4 Neighbors). Hence, this will be the solution for which the RD impact of the aforestated test conditions will be further studied in the following.

6.2.2. Studying the Not-to-be-Inpainted Core Textural Blocks

Neighborhood Performance Impact

The second performance evaluation intends to test the influence on the RD results of the number of neighbors considered for classifying the not-to-be-inpainted core textural blocks. With this purpose in mind, the ‘winning’ solution from the previous section, this means the IST-Inpainting/Chess (4 Neighbors) solution, is compared to the IST-Inpainting/Chess (8 Neighbors) solution, i.e. the only differing parameter is the number of neighbors used for the aforestated classification. The RD results are presented in the following.


Figure 57 to Figure 59 present the PSNR RD performance results for the tested images. Analyzing the RD results in these figures, the following conclusions may be taken:

• As expected, the PSNR increases with the bitrate and their values roughly fall into the typical range for lossy image compression.

• Considering 4 or 8 neighbors in the classification of the not-to-be-inpainted core textural blocks does not seem to have a detectable influence on the PSNR achieved for all bitrates.

Figure 57 – PSNR RD performance for studying the not-to-be-inpainted core textural blocks neighborhood

impact with the image Lena.

29,00

31,00

33,00

35,00

37,00

39,00

41,00

43,00

45,00

0,200 0,400 0,600 0,800 1,000 1,200 1,400

PS

NR

[d

B]




83


impact with the image Peppers.


impact with the image Jet.

RD Performance: MS-SSIM versus Bitrate

The MS-SSIM RD performance results are presented in Figure 60 to Figure 62, for all the tested images. Considering these results, it is possible to conclude that, the MS-SSIM differences between the two cases for the same bitrate do not bring ‘ground-breaking’ performance improvements with the exception of a single RD point where the 8 Neighbors solution shows a lower rate for the same MS-SSIM quality (third RD point for the image Peppers).

Figure 60 – MS-SSIM RD performance for studying the not-to-be-inpainted core textural blocks neighborhood

impact with the image Lena.

28,00

30,00

32,00

34,00

36,00

38,00

40,00

0,200 0,400 0,600 0,800 1,000 1,200 1,400

PS

NR

[d

B]




27,00

32,00

37,00

42,00

0,200 0,400 0,600 0,800 1,000 1,200 1,400

PS

NR

[d

B]




0,920

0,940

0,960

0,980

1,000

0,200 0,400 0,600 0,800 1,000 1,200 1,400

MS

-S

SIM




84


impact with the image Peppers.


impact with the image Jet.

Considering the RD results above, it is possible to conclude that considering 8 neighbors instead of 4 allows achieving slightly better RD results for the considered quality metrics, at the price of increasing the encoder complexity which does not meet the objective regarding the encoder complexity which has been defined in Chapter 1. Therefore, the solution adopted here as the ‘winning’ solution is the IST-Inpainting/Chess (4

Neighbors) solution.

6.2.3. Studying the Patch Side Performance Impact

As mentioned in Chapter 5, it has not been made clear in the reviewed literature if the patch side has a significant influence on the RD performance and on the perceived visual quality of the entire decoded/reconstructed image. In this context, evaluating the RD performance for different patch sides has been considered to be an interesting task which may benefit future work and provide the reader with more solid conclusions.


The PSNR RD performance curves for each test image using different patch side values (notably, 3, 5 and 7 pixels) for the ‘winning’ solution of the previous section are presented in Figure 63 to Figure 65. Analyzing the obtained RD results, the following conclusions may be taken:

• As expected, the PSNR is lower for the lower bitrates and higher for the higher bitrates although there is a saturation trend.

0,910

0,920

0,930

0,940

0,950

0,960

0,970

0,980

0,990

1,000

0,200 0,400 0,600 0,800 1,000 1,200 1,400

MS

-S

SIM




0,860

0,880

0,900

0,920

0,940

0,960

0,980

1,000

0,200 0,400 0,600 0,800 1,000 1,200 1,400

MS

-S

SIM


IST-Inpainting/Chess (4 Neighbors)IST-Inpainting/Chess (8 Neighbors)

85

• The difference between the PSNR RD performances for the various tested patch side values is not significant; therefore, the PSNR metric results do not allow selecting any specific value for this coding parameter.

Figure 63 – PSNR RD performance for studying the patch side impact with the image Lena.

Figure 64 – PSNR RD performance for studying the patch side impact with the image Peppers.

Figure 65 – PSNR RD performance for studying the patch side impact with the image Jet.

29,00

31,00

33,00

35,00

37,00

39,00

0,200 0,400 0,600 0,800 1,000 1,200 1,400

PS

NR

[d

B]


IST-Inpainting/Chess (4 Neighbors, Patch Side 3)IST-Inpainting/Chess (4 Neighbors, Patch Side 5)IST-Inpainting/Chess (4 Neighbors, Patch Side 7)

28,00

29,00

30,00

31,00

32,00

33,00

34,00

35,00

36,00

37,00

38,00

0,200 0,400 0,600 0,800 1,000 1,200 1,400

PS

NR

[d

B]


IST-Inpainting/Chess (4 Neighbors, Patch Side 3)



27,00

29,00

31,00

33,00

35,00

37,00

39,00

41,00

43,00

45,00

0,200 0,400 0,600 0,800 1,000 1,200 1,400

PS

NR

[d

B]





86

RD Performance: MS-SSIM versus bitrate

Figure 66 to Figure 68 present the MS-SSIM RD performance for all tested images. Considering the obtained results, it may be generally concluded that varying the patch side does not bring significant changes in terms of MS-SSIM RD performance, as happened for the PSNR RD performance. However, among the three tested patch size values, the value 7 seems to achieve slightly worse results than for the other two values, notably in terms of the MS-SSIM index for the image Jet. The fact that the patch side does not have much influence on the RD performance for the two selected metrics is very likely related to the fact that the current solution already incorporates a ‘chess’ seeding pattern for the not-to-be-inpainted additional textural blocks in the larger textural areas; this means that much more ‘local’ texture is known and uniformly scattered along the image reducing the impact of the patch size. Nonetheless, since there is no major difference in the RD results and to limit the decoder complexity, the suggested patch side default value for the developed image codec is 3 pixels.

Figure 66 – MS-SSIM RD performance for studying the patch side impact with the image Lena.

Figure 67 – MS-SSIM RD performance for studying the patch side impact with the image Peppers.

0,920

0,930

0,940

0,950

0,960

0,970

0,980

0,990

1,000

0,200 0,400 0,600 0,800 1,000 1,200 1,400

MS

-S

SIM





0,910

0,920

0,930

0,940

0,950

0,960

0,970

0,980

0,990

1,000

0,200 0,400 0,600 0,800 1,000 1,200 1,400

MS

-S

SIM





87

Figure 68 – MS-SSIM RD performance for studying the patch side impact with the image Jet.

6.2.4. Studying the Block Luminance Averages Adjustment Performance

Impact

The performance evaluation conducted in this section regards testing the impact on the RD results of the block luminance averages adjustment (denoted by ADJ in the following charts). In fact, this will be the last partial test before defining the best performing inpainting-based image coding solution.

RD Performance: PSNR versus bitrate

Figure 69 to Figure 71 present the PSNR RD performance results related to the study of the block luminance averages adjustment impact for all test images.

Figure 69 – PSNR RD performance for studying the block luminance averages adjustment impact with the image

Lena.

0,860

0,880

0,900

0,920

0,940

0,960

0,980

1,000

0,200 0,400 0,600 0,800 1,000 1,200 1,400

MS

-S

SIM





29,00

31,00

33,00

35,00

37,00

39,00

41,00

43,00

45,00

0,200 0,400 0,600 0,800 1,000 1,200 1,400

PS

NR

[d

B]



IST-Inpainting/Chess (4 Neighbors, Patch Side 3, ADJ)

IST-Inpainting/LA (4 Neighbors, Patch Side 3)

IST-Inpainting/LA (4 Neighbors, Patch Side 3, ADJ)

88


Peppers.


Jet.

Analyzing the results in Figure 69 to Figure 71, the following conclusions may be taken:

• The block luminance averages adjustment has no significant impact on the PSNR curves for the IST-

Inpainting/Chess (4 Neighbors, Patch Side 3, ADJ) solution when comparing to the corresponding solution without this adjustment, i.e. the IST-Inpainting/Chess (4 Neighbors, Patch Side 3) solution. This is expected since the ‘chess-like’ pattern seeding adopted for seeding the not-to-be-inpainted additional

textural blocks already allows the reconstruction to be very effective in comparison to the original image.

• The block luminance averages adjustment has significant positive impact on the PSNR curve for the IST-

Inpainting/LA (4 Neighbors, Patch Side 3) in comparison with the corresponding RD results for the IST-

Inpainting/LA (4 Neighbors, Patch Side 3), allowing to achieve a PSNR increase up to about 1 dB for the

28,00

30,00

32,00

34,00

36,00

38,00

40,00

0,200 0,400 0,600 0,800 1,000 1,200 1,400

PS

NR

[d

B]


IST-Inpainting/Chess (4 Neighbors, Patch side 3)




27,00

29,00

31,00

33,00

35,00

37,00

39,00

41,00

43,00

45,00

0,200 0,400 0,600 0,800 1,000 1,200 1,400

PS

NR

[d

B]


IST-Inpainting/Chess (4Neighbors, Patch Side 3)




89

last RD point with the image Lena. This is expected since the IST-Inpainting/LA (4 Neighbors, Patch Side

3) solution does not uniformly scatter textural seeds along the image so that the areas to be inpainted are filled with more reliable and rather local textural information from their surroundings; therefore, the quality of the inpainting is more limited and more inpainting-related error is propagated from iteration to iteration.

• For the lower bitrates, the PSNR metric increases in a more steep fashion, whereas for the medium and the higher bitrates, the PSNR increase is slower.

• The IST-Inpainting/LA (4 Neighbors, Patch Side 3, ADJ) solution achieves similar PSNR results to the IST-Inpainting/Chess (4 Neighbors, Patch Side, ADJ), consuming approximately the same bitrate, in fact, a little less.


Figure 72 to Figure 74 present the MS-SSIM RD performance results related to the study of the block luminance averages adjustment for all adopted test sequences.

Figure 72 – MS-SSIM RD performance for studying the block luminance averages adjustment impact with the

image Lena.


image Peppers.

0,910

0,920

0,930

0,940

0,950

0,960

0,970

0,980

0,990

1,000

0,200 0,400 0,600 0,800 1,000 1,200 1,400

MS

-S

SIM



IST-Inpanting/Chess (4 Neighbors, Patch Side 3, ADJ)


IST-Inpainting/LA (4 Neighbors, Patch Side 3,ADJ)

0,900

0,910

0,920

0,930

0,940

0,950

0,960

0,970

0,980

0,990

1,000

0,200 0,400 0,600 0,800 1,000 1,200 1,400

MS

-S

SIM


IST-Inpainting/Chess (4 Neighbors, Patch 3)




90


image Jet.


• For the lower bitrates, the block luminance averages adjustment does not to improve the MS-SSIM metric RD results either for the IST-Inpainting/Chess (4 Neighbors, Patch Side 3) or the IST-Inpainting/LA (4

Neighbors, Patch Side 3); this has to do to with the ‘block effect’ which is generated for the JPEG coding solution for the lower bitrates.

• For the medium and higher bitrates, the MS-SSIM metric is very improved by incorporating the block luminance averages adjustment in the IST-Inpainting/LA (4 Neighbors, Patch Side 3), meaning that this adjustment allows significantly increasing the perceived image quality of the totally decoded/reconstructed image when very large textural areas have to be inpainted without a ‘chess-like’ seeding pattern. This is especially noticeable for the images Lena and Jet.

• As happened for the PSNR metric, the MS-SSIM indexes for the IST-Inpainting/Chess (4 Neighbors,

Patch Side 3) is barely improved by the block luminance averages adjustment; this has to do with the fact that considering a ‘chess-like’ seeding pattern of the not-to-be-inpainted additional textural blocks allows very effectively filling-in the areas to be inpainted.

• The IST-Inpainting/LA (4 Neighbors, Patch Side 3, ADJ) achieves the same MS-SSIM indexes as the IST-

Inpainting/Chess (4 Neighbors, Patch Side 3, ADJ) solution for all the test images, consuming approximately the same bitrate.

Overall, the IST-Inpainting/LA (4 Neighbors, Patch Side 3, ADJ) emerges as a competitive alternative to the IST-Inpainting/Chess (4 Neighbors, Patch Side 3, ADJ), as the less quality of the image reconstructed by the former solution is ‘compensated’ by the block luminance averages adjustment. Both solutions would be valid if considered as the default solution and their selection may depend on further encoder and decoder complexity constraints and on the target application. The IST-Inpainting/Chess (4 Neighbors, Patch Side 3, ADJ) incorporates a sensible, intuitive, straightforward and still effective seeding criterion without any significant encoder complexity increase regarding the IST-Inpainting/LA (4 Neighbors, Patch Side 3, ADJ). These facts have supported the decision of considering the IST-Inpainting/Chess (4 Neighbors, Patch Side 3, ADJ) the final ‘winning’ solution regarding those under comparison; hence, this will be the solution to be compared in terms of RD results to the selected image coding benchmarks.

0,860

0,880

0,900

0,920

0,940

0,960

0,980

1,000

0,200 0,400 0,600 0,800 1,000 1,200 1,400

MS

-S

SIM






91

6.2.5. Best Coding Solution

The last RD performance study regards comparing the best inpainting-based image coding solution with the selected image coding benchmarks, i.e. the JPEG and JPEG 2000 coding standards; therefore, the RD results for the image coding benchmarks will also be presented.

RD Performance: PSNR versus bitrate

Figure 75 to Figure 77 present the PSNR RD performance results related to the comparison of the proposed solution with the selected benchmarks.

Figure 75 – PSNR RD performance comparison with the selected benchmarks with the image Lena.

Figure 76 – PSNR RD performance comparison with the selected benchmarks with the image Peppers.

29,00

31,00

33,00

35,00

37,00

39,00

41,00

43,00

45,00

0,200 0,400 0,600 0,800 1,000 1,200 1,400 1,600

PS

NR

[d

B]


JPEG

JPEG 2000


28,00

30,00

32,00

34,00

36,00

38,00

40,00

0,200 0,400 0,600 0,800 1,000 1,200 1,400 1,600

PS

NR

[d

B]


JPEG

JPEG 2000

IST-Inpainting (4 Neighbors, Patch Side 3, ADJ)

92

Figure 77 – PSNR RD performance comparison with the selected benchmarks with the image Jet.

Analyzing the PSNR RD performance results in Figure 75 to Figure 77, the following conclusions may be taken:

• For all test images and selected rates, the best developed inpainting-based image coding solution outperforms the JPEG coding standard in terms of PSNR, notably allowing reaching the same quality for a lower rate, thus improving the compression efficiency.

• The bitrate savings obtained with the selected default inpainting-based image coding solution are higher for the higher bit bitrates than for the medium and lower bitrates.

• For all bitrates, the JPEG 2000 coding standard is still more efficient than JPEG standard and the best developed inpainting-based image coding solution, this means the IST-Inpainting/Chess (4 Neighbors,

Patch Side 3, ADJ) solution.


Figure 78 to Figure 80 present the MS-SSIM RD performance results related to the comparison of the proposed solution with the selected benchmarks.

Figure 78 – MS-SSIM RD performance comparison with the selected benchmarks with the image Lena.

27,00

29,00

31,00

33,00

35,00

37,00

39,00

41,00

43,00

45,00

0,200 0,400 0,600 0,800 1,000 1,200 1,400 1,600

PS

NR

[d

B]


JPEG


JPEG 2000

0,92

0,93

0,94

0,95

0,96

0,97

0,98

0,99

1

0,200 0,400 0,600 0,800 1,000 1,200 1,400 1,600

MS

-S

SIM


JPEG

JPEG 2000

IST-Inpanting/Chess (4 Neighbors, Patch Side 3, ADJ)

93

Figure 79 – MS-SSIM RD performance comparison with the selected benchmarks with the image Peppers.

Figure 80 – MS-SSIM RD performance comparison with the selected benchmarks with the image Jet.


• For the lower bitrates, the best developed inpainting-based image coding solution outperforms the JPEG standard in terms of MS-SSIM scores, especially for the test image Peppers.

• For the medium and higher bitrates, in general, the developed solution allows reaching the same MS-SSIM by ‘consuming’ much less bitrate. For the image Lena, the best developed solution shows better MS-SSIM results for the medium bitrates, whereas for the image Jet it shows better results for the lower bitrates.

• The developed inpainting-based image coding solution, i.e. the IST-Inpainting/Chess (4 Neighbors, Patch

Side 3, ADJ) is, for all test images and for all bitrates, less efficient than the JPEG 2000 coding solution. This may be understood with the fact that the JPEG 2000 coding standard is a rather powerful image coding solution, representing nowadays the image coding state-of-the-art, and significantly over performing the JPEG coding standard, eventually at the cost of some additional complexity. JPEG 2000 also provides quality and spatial resolution scalability, which the baseline JPEG standard and the developed inpainting-based coding solution cannot provide.

• Overall, the IST-Inpainting/Chess (4 Neighbors, Patch Side 3, ADJ) solution allows achieving significant bitrate savings over the JPEG standard and obtain encouraging results in terms of the objective perceptually-driven evaluation metric. At this stage, it is important to stress that while the JPEG and JPEG 2000 solution are the outcome of many years of work by hundreds of researchers, inpainting-based

0,90,910,920,930,940,950,960,970,980,99

1

0,200 0,400 0,600 0,800 1,000 1,200 1,400 1,600

MS

-S

SIM


JPEG

JPEG 2000


0,860

0,880

0,900

0,920

0,940

0,960

0,980

1,000

0,200 0,400 0,600 0,800 1,000 1,200 1,400 1,600

MS

-S

SIM


JPEG


JPEG 2000

94

image coding is much less mature as a coding technology and, thus, still has large margin for improvement. So, overall, the obtained RD performance is rather encouraging.

6.3. Conclusions

The developed inpainting-based image coding solution meets all the initially defined objectives for this Thesis, notably in terms of the maximization of the compression efficiency. Along this chapter, it has been shown that textural ‘seeds’ should be scattered along the image so as to improve the PSNR and MS-SSIM RD performances, under the penalty of significantly limiting the maximum perceived image quality that may be obtained for a given rate if a ‘chess-like’ seeding pattern is not considered. Furthermore, the patch side seems not to have a very significant influence on the RD results, which the author of this Thesis believes to be related to the fact that local textural seeds have been scattered along the images. The exploitation of the selected image feature has proved to be an alternative to the ‘chess-like’ seeding pattern of the not-to-be-inpainted additional

textural blocks, allowing achieving similar results for the PSNR and MS-SSIM metrics. In addition, the block luminance averages adjustment has a noticeable positive impact on the perceived quality of the reconstructed image when the reconstruction itself is not very effective, when inpainting larger image areas. Developing an automatic and adaptive inpainting-based image coding solution has been successfully achieved and allows obtaining very interesting and encouraging RD results, especially if taking into consideration that only some textural blocks have been inpainted; in the literature, even better results have been achieved if some structural blocks are also selected for inpainting [4], thus significantly increasing both the encoder and decoder complexity. It has also been concluded that RD results for non-perceptually-driven metrics, such as the PSNR, may not be as correlated with subjective quality scores as alternative perceptually-driven metrics such as the MS-SSIM.

The structure adopted for this chapter allowed ‘walking’ towards the developed solution with the best performance by first studying the influence of various parameters and tools on the RD results. Finally, the obtained inpainting-based image coding solution has been compared to the relevant image coding benchmarks and the final conclusions have been stated.

95

Chapter 7

7. Conclusions and Future Work

This last chapter intends to provide the reader with a brief summary of the achievements made during this Thesis regarding the developed image inpainting-based image coding solution, as well as some conclusions and research directions for future work.

7.1. Summary and Conclusions

The first chapter has regarded the motivation and context for this Thesis as well as the statement of the objectives and technical novelties of the developed solution. Next, the problem at hand was structured, by providing the reader with the two perspectives from which it may be seen, notably from the pure inpainting perspective and from the inpainting-based coding perspective. Based on the literature reviewing made for the author of this Thesis to become familiar the research topic in study, a clustering of the digital inpainting tools has been proposed and some relevant pure inpainting and inpainting-based image coding solutions have been presented in detail.

As for the developed solution itself, the reader was first provided with a preliminary approach to the high-level encoder and decoder architectures and their corresponding functional descriptions for their constituent modules. After, an in-depth description of the algorithmic details of the encoder tools was presented, where particular focus has been given to the Analysis for Classification and Coding module, which is one of the most relevant encoder modules. Similarly to the encoder, the decoder modules have been presented, with great attention given to the Image Inpainting module, notably by detailing the patch-based inpainting procedure. Finally, the evaluation of this solution’s performance has been presented, describing the test conditions, the test images and the rate-distortion results and respective analysis.

In short, the main objective of this work was to develop and inpainting-based image coding solution capable improving the compression ratio of the input image without significantly compromising the perceived visual quality of the entirely decoded/reconstructed image. With this purpose, an image feature has been extracted, i.e. the block luminance averages, in order to further improve the perception of the HVS for the output image.

Based on the results obtained in Chapter 6, it has been concluded that the developed advanced inpainting-based image coding solution has achieved encouraging results in terms of improving the compression efficiency and maximizing the perceived visual quality of the entirely decoded/reconstructed image for a given rate. The initial intention of exploiting the selected image feature for improving the perceived image quality has been fulfilled, allowing enhancing the MS-SSIM results for a given rate. Moreover, the patch side seemed not to significantly influence the RD results, which is believed to be due to the fact that local textural information has been provided using the ‘chess-like’ seeding pattern for the not-to-be-inpainted additional textural blocks. Having developed this solution towards it becoming an automatically adaptive to the input image is considered

96

as a positive add-on regarding the literature [4] [5], as the common user interacts with technology from and I/O perspective, thus, is not minimally interested in interfering with the processing details of the technology itself.

Overall, the objectives for this Thesis have been fulfilled and it has been proved based on solid evidence that inpainting-based image coding solutions represent a very promising although challenging way to go forward in terms of better exploiting the Human Visual System to target even higher compression efficiencies.

7.2. Future Work

Despite the interesting objective and subjective results achieved, which have been presented in Chapter 6, the author of this Thesis believes it is still possible to make further steps in the technical improvement of the proposed solution which could lead to better compression efficiency performances, thus making the developed solution more powerful, although eventually also more complex. In this context, some relevant research lines that may be pursued when conducting future work are presented in the following:

• Patch Priority ‘Draws’ Solving – While the current solution of solving the patch priority ‘draws‘ by randomly choosing a patch has achieved good results so far, some time and effort should be dedicated to finding a method for giving further preference to patches which are associated to the maximum confidence values possible in a given inpainting iteration in order the best may be selected.

• ‘Seeding’ Method for Not-to-be-Inpainted Additional Textural Blocks Enhancement – Although the current seeding solution is a ‘chess-like’ pattern, simply alternating not-to-be inpainted additional

textural and to-be-inpainted textural blocks, after exploiting statistical tools for preliminarily selecting the not-to-be-inpainted additional textural blocks, significant gains were obtained. In this context, it should be worthwhile to invest on the definition of some intelligent, adaptive and dynamic ‘seeding’ strategy for the not-to-be-inpainted textural blocks in order higher compression efficiencies and less image artifacts are obtained.

• Incorporating Decoder Tools in the Encoder – To limit the encoder complexity, the decoder inpainting tools have not been made available at the encoder in the current solution. This means the encoder cannot select ‘for sure’ to inpaint only the image areas for which the novel coding solution over performs the traditional coding solution. Providing the encoder with the decoder inpainting tools would allow further raising the overall RD performance by taking less ‘risky’ decisions, even if at the cost of increasing the encoder complexity.

• Structural Areas Inpainting – As aforementioned, the developed solution only selects, at the encoder side, the areas to be inpainted among the textural image areas, not considering any structural area as a candidate for inpainting. This means that there is still room for improving the bitrate savings if some structural areas are also selected to be inpainted; based on the current inpainting literature, it is likely that these areas would have to be inpainted by geometric data modeling in order good objective and subjective results are obtained.

• Additional Image Features Extraction – It has been shown in this Thesis that providing an image feature for the areas to be inpainted can help improving both the subjective and objective compression efficiency performance of the inpainting-based coding solution. Therefore, a relevant topic for further research is the definition of the number and type of features to extract and exploit in the context of the image areas to be inpainted; in particular, edge information seems to be an effective feature to consider [4], especially if structural areas are to be inpainted.

• H.264/AVC Inpainting Intra Prediction Mode Integration – Finally, this inpainting-based coding solution may be integrated in the state-of-the-art H.264/AVC coding standard as another alternative Intra coding mode. However, it should be taken into account that, for inferring the luminance data only the already decoded blocks may be used.

• Inpainting-based Video Coding – The proposed inpainting-based image coding solution could also be extended to become an inpainting-based video coding solution, where digital inpainting tools could be

97

exploited so as to reduce both the spatial and temporal video redundancy, thus achieving even higher RD performance gains.

In conclusion, the objectives for this Thesis have been fulfilled, as a promising inpainting-based image coding solution has been developed. However, there is still plenty of research to be carried out in this challenging field and the author is hoping that this solution will be revisited so that better performances may be achieved by optimizing both the encoder and decoder modules or even by providing them with more advanced tools.

99

References

[1] M. Bertalmio, G. Sapiro, V. Caselles and C.Ballaster, "Image Inpainting," in Proceedings of the

ACM SIGGRAPH Conference on Computer Graphics, New Orleans, U.S.A, July 2000, pp. 417-424.

[2] Pure Inpainting Examples: http://www.dtic.upf.edu/~mbertalmio/restoration0.html.

[3] R. Bonard, E. Lecan, L. Laborelli and J-H. Chenot, "Missing Data Correction in Still Images and Sequences," in ACM International Multimedia Conference, Juan-les-Pins, France, December 2002, pp. 355-361.

[4] D. Liu, X. Sun, F. Wu, S. Li and Y.-Q. Zhang, "Image Compression with Edge-based Inpainting," IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no. 10, pp. 1273-1287, October 2007.

[5] S. D. Rane, G. Sapiro, M. Bertalmio, "Structure and Texture Filling-in of Missing Image Blocks in Wireless Transmission and Compression Applications," IEEE Transactions on Image Processing, vol. 12, no. 3, pp. 296-303, March 2003.

[6] K. A. Patwardhan, G. Sapiro and M. Bertalmio, "Video Inpainting Under Constrained Camera Motion," IEEE Transactions on Image Processing, vol. 16, no. 2, pp. 545-553, February 2007.

[7] Y.-T. Jia, S.-M. Hu and R.R. Martin, "Video Completion Using Tracking and Fragment Merging," The Visual Computer, vol. 8-10, no. 21, pp. 601-610, September 2005.

[8] CIELUV Color Space: http://en.wikipedia.org/wiki/CIELUV_color_space.

[9] P. Perona and J. Malik, "Scale-space and Edge Detection Using Anisotropic Diffusion," IEEE

Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 7, pp. 629 - 639, July 1990.

[10] L. Alvarez, P. L. Lions, J. M. Morel, "Image Selective Smoothing and Edge Detection by Nonlinear Diffusion," Society for Industrial and Applied Mathematics (SIAM J), vol. 29, no. 3, pp. 845-866, June 1992.

[11] Signal & Image Processing Institute: http://sipi.usc.edu/database/.

[12] Kodak Image Library: http://r0k.us/graphics/kodak.

[13] C. A. Rothwell, J. L. Mundy, W. Hoffman and V.-D. Nguyen, "Driving Vision By Topology," in International Symposium on Computer Vision, Coral Gables, Florida, U.S.A, 1995, pp. 395-400.

100

[14] A. Criminisi, P. Pérez and K. Toyama, "Region Filling and Object Removal by Exemplar-based Image Inpainting," IEEE Transactions on Image Processing, vol. 13, no. 9, pp. 1200-1212, September 2004.

[15] Image Processing Toolbox: www.mathworks.com/products/image/.

[16] YUV Color Space: http://en.wikipedia.org/wiki/YUV.

[17] J. Canny, "A Computational Approach to Edge Detection," IEEE Transactions on Pattern Analysis

and Machine Intelligence, vol. PAMI-8, no. 6, pp. 679-698, November 1986.

[18] Edge Detectors: www.mathworks.com/access//helpdesk/help/toolbox/images/edge.html.

[19] Edge Detection Tutorial: www.pages.drexel.edu/~weg22/can_tut.html.

[20] Canny Edge Detector: http://en.wikipedia.org/wiki/Canny_edge_detector.

[21] Histogram: http://en.wikipedia.org/wiki/Histogram.

[22] Independent JPEG Group: www.ijg.org/.

[23] F.Simone, L. Goldmann, V. Baroncini and T. Ebrahimi, "Subjective Evaluation of JPEG XR Image Compression," SPIE Optics and Photonics, Applications of Digital Image Processing XXXII, vol. 7443, August, San Diego, California 2009.

[24] T. K. Shih, N. C. Tang and J.-N. Hwang, "Exemplar-Based Video Inpainting Without Ghost Shadow Artifacts by Mantaining Temporal Continuity," IEEE Transactions on Circuits and Systems

for Video Technology, vol. 19, no. 3, pp. 347-359, March 2009.

[25] Z. Wang, E. Simoncelli and A. Bovik, "Multi-scale Structural Similarity for Image Quality Assessment," in Proceeding of the 37th IEEE Asilomar Xonference on Signals, Systems and, Pacific Grove, California, U.S.A., November 2003, pp. 1398-1402.

inpainting-based image coding: a patch-driven approach · image compression efficiency for the...

Documents