context-patch based face hallucination - nii.ac.jp · 4>ui553-o39tr:x6.1m1l=r:xy 2. 4 ! + e...

1
4>UI5 53- O39TR:X6 .1M1L=R:XY 2. 4 ! ++E"7A Context-Patch based Face Hallucination Junjun JIANG, Yi YU *A%!"' 1&E'& Fig.1. Typical frame from a surveillance video. (a) and (c) are the images from a camera with CIF size (352288) and a camera with 720P size (1280720) respectively; (b) are the two interested faces extracted from (a) and (c). (a) (b) (c) Surveillance cameras have been used for a large range such as security and protection systems. They can provide very important clues about objects for solving a case, such as criminals. However, the resolution of a video camera is very limited due to cost and technical reasons. In addition, the object is so far away from the camera even if the video is clear that the resolution of the interested face in the picture is too low to provide helpful information. @ 1 w 2 w 3 w 4 w M w + + + + + 1 w 2 w 3 w 4 w M w + + + + + = Testing phase Training phase HR face images LR face images Hallucinated HR face Input LR face Fig.2. Flow diagram of the position-patch based face hallucination framework. To obtain the optimal weight vector is the key issue. Inspired by the results of face analysis that human face is a class of highly structured object and consequently position plays an increasingly important role in its reconstruction, some position-patch based face hallucination methods have been proposed. The key issue of those methods is how to represent the input image patch and obtain the optimal weight vector -"'%@E7'@" @'F'@" " 0E% ':@ 0 10 20 30 40 50 60 70 80 90 100 28 29 30 31 32 33 34 patch size PSNR (dB) Input Patch Conventional Position-patch Proposed Context-patch Fig. 3. Simply enlarging the patch size does not always bring performance improvement. in this paper, we propose to simultaneously consider all the patches in a large window around the observation patch and develop a context-patch based face hallucination framework. @ w * w * Thresholding = Input LR face Estimated HR face Reproduce Learning HR Training Samples LR Training Samples Add the estimated HR face to training set Transform the LR weights to the HR contextual patches We propose the context-patch based face hallucination framework, and develop a novel patch representation method based on Thresholding Locality-constrained Representation. Additionally, we introduce a reproducing learning to iteratively enhance the estimated result by adding the estimated HR face to the training set. Fig. 7. Visual face hallucination results of the four real-world images. Fig. 4. Flow diagram of the proposed context-patch face hallucination framework. Fig. 5. Robust to misalignment. 2 4 8 12 16 20 24 28 32 36 40 44 48 31 31.5 32 32.5 33 33.5 34 PSNR (dB) Position-patch Context-patch 2 4 8 12 16 20 24 28 32 36 40 44 48 0.895 0.9 0.905 0.91 0.915 0.92 0.925 0.93 0.935 0.94 SSIM Position-patch Context-patch 0 1 2 3 4 5 6 7 8 9 10 33.6 33.65 33.7 33.75 33.8 33.85 33.9 33.95 34 PSNR(dB) 0 1 2 3 4 5 6 7 8 9 10 0.93 0.9305 0.931 0.9315 0.932 0.9325 0.933 0.9335 0.934 0.9345 SSIM Fig. 6. Effectiveness of contextual information and reproducing learning.

Upload: doankhue

Post on 04-Aug-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

UI5 53 O TR X M L R XY2 . 4 E 7 A

Context-PatchbasedFaceHallucinationJunjun JIANG,YiYU

A 1 E

Fig.1. Typical frame from a surveillance video. (a) and (c) are the images from a camera withCIF size (352�288) and a camera with 720P size (1280�720) respectively; (b) are the twointerested faces extracted from (a) and (c).

(a) (b) (c)

Surveillance cameras have been used for a large range such as securityand protection systems. They can provide very important clues aboutobjects for solving a case, such as criminals.However, the resolution of a video camera is very limited due to cost andtechnical reasons. In addition, the object is so far away from the cameraeven if the video is clear that the resolution of the interested face in thepicture is too low to provide helpful information.

@ 1w 2w 3w 4w Mw+ + + + … +

1w 2w 3w 4w Mw+ + + + … +=

Testing phase Training phase

HR face images

LR face images

HallucinatedHR face

InputLR face

Fig.2. Flow diagram of the position-patch based face hallucination framework. To obtain theoptimal weight vector is the key issue.

Inspired by the results of face analysis that human face is a class ofhighly structured object and consequently position plays an increasinglyimportant role in its reconstruction, some position-patch based facehallucination methods have been proposed. The key issue of thosemethods is how to represent the input image patch and obtain theoptimal weight vector

- @ E7 @ @ F @ 0E :@

0 10 20 30 40 50 60 70 80 90 10028

29

30

31

32

33

34

patch size

PS

NR

(dB

)

… … … …

Input Patch Conventional Position-patch Proposed Context-patch

Fig. 3. Simply enlarging the patch size does not always bring performance improvement. in thispaper, we propose to simultaneously consider all the patches in a large window around theobservation patch and develop a context-patch based face hallucination framework.

@

… … … … …

w*

w*

…… … ……Thresholding

=

Input LR face

Estimated HR face

Reproduce LearningHR Training Samples

LR Training Samples

AddtheestimatedHRfacetotrainingset

TransformtheLRweightstotheHRcontextualpatches

We propose the context-patch based face hallucination framework, anddevelop a novel patch representation method based on ThresholdingLocality-constrained Representation. Additionally, we introduce areproducing learning to iteratively enhance the estimated result byadding the estimated HR face to the training set.

Fig. 7. Visual face hallucination results of the four real-world images.

Fig. 4. Flow diagram of the proposed context-patch face hallucination framework.

Fig. 5. Robust to misalignment.

2 4 8 12 16 20 24 28 32 36 40 44 48

31

31.5

32

32.5

33

33.5

34

PSNR

(dB)

Position-patch Context-patch

2 4 8 12 16 20 24 28 32 36 40 44 48

0.895

0.9

0.905

0.91

0.915

0.92

0.925

0.93

0.935

0.94

SSIM

Position-patch Context-patch

0 1 2 3 4 5 6 7 8 9 1033.6

33.65

33.7

33.75

33.8

33.85

33.9

33.95

34

PSNR(dB)

0 1 2 3 4 5 6 7 8 9 100.93

0.9305

0.931

0.9315

0.932

0.9325

0.933

0.9335

0.934

0.9345

SSIM

Fig. 6. Effectiveness of contextual information and reproducing learning.