Download - Final Year IEEE Project 2013-2014 - Digital Image Processing Project Title and Abstract

Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |

Pondicherry | Trivandrum | Salem | Erode | Tirunelveli

http://www.elysiumtechnologies.com, [email protected]

13 Years of Experience

Automated Services

24/7 Help Desk Support

Experience & Expertise Developers

Advanced Technologies & Tools

Legitimate Member of all Journals

Having 1,50,000 Successive records in

all Languages

More than 12 Branches in Tamilnadu,

Kerala & Karnataka.

Ticketing & Appointment Systems.

Individual Care for every Student.

Around 250 Developers & 20

Researchers




227-230 Church Road, Anna Nagar, Madurai – 625020.

0452-4390702, 4392702, + 91-9944793398.

[email protected], [email protected]

S.P.Towers, No.81 Valluvar Kottam High Road, Nungambakkam,

Chennai - 600034. 044-42072702, +91-9600354638,

[email protected]

15, III Floor, SI Towers, Melapudur main Road, Trichy – 620001.

0431-4002234, + 91-9790464324.

[email protected]

577/4, DB Road, RS Puram, Opp to KFC, Coimbatore – 641002

0422- 4377758, +91-9677751577.

[email protected]

mailto:[email protected]

mailto:[email protected]




Plot No: 4, C Colony, P&T Extension, Perumal puram, Tirunelveli-

627007. 0462-2532104, +919677733255,

[email protected]

1st Floor, A.R.IT Park, Rasi Color Scan Building, Ramanathapuram

- 623501. 04567-223225,

[email protected]

74, 2nd floor, K.V.K Complex,Upstairs Krishna Sweets, Mettur

Road, Opp. Bus stand, Erode-638 011. 0424-4030055, +91-

9677748477 [email protected]

No: 88, First Floor, S.V.Patel Salai, Pondicherry – 605 001. 0413–

4200640 +91-9677704822

[email protected]

TNHB A-Block, D.no.10, Opp: Hotel Ganesh Near Busstand. Salem

– 636007, 0427-4042220, +91-9894444716.

[email protected]




ETPL

DIP-001

Local Edge Preserving Multiscale Decomposition For High Dynamic Range Image

Tone Mapping

A novel filter is proposed for edge-preserving decomposition of an image. It is different from previous

filters in its locally adaptive property. The filtered image contains local means everywhere and preserves

local salient edges. Comparisons are made between our filtered result and the results of three other

methods. A detailed analysis is also made on the behavior of the filter. A multiscale decomposition with this filter is proposed for manipulating a high dynamic range image, which has three detail layers and one

base layer. The multiscale decomposition with the filter addresses three assumptions: 1) the base layer

preserves local means everywhere; 2) every scale's salient edges are relatively large gradients in a local window; and 3) all of the nonzero gradient information belongs to the detail layer. An effective function

is also proposed for compressing the detail layers. The reproduced image gives a good visualization.

Experimental results on real images demonstrate that our algorithm is especially effective at preserving or

enhancing local details.

ETPL

DIP-002

Multistucture Large Deformation Diffeomorphic Brain Registration( Biomedical

Engineering)

Whole brain MRI registration has many useful applications in group analysis and morphometry, yet

accurate registration across different neuropathological groups remains challenging. Structure-specific information, or anatomical guidance, can be used to initialize and constrain registration to improve

accuracy and robustness. We describe here a multistructure diffeomorphic registration approach that uses

concurrent subcortical and cortical shape matching to guide the overall registration. Validation experiments carried out on openly available datasets demonstrate comparable or improved alignment of

subcortical and cortical brain structures over leading brain registration algorithms. We also demonstrate

that a group-wise average atlas built with multistructure registration accounts for greater intersubject

variability and provides more sensitive tensor-based morphometry measurements.

ETPL

DIP-003

Iterative Closest Normal Point for 3D Face Recognition( Pattern Analysis and Machine

Intelligence)

The common approach for 3D face recognition is to register a probe face to each of the gallery faces and

then calculate the sum of the distances between their points. This approach is computationally expensive and sensitive to facial expression variation. In this paper, we introduce the iterative closest normal point

method for finding the corresponding points between a generic reference face and every input face. The

proposed correspondence finding method samples a set of points for each face, denoted as the closest

normal points. These points are effectively aligned across all faces, enabling effective application of discriminant analysis methods for 3D face recognition. As a result, the expression variation problem is

addressed by minimizing the within-class variability of the face samples while maximizing the between-

class variability. As an important conclusion, we show that the surface normal vectors of the face at the sampled points contain more discriminatory information than the coordinates of the points. We have

performed comprehensive experiments on the Face Recognition Grand Challenge database, which is

presently the largest available 3D face database. We have achieved verification rates of 99.6 and 99.2 percent at a false acceptance rate of 0.1 percent for the all versus all and ROC III experiments,

respectively, which, to the best of our knowledge, have seven and four times less error rates, respectively,

compared to the best existing methods on this database.

ETPL

DIP-004

Face Recognition & verification using photometric stergo(Information Forensics and

Security)

This paper presents a new database suitable for both 2-D and 3-D face recognition based on photometric

stereo (PS): the Photoface database. The database was collected using a custom-made four-source PS

device designed to enable data capture with minimal interaction necessary from the subjects. The device, which automatically detects the presence of a subject using ultrasound, was placed at the entrance to a




busy workplace and captured 1839 sessions of face images with natural pose and expression. This meant

that the acquired data is more realistic for everyday use than existing databases and is, therefore, an

invaluable test bed for state-of-the-art recognition algorithms. The paper also presents experiments of various face recognition and verification algorithms using the albedo, surface normals, and recovered

depth maps. Finally, we have conducted experiments in order to demonstrate how different methods in

the pipeline of PS (i.e., normal field computation and depth map reconstruction) affect recognition and verification performance. These experiments help to 1) demonstrate the usefulness of PS, and our device

in particular, for minimal-interaction face recognition, and 2) highlight the optimal reconstruction and

recognition algorithms for use with natural-expression PS data. The database can be downloaded from

http://www.uwe.ac.uk/research/Photoface.

ETPL

DIP-005 Objective Quality Assessment of Tone-Mapped Images

Tone-mapping operators (TMOs) that convert high dynamic range (HDR) to low dynamic range (LDR)

images provide practically useful tools for the visualization of HDR images on standard LDR displays. Different TMOs create different tone-mapped images, and a natural question is which one has the best

quality. Without an appropriate quality measure, different TMOs cannot be compared, and further

improvement is directionless. Subjective rating may be a reliable evaluation method, but it is expensive

and time consuming, and more importantly, is difficult to be embedded into optimization frameworks. Here we propose an objective quality assessment algorithm for tone-mapped images by combining: 1) a

multiscale signal fidelity measure on the basis of a modified structural similarity index and 2) a

naturalness measure on the basis of intensity statistics of natural images. Validations using independent subject-rated image databases show good correlations between subjective ranking score and the proposed

tone-mapped image quality index (TMQI). Furthermore, we demonstrate the extended applications of

TMQI using two examples - parameter tuning for TMOs and adaptive fusion of multiple tone-mapped images.

ETPL

DIP-006

Segmentation and Tracing of Single Neurons from 3D Confocal Microscope Images(

Biomedical and Health Informatics)

In order to understand the brain, we need to first understand the morphology of neurons. In the neurobiology community, there have been recent pushes to analyze both neuron connectivity and the

influence of structure on function. Currently, a technical roadblock that stands in the way of these studies

is the inability to automatically trace neuronal structure from microscopy. On the image processing side, proposed tracing algorithms face difficulties in low contrast, indistinct boundaries, clutter, and complex

branching structure. To tackle these difficulties, we develop Tree2Tree, a robust automatic neuron

segmentation and morphology generation algorithm. Tree2Tree uses a local medial tree generation

strategy in combination with a global tree linking to build a maximum likelihood global tree. Recasting the neuron tracing problem in a graph-theoretic context enables Tree2Tree to estimate bifurcations

naturally, which is currently a challenge for current neuron tracing algorithms. Tests on cluttered confocal

microscopy images of Drosophila neurons give results that correspond to ground truth within a margin of $ pm hbox{2.75}$% normalized mean absolute error.

ETPL

DIP-007

Silhoutte Analysis-Based action recognition via Exploiting Human Poses( Circuits and

Systems for Video Technology)

In this paper, we propose a novel scheme for human action recognition that combines the advantages of both local and global representations. We explore human silhouettes for human action representation by

taking into account the correlation between sequential poses in an action. A modified bag-of-words

model, named bag of correlated poses, is introduced to encode temporally local features of actions. To




utilize the property of visual word ambiguity, we adopt the soft assignment strategy to reduce the

dimensionality of our model and circumvent the penalty of computational complexity and quantization

error. To compensate for the loss of structural information, we propose an extended motion template, i.e., extensions of the motion history image, to capture the holistic structural features. The proposed scheme

takes advantages of local and global features and, therefore, provides a discriminative representation for

human actions. Experimental results prove the viability of the complimentary properties of two descriptors and the proposed approach outperforms the state-of-the-art methods on the IXMAS action

recognition dataset.

ETPL

DIP-008 Pose-Invariant Face Recognition Using Markov Random Fields

One of the key challenges for current face recognition techniques is how to handle pose variations

between the probe and gallery face images. In this paper, we present a method for reconstructing the

virtual frontal view from a given nonfrontal face image using Markov random fields (MRFs) and an efficient variant of the belief propagation algorithm. In the proposed approach, the input face image is

divided into a grid of overlapping patches, and a globally optimal set of local warps is estimated to

synthesize the patches at the frontal view. A set of possible warps for each patch is obtained by aligning it

with images from a training database of frontal faces. The alignments are performed efficiently in the Fourier domain using an extension of the Lucas-Kanade algorithm that can handle illumination

variations. The problem of finding the optimal warps is then formulated as a discrete labeling problem

using an MRF. The reconstructed frontal face image can then be used with any face recognition technique. The two main advantages of our method are that it does not require manually selected facial

landmarks or head pose estimation. In order to improve the performance of our pose normalization

method in face recognition, we also present an algorithm for classifying whether a given face image is at a frontal or nonfrontal pose. Experimental results on different datasets are presented to demonstrate the

effectiveness of the proposed approach

ETPL

DIP-009

Color Video Denoising Based on Combined Interframe and Intercolor Prediction(

Circuits and Systems for Video Technology)

An advanced color video denoising scheme which we call CIFIC based on combined interframe and intercolor prediction is proposed in this paper. CIFIC performs the denoising filtering in the RGB color

space, and exploits both the interframe and intercolor correlation in color video signal directly by forming

multiple predictors for each color component using all three color components in the current frame as well as the motion-compensated neighboring reference frames. The temporal correspondence is

established through the joint-RGB motion estimation (ME) which acquires a single motion trajectory for

the red, green, and blue components. Then the current noisy observation as well as the interframe and

intercolor predictors are combined by a linear minimum mean squared error (LMMSE) filter to obtain the denoised estimate for every color component. The ill condition in the weight determination of the

LMMSE filter is detected and remedied by gradually removing the “least contributing” predictor.

Furthermore, our previous work on the LMMSE filter applied in the adaptive luminance-chrominance space (LAYUV for short) is revisited. By reformulating LAYUV and comparing it with CIFIC, we

deduce that LAYUV is a restricted version of CIFIC, and thus CIFIC can theoretically achieve lower

denoising error. Experimental results verify the improvement brought by the joint-RGB ME and the integration of the intercolor prediction, as well as the superiority of CIFIC over LAYUV. Meanwhile,

when compared with other state-of-the-art algorithms, CIFIC provides competitive performance both in

terms of the color peak signal-to-noise ratio and in perceptual quality.




ETPL

DIP-010

Wang-Landau Monte Carlo-Based Tracking Methods for Abrupt Motions( Pattern

Analysis and Machine Intelligence)

We propose a novel tracking algorithm based on the Wang-Landau Monte Carlo (WLMC) sampling

method for dealing with abrupt motions efficiently. Abrupt motions cause conventional tracking methods to fail because they violate the motion smoothness constraint. To address this problem, we introduce the

Wang-Landau sampling method and integrate it into a Markov Chain Monte Carlo (MCMC)-based

tracking framework. By employing the novel density-of-states term estimated by the Wang-Landau sampling method into the acceptance ratio of MCMC, our WLMC-based tracking method alleviates the

motion smoothness constraint and robustly tracks the abrupt motions. Meanwhile, the marginal likelihood

term of the acceptance ratio preserves the accuracy in tracking smooth motions. The method is then extended to obtain good performance in terms of scalability, even on a high-dimensional state space.

Hence, it covers drastic changes in not only position but also scale of a target. To achieve this, we modify

our method by combining it with the N-fold way algorithm and present the N-Fold Wang-Landau

(NFWL)-based tracking method. The N-fold way algorithm helps estimate the density-of-states with a smaller number of samples. Experimental results demonstrate that our approach efficiently samples the

states of the target, even in a whole state space, without loss of time, and tracks the target accurately and

robustly when position and scale are changing severely

ETPL

DIP-011

Multi-View ML Object Tracking With Online Learning on Riemannian Manifolds by

Combining Geometric Constraints

This paper addresses issues in object tracking with occlusion scenarios, where multiple uncalibrated

cameras with overlapping fields of view are exploited. We propose a novel method where tracking is first

done independently in each individual view and then tracking results are mapped from different views to improve the tracking jointly. The proposed tracker uses the assumptions that objects are visible in at least

one view and move uprightly on a common planar ground that may induce a homography relation

between views. A method for online learning of object appearances on Riemannian manifolds is also introduced. The main novelties of the paper include: 1) define a similarity measure, based on geodesics

between a candidate object and a set of mapped references from multiple views on a Riemannian

manifold; 2) propose multi-view maximum likelihood estimation of object bounding box parameters, based on Gaussian-distributed geodesics on the manifold; 3) introduce online learning of object

appearances on the manifold, taking into account of possible occlusions; 4) utilize projective

transformations for objects between views, where parameters are estimated from warped vertical axis by

combining planar homography, epipolar geometry, and vertical vanishing point; 5) embed single-view trackers in a three-layer multi-view tracking scheme. Experiments have been conducted on videos from

multiple uncalibrated cameras, where objects contain long-term partial/full occlusions, or frequent

intersections. Comparisons have been made with three existing methods, where the performance is evaluated both qualitatively and quantitatively. Results have shown the effectiveness of the proposed

method in terms of robustness against tracking drift caused by occlusions.

ETPL

DIP-012

Multi-Atlas Segmentation with Joint Label Fusion ( Pattern Analysis and Machine

Intelligence)

Multi-atlas segmentation is an effective approach for automatically labeling objects of interest in biomedical images. In this approach, multiple expert-segmented example images, called atlases, are

registered to a target image, and deformed atlas segmentations are combined using label fusion. Among

the proposed label fusion strategies, weighted voting with spatially varying weight distributions derived from atlas-target intensity similarity have been particularly successful. However, one limitation of these

strategies is that the weights are computed independently for each atlas, without taking into account the

fact that different atlases may produce similar label errors. To address this limitation, we propose a new solution for the label fusion problem in which weighted voting is formulated in terms of minimizing the




total expectation of labeling error and in which pairwise dependency between atlases is explicitly

modeled as the joint probability of two atlases making a segmentation error at a voxel. This probability is

approximated using intensity similarity between a pair of atlases and the target image in the neighborhood of each voxel. We validate our method in two medical image segmentation problems: hippocampus

segmentation and hippocampus subfield segmentation in magnetic resonance (MR) images. For both

problems, we show consistent and significant improvement over label fusion strategies that assign atlas weights independently.

ETPL

DIP-013

Spatially Coherent Fuzzy Clustering for Accurate and Noise-Robust Image

Segmentation

In this letter, we present a new FCM-based method for spatially coherent and noise-robust image

segmentation. Our contribution is twofold: 1) the spatial information of local image features is integrated into both the similarity measure and the membership function to compensate for the effect of noise; and

2) an anisotropic neighborhood, based on phase congruency features, is introduced to allow more

accurate segmentation without image smoothing. The segmentation results, for both synthetic and real images, demonstrate that our method efficiently preserves the homogeneity of the regions and is more

robust to noise than related FCM-based methods.

ETPL

DIP-014

Adaptive Markov Random Fields for Joint Unmixing and Segmentation of

Hyperspectral Images

Abstract: Linear spectral unmixing is a challenging problem in hyperspectral imaging that consists of decomposing an observed pixel into a linear combination of pure spectra (or endmembers) with their

corresponding proportions (or abundances). Endmember extraction algorithms can be employed for

recovering the spectral signatures while abundances are estimated using an inversion step. Recent works have shown that exploiting spatial dependencies between image pixels can improve spectral unmixing.

Markov random fields (MRF) are classically used to model these spatial correlations and partition the

image into multiple classes with homogeneous abundances. This paper proposes to define the MRF sites

using similarity regions. These regions are built using a self-complementary area filter that stems from the morphological theory. This kind of filter divides the original image into flat zones where the

underlying pixels have the same spectral values. Once the MRF has been clearly established, a

hierarchical Bayesian algorithm is proposed to estimate the abundances, the class labels, the noise variance, and the corresponding hyperparameters. A hybrid Gibbs sampler is constructed to generate

samples according to the corresponding posterior distribution of the unknown parameters and

hyperparameters. Simulations conducted on synthetic and real AVIRIS data demonstrate the good performance of the algorithm.

ETPL

DIP-015 Depth Estimation of Face Images Using the Nonlinear Least-Squares Model

Abstract: In this paper, we propose an efficient algorithm to reconstruct the 3D structure of a human face

from one or more of its 2D images with different poses. In our algorithm, the nonlinear least-squares model is first employed to estimate the depth values of facial feature points and the pose of the 2D face

image concerned by means of the similarity transform. Furthermore, different optimization schemes are

presented with regard to the accuracy levels and the training time required. Our algorithm also embeds the symmetrical property of the human face into the optimization procedure, in order to alleviate the

sensitivities arising from changes in pose. In addition, the regularization term, based on linear correlation,

is added in the objective function to improve the estimation accuracy of the 3D structure. Further, a

model-integration method is proposed to improve the depth-estimation accuracy when multiple nonfrontal-view face images are available. Experimental results on the 2D and 3D databases demonstrate

the feasibility and efficiency of the proposed methods.




ETPL

DIP-016

Local Energy Pattern for Texture Classification Using Self-Adaptive Quantization

Thresholds

Abstract: Local energy pattern, a statistical histogram-based representation, is proposed for texture

classification. First, we use normalized local-oriented energies to generate local feature vectors, which describe the local structures distinctively and are less sensitive to imaging conditions. Then, each local

feature vector is quantized by self-adaptive quantization thresholds determined in the learning stage using

histogram specification, and the quantized local feature vector is transformed to a number by N-nary coding, which helps to preserve more structure information during vector quantization. Finally, the

frequency histogram is used as the representation feature. The performance is benchmarked by material

categorization on KTH-TIPS and KTH-TIPS2-a databases. Our method is compared with typical statistical approaches, such as basic image features, local binary pattern (LBP), local ternary pattern,

completed LBP, Weber local descriptor, and VZ algorithms (VZ-MR8 and VZ-Joint). The results show

that our method is superior to other methods on the KTH-TIPS2-a database, and achieving competitive

performance on the KTH-TIPS database. Furthermore, we extend the representation from static image to dynamic texture, and achieve favorable recognition results on the University of California at Los Angeles

(UCLA) dynamic texture database.

ETPL

DIP-017 Perceptual Quality Metric With Internal Generative Mechanism

Abstract: Objective image quality assessment (IQA) aims to evaluate image quality consistently with

human perception. Most of the existing perceptual IQA metrics cannot accurately represent the

degradations from different types of distortion, e.g., existing structural similarity metrics perform well on

content-dependent distortions while not as well as peak signal-to-noise ratio (PSNR) on content-independent distortions. In this paper, we integrate the merits of the existing IQA metrics with the guide

of the recently revealed internal generative mechanism (IGM). The IGM indicates that the human visual

system actively predicts sensory information and tries to avoid residual uncertainty for image perception and understanding. Inspired by the IGM theory, we adopt an autoregressive prediction algorithm to

decompose an input scene into two portions, the predicted portion with the predicted visual content and

the disorderly portion with the residual content. Distortions on the predicted portion degrade the primary visual information, and structural similarity procedures are employed to measure its degradation;

distortions on the disorderly portion mainly change the uncertain information and the PNSR is employed

for it. Finally, according to the noise energy deployment on the two portions, we combine the two

evaluation results to acquire the overall quality score. Experimental results on six publicly available databases demonstrate that the proposed metric is comparable with the state-of-the-art quality metrics.

ETPL

DIP-018

Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A

Comparative Study

Abstract: Visual attention is a process that enables biological and machine vision systems to select the most relevant regions from a scene. Relevance is determined by two components: 1) top-down factors

driven by task and 2) bottom-up factors that highlight image regions that are different from their

surroundings. The latter are often referred to as “visual saliency.” Modeling bottom-up visual saliency

has been the subject of numerous research efforts during the past 20 years, with many successful applications in computer vision and robotics. Available models have been tested with different datasets

(e.g., synthetic psychological search arrays, natural images or videos) using different evaluation scores

(e.g., search slopes, comparison to human eye tracking) and parameter settings. This has made direct comparison of models difficult. Here, we perform an exhaustive comparison of 35 state-of-the-art

saliency models over 54 challenging synthetic patterns, three natural image datasets, and two video

datasets, using three evaluation scores. We find that although model rankings vary, some models consistently perform better. Analysis of datasets reveals that existing datasets are highly center-biased,




which influences some of the evaluation scores. Computational complexity analysis shows that some

models are very fast, yet yield competitive eye movement prediction accuracy. Different models often

have common easy/difficult stimuli. Furthermore, several concerns in visual saliency modeling, eye movement datasets, and evaluation scores are discussed and insights for future work are provided. Our

study allows one to assess the state-of-the-art, helps to organizing this rapidly growing field, and sets a

unified comparison framework for gauging future efforts, similar to the PASCAL VOC challenge in the object recognition and detection domains.

ETPL

DIP-019

Local Edge-Preserving Multiscale Decomposition for High Dynamic Range Image

Tone Mapping

Abstract: A novel filter is proposed for edge-preserving decomposition of an image. It is different from

previous filters in its locally adaptive property. The filtered image contains local means everywhere and preserves local salient edges. Comparisons are made between our filtered result and the results of three

other methods. A detailed analysis is also made on the behavior of the filter. A multiscale decomposition

with this filter is proposed for manipulating a high dynamic range image, which has three detail layers and one base layer. The multiscale decomposition with the filter addresses three assumptions: 1) the base

layer preserves local means everywhere; 2) every scale's salient edges are relatively large gradients in a

local window; and 3) all of the nonzero gradient information belongs to the detail layer. An effective

function is also proposed for compressing the detail layers. The reproduced image gives a good visualization. Experimental results on real images demonstrate that our algorithm is especially effective at

preserving or enhancing local details.

ETPL

DIP-020 LLSURE: Local Linear SURE-Based Edge-Preserving Image Filtering

Abstract: In this paper, we propose a novel approach for performing high-quality edge-preserving image

filtering. Based on a local linear model and using the principle of Stein's unbiased risk estimate as an

estimator for the mean squared error from the noisy image only, we derive a simple explicit image filter

which can filter out noise while preserving edges and fine-scale details. Moreover, this filter has a fast and exact linear-time algorithm whose computational complexity is independent of the filtering kernel

size; thus, it can be applied to real time image processing tasks. The experimental results demonstrate the

effectiveness of the new filter for various computer vision applications, including noise reduction, detail smoothing and enhancement, high dynamic range compression, and flash/no-flash denoising.

ETPL

DIP-021

Optimal Inversion of the Generalized Anscombe Transformation for Poisson-Gaussian

Noise

Abstract: Many digital imaging devices operate by successive photon-to-electron, electron-to-voltage,

and voltage-to-digit conversions. These processes are subject to various signal-dependent errors, which

are typically modeled as Poisson-Gaussian noise. The removal of such noise can be effected indirectly by

applying a variance-stabilizing transformation (VST) to the noisy data, denoising the stabilized data with a Gaussian denoising algorithm, and finally applying an inverse VST to the denoised data. The

generalized Anscombe transformation (GAT) is often used for variance stabilization, but its unbiased

inverse transformation has not been rigorously studied in the past. We introduce the exact unbiased inverse of the GAT and show that it plays an integral part in ensuring accurate denoising results. We

demonstrate that this exact inverse leads to state-of-the-art results without any notable increase in the

computational complexity compared to the other inverses. We also show that this inverse is optimal in the

sense that it can be interpreted as a maximum likelihood inverse. Moreover, we thoroughly analyze the behavior of the proposed inverse, which also enables us to derive a closed-form approximation for it. This

paper generalizes our work on the exact unbiased inverse of the Anscombe transformation, which we

have presented earlier for the removal of pure Poisson noise.




ETPL

DIP-022 Blind Separation of Time/Position Varying Mixtures

Abstract: We address the challenging open problem of blindly separating time/position varying mixtures,

and attempt to separate the sources from such mixtures without having prior information about the sources or the mixing system. Unlike studies concerning instantaneous or convolutive mixtures, we

assume that the mixing system (medium) is varying in time/position. Attempts to solve this problem have

mostly utilized, so far, online algorithms based on tracking the mixing system by methods previously developed for the instantaneous or convolutive mixtures. In contrast with these attempts, we develop a

unified approach in the form of staged sparse component analysis (SSCA). Accordingly, we assume that

the sources are either sparse or can be “sparsified.” In the first stage, we estimate the filters of the mixing system, based on the scatter plot of the sparse mixtures' data, using a proper clustering and curve/surface

fitting. In the second stage, the mixing system is inverted, yielding the estimated sources. We use the

SSCA approach for solving three types of mixtures: time/position varying instantaneous mixtures, single-

path mixtures, and multipath mixtures. Real-life scenarios and simulated mixtures are used to demonstrate the performance of our approach.

ETPL

DIP-023 Nonlocal Transform-Domain Filter for Volumetric Data Denoising and Reconstruction

Abstract: We present an extension of the BM3D filter to volumetric data. The proposed algorithm, BM4D, implements the grouping and collaborative filtering paradigm, where mutually similar d -

dimensional patches are stacked together in a (d+1) -dimensional array and jointly filtered in transform

domain. While in BM3D the basic data patches are blocks of pixels, in BM4D we utilize cubes of voxels,

which are stacked into a 4-D “group.” The 4-D transform applied on the group simultaneously exploits the local correlation present among voxels in each cube and the nonlocal correlation between the

corresponding voxels of different cubes. Thus, the spectrum of the group is highly sparse, leading to very

effective separation of signal and noise through coefficient shrinkage. After inverse transformation, we obtain estimates of each grouped cube, which are then adaptively aggregated at their original locations.

We evaluate the algorithm on denoising of volumetric data corrupted by Gaussian and Rician noise, as

well as on reconstruction of volumetric phantom data with non-zero phase from noisy and incomplete Fourier-domain (k-space) measurements. Experimental results demonstrate the state-of-the-art denoising

performance of BM4D, and its effectiveness when exploited as a regularizer in volumetric data

reconstruction.

ETPL

DIP-024 Huber Fractal Image Coding Based on a Fitting Plane

Abstract: Recently, there has been significant interest in robust fractal image coding for the purpose of

robustness against outliers. However, the known robust fractal coding methods (HFIC and LAD-FIC,

etc.) are not optimal, since, besides the high computational cost, they use the corrupted domain block as the independent variable in the robust regression model, which may adversely affect the robust estimator

to calculate the fractal parameters (depending on the noise level). This paper presents a Huber fitting

plane-based fractal image coding (HFPFIC) method. This method builds Huber fitting planes (HFPs) for

the domain and range blocks, respectively, ensuring the use of an uncorrupted independent variable in the robust model. On this basis, a new matching error function is introduced to robustly evaluate the best

scaling factor. Meanwhile, a median absolute deviation (MAD) about the median decomposition criterion

is proposed to achieve fast adaptive quadtree partitioning for the image corrupted by salt & pepper noise. In order to reduce computational cost, the no-search method is applied to speedup the encoding process.

Experimental results show that the proposed HFPFIC can yield superior performance over conventional

robust fractal image coding methods in encoding speed and the quality of the restored image. Furthermore, the no-search method can significantly reduce encoding time and achieve less than 2.0 s for




the HFPFIC with acceptable image quality degradation. In addition, we show that, combined with the

MAD decomposition scheme, the HFP technique used as a robust method can further reduce the encoding

time while maintaining image quality.

ETPL

DIP-025

Demosaicking of Noisy Bayer-Sampled Color Images With Least-Squares Luma-

Chroma Demultiplexing and Noise Level Estimation

Abstract: This paper adapts the least-squares luma-chroma demultiplexing (LSLCD) demosaicking

method to noisy Bayer color filter array (CFA) images. A model is presented for the noise in white-

balanced gamma-corrected CFA images. A method to estimate the noise level in each of the red, green, and blue color channels is then developed. Based on the estimated noise parameters, one of a finite set of

configurations adapted to a particular level of noise is selected to demosaic the noisy data. The noise-

adaptive demosaicking scheme is called LSLCD with noise estimation (LSLCD-NE). Experimental results demonstrate state-of-the-art performance over a wide range of noise levels, with low

computational complexity. Many results with several algorithms, noise levels, and images are presented

on our companion web site along with software to allow reproduction of our results.

ETPL

DIP-026 Multiscale Gradients-Based Color Filter Array Interpolation

Abstract: Single sensor digital cameras use color filter arrays to capture a subset of the color data at each

pixel coordinate. Demosaicing or color filter array (CFA) interpolation is the process of estimating the

missing color samples to reconstruct a full color image. In this paper, we propose a demosaicing method that uses multiscale color gradients to adaptively combine color difference estimates from different

directions. The proposed solution does not require any thresholds since it does not make any hard

decisions, and it is noniterative. Although most suitable for the Bayer CFA pattern, the method can be extended to other mosaic patterns. To demonstrate this, we describe its application to the Lukac CFA

pattern. Experimental results show that it outperforms other available demosaicing methods by a clear

margin in terms of CPSNR and S-CIELAB measures for both mosaic patterns.

ETPL

DIP-027 Optimal local dimming for LC image formation with controllable backlighting

Abstract: Light emitting diode (LED)-backlit liquid crystal displays (LCDs) hold the promise of

improving image quality while reducing the energy consumption with signal-dependent local dimming.

However, most existing local dimming algorithms are mostly motivated by simple implementation, and they often lack concern for visual quality. To fully realize the potential of LED-backlit LCDs and reduce

the artifacts that often occur in current systems, we propose a novel local dimming technique that can

achieve the theoretical highest fidelity of intensity reproduction in either l1 or l2 metrics. Both the exact and fast approximate versions of the optimal local dimming algorithm are proposed. Simulation results

demonstrate superior performances of the proposed algorithm in terms of visual quality and power

consumption.

ETPL

DIP-028

Multiscale Bi-Gaussian Filter for Adjacent Curvilinear Structures Detection With

Application to Vasculature Images

Abstract: The intensity or gray-level derivatives have been widely used in image segmentation and

enhancement. Conventional derivative filters often suffer from an undesired merging of adjacent objects

because of their intrinsic usage of an inappropriately broad Gaussian kernel; as a result, neighboring structures cannot be properly resolved. To avoid this problem, we propose to replace the low-level

Gaussian kernel with a bi-Gaussian function, which allows independent selection of scales in the

foreground and background. By selecting a narrow neighborhood for the background with regard to the

foreground, the proposed method will reduce interference from adjacent objects simultaneously preserving the ability of intraregion smoothing. Our idea is inspired by a comparative analysis of existing

line filters, in which several traditional methods, including the vesselness, gradient flux, and medialness




models, are integrated into a uniform framework. The comparison subsequently aids in understanding the

principles of different filtering kernels, which is also a contribution of this paper. Based on some

axiomatic scale-space assumptions, the full representation of our bi-Gaussian kernel is deduced. The popular γ-normalization scheme for multiscale integration is extended to the bi-Gaussian operators.

Finally, combined with a parameter-free shape estimation scheme, a derivative filter is developed for the

typical applications of curvilinear structure detection and vasculature image enhancement. It is verified in experiments using synthetic and real data that the proposed method outperforms several conventional

filters in separating closely located objects and being robust to noise.

ETPL

DIP-029 Visually Lossless Encoding for JPEG2000

Abstract: Due to exponential growth in image sizes, visually lossless coding is increasingly being considered as an alternative to numerically lossless coding, which has limited compression ratios. This

paper presents a method of encoding color images in a visually lossless manner using JPEG2000. In order

to hide coding artifacts caused by quantization, visibility thresholds (VTs) are measured and used for quantization of subband signals in JPEG2000. The VTs are experimentally determined from statistically

modeled quantization distortion, which is based on the distribution of wavelet coefficients and the dead-

zone quantizer of JPEG2000. The resulting VTs are adjusted for locally changing backgrounds through a

visual masking model, and then used to determine the minimum number of coding passes to be included in the final codestream for visually lossless quality under the desired viewing conditions. Codestreams

produced by this scheme are fully JPEG2000 Part-I compliant.

ETPL

DIP-030

Rate-Distortion Analysis of Dead-Zone Plus Uniform Threshold Scalar Quantization

and Its Application—Part I: Fundamental Theory,

Abstract: This paper provides a systematic rate-distortion (R-D) analysis of the dead-zone plus uniform

threshold scalar quantization (DZ+UTSQ) with nearly uniform reconstruction quantization (NURQ) for

generalized Gaussian distribution (GGD), which consists of two aspects: R-D performance analysis and

R-D modeling. In R-D performance analysis, we first derive the preliminary constraint of optimum entropy-constrained DZ+UTSQ/NURQ for GGD, under which the property of the GGD distortion-rate

(D-R) function is elucidated. Then for the GGD source of actual transform coefficients, the refined

constraint and precise conditions of optimum DZ+UTSQ/NURQ are rigorously deduced in the real coding bit rate range, and efficient DZ+UTSQ/NURQ design criteria are proposed to reasonably simplify

the utilization of effective quantizers in practice. In R-D modeling, inspired by R-D performance analysis,

the D-R function is first developed, followed by the novel rate-quantization (R-Q) and distortion-quantization (D-Q) models derived using analytical and heuristic methods. The D-R, R-Q, and D-Q

models form the source model describing the relationship between the rate, distortion, and quantization

steps. One application of the proposed source model is the effective two-pass VBR coding algorithm

design on an encoder of H.264/AVC reference software, which achieves constant video quality and desirable rate control accuracy.

ETPL

DIP-031

Rate-Distortion Analysis of Dead-Zone Plus Uniform Threshold Scalar Quantization

and Its Application—Part II: Two-Pass VBR Coding for H.264/AVC

Abstract: In the first part of this paper, we derive a source model describing the relationship between the rate, distortion, and quantization steps of the dead-zone plus uniform threshold scalar quantizers with

nearly uniform reconstruction quantizers for generalized Gaussian distribution. This source model

consists of rate-quantization, distortion-quantization (D-Q), and distortion-rate (D-R) models. In this part,

we first rigorously confirm the accuracy of the proposed source model by comparing the calculated results with the coding data of JM 16.0. Efficient parameter estimation strategies are then developed to

better employ this source model in our two-pass rate control method for H.264 variable bit rate coding.

Based on our D-Q and D-R models, the proposed method is of high stability, low complexity and is easy




to implement. Extensive experiments demonstrate that the proposed method achieves: 1) average peak

signal-to-noise ratio variance of only 0.0658 dB, compared to 1.8758 dB of JM 16.0's method, with an

average rate control error of 1.95% and 2) significant improvement in smoothing the video quality compared with the latest two-pass rate control method.

ETPL

DIP-032 Nonrigid Image Registration With Crystal Dislocation Energy

Abstract: The goal of nonrigid image registration is to find a suitable transformation such that the

transformed moving image becomes similar to the reference image. The image registration problem can also be treated as an optimization problem, which tries to minimize an objective energy function that

measures the differences between two involved images. In this paper, we consider image matching as the

process of aligning object boundaries in two different images. The registration energy function can be defined based on the total energy associated with the object boundaries. The optimal transformation is

obtained by finding the equilibrium state when the total energy is minimized, which indicates the object

boundaries find their correspondences and stop deforming. We make an analogy between the above processes with the dislocation system in physics. The object boundaries are viewed as dislocations (line

defects) in crystal. Then the well-developed dislocation energy is used to derive the energy assigned to

object boundaries in images. The newly derived registration energy function takes the global gradient

information of the entire image into consideration, and produces an orientation-dependent and long-range interaction between two images to drive the registration process. This property of interaction endows the

new registration framework with both fast convergence rate and high registration accuracy. Moreover, the

new energy function can be adapted to realize symmetric diffeomorphic transformation so as to ensure one-to-one matching between subjects. In this paper, the superiority of the new method is theoretically

proven, experimentally tested and compared with the state-of-the-art SyN method. Experimental results

with 3-D magnetic resonance brain images demonstrate that the proposed method outperforms the compared methods in terms of both registration accuracy and computation time.

ETPL

DIP-033 Double Shrinking Sparse Dimension Reduction

Abstract: Learning tasks such as classification and clustering usually perform better and cost less (time

and space) on compressed representations than on the original data. Previous works mainly compress data via dimension reduction. In this paper, we propose “double shrinking” to compress image data on both

dimensionality and cardinality via building either sparse low-dimensional representations or a sparse

projection matrix for dimension reduction. We formulate a double shrinking model (DSM) as an l1 regularized variance maximization with constraint ||x||2=1, and develop a double shrinking algorithm

(DSA) to optimize DSM. DSA is a path-following algorithm that can build the whole solution path of

locally optimal solutions of different sparse levels. Each solution on the path is a “warm start” for

searching the next sparser one. In each iteration of DSA, the direction, the step size, and the Lagrangian multiplier are deduced from the Karush-Kuhn-Tucker conditions. The magnitudes of trivial variables are

shrunk and the importances of critical variables are simultaneously augmented along the selected

direction with the determined step length. Double shrinking can be applied to manifold learning and feature selections for better interpretation of features, and can be combined with classification and

clustering to boost their performance. The experimental results suggest that double shrinking produces

efficient and effective data compression.

ETPL

DIP-034 Reinitialization-Free Level Set Evolution via Reaction Diffusion

Abstract: This paper presents a novel reaction-diffusion (RD) method for implicit active contours that is

completely free of the costly reinitialization procedure in level set evolution (LSE). A diffusion term is

introduced into LSE, resulting in an RD-LSE equation, from which a piecewise constant solution can be




derived. In order to obtain a stable numerical solution from the RD-based LSE, we propose a two-step

splitting method to iteratively solve the RD-LSE equation, where we first iterate the LSE equation, then

solve the diffusion equation. The second step regularizes the level set function obtained in the first step to ensure stability, and thus the complex and costly reinitialization procedure is completely eliminated from

LSE. By successfully applying diffusion to LSE, the RD-LSE model is stable by means of the simple

finite difference method, which is very easy to implement. The proposed RD method can be generalized to solve the LSE for both variational level set method and partial differential equation-based level set

method. The RD-LSE method shows very good performance on boundary antileakage. The extensive and

promising experimental results on synthetic and real images validate the effectiveness of the proposed

RD-LSE approach.

ETPL

DIP-035 Track Creation and Deletion Framework for Long-Term Online Multiface Tracking

Abstract: To improve visual tracking, a large number of papers study more powerful features, or better

cue fusion mechanisms, such as adaptation or contextual models. A complementary approach consists of improving the track management, that is, deciding when to add a target or stop its tracking, for example,

in case of failure. This is an essential component for effective multiobject tracking applications, and is

often not trivial. Deciding whether or not to stop a track is a compromise between avoiding erroneous

early stopping while tracking is fine, and erroneous continuation of tracking when there is an actual failure. This decision process, very rarely addressed in the literature, is difficult due to object detector

deficiencies or observation models that are insufficient to describe the full variability of tracked objects

and deliver reliable likelihood (tracking) information. This paper addresses the track management issue and presents a real-time online multiface tracking algorithm that effectively deals with the above

difficulties. The tracking itself is formulated in a multiobject state-space Bayesian filtering framework

solved with Markov Chain Monte Carlo. Within this framework, an explicit probabilistic filtering step decides when to add or remove a target from the tracker, where decisions rely on multiple cues such as

face detections, likelihood measures, long-term observations, and track state characteristics. The method

has been applied to three challenging data sets of more than 9 h in total, and demonstrate a significant

performance increase compared to more traditional approaches (Markov Chain Monte Carlo, reversible-jump Markov Chain Monte Carlo) only relying on head detection and likelihood for track management.

ETPL

DIP-036 Wavelet Domain Multifractal Analysis for Static and Dynamic Texture Classification

Abstract: In this paper, we propose a new texture descriptor for both static and dynamic textures. The new descriptor is built on the wavelet-based spatial-frequency analysis of two complementary wavelet

pyramids: standard multiscale and wavelet leader. These wavelet pyramids essentially capture the local

texture responses in multiple high-pass channels in a multiscale and multiorientation fashion, in which

there exists a strong power-law relationship for natural images. Such a power-law relationship is characterized by the so-called multifractal analysis. In addition, two more techniques, scale normalization

and multiorientation image averaging, are introduced to further improve the robustness of the proposed

descriptor. Combining these techniques, the proposed descriptor enjoys both high discriminative power and robustness against many environmental changes. We apply the descriptor for classifying both static

and dynamic textures. Our method has demonstrated excellent performance in comparison with the state-

of-the-art approaches in several public benchmark datasets.

ETPL

DIP-037

Video Object Tracking in the Compressed Domain Using Spatio-Temporal Markov

Random Fields

Abstract: Despite the recent progress in both pixel-domain and compressed-domain video object tracking,

the need for a tracking framework with both reasonable accuracy and reasonable complexity still exists.

This paper presents a method for tracking moving objects in H.264/AVC-compressed video sequences




using a spatio-temporal Markov random field (ST-MRF) model. An ST-MRF model naturally integrates

the spatial and temporal aspects of the object's motion. Built upon such a model, the proposed method

works in the compressed domain and uses only the motion vectors (MVs) and block coding modes from the compressed bitstream to perform tracking. First, the MVs are preprocessed through intracoded block

motion approximation and global motion compensation. At each frame, the decision of whether a

particular block belongs to the object being tracked is made with the help of the ST-MRF model, which is updated from frame to frame in order to follow the changes in the object's motion. The proposed method

is tested on a number of standard sequences, and the results demonstrate its advantages over some of the

recent state-of-the-art methods.

ETPL

DIP-038 Online Object Tracking With Sparse Prototypes

Abstract: Online object tracking is a challenging problem as it entails learning an effective model to

account for appearance change caused by intrinsic and extrinsic factors. In this paper, we propose a novel

online object tracking algorithm with sparse prototypes, which exploits both classic principal component analysis (PCA) algorithms with recent sparse representation schemes for learning effective appearance

models. We introduce l1regularization into the PCA reconstruction, and develop a novel algorithm to

represent an object by sparse prototypes that account explicitly for data and noise. For tracking, objects

are represented by the sparse prototypes learned online with update. In order to reduce tracking drift, we present a method that takes occlusion and motion blur into account rather than simply includes image

observations for model update. Both qualitative and quantitative evaluations on challenging image

sequences demonstrate that the proposed tracking algorithm performs favorably against several state-of-the-art methods.

ETPL

DIP-039 Automatic Dynamic Texture Segmentation Using Local Descriptors and Optical Flow

bstract: A dynamic texture (DT) is an extension of the texture to the temporal domain. How to segment a

DT is a challenging problem. In this paper, we address the problem of segmenting a DT into disjoint regions. A DT might be different from its spatial mode (i.e., appearance) and/or temporal mode (i.e.,

motion field). To this end, we develop a framework based on the appearance and motion modes. For the

appearance mode, we use a new local spatial texture descriptor to describe the spatial mode of the DT; for the motion mode, we use the optical flow and the local temporal texture descriptor to represent the

temporal variations of the DT. In addition, for the optical flow, we use the histogram of oriented optical

flow (HOOF) to organize them. To compute the distance between two HOOFs, we develop a simple effective and efficient distance measure based on Weber's law. Furthermore, we also address the problem

of threshold selection by proposing a method for determining thresholds for the segmentation method by

an offline supervised statistical learning. The experimental results show that our method provides very

good segmentation results compared to the state-of-the-art methods in segmenting regions that differ in their dynamics.

ETPL

DIP-040 Efficient Image Classification via Multiple Rank Regression

bstract: The problem of image classification has aroused considerable research interest in the field of image processing. Traditional methods often convert an image to a vector and then use a vector-based

classifier. In this paper, a novel multiple rank regression model (MRR) for matrix data classification is

proposed. Unlike traditional vector-based methods, we employ multiple-rank left projecting vectors and

right projecting vectors to regress each matrix data set to its label for each category. The convergence behavior, initialization, computational complexity, and parameter determination are also analyzed.

Compared with vector-based regression methods, MRR achieves higher accuracy and has lower

computational complexity. Compared with traditional supervised tensor-based methods, MRR performs




better for matrix data classification. Promising experimental results on face, object, and hand-written digit

image classification tasks are provided to show the effectiveness of our method.

ETPL

DIP-041

Regularized Discriminative Spectral Regression Method for Heterogeneous Face

Matching

Abstract: Face recognition is confronted with situations in which face images are captured in various

modalities, such as the visual modality, the near infrared modality, and the sketch modality. This is

known as heterogeneous face recognition. To solve this problem, we propose a new method called

discriminative spectral regression (DSR). The DSR maps heterogeneous face images into a common discriminative subspace in which robust classification can be achieved. In the proposed method, the

subspace learning problem is transformed into a least squares problem. Different mappings should map

heterogeneous images from the same class close to each other, while images from different classes should be separated as far as possible. To realize this, we introduce two novel regularization terms, which reflect

the category relationships among data, into the least squares approach. Experiments conducted on two

heterogeneous face databases validate the superiority of the proposed method over the previous methods.

ETPL

DIP-042 Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Abstract: Due to the popularity of social media websites, extensive research efforts have been dedicated

to tag-based social image search. Both visual information and tags have been investigated in the research

field. However, most existing methods use tags and visual characteristics either separately or sequentially in order to estimate the relevance of images. In this paper, we propose an approach that simultaneously

utilizes both visual and textual information to estimate the relevance of user tagged images. The

relevance estimation is determined with a hypergraph learning approach. In this method, a social image hypergraph is constructed, where vertices represent images and hyperedges represent visual or textual

terms. Learning is achieved with use of a set of pseudo-positive images, where the weights of hyperedges

are updated throughout the learning process. In this way, the impact of different tags and visual words can

be automatically modulated. Comparative results of the experiments conducted on a dataset including 370+images are presented, which demonstrate the effectiveness of the proposed approach.

ETPL

DIP-043 Action Search by Example Using Randomized Visual Vocabularies

Abstract: Because actions can be small video objects, it is a challenging problem to search for similar actions in crowded and dynamic scenes when a single query example is provided. We propose a fast

action search method that can efficiently locate similar actions spatiotemporally. Both the query action

and the video datasets are characterized by spatio-temporal interest points. Instead of using a unified visual vocabulary to index all interest points in the database, we propose randomized visual vocabularies

to enable fast and robust interest point matching. To accelerate action localization, we have developed a

coarse-to-fine video subvolume search scheme, which is several orders of magnitude faster than the

existing spatio-temporal branch and bound search. Our experiments on cross-dataset action search show promising results when compared with the state of the arts. Additional experiments on a 5-h versatile

video dataset validate the efficiency of our method, where an action search can be finished in just 37.6 s

on a regular desktop machine.

ETPL

DIP-044

Robust Albedo Estimation From a Facial Image With Cast Shadow Under General

Unknown Lighting

Abstract: Albedo estimation from a facial image is crucial for various computer vision tasks, such as 3-D

morphable-model fitting, shape recovery, and illumination-invariant face recognition, but the currently

available methods do not give good estimation results. Most methods ignore the influence of cast shadows and require a statistical model to obtain facial albedo. This paper describes a method for albedo

estimation that makes combined use of image intensity and facial depth information for an image with




cast shadows and general unknown light. In order to estimate the albedo map of a face, we formulate the

albedo estimation problem as a linear programming problem that minimizes intensity error under the

assumption that the surface of the face has constant albedo. Since the solution thus obtained has significant errors in certain parts of the facial image, the albedo estimate needs to be compensated. We

minimize the mean square error of albedo under the assumption that the surface normals, which are

calculated from the facial depth information, are corrupted with noise. The proposed method is simple and the experimental results show that this method gives better estimates than other methods.

ETPL

DIP-045 Separable Markov Random Field Model and Its Applications in Low Level Vision

Abstract: This brief proposes a continuously-valued Markov random field (MRF) model with separable

filter bank, denoted as MRFSepa, which significantly reduces the computational complexity in the MRF modeling. In this framework, we design a novel gradient-based discriminative learning method to learn

the potential functions and separable filter banks. We learn MRFSepa models with 2-D and 3-D separable

filter banks for the applications of gray-scale/color image denoising and color image demosaicing. By implementing MRFSepa model on graphics processing unit, we achieve real-time image denoising and

fast image demosaicing with high-quality results.

ETPL

DIP-046 Two-Direction Nonlocal Model for Image Denoising

Abstract: Similarities inherent in natural images have been widely exploited for image denoising and

other applications. In fact, if a cluster of similar image patches is rearranged into a matrix, similarities exist both between columns and rows. Using the similarities, we present a two-directional nonlocal

(TDNL) variational model for image denoising. The solution of our model consists of three components:

one component is a scaled version of the original observed image and the other two components are

obtained by utilizing the similarities. Specifically, by using the similarity between columns, we get a nonlocal-means-like estimation of the patch with consideration to all similar patches, while the weights

are not the pairwise similarities but a set of clusterwise coefficients. Moreover, by using the similarity

between rows, we also get nonlocal-autoregression-like estimations for the center pixels of the similar patches. The TDNL model leads to an alternative minimization algorithm. Experiments indicate that the

model can perform on par with or better than the state-of-the-art denoising methods.

ETPL

DIP-047

Optimizing the Error Diffusion Filter for Blue Noise Halftoning With Multiscale Error

Diffusion

Abstract: A good halftoning output should bear a blue noise characteristic contributed by isotropically-

distributed isolated dots. Multiscale error diffusion (MED) algorithms try to achieve this by exploiting

radially symmetric and noncausal error diffusion filters to guarantee spatial homogeneity. In this brief, an

optimized diffusion filter is suggested to make the diffusion close to isotropic. When it is used with MED, the resulting output has a nearly ideal blue noise characteristic.

ETPL

DIP-049 Sparse Representation With Kernels

Abstract: Recent research has shown the initial success of sparse coding (Sc) in solving many computer vision tasks. Motivated by the fact that kernel trick can capture the nonlinear similarity of features, which

helps in finding a sparse representation of nonlinear features, we propose kernel sparse representation

(KSR). Essentially, KSR is a sparse coding technique in a high dimensional feature space mapped by an

implicit mapping function. We apply KSR to feature coding in image classification, face recognition, and kernel matrix approximation. More specifically, by incorporating KSR into spatial pyramid matching

(SPM), we develop KSRSPM, which achieves a good performance for image classification. Moreover,




KSR-based feature coding can be shown as a generalization of efficient match kernel and an extension of

Sc-based SPM. We further show that our proposed KSR using a histogram intersection kernel (HIK) can

be considered a soft assignment extension of HIK-based feature quantization in the feature coding process. Besides feature coding, comparing with sparse coding, KSR can learn more discriminative

sparse codes and achieve higher accuracy for face recognition. Moreover, KSR can also be applied to

kernel matrix approximation in large scale learning tasks, and it demonstrates its robustness to kernel matrix approximation, especially when a small fraction of the data is used. Extensive experimental results

demonstrate promising results of KSR in image classification, face recognition, and kernel matrix

approximation. All these applications prove the effectiveness of KSR in computer vision and machine

learning tasks.

ETPL

DIP-050 Image-Difference Prediction: From Grayscale to Color

Abstract: Existing image-difference measures show excellent accuracy in predicting distortions, such as

lossy compression, noise, and blur. Their performance on certain other distortions could be improved; one example of this is gamut mapping. This is partly because they either do not interpret chromatic

information correctly or they ignore it entirely. We present an image-difference framework that

comprises image normalization, feature extraction, and feature combination. Based on this framework,

we create image-difference measures by selecting specific implementations for each of the steps. Particular emphasis is placed on using color information to improve the assessment of gamut-mapped

images. Our best image-difference measure shows significantly higher prediction accuracy on a gamut-

mapping dataset than all other evaluated measures.

ETPL

DIP-051 When Does Computational Imaging Improve Performance?

Abstract: A number of computational imaging techniques are introduced to improve image quality by

increasing light throughput. These techniques use optical coding to measure a stronger signal level. However, the performance of these techniques is limited by the decoding step, which amplifies noise.

Although it is well understood that optical coding can increase performance at low light levels, little is

known about the quantitative performance advantage of computational imaging in general settings. In this paper, we derive the performance bounds for various computational imaging techniques. We then discuss

the implications of these bounds for several real-world scenarios (e.g., illumination conditions, scene

properties, and sensor noise characteristics). Our results show that computational imaging techniques do not provide a significant performance advantage when imaging with illumination that is brighter than

typical daylight. These results can be readily used by practitioners to design the most suitable imaging

systems given the application at hand.

ETPL

DIP-052 Anisotropic Interpolation of Sparse Generalized Image Samples

Abstract: Practical image-acquisition systems are often modeled as a continuous-domain prefilter

followed by an ideal sampler, where generalized samples are obtained after convolution with the impulse

response of the device. In this paper, our goal is to interpolate images from a given subset of such samples. We express our solution in the continuous domain, considering consistent resampling as a data-

fidelity constraint. To make the problem well posed and ensure edge-preserving solutions, we develop an

efficient anisotropic regularization approach that is based on an improved version of the edge-enhancing

anisotropic diffusion equation. Following variational principles, our reconstruction algorithm minimizes successive quadratic cost functionals. To ensure fast convergence, we solve the corresponding sequence

of linear problems by using multigrid iterations that are specifically tailored to their sparse structure. We

conduct illustrative experiments and discuss the potential of our approach both in terms of algorithmic




design and reconstruction quality. In particular, we present results that use as little as 2% of the image

samples.

ETPL

DIP-053 Clustered-Dot Halftoning With Direct Binary Search

Abstract: In this paper, we present a new algorithm for aperiodic clustered-dot halftoning based on direct

binary search (DBS). The DBS optimization framework has been modified for designing clustered-dot

texture, by using filters with different sizes in the initialization and update steps of the algorithm.

Following an intuitive explanation of how the clustered-dot texture results from this modified framework, we derive a closed-form cost metric which, when minimized, equivalently generates stochastic clustered-

dot texture. An analysis of the cost metric and its influence on the texture quality is presented, which is

followed by a modification to the cost metric to reduce computational cost and to make it more suitable for screen design.

ETPL

DIP-054 Task-Specific Image Partitioning

Abstract: Image partitioning is an important preprocessing step for many of the state-of-the-art algorithms used for performing high-level computer vision tasks. Typically, partitioning is conducted without regard

to the task in hand. We propose a task-specific image partitioning framework to produce a region-based

image representation that will lead to a higher task performance than that reached using any task-

oblivious partitioning framework and existing supervised partitioning framework, albeit few in number. The proposed method partitions the image by means of correlation clustering, maximizing a linear

discriminant function defined over a superpixel graph. The parameters of the discriminant function that

define task-specific similarity/dissimilarity among superpixels are estimated based on structured support vector machine (S-SVM) using task-specific training data. The S-SVM learning leads to a better

generalization ability while the construction of the superpixel graph used to define the discriminant

function allows a rich set of features to be incorporated to improve discriminability and robustness. We

evaluate the learned task-aware partitioning algorithms on three benchmark datasets. Results show that task-aware partitioning leads to better labeling performance than the partitioning computed by the state-

of-the-art general-purpose and supervised partitioning algorithms. We believe that the task-specific image

partitioning paradigm is widely applicable to improving performance in high-level image understanding tasks

ETPL

DIP-055 Generalized Inverse-Approach Model for Spectral-Signal Recovery

Abstract: We have studied the transformation system of a spectral signal to the response of the system as a linear mapping from higher to lower dimensional space in order to look more closely at inverse-

approach models. The problem of spectral-signal recovery from the response of a transformation system

is generally stated on the basis of the generalized inverse-approach theorem, which provides a modular

model for generating a spectral signal from a given response value. The controlling criteria, including the robustness of the inverse model to perturbations of the response caused by noise, and the condition

number for matrix inversion, are proposed, together with the mean square error, so as to create an

efficient model for spectral-signal recovery. The spectral-reflectance recovery and color correction of natural surface color are numerically investigated to appraise different illuminant-observer transformation

matrices based on the proposed controlling criteria both in the absence and the presence of noise.

ETPL

DIP-056 Spatio-Temporal Auxiliary Particle Filtering With -Norm-Based Appearance Model

Learning for Robust Visual Tracking

Abstract: In this paper, we propose an efficient and accurate visual tracker equipped with a new particle filtering algorithm and robust subspace learning-based appearance model. The proposed visual tracker

avoids drifting problems caused by abrupt motion changes and severe appearance variations that are well-




known difficulties in visual tracking. The proposed algorithm is based on a type of auxiliary particle

filtering that uses a spatio-temporal sliding window. Compared to conventional particle filtering

algorithms, spatio-temporal auxiliary particle filtering is computationally efficient and successfully implemented in visual tracking. In addition, a real-time robust principal component pursuit (RRPCP)

equipped with l1-norm optimization has been utilized to obtain a new appearance model learning block

for reliable visual tracking especially for occlusions in object appearance. The overall tracking framework based on the dual ideas is robust against occlusions and out-of-plane motions because of the proposed

spatio-temporal filtering and recursive form of RRPCP. The designed tracker has been evaluated using

challenging video sequences, and the results confirm the advantage of using this tracker.

ETPL

DIP-057

Manifold Regularized Multitask Learning for Semi-Supervised Multilabel Image

Classification

Abstract: It is a significant challenge to classify images with multiple labels by using only a small number

of labeled samples. One option is to learn a binary classifier for each label and use manifold

regularization to improve the classification performance by exploring the underlying geometric structure of the data distribution. However, such an approach does not perform well in practice when images from

multiple concepts are represented by high-dimensional visual features. Thus, manifold regularization is

insufficient to control the model complexity. In this paper, we propose a manifold regularized multitask

learning (MRMTL) algorithm. MRMTL learns a discriminative subspace shared by multiple classification tasks by exploiting the common structure of these tasks. It effectively controls the model

complexity because different tasks limit one another's search volume, and the manifold regularization

ensures that the functions in the shared hypothesis space are smooth along the data manifold. We conduct extensive experiments, on the PASCAL VOC'07 dataset with 20 classes and the MIR dataset with 38

classes, by comparing MRMTL with popular image classification algorithms. The results suggest that

MRMTL is effective for image classification.

ETPL

DIP-058 Linear Distance Coding for Image Classification

Abstract: The feature coding-pooling framework is shown to perform well in image classification tasks,

because it can generate discriminative and robust image representations. The unavoidable information

loss incurred by feature quantization in the coding process and the undesired dependence of pooling on the image spatial layout, however, may severely limit the classification. In this paper, we propose a linear

distance coding (LDC) method to capture the discriminative information lost in traditional coding

methods while simultaneously alleviating the dependence of pooling on the image spatial layout. The core of the LDC lies in transforming local features of an image into more discriminative distance vectors,

where the robust image-to-class distance is employed. These distance vectors are further encoded into

sparse codes to capture the salient features of the image. The LDC is theoretically and experimentally

shown to be complementary to the traditional coding methods, and thus their combination can achieve higher classification accuracy. We demonstrate the effectiveness of LDC on six data sets, two of each of

three types (specific object, scene, and general object), i.e., Flower 102 and PFID 61, Scene 15 and

Indoor 67, Caltech 101 and Caltech 256. The results show that our method generally outperforms the traditional coding methods, and achieves or is comparable to the state-of-the-art performance on these

data sets.

ETPL

DIP-059 What Are We Tracking: A Unified Approach of Tracking and Recognition

Abstract: Tracking is essentially a matching problem. While traditional tracking methods mostly focus on low-level image correspondences between frames, we argue that high-level semantic correspondences are

indispensable to make tracking more reliable. Based on that, a unified approach of low-level object

tracking and high-level recognition is proposed for single object tracking, in which the target category is




actively recognized during tracking. High-level offline models corresponding to the recognized category

are then adaptively selected and combined with low-level online tracking models so as to achieve better

tracking performance. Extensive experimental results show that our approach outperforms state-of-the-art online models in many challenging tracking scenarios such as drastic view change, scale change,

background clutter, and morphable objects.

ETPL

DIP-060

Unsupervised Amplitude and Texture Classification of SAR Images With Multinomial

Latent Model

Abstract: In this paper, we combine amplitude and texture statistics of the synthetic aperture radar images for the purpose of model-based classification. In a finite mixture model, we bring together the Nakagami

densities to model the class amplitudes and a 2-D auto-regressive texture model with t-distributed

regression error to model the textures of the classes. A non-stationary multinomial logistic latent class label model is used as a mixture density to obtain spatially smooth class segments. The classification

expectation-maximization algorithm is performed to estimate the class parameters and to classify the

pixels. We resort to integrated classification likelihood criterion to determine the number of classes in the model. We present our results on the classification of the land covers obtained in both supervised and

unsupervised cases processing TerraSAR-X, as well as COSMO-SkyMed data.

ETPL

DIP-061

Fuzzy C-Means Clustering With Local Information and Kernel Metric for Image

Segmentation

Abstract: In this paper, we present an improved fuzzy C-means (FCM) algorithm for image segmentation by introducing a tradeoff weighted fuzzy factor and a kernel metric. The tradeoff weighted fuzzy factor

depends on the space distance of all neighboring pixels and their gray-level difference simultaneously. By

using this factor, the new algorithm can accurately estimate the damping extent of neighboring pixels. In order to further enhance its robustness to noise and outliers, we introduce a kernel distance measure to its

objective function. The new algorithm adaptively determines the kernel parameter by using a fast

bandwidth selection rule based on the distance variance of all data points in the collection. Furthermore,

the tradeoff weighted fuzzy factor and the kernel distance measure are both parameter free. Experimental results on synthetic and real images show that the new algorithm is effective and efficient, and is

relatively independent of this type of noise.

ETPL

DIP-062 Rate-Distortion Optimized Rate Control for Depth Map-Based 3-D Video Coding

Abstract: In this paper, a novel rate control scheme with optimized bits allocation for the 3-D video

coding is proposed. First, we investigate the R-D characteristics of the texture and depth map of the coded

view, as well as the quality dependency between the virtual view and the coded view. Second, an optimal bit allocation scheme is developed to allocate target bits for both the texture and depth maps of different

views. Meanwhile, a simplified model parameter estimation scheme is adopted to speed up the coding

process. Finally, the experimental results on various 3-D video sequences demonstrate that the proposed

algorithm achieves excellent R-D efficiency and bit rate accuracy compared to benchmark algorithms.

ETPL

DIP-063 Performance Evaluation Methodology for Historical Document Image Binarization

Abstract: Document image binarization is of great importance in the document image analysis and

recognition pipeline since it affects further stages of the recognition process. The evaluation of a binarization method aids in studying its algorithmic behavior, as well as verifying its effectiveness, by

providing qualitative and quantitative indication of its performance. This paper addresses a pixel-based

binarization evaluation methodology for historical handwritten/machine-printed document images. In the

proposed evaluation scheme, the recall and precision evaluation measures are properly modified using a weighting scheme that diminishes any potential evaluation bias. Additional performance metrics of the

proposed evaluation scheme consist of the percentage rates of broken and missed text, false alarms,




background noise, character enlargement, and merging. Several experiments conducted in comparison

with other pixel-based evaluation measures demonstrate the validity of the proposed evaluation scheme.

ETPL

DIP-064 Video Quality Pooling Adaptive to Perceptual Distortion Severity

Abstract: It is generally recognized that severe video distortions that are transient in space and/or time

have a large effect on overall perceived video quality. In order to understand this phenomena, we study

the distribution of spatio-temporally local quality scores obtained from several video quality assessment

(VQA) algorithms on videos suffering from compression and lossy transmission over communication channels. We propose a content adaptive spatial and temporal pooling strategy based on the observed

distribution. Our method adaptively emphasizes “worst” scores along both the spatial and temporal

dimensions of a video sequence and also considers the perceptual effect of large-area cohesive motion flow such as egomotion. We demonstrate the efficacy of the method by testing it using three different

VQA algorithms on the LIVE Video Quality database and the EPFL-PoliMI video quality database.

ETPL

DIP-065 Modified Gradient Search for Level Set Based Image Segmentation

Abstract: Level set methods are a popular way to solve the image segmentation problem. The solution

contour is found by solving an optimization problem where a cost functional is minimized. Gradient

descent methods are often used to solve this optimization problem since they are very easy to implement

and applicable to general nonconvex functionals. They are, however, sensitive to local minima and often display slow convergence. Traditionally, cost functionals have been modified to avoid these problems. In

this paper, we instead propose using two modified gradient descent methods, one using a momentum term

and one based on resilient propagation. These methods are commonly used in the machine learning community. In a series of 2-D/3-D-experiments using real and synthetic data with ground truth, the

modifications are shown to reduce the sensitivity for local optima and to increase the convergence rate.

The parameter sensitivity is also investigated. The proposed methods are very simple modifications of the

basic method, and are directly compatible with any type of level set implementation. Downloadable reference code with examples is available online.

ETPL

DIP-066

Maximum Margin Correlation Filter: A New Approach for Localization and

Classification

Abstract: Support vector machine (SVM) classifiers are popular in many computer vision tasks. In most of them, the SVM classifier assumes that the object to be classified is centered in the query image, which

might not always be valid, e.g., when locating and classifying a particular class of vehicles in a large

scene. In this paper, we introduce a new classifier called Maximum Margin Correlation Filter (MMCF), which, while exhibiting the good generalization capabilities of SVM classifiers, is also capable of

localizing objects of interest, thereby avoiding the need for image centering as is usually required in SVM

classifiers. In other words, MMCF can simultaneously localize and classify objects of interest. We test

the efficacy of the proposed classifier on three different tasks: vehicle recognition, eye localization, and face classification. We demonstrate that MMCF outperforms SVM classifiers as well as well known

correlation filters.

ETPL

DIP-067 Adaptive Fingerprint Image Enhancement With Emphasis on Preprocessing of Data

Abstract: This article proposes several improvements to an adaptive fingerprint enhancement method that

is based on contextual filtering. The term adaptive implies that parameters of the method are automatically adjusted based on the input fingerprint image. Five processing blocks comprise the

adaptive fingerprint enhancement method, where four of these blocks are updated in our proposed




system. Hence, the proposed overall system is novel. The four updated processing blocks are: 1)

preprocessing; 2) global analysis; 3) local analysis; and 4) matched filtering. In the preprocessing and

local analysis blocks, a nonlinear dynamic range adjustment method is used. In the global analysis and matched filtering blocks, different forms of order statistical filters are applied. These processing blocks

yield an improved and new adaptive fingerprint image processing method. The performance of the

updated processing blocks is presented in the evaluation part of this paper. The algorithm is evaluated toward the NIST developed NBIS software for fingerprint recognition on FVC databases.

ETPL

DIP-068 Objective Quality Assessment of Tone-Mapped Images

Abstract: Tone-mapping operators (TMOs) that convert high dynamic range (HDR) to low dynamic range

(LDR) images provide practically useful tools for the visualization of HDR images on standard LDR displays. Different TMOs create different tone-mapped images, and a natural question is which one has

the best quality. Without an appropriate quality measure, different TMOs cannot be compared, and

further improvement is directionless. Subjective rating may be a reliable evaluation method, but it is expensive and time consuming, and more importantly, is difficult to be embedded into optimization

frameworks. Here we propose an objective quality assessment algorithm for tone-mapped images by

combining: 1) a multiscale signal fidelity measure on the basis of a modified structural similarity index

and 2) a naturalness measure on the basis of intensity statistics of natural images. Validations using independent subject-rated image databases show good correlations between subjective ranking score and

the proposed tone-mapped image quality index (TMQI). Furthermore, we demonstrate the extended

applications of TMQI using two examples - parameter tuning for TMOs and adaptive fusion of multiple tone-mapped images.

ETPL

DIP-069 Catching a Rat by Its Edglets

Abstract: Computer vision is a noninvasive method for monitoring laboratory animals. In this article, we

propose a robust tracking method that is capable of extracting a rodent from a frame under uncontrolled normal laboratory conditions. The method consists of two steps. First, a sliding window combines three

features to coarsely track the animal. Then, it uses the edglets of the rodent to adjust the tracked region to

the animal's boundary. The method achieves an average tracking error that is smaller than a representative state-of-the-art method.

ETPL

DIP-070 Juxtaposed Color Halftoning Relying on Discrete Lines

Abstract: Most halftoning techniques allow screen dots to overlap. They rely on the assumption that the inks are transparent, i.e., the inks do not scatter a significant portion of the light back to the air. However,

many special effect inks, such as metallic inks, iridescent inks, or pigmented inks, are not transparent. In

order to create halftone images, halftone dots formed by such inks should be juxtaposed, i.e., printed side

by side. We propose an efficient juxtaposed color halftoning technique for placing any desired number of colorant layers side by side without overlapping. The method uses a monochrome library of screen

elements made of discrete lines with rational thicknesses. Discrete line juxtaposed color halftoning is

performed efficiently by multiple accesses to the screen element library.

ETPL

DIP-071 Image Noise Level Estimation by Principal Component Analysis

Abstract: The problem of blind noise level estimation arises in many image processing applications, such

as denoising, compression, and segmentation. In this paper, we propose a new noise level estimation

method on the basis of principal component analysis of image blocks. We show that the noise variance can be estimated as the smallest eigenvalue of the image block covariance matrix. Compared with 13

existing methods, the proposed approach shows a good compromise between speed and accuracy. It is at




least 15 times faster than methods with similar accuracy, and it is at least two times more accurate than

other methods. Our method does not assume the existence of homogeneous areas in the input image and,

hence, can successfully process images containing only textures.

ETPL

DIP-072

Nonlocal Image Restoration With Bilateral Variance Estimation: A Low-Rank

Approach

Abstract: Simultaneous sparse coding (SSC) or nonlocal image representation has shown great potential

in various low-level vision tasks, leading to several state-of-the-art image restoration techniques,

including BM3D and LSSC. However, it still lacks a physically plausible explanation about why SSC is a better model than conventional sparse coding for the class of natural images. Meanwhile, the problem of

sparsity optimization, especially when tangled with dictionary learning, is computationally difficult to

solve. In this paper, we take a low-rank approach toward SSC and provide a conceptually simple interpretation from a bilateral variance estimation perspective, namely that singular-value decomposition

of similar packed patches can be viewed as pooling both local and nonlocal information for estimating

signal variances. Such perspective inspires us to develop a new class of image restoration algorithms called spatially adaptive iterative singular-value thresholding (SAIST). For noise data, SAIST generalizes

the celebrated BayesShrink from local to nonlocal models; for incomplete data, SAIST extends previous

deterministic annealing-based solution to sparsity optimization through incorporating the idea of

dictionary learning. In addition to conceptual simplicity and computational efficiency, SAIST has achieved highly competent (often better) objective performance compared to several state-of-the-art

methods in image denoising and completion experiments. Our subjective quality results compare

favorably with those obtained by existing techniques, especially at high noise levels and with a large amount of missing data.

ETPL

DIP-073 Variational Approach for the Fusion of Exposure Bracketed Pairs

Abstract: When taking pictures of a dark scene with artificial lighting, ambient light is not sufficient for

most cameras to obtain both accurate color and detail information. The exposure bracketing feature usually available in many camera models enables the user to obtain a series of pictures taken in rapid

succession with different exposure times; the implicit idea is that the user picks the best image from this

set. But in many cases, none of these images is good enough; in general, good brightness and color information are retained from longer-exposure settings, whereas sharp details are obtained from shorter

ones. In this paper, we propose a variational method for automatically combining an exposure-bracketed

pair of images within a single picture that reflects the desired properties of each one. We introduce an energy functional consisting of two terms, one measuring the difference in edge information with the

short-exposure image and the other measuring the local color difference with a warped version of the

long-exposure image. This method is able to handle camera and subject motion as well as noise, and the

results compare favorably with the state of the art.

ETPL

DIP-074 Image Denoising With Dominant Sets by a Coalitional Game Approach

Abstract: Dominant sets are a new graph partition method for pairwise data clustering proposed by Pavan

and Pelillo. We address the problem of dominant sets with a coalitional game model, in which each data point is treated as a player and similar data points are encouraged to group together for cooperation. We

propose betrayal and hermit rules to describe the cooperative behaviors among the players. After applying

the betrayal and hermit rules, an optimal and stable graph partition emerges, and all the players in the

partition will not change their groups. For computational feasibility, we design an approximate algorithm for finding a dominant set of mutually similar players and then apply the algorithm to an application such

as image denoising. In image denoising, every pixel is treated as a player who seeks similar partners

according to its patch appearance in its local neighborhood. By averaging the noisy effects with the




similar pixels in the dominant sets, we improve nonlocal means image denoising to restore the intrinsic

structure of the original images and achieve competitive denoising results with the state-of-the-art

methods in visual and quantitative qualities.

ETPL

DIP-075 High-Order Local Spatial Context Modeling by Spatialized Random Forest

Abstract: In this paper, we propose a novel method for spatial context modeling toward boosting visual

discriminating power. We are particularly interested in how to model high-order local spatial contexts instead of the intensively studied second-order spatial contexts, i.e., co-occurrence relations. Motivated

by the recent success of random forest in learning discriminative visual codebook, we present a

spatialized random forest (SRF) approach, which can encode an unlimited length of high-order local spatial contexts. By spatially random neighbor selection and random histogram-bin partition during the

tree construction, the SRF can explore much more complicated and informative local spatial patterns in a

randomized manner. Owing to the discriminative capability test for the random partition in each tree node's split process, a set of informative high-order local spatial patterns are derived, and new images are

then encoded by counting the occurrences of such discriminative local spatial patterns. Extensive

comparison experiments on face recognition and object/scene classification clearly demonstrate the

superiority of the proposed spatial context modeling method over other state-of-the-art approaches for this purpose.

ETPL

DIP-076 Adaptive Inpainting Algorithm Based on DCT Induced Wavelet Regularization

Abstract: In this paper, we propose an image inpainting optimization model whose objective function is a smoothed ℓ

1 norm of the weighted nondecimated discrete cosine transform (DCT) coefficients of the

underlying image. By identifying the objective function of the proposed model as a sum of a

differentiable term and a nondifferentiable term, we present a basic algorithm inspired by Beck and

Teboulle's recent work on the model. Based on this basic algorithm, we propose an automatic way to determine the weights involved in the model and update them in each iteration. The DCT as an

orthogonal transform is used in various applications. We view the rows of a DCT matrix as the filters

associated with a multiresolution analysis. Nondecimated wavelet transforms with these filters are explored in order to analyze the images to be inpainted. Our numerical experiments verify that under the

proposed framework, the filters from a DCT matrix demonstrate promise for the task of image inpainting

ETPL

DIP-077 Extended Coding and Pooling in the HMAX Model

Abstract: This paper presents an extension of the HMAX model, a neural network model for image

classification. The HMAX model can be described as a four-level architecture, with the first level

consisting of multiscale and multiorientation local filters. We introduce two main contributions to this

model. First, we improve the way the local filters at the first level are integrated into more complex filters at the last level, providing a flexible description of object regions and combining local information of

multiple scales and orientations. These new filters are discriminative and yet invariant, two key aspects of

visual classification. We evaluate their discriminative power and their level of invariance to geometrical transformations on a synthetic image set. Second, we introduce a multiresolution spatial pooling. This

pooling encodes both local and global spatial information to produce discriminative image signatures.

Classification results are reported on three image data sets: Caltech101, Caltech256, and fifteen scenes.

We show significant improvements over previous architectures using a similar framework.




ETPL

DIP-078 Human Detection in Images via Piecewise Linear Support Vector Machines

Abstract: Human detection in images is challenged by the view and posture variation problem. In this

paper, we propose a piecewise linear support vector machine (PL-SVM) method to tackle this problem. The motivation is to exploit the piecewise discriminative function to construct a nonlinear classification

boundary that can discriminate multiview and multiposture human bodies from the backgrounds in a

high-dimensional feature space. A PL-SVM training is designed as an iterative procedure of feature space division and linear SVM training, aiming at the margin maximization of local linear SVMs. Each

piecewise SVM model is responsible for a subspace, corresponding to a human cluster of a special view

or posture. In the PL-SVM, a cascaded detector is proposed with block orientation features and a histogram of oriented gradient features. Extensive experiments show that compared with several recent

SVM methods, our method reaches the state of the art in both detection accuracy and computational

efficiency, and it performs best when dealing with low-resolution human regions in clutter backgrounds.

ETPL

DIP-079 Short Distance Intra Coding Scheme for High Efficiency Video Coding

Abstract: This paper proposes a new intra coding scheme, known as short distance intra prediction

(SDIP), for high efficiency video coding (HEVC) standardization work. The proposed method is based on

the quadtree unit structure of HEVC. By splitting a coding unit into nonsquare units for coding and reconstruction, and therefore shortening the distances between the predicted and the reference samples,

the accuracy of intra prediction can be improved when applying the directional prediction method. SDIP

improves the intra prediction accuracy, especially for high-detailed regions. This approach is applied in

both luma and chroma components. When integrated into the HEVC reference software, it shows up to a 12.8% bit rate reduction to sequences with rich textures.

ETPL

DIP-080 Probabilistic Graphlet Transfer for Photo Cropping

Abstract: As one of the most basic photo manipulation processes, photo cropping is widely used in the printing, graphic design, and photography industries. In this paper, we introduce graphlets (i.e., small

connected subgraphs) to represent a photo's aesthetic features, and propose a probabilistic model to

transfer aesthetic features from the training photo onto the cropped photo. In particular, by segmenting

each photo into a set of regions, we construct a region adjacency graph (RAG) to represent the global aesthetic feature of each photo. Graphlets are then extracted from the RAGs, and these graphlets capture

the local aesthetic features of the photos. Finally, we cast photo cropping as a candidate-searching

procedure on the basis of a probabilistic model, and infer the parameters of the cropped photos using Gibbs sampling. The proposed method is fully automatic. Subjective evaluations have shown that it is

preferred over a number of existing approaches.

ETPL

DIP-081 On Removing Interpolation and Resampling Artifacts in Rigid Image Registration

Abstract: We show that image registration using conventional interpolation and summation

approximations of continuous integrals can generally fail because of resampling artifacts. These artifacts

negatively affect the accuracy of registration by producing local optima, altering the gradient, shifting the

global optimum, and making rigid registration asymmetric. In this paper, after an extensive literature review, we demonstrate the causes of the artifacts by comparing inclusion and avoidance of resampling

analytically. We show the sum-of-squared-differences cost function formulated as an integral to be more

accurate compared with its traditional sum form in a simple case of image registration. We then discuss aliasing that occurs in rotation, which is due to the fact that an image represented in the Cartesian grid is

sampled with different rates in different directions, and propose the use of oscillatory isotropic

interpolation kernels, which allow better recovery of true global optima by overcoming this type of




aliasing. Through our experiments on brain, fingerprint, and white noise images, we illustrate the superior

performance of the integral registration cost function in both the Cartesian and spherical coordinates, and

also validate the introduced radial interpolation kernel by demonstrating the improvement in registration.

ETPL

DIP-082 Fast Positive Deconvolution of Hyperspectral Images

Abstract: In this brief, we provide an efficient scheme for performing deconvolution of large

hyperspectral images under a positivity constraint, while accounting for spatial and spectral smoothness

of the data.

ETPL

DIP-083

Segmentation of Intracranial Vessels and Aneurysms in Phase Contrast Magnetic

Resonance Angiography Using Multirange Filters and Local Variances

Abstract: Segmentation of intensity varying and low-contrast structures is an extremely challenging and

rewarding task. In computer-aided diagnosis of intracranial aneurysms, segmenting the high-intensity major vessels along with the attached low-contrast aneurysms is essential to the recognition of this lethal

vascular disease. It is particularly helpful in performing early and noninvasive diagnosis of intracranial

aneurysms using phase contrast magnetic resonance angiographic (PC-MRA) images. The major challenges of developing a PC-MRA-based segmentation method are the significantly varying voxel

intensity inside vessels with different flow velocities and the signal loss in the aneurysmal regions where

turbulent flows occur. This paper proposes a novel intensity-based algorithm to segment intracranial

vessels and the attached aneurysms. The proposed method can handle intensity varying vasculatures and also the low-contrast aneurysmal regions affected by turbulent flows. It is grounded on the use of

multirange filters and local variances to extract intensity-based image features for identifying contrast

varying vasculatures. The extremely low-intensity region affected by turbulent flows is detected according to the topology of the structure detected by multirange filters and local variances. The proposed

method is evaluated using a phantom image volume with an aneurysm and four clinical cases. It achieves

0.80 dice score in the phantom case. In addition, different components of the proposed method-the

multirange filters, local variances, and topology-based detection-are evaluated in the comparison between the proposed method and its lower complexity variants. Owing to the analogy between these variants and

existing vascular segmentation methods, this comparison also exemplifies the advantage of the proposed

method over the existing approaches. It analyzes the weaknesses of these existing approaches and justifies the use of every component involved in the proposed method. It- is shown that the proposed

method is capable of segmenting blood vessels and the attached aneurysms on PC-MRA images.

ETPL

DIP-084 Robust Image Analysis With Sparse Representation on Quantized Visual Features

Abstract: Recent techniques based on sparse representation (SR) have demonstrated promising

performance in high-level visual recognition, exemplified by the highly accurate face recognition under

occlusion and other sparse corruptions. Most research in this area has focused on classification algorithms

using raw image pixels, and very few have been proposed to utilize the quantized visual features, such as the popular bag-of-words feature abstraction. In such cases, besides the inherent quantization errors,

ambiguity associated with visual word assignment and misdetection of feature points, due to factors such

as visual occlusions and noises, constitutes the major cause of dense corruptions of the quantized representation. The dense corruptions can jeopardize the decision process by distorting the patterns of the

sparse reconstruction coefficients. In this paper, we aim to eliminate the corruptions and achieve robust

image analysis with SR. Toward this goal, we introduce two transfer processes (ambiguity transfer and

mis-detection transfer) to account for the two major sources of corruption as discussed. By reasonably assuming the rarity of the two kinds of distortion processes, we augment the original SR-based

reconstruction objective with mmbl0-norm regularization on the transfer terms to encourage sparsity and,

hence, discourage dense distortion/transfer. Computationally, we relax the nonconvex mmbl0-norm




optimization into a convex mmbl1-norm optimization problem, and employ the accelerated proximal

gradient method to optimize the convergence provable updating procedure. Extensive experiments on

four benchmark datasets, Caltech-101, Caltech-256, Corel-5k, and CMU pose, illumination, and expression, manifest the necessity of removing the quantization corruptions and the various advantages of

the proposed framework.

ETPL

DIP-085 Additive White Gaussian Noise Level Estimation in SVD Domain for Images

Abstract: Accurate estimation of Gaussian noise level is of fundamental interest in a wide variety of vision and image processing applications as it is critical to the processing techniques that follow. In this

paper, a new effective noise level estimation method is proposed on the basis of the study of singular

values of noise-corrupted images. Two novel aspects of this paper address the major challenges in noise estimation: 1) the use of the tail of singular values for noise estimation to alleviate the influence of the

signal on the data basis for the noise estimation process and 2) the addition of known noise to estimate the

content-dependent parameter, so that the proposed scheme is adaptive to visual signals, thereby enabling a wider application scope of the proposed scheme. The analysis and experiment results demonstrate that

the proposed algorithm can reliably infer noise levels and show robust behavior over a wide range of

visual content and noise conditions, and that is outperforms relevant existing methods.

ETPL

DIP-086

Nonedge-Specific Adaptive Scheme for Highly Robust Blind Motion Deblurring of

Natural Imagess,

Abstract: Blind motion deblurring estimates a sharp image from a motion blurred image without the

knowledge of the blur kernel. Although significant progress has been made on tackling this problem,

existing methods, when applied to highly diverse natural images, are still far from stable. This paper focuses on the robustness of blind motion deblurring methods toward image diversity-a critical problem

that has been previously neglected for years. We classify the existing methods into two schemes and

analyze their robustness using an image set consisting of 1.2 million natural images. The first scheme is

edge-specific, as it relies on the detection and prediction of large-scale step edges. This scheme is sensitive to the diversity of the image edges in natural images. The second scheme is nonedge-specific

and explores various image statistics, such as the prior distributions. This scheme is sensitive to statistical

variation over different images. Based on the analysis, we address the robustness by proposing a novel nonedge-specific adaptive scheme (NEAS), which features a new prior that is adaptive to the variety of

textures in natural images. By comparing the performance of NEAS against the existing methods on a

very large image set, we demonstrate its advance beyond the state-of-the-art.

ETPL

DIP-087

Image Enhancement Using the Hypothesis Selection Filter: Theory and Application to

JPEG Decoding

Abstract: We introduce the hypothesis selection filter (HSF) as a new approach for image quality

enhancement. We assume that a set of filters has been selected a priori to improve the quality of a

distorted image containing regions with different characteristics. At each pixel, HSF uses a locally computed feature vector to predict the relative performance of the filters in estimating the corresponding

pixel intensity in the original undistorted image. The prediction result then determines the proportion of

each filter used to obtain the final processed output. In this way, the HSF serves as a framework for combining the outputs of a number of different user selected filters, each best suited for a different region

of an image. We formulate our scheme in a probabilistic framework where the HSF output is obtained as

the Bayesian minimum mean square error estimate of the original image. Maximum likelihood estimates

of the model parameters are determined from an offline fully unsupervised training procedure that is derived from the expectation-maximization algorithm. To illustrate how to apply the HSF and to

demonstrate its potential, we apply our scheme as a post-processing step to improve the decoding quality

of JPEG-encoded document images. The scheme consistently improves the quality of the decoded image




over a variety of image content with different characteristics. We show that our scheme results in

quantitative improvements over several other state-of-the-art JPEG decoding methods.

ETPL

DIP-088 Learning the Spherical Harmonic Features for 3-D Face Recognition

Abstract: In this paper, a competitive method for 3-D face recognition (FR) using spherical harmonic

features (SHF) is proposed. With this solution, 3-D face models are characterized by the energies

contained in spherical harmonics with different frequencies, thereby enabling the capture of both gross

shape and fine surface details of a 3-D facial surface. This is in clear contrast to most 3-D FR techniques which are either holistic or feature based, using local features extracted from distinctive points. First, 3-D

face models are represented in a canonical representation, namely, spherical depth map, by which SHF

can be calculated. Then, considering the predictive contribution of each SHF feature, especially in the presence of facial expression and occlusion, feature selection methods are used to improve the predictive

performance and provide faster and more cost-effective predictors. Experiments have been carried out on

three public 3-D face datasets, SHREC2007, FRGC v2.0, and Bosphorus, with increasing difficulties in terms of facial expression, pose, and occlusion, and which demonstrate the effectiveness of the proposed

method.

ETPL

DIP-089

Video Deblurring Algorithm Using Accurate Blur Kernel Estimation and Residual

Deconvolution Based on a Blurred-Unblurred Frame Pair

Abstract: Blurred frames may happen sparsely in a video sequence acquired by consumer devices such as digital camcorders and digital cameras. In order to avoid visually annoying artifacts due to those blurred

frames, this paper presents a novel motion deblurring algorithm in which a blurred frame can be

reconstructed utilizing the high-resolution information of adjacent unblurred frames. First, a motion-compensated predictor for the blurred frame is derived from its neighboring unblurred frame via specific

motion estimation. Then, an accurate blur kernel, which is difficult to directly obtain from the blurred

frame itself, is computed using both the predictor and the blurred frame. Next, a residual deconvolution is

applied to both of those frames in order to reduce the ringing artifacts inherently caused by conventional deconvolution. The blur kernel estimation and deconvolution processes are iteratively performed for the

deblurred frame. Simulation results show that the proposed algorithm provides superior deblurring results

over conventional deblurring algorithms while preserving details and reducing ringing artifacts

ETPL

DIP-090

Rank Minimization Code Aperture Design for Spectrally Selective Compressive

Imaging

Abstract: A new code aperture design framework for multiframe code aperture snapshot spectral imaging

(CASSI) system is presented. It aims at the optimization of code aperture sets such that a group of compressive spectral measurements is constructed, each with information from a specific subset of bands.

A matrix representation of CASSI is introduced that permits the optimization of spectrally selective code

aperture sets. Furthermore, each code aperture set forms a matrix such that rank minimization is used to

reduce the number of CASSI shots needed. Conditions for the code apertures are identified such that a restricted isometry property in the CASSI compressive measurements is satisfied with higher probability.

Simulations show higher quality of spectral image reconstruction than that attained by systems using

Hadamard or random code aperture sets.

ETPL

DIP-091

Coaching the Exploration and Exploitation in Active Learning for Interactive Video

Retrieval

Abstract: Conventional active learning approaches for interactive video/image retrieval usually assume

the query distribution is unknown, as it is difficult to estimate with only a limited number of labeled

instances available. Thus, it is easy to put the system in a dilemma whether to explore the feature space in uncertain areas for a better understanding of the query distribution or to harvest in certain areas for more

relevant instances. In this paper, we propose a novel approach called coached active learning that makes




the query distribution predictable through training and, therefore, avoids the risk of searching on a

completely unknown space. The estimated distribution, which provides a more global view of the feature

space, can be used to schedule not only the timing but also the step sizes of the exploration and the exploitation in a principled way. The results of the experiments on a large-scale data set from TRECVID

2005-2009 validate the efficiency and effectiveness of our approach, which demonstrates an encouraging

performance when facing domain-shift, outperforms eight conventional active learning methods, and shows superiority to six state-of-the-art interactive video retrieval systems.

ETPL

DIP-092 Nonnegative Local Coordinate Factorization for Image Representation,

Abstract: Recently, nonnegative matrix factorization (NMF) has become increasingly popular for feature

extraction in computer vision and pattern recognition. NMF seeks two nonnegative matrices whose product can best approximate the original matrix. The nonnegativity constraints lead to sparse parts-based

representations that can be more robust than nonsparse global features. To obtain more accurate control

over the sparseness, in this paper, we propose a novel method called nonnegative local coordinate factorization (NLCF) for feature extraction. NLCF adds a local coordinate constraint into the standard

NMF objective function. Specifically, we require that the learned basis vectors be as close to the original

data points as possible. In this way, each data point can be represented by a linear combination of only a

few nearby basis vectors, which naturally leads to sparse representation. Extensive experimental results suggest that the proposed approach provides a better representation and achieves higher accuracy in

image clustering.

ETPL

DIP-093 Flip-Invariant SIFT for Copy and Object Detection

Abstract: Scale-invariant feature transform (SIFT) feature has been widely accepted as an effective local

keypoint descriptor for its invariance to rotation, scale, and lighting changes in images. However, it is

also well known that SIFT, which is derived from directionally sensitive gradient fields, is not flip

invariant. In real-world applications, flip or flip-like transformations are commonly observed in images due to artificial flipping, opposite capturing viewpoint, or symmetric patterns of objects. This paper

proposes a new descriptor, named flip-invariant SIFT (or F-SIFT), that preserves the original properties

of SIFT while being tolerant to flips. F-SIFT starts by estimating the dominant curl of a local patch and then geometrically normalizes the patch by flipping before the computation of SIFT. We demonstrate the

power of F-SIFT on three tasks: large-scale video copy detection, object recognition, and detection. In

copy detection, a framework, which smartly indices the flip properties of F-SIFT for rapid filtering and weak geometric checking, is proposed. F-SIFT not only significantly improves the detection accuracy of

SIFT, but also leads to a more than 50% savings in computational cost. In object recognition, we

demonstrate the superiority of F-SIFT in dealing with flip transformation by comparing it to seven other

descriptors. In object detection, we further show the ability of F-SIFT in describing symmetric objects. Consistent improvement across different kinds of keypoint detectors is observed for F-SIFT over the

original SIFT.

ETPL

DIP-094

Multiscale Image Fusion Using the Undecimated Wavelet Transform With Spectral

Factorization and Nonorthogonal Filter Banks

Abstract: Multiscale transforms are among the most popular techniques in the field of pixel-level image

fusion. However, the fusion performance of these methods often deteriorates for images derived from

different sensor modalities. In this paper, we demonstrate that for such images, results can be improved

using a novel undecimated wavelet transform (UWT)-based fusion scheme, which splits the image decomposition process into two successive filtering operations using spectral factorization of the analysis

filters. The actual fusion takes place after convolution with the first filter pair. Its significantly smaller

support size leads to the minimization of the unwanted spreading of coefficient values around




overlapping image singularities. This usually complicates the feature selection process and may lead to

the introduction of reconstruction errors in the fused image. Moreover, we will show that the

nonsubsampled nature of the UWT allows the design of nonorthogonal filter banks, which are more robust to artifacts introduced during fusion, additionally improving the obtained results. The combination

of these techniques leads to a fusion framework, which provides clear advantages over traditional

multiscale fusion approaches, independent of the underlying fusion rule, and reduces unwanted side effects such as ringing artifacts in the fused reconstruction.

ETPL

DIP-095 Context-Dependent Logo Matching and Recognition

Abstract: We contribute, through this paper, to the design of a novel variational framework able to match

and recognize multiple instances of multiple reference logos in image archives. Reference logos and test images are seen as constellations of local features (interest points, regions, etc.) and matched by

minimizing an energy function mixing: 1) a fidelity term that measures the quality of feature matching, 2)

a neighborhood criterion that captures feature co-occurrence/geometry, and 3) a regularization term that controls the smoothness of the matching solution. We also introduce a detection/recognition procedure

and study its theoretical consistency. Finally, we show the validity of our method through extensive

experiments on the challenging MICC-Logos dataset. Our method overtakes, by 20%, baseline as well as

state-of-the-art matching/recognition procedures.

ETPL

DIP-096

Efficient Contrast Enhancement Using Adaptive Gamma Correction With Weighting

Distribution

Abstract: This paper proposes an efficient method to modify histograms and enhance contrast in digital

images. Enhancement plays a significant role in digital image processing, computer vision, and pattern recognition. We present an automatic transformation technique that improves the brightness of dimmed

images via the gamma correction and probability distribution of luminance pixels. To enhance video, the

proposed image-enhancement method uses temporal information regarding the differences between each

frame to reduce computational complexity. Experimental results demonstrate that the proposed method produces enhanced images of comparable or higher quality than those produced using previous state-of-

the-art methods.

ETPL

DIP-097 Binary Compressed Imaging

Abstract: Compressed sensing can substantially reduce the number of samples required for conventional

signal acquisition at the expense of an additional reconstruction procedure. It also provides robust

reconstruction when using quantized measurements, including in the one-bit setting. In this paper, our goal is to design a framework for binary compressed sensing that is adapted to images. Accordingly, we

propose an acquisition and reconstruction approach that complies with the high dimensionality of image

data and that provides reconstructions of satisfactory visual quality. Our forward model describes data

acquisition and follows physical principles. It entails a series of random convolutions performed optically followed by sampling and binary thresholding. The binary samples that are obtained can be either

measured or ignored according to predefined functions. Based on these measurements, we then express

our reconstruction problem as the minimization of a compound convex cost that enforces the consistency of the solution with the available binary data under total-variation regularization. Finally, we derive an

efficient reconstruction algorithm relying on convex-optimization principles. We conduct several

experiments on standard images and demonstrate the practical interest of our approach.

ETPL

DIP-098

MIMO Nonlinear Ultrasonic Tomography by Propagation and Backpropagation

Method

Abstract: This paper develops a fast ultrasonic tomographic imaging method in a multiple-input multiple-

output (MIMO) configuration using the propagation and backpropagation (PBP) method. By this method,




ultrasonic excitation signals from multiple sources are transmitted simultaneously to probe the objects

immersed in the medium. The scattering signals are recorded by multiple receivers. Utilizing the

nonlinear ultrasonic wave propagation equation and the received time domain scattered signals, the objects are to be reconstructed iteratively in three steps. First, the propagation step calculates the

predicted acoustic potential data at the receivers using an initial guess. Second, the difference signal

between the predicted value and the measured data is calculated. Third, the backpropagation step computes updated acoustical potential data by backpropagating the difference signal to the same medium

computationally. Unlike the conventional PBP method for tomographic imaging where each source takes

turns to excite the acoustical field until all the sources are used, the developed MIMO-PBP method

achieves faster image reconstruction by utilizing multiple source simultaneous excitation. Furthermore, we develop an orthogonal waveform signaling method using a waveform delay scheme to reduce the

impact of speckle patterns in the reconstructed images. By numerical experiments we demonstrate that

the proposed MIMO-PBP tomographic imaging method results in faster convergence and achieves superior imaging quality.

ETPL

DIP-099

Vector Extension of Monogenic Wavelets for Geometric Representation of Color

Images

Abstract: Monogenic wavelets offer a geometric representation of grayscale images through an AM-FM

model allowing invariance of coefficients to translations and rotations. The underlying concept of local phase includes a fine contour analysis into a coherent unified framework. Starting from a link with

structure tensors, we propose a nontrivial extension of the monogenic framework to vector-valued signals

to carry out a nonmarginal color monogenic wavelet transform. We also give a practical study of this new wavelet transform in the contexts of sparse representations and invariant analysis, which helps to

understand the physical interpretation of coefficients and validates the interest of our theoretical

construction.

ETPL

DIP-100 Myocardial Motion Estimation From Medical Images Using the Monogenic Signal

Abstract: We present a method for the analysis of heart motion from medical images. The algorithm

exploits monogenic signal theory, recently introduced as an N-dimensional generalization of the analytic

signal. The displacement is computed locally by assuming the conservation of the monogenic phase over time. A local affine displacement model is considered to account for typical heart motions as

contraction/expansion and shear. A coarse-to-fine B-spline scheme allows a robust and effective

computation of the model's parameters, and a pyramidal refinement scheme helps to handle large motions. Robustness against noise is increased by replacing the standard point-wise computation of the

monogenic orientation with a robust least-squares orientation estimate. Given its general formulation, the

algorithm is well suited for images from different modalities, in particular for those cases where time

variant changes of local intensity invalidate the standard brightness constancy assumption. This paper evaluates the method's feasibility on two emblematic cases: cardiac tagged magnetic resonance and

cardiac ultrasound. In order to quantify the performance of the proposed method, we made use of realistic

synthetic sequences from both modalities for which the benchmark motion is known. A comparison is presented with state-of-the-art methods for cardiac motion analysis. On the data considered, these

conventional approaches are outperformed by the proposed algorithm. A recent global optical-flow

estimation algorithm based on the monogenic curvature tensor is also considered in the comparison. With respect to the latter, the proposed framework provides, along with higher accuracy, superior robustness to

noise and a considerably shorter computation time.




ETPL

DIP-101

Revisiting the Relationship Between Adaptive Smoothing and Anisotropic Diffusion

With Modified Filters

Abstract: Anisotropic diffusion has been known to be closely related to adaptive smoothing and

discretized in a similar manner. This paper revisits a fundamental relationship between two approaches. It is shown that adaptive smoothing and anisotropic diffusion have different theoretical backgrounds by

exploring their characteristics with the perspective of normalization, evolution step size, and energy flow.

Based on this principle, adaptive smoothing is derived from a second order partial differential equation (PDE), not a conventional anisotropic diffusion, via the coupling of Fick's law with a generalized

continuity equation where a “source” or “sink” exists, which has not been extensively exploited. We

show that the source or sink is closely related to the asymmetry of energy flow as well as the normalization term of adaptive smoothing. It enables us to analyze behaviors of adaptive smoothing, such

as the maximum principle and stability with a perspective of a PDE. Ultimately, this relationship provides

new insights into application-specific filtering algorithm design. By modeling the source or sink in the

PDE, we introduce two specific diffusion filters, the robust anisotropic diffusion and the robust coherence enhancing diffusion, as novel instantiations which are more robust against the outliers than the

conventional filters.

ETPL

DIP-102

A Weighted Dictionary Learning Model for Denoising Images Corrupted by Mixed

Noise

Abstract: This paper proposes a general weighted l2-l

0 norms energy minimization model to remove

mixed noise such as Gaussian-Gaussian mixture, impulse noise, and Gaussian-impulse noise from the

images. The approach is built upon maximum likelihood estimation framework and sparse representations

over a trained dictionary. Rather than optimizing the likelihood functional derived from a mixture distribution, we present a new weighting data fidelity function, which has the same minimizer as the

original likelihood functional but is much easier to optimize. The weighting function in the model can be

determined by the algorithm itself, and it plays a role of noise detection in terms of the different estimated noise parameters. By incorporating the sparse regularization of small image patches, the proposed method

can efficiently remove a variety of mixed or single noise while preserving the image textures well. In

addition, a modified K-SVD algorithm is designed to address the weighted rank-one approximation. The experimental results demonstrate its better performance compared with some existing methods

ETPL

DIP-103 Comparative Study of Fixation Density Maps

Abstract: Fixation density maps (FDM) created from eye tracking experiments are widely used in image

processing applications. The FDM are assumed to be reliable ground truths of human visual attention and as such, one expects a high similarity between FDM created in different laboratories. So far, no studies

have analyzed the degree of similarity between FDM from independent laboratories and the related

impact on the applications. In this paper, we perform a thorough comparison of FDM from three independently conducted eye tracking experiments. We focus on the effect of presentation time and

image content and evaluate the impact of the FDM differences on three applications: visual saliency

modeling, image quality assessment, and image retargeting. It is shown that the FDM are very similar and

that their impact on the applications is low. The individual experiment comparisons, however, are found to be significantly different, showing that inter-laboratory differences strongly depend on the

experimental conditions of the laboratories. The FDM are publicly available to the research community.

ETPL

DIP-104 Efficient Method for Content Reconstruction With Self-Embedding

Abstract: This paper presents a new model of the content reconstruction problem in self-embedding

systems, based on an erasure communication channel. We explain why such a model is a good fit for this

problem, and how it can be practically implemented with the use of digital fountain codes. The proposed




method is based on an alternative approach to spreading the reference information over the whole image,

which has recently been shown to be of critical importance in the application at hand. Our paper presents

a theoretical analysis of the inherent restoration trade-offs. We analytically derive formulas for the reconstruction success bounds, and validate them experimentally with Monte Carlo simulations and a

reference image authentication system. We perform an exhaustive reconstruction quality assessment,

where the presented reference scheme is compared to five state-of-the-art alternatives in a common evaluation scenario. Our paper leads to important insights on how self-embedding schemes should be

constructed to achieve optimal performance. The reference authentication system designed according to

the presented principles allows for high-quality reconstruction, regardless of the amount of the tampered

content. The average reconstruction quality, measured on 10000 natural images is 37 dB, and is achievable even when 50% of the image area becomes tampered.

ETPL

DIP-105

Modeling IrisCode and Its Variants as Convex Polyhedral Cones and Its Security

Implications

Abstract: IrisCode, developed by Daugman, in 1993, is the most influential iris recognition algorithm. A thorough understanding of IrisCode is essential, because over 100 million persons have been enrolled by

this algorithm and many biometric personal identification and template protection methods have been

developed based on IrisCode. This paper indicates that a template produced by IrisCode or its variants is

a convex polyhedral cone in a hyperspace. Its central ray, being a rough representation of the original biometric signal, can be computed by a simple algorithm, which can often be implemented in one Matlab

command line. The central ray is an expected ray and also an optimal ray of an objective function on a

group of distributions. This algorithm is derived from geometric properties of a convex polyhedral cone but does not rely on any prior knowledge (e.g., iris images). The experimental results show that biometric

templates, including iris and palmprint templates, produced by different recognition methods can be

matched through the central rays in their convex polyhedral cones and that templates protected by a method extended from IrisCode can be broken into. These experimental results indicate that, without a

thorough security analysis, convex polyhedral cone templates cannot be assumed secure. Additionally,

the simplicity of the algorithm implies that even junior hackers without knowledge of advanced image

processing and biometric databases can still break into protected templates and reveal relationships among templates produced by different recognition methods.

ETPL

DIP-106 Correspondence Map-Aided Neighbor Embedding for Image Intra Prediction

Abstract: This paper describes new image prediction methods based on neighbor embedding (NE) techniques. Neighbor embedding methods are used here to approximate an input block (the block to be

predicted) in the image as a linear combination of K nearest neighbors. However, in order for the decoder

to proceed similarly, the K nearest neighbors are found by computing distances between the known pixels

in a causal neighborhood (called template) of the input block and the co-located pixels in candidate patches taken from a causal window. Similarly, the weights used for the linear approximation are

computed in order to best approximate the template pixels. Although efficient, these methods suffer from

limitations when the template and the block to be predicted are not correlated, e.g., in non homogenous texture areas. To cope with these limitations, this paper introduces new image prediction methods based

on NE techniques in which the K-NN search is done in two steps and aided, at the decoder, by a block

correspondence map, hence the name map-aided neighbor embedding (MANE) method. Another optimized variant of this approach, called oMANE method, is also studied. In these methods, several

alternatives have also been proposed for the K-NN search. The resulting prediction methods are shown to

bring significant rate-distortion performance improvements when compared to H.264 Intra prediction

modes (up to 44.75% rate saving at low bit rates).




ETPL

DIP-107

Estimation-Theoretic Approach to Delayed Decoding of Predictively Encoded Video

Sequences

Abstract: Current video coders employ predictive coding with motion compensation to exploit temporal

redundancies in the signal. In particular, blocks along a motion trajectory are modeled as an auto-

regressive (AR) process, and it is generally assumed that the prediction errors are temporally independent and approximate the innovations of this process. Thus, zero-delay encoding and decoding is considered

efficient. This paper is premised on the largely ignored fact that these prediction errors are, in fact,

temporally dependent due to quantization effects in the prediction loop. It presents an estimation-theoretic

delayed decoding scheme, which exploits information from future frames to improve the reconstruction quality of the current frame. In contrast to the standard decoder that reproduces every block

instantaneously once the corresponding quantization indices of residues are available, the proposed

delayed decoder efficiently combines all accessible (including any future) information in an appropriately derived probability density function, to obtain the optimal delayed reconstruction per transform

coefficient. Experiments demonstrate significant gains over the standard decoder. Requisite information

about the source AR model is estimated in a spatio-temporally adaptive manner from a bit-stream conforming to the H.264/AVC standard, i.e., no side information needs to be sent to the decoder in order

to employ the proposed approach, thereby compatibility with the standard syntax and existing encoders is

retained

ETPL

DIP-108 Correction of Axial and Lateral Chromatic Aberration With False Color Filtering

Abstract: In this paper, we propose a chromatic aberration (CA) correction algorithm based on a false

color filtering technique. In general, CA produces color distortions called color fringes near the

contrasting edges of captured images, and these distortions cause false color artifacts. In the proposed method, a false color filtering technique is used to filter out the false color components from the chroma-

signals of the input image. The filtering process is performed with the adaptive weights obtained from

both the gradient and color differences, and the weights are designed to reduce the various types of color

fringes regardless of the colors of the artifacts. Moreover, as preprocessors of the filtering process, a transient improvement (TI) technique is applied to enhance the slow transitions of the red and blue

channels that are blurred by the CA. The TI process improves the filtering performance by narrowing the

false color regions before the filtering process when severe color fringes (typically purple fringes) occur widely. Last, the CA-corrected chroma-signal is combined with the TI chroma-signal to avoid incorrect

color adjustment. The experimental results show that the proposed method substantially reduces the CA

artifacts and provides natural-looking replacement colors, while it avoids incorrect color adjustment

ETPL

DIP-109 New Class Tiling Design for Dot-Diffused Halftoning

Abstract: In this paper, a new class tiling designed dot diffusion along with the optimized class matrix

and diffused matrix are proposed. The result of this method presents a nearly periodic-free halftone when

compared to the former schemes. Formerly, the class matrix of the dot diffusion is duplicated and orthogonally tiled to fulfill the entire image for further thresholding and quantized-error diffusion, which

accompanies subsequent periodic artifacts. In our observation, this artifact can be solved by manipulating

the class tiling with comprising rotation, transpose, and alternatively shifting of the class matrices. As documented in the experimental results, the proposed dot diffusion has been compared with the former

halftoning methods with parallelism in terms of image quality, processing efficiency, periodicity, and

memory consumption; the proposed dot diffusion exhibits as a very competitive candidate in the

printing/display market

ETPL W-Tree Indexing for Fast Visual Word Generation




DIP-110

Abstract: The bag-of-visual-words representation has been widely used in image retrieval and visual recognition. The most time-consuming step in obtaining this representation is the visual word generation,

i.e., assigning visual words to the corresponding local features in a high-dimensional space. Recently,

structures based on multibranch trees and forests have been adopted to reduce the time cost. However,

these approaches cannot perform well without a large number of backtrackings. In this paper, by considering the spatial correlation of local features, we can significantly speed up the time consuming

visual word generation process while maintaining accuracy. In particular, visual words associated with

certain structures frequently co-occur; hence, we can build a co-occurrence table for each visual word for a large-scale data set. By associating each visual word with a probability according to the corresponding

co-occurrence table, we can assign a probabilistic weight to each node of a certain index structure (e.g., a

KD-tree and a K-means tree), in order to re-direct the searching path to be close to its global optimum within a small number of backtrackings. We carefully study the proposed scheme by comparing it with

the fast library for approximate nearest neighbors and the random KD-trees on the Oxford data set.

Thorough experimental results suggest the efficiency and effectiveness of the new scheme.

ETPL

DIP-111

Efficient Improvements on the BDND Filtering Algorithm for the Removal of High-

Density Impulse Noise

Abstract: Switching median filters are known to outperform standard median filters in the removal of

impulse noise due to their capability of filtering candidate noisy pixels and leaving other pixels intact.

The boundary discriminative noise detection (BDND) is one powerful example in this class of filters. However, there are some issues related to the filtering step in the BDND algorithm that may degrade its

performance. In this paper, we propose two modifications to the filtering step of the BDND algorithm to

address these issues. Experimental evaluation shows the effectiveness of the proposed modifications in

producing sharper images than the BDND algorithm.

ETPL

DIP-112 Novel Approaches to the Parametric Cubic-Spline Interpolation

Abstract: The cubic-spline interpolation (CSI) scheme can be utilized to obtain a better quality

reconstructed image. It is based on the least-squares method with cubic convolution interpolation (CCI) function. Within the parametric CSI scheme, it is difficult to determine the optimal parameter for various

target images. In this paper, a novel method involving the concept of opportunity costs is proposed to

identify the most suitable parameter for the CCI function needed in the CSI scheme. It is shown that such

an optimal four-point CCI function in conjunction with the least-squares method can achieve a better performance with the same arithmetic operations in comparison with the existing CSI algorithm. In

addition, experimental results show that the optimal six-point CSI scheme together with cross-zonal filter

is superior in performance to the optimal four-point CSI scheme without increasing the computational complexity.

ETPL

DIP-113

Generation of All-in-Focus Images by Noise-Robust Selective Fusion of Limited Depth-

of-Field Images

Abstract: The limited depth-of-field of some cameras prevents them from capturing perfectly focused images when the imaged scene covers a large distance range. In order to compensate for this problem,

image fusion has been exploited for combining images captured with different camera settings, thus

yielding a higher quality all-in-focus image. Since most current approaches for image fusion rely on

maximizing the spatial frequency of the composed image, the fusion process is sensitive to noise. In this paper, a new algorithm for computing the all-in-focus image from a sequence of images captured with a

low depth-of-field camera is presented. The proposed approach adaptively fuses the different frames of

the focus sequence in order to reduce noise while preserving image features. The algorithm consists of three stages: 1) focus measure; 2) selectivity measure; 3) and image fusion. An extensive set of




experimental tests has been carried out in order to compare the proposed algorithm with state-of-the-art

all-in-focus methods using both synthetic and real sequences. The obtained results show the advantages

of the proposed scheme even for high levels of noise.

ETPL

DIP-114

Missing Texture Reconstruction Method Based on Error Reduction Algorithm Using

Fourier Transform Magnitude Estimation Scheme

Abstract: A missing texture reconstruction method based on an error reduction (ER) algorithm, including

a novel estimation scheme of Fourier transform magnitudes is presented in this brief. In our method,

Fourier transform magnitude is estimated for a target patch including missing areas, and the missing intensities are estimated by retrieving its phase based on the ER algorithm. Specifically, by monitoring

errors converged in the ER algorithm, known patches whose Fourier transform magnitudes are similar to

that of the target patch are selected from the target image. In the second approach, the Fourier transform magnitude of the target patch is estimated from those of the selected known patches and their

corresponding errors. Consequently, by using the ER algorithm, we can estimate both the Fourier

transform magnitudes and phases to reconstruct the missing areas

ETPL

DIP-115 A Robust Fuzzy Local Information C-Means Clustering Algorithm

Abstract: In a recent paper, Krinidis and Chatzis proposed a variation of fuzzy c-means algorithm for

image clustering. The local spatial and gray-level information are incorporated in a fuzzy way through an

energy function. The local minimizers of the designed energy function to obtain the fuzzy membership of each pixel and cluster centers are proposed. In this paper, it is shown that the local minimizers of Krinidis

and Chatzis to obtain the fuzzy membership and the cluster centers in an iterative manner are not

exclusively solutions for true local minimizers of their designed energy function. Thus, the local minimizers of Krinidis and Chatzis do not converge to the correct local minima of the designed energy

function not because of tackling to the local minima, but because of the design of energy function.

ETPL

DIP-116

Nonlinearity Detection in Hyperspectral Images Using a Polynomial Post-Nonlinear

Mixing Model

Abstract: This paper studies a nonlinear mixing model for hyperspectral image unmixing and nonlinearity detection. The proposed model assumes that the pixel reflectances are nonlinear functions of pure spectral

components contaminated by an additive white Gaussian noise. These nonlinear functions are

approximated by polynomials leading to a polynomial post-nonlinear mixing model. We have shown in a previous paper that the parameters involved in the resulting model can be estimated using least squares

methods. A generalized likelihood ratio test based on the estimator of the nonlinearity parameter is

proposed to decide whether a pixel of the image results from the commonly used linear mixing model or from a more general nonlinear mixing model. To compute the test statistic associated with the

nonlinearity detection , we propose to approximate the variance of the estimated nonlinearity parameter by

its constrained Cramer -Rao bound. The performance of the detection strategy is evaluated via simulations

conducted on synthetic and real data. More precisely, synthetic data have been generated according to the standard linear mixing model and three nonlinear models from the literature. The real data investigated in

this study are extracted from the Cuprite image, which shows that some minerals seem to be nonlinearly

mixed in this image. Finally, it is interesting to note that the estimated abundance maps obtained with the post-nonlinear mixing model are in good agreement with results obtained in previous studies.

ETPL

DIP-117 Wavelet Bayesian Network Image Denoising

Abstract: From the perspective of the Bayesian approach, the denoising problem is essentially a prior

probability modeling and estimation task. In this paper, we propose an approach that exploits a hidden Bayesian network, constructed from wavelet coefficients, to model the prior probability of the original

image. Then, we use the belief propagation (BP) algorithm, which estimates a coefficient based on all the




coefficients of an image, as the maximum-a-posterior (MAP) estimator to derive the denoised wavelet

coefficients. We show that if the network is a spanning tree, the standard BP algorithm can perform MAP

estimation efficiently. Our experiment results demonstrate that, in terms of the peak-signal-to-noise-ratio and perceptual quality, the proposed approach outperforms state-of-the-art algorithms on several images,

particularly in the textured regions, with various amounts of white Gaussian noise.

ETPL

DIP-118 Acceleration of the Shiftable Algorithm for Bilateral Filtering and Nonlocal

Means

Abstract: A direct implementation of the bilateral filter requires O(σs2) operations per pixel, where σs is

the (effective) width of the spatial kernel. A fast implementation of the bilateral filter that required O(1) operations per pixel with respect to σs was recently proposed. This was done by using trigonometric

functions for the range kernel of the bilateral filter, and by exploiting their so-called shiftability property.

In particular, a fast implementation of the Gaussian bilateral filter was realized by approximating the

Gaussian range kernel using raised cosines. Later, it was demonstrated that this idea could be extended to a larger class of filters, including the popular non-local means filter. As already observed, a flip side of

this approach was that the run time depended on the width σr of the range kernel. For an image with

dynamic range [0,T], the run time scaled as O(T2/σr

2) with σr. This made it difficult to implement narrow

range kernels, particularly for images with large dynamic range. In this paper, we discuss this problem,

and propose some simple steps to accelerate the implementation, in general, and for small σr in particular.

We provide some experimental results to demonstrate the acceleration that is achieved using these modifications.

ETPL

DIP-119

Determining the Intrinsic Dimension of a Hyperspectral Image Using Random Matrix

Theory

Abstract: Determining the intrinsic dimension of a hyperspectral image is an important step in the spectral

unmixing process and under- or overestimation of this number may lead to incorrect unmixing in

unsupervised methods. In this paper, we discuss a new method for determining the intrinsic dimension using recent advances in random matrix theory. This method is entirely unsupervised, free from any user-

determined parameters and allows spectrally correlated noise in the data. Robustness tests are run on

synthetic data, to determine how the results were affected by noise levels, noise variability, noise

approximation, and spectral characteristics of the end-members. Success rates are determined for many different synthetic images, and the method is tested on two pairs of real images, namely a Cuprite scene

taken from Airborne Visible InfraRed Imaging Spectrometer (AVIRIS) and SpecTIR sensors, and a

Lunar Lakes scene taken from AVIRIS and Hyperion, with good results.

ETPL

DIP-120 Learning Smooth Pattern Transformation Manifolds

Abstract: Manifold models provide low-dimensional representations that are useful for processing and

analyzing data in a transformation-invariant way. In this paper, we study the problem of learning smooth pattern transformation manifolds from image sets that represent observations of geometrically

transformed signals. To construct a manifold, we build a representative pattern whose transformations

accurately fit various input images. We examine two objectives of the manifold-building problem,

namely, approximation and classification. For the approximation problem, we propose a greedy method that constructs a representative pattern by selecting analytic atoms from a continuous dictionary manifold.

We present a dc optimization scheme that is applicable to a wide range of transformation and dictionary

models, and demonstrate its application to the transformation manifolds generated by the rotation, translation, and anisotropic scaling of a reference pattern. Then, we generalize this approach to a setting

with multiple transformation manifolds, where each manifold represents a different class of signals. We




present an iterative multiple-manifold-building algorithm such that the classification accuracy is

promoted in the learning of the representative patterns. The experimental results suggest that the proposed

methods yield high accuracy in the approximation and classification of data compared with some reference methods, while the invariance to geometric transformations is achieved because of the

transformation manifold model.

ETPL

DIP-121

Modifying JPEG Binary Arithmetic Codec for Exploiting Inter/Intra-Block and DCT

Coefficient Sign Redundancies

Abstract: This article presents four modifications to the JPEG arithmetic coding (JAC) algorithm, a topic not studied well before. It then compares the compression performance of the modified JPEG with JPEG

XR, the latest block-based image coding standard. We first show that the bulk of inter/intra-block

redundancy, caused due to the use of the block-based approach by JPEG, can be captured by applying efficient prediction coding. We propose the following modifications to JAC to take advantages of our

prediction approach. 1) We code a totally different DC difference. 2) JAC tests a DCT coefficient by

considering its bits in the increasing order of significance for coding the most significant bit position. It causes plenty of redundancy because JAC always begins with the zeroth bit. We modify this coding order

and propose alternations to the JPEG coding procedures. 3) We predict the sign of significant DCT

coefficients, a problem is not addressed from the perspective of the JPEG decoder before. 4) We reduce

the number of binary tests that JAC codes to mark end-of-block. We provide experimental results for two sets of eight-bit gray images. The first set consists of nine classical test images mostly of size 512

× 512 pixels. The second set consists of 13 images of size

2000 × 3000 pixels or more. Our modifications to JAC obtain extra-ordinary amount of code reduction without adding any kind of losses. More specifically, when we quantize the images

using the default quantizers, our modifications reduce the total JAC code size of the images of these two

sets by about 8.9 and 10.6%, and the JPEG Huffman code size by about 16.3 and 23.4%, respectively, on the average. Gains are even higher for coarsely quantized images. Finally, we compare the modified JAC

with two settings of JPEG XR, one with no block overlapping and the other with the default transform

(we denote them by JXR0 and JXR1, respectively). Our results show- that for the finest quality rate

image coding, the modified JAC compresses the large set images by about 5.8% more than JXR1 and by 6.7% more than JXR0, on the average. We provide some rate-distortion plots on lossy coding, which

show that the modified JAC distinctly outperforms JXR0, but JXR1 beats us by about a similar margin.

ETPL

DIP-122 Motion Estimation Without Integer-Pel Search

Abstract: The typical motion estimation (ME) consists of three main steps, including spatial-temporal

prediction, integer-pel search, and fractional-pel search. The integer-pel search, which seeks the best

matched integer-pel position within a search window, is considered to be crucial for video encoding. It

occupies over 50% of the overall encoding time (when adopting the full search scheme) for software encoders, and introduces remarkable area cost, memory traffic, and power consumption to hardware

encoders. In this paper, we find that video sequences (especially high-resolution videos) can often be

encoded effectively and efficiently even without integer-pel search. Such counter-intuitive phenomenon is not only because that spatial-temporal prediction and fractional-pel search are accurate enough for the

ME of many blocks. In fact, we observe that when the predicted motion vector is biased from the optimal

motion vector (mainly for boundary blocks of irregularly moving objects), it is also hard for integer-pel search to reduce the final rate-distortion cost: the deviation of reference position could be alleviated with

the fractional-pel interpolation and rate-distortion optimization techniques (e.g., adaptive macroblock

mode). Considering the decreasing proportion of boundary blocks caused by the increasing resolution of

videos, integer-pel search may be rather cost-ineffective in the era of high-resolution. Experimental results on 36 typical sequences of different resolutions encoded with x264, which is a widely-used video




encoder, comply with our analysis well. For 1080p sequences, removing the integer-pel search saves

57.9% of the overall H.264 encoding time on average (compared to the original x264 with full integer-pel

search using default parameters), while the resultant performance loss is negligible: the bit-rate is increased by only 0.18%, while the peak signal-to-noise ratio is decreased by only 0.01 dB per frame

averagely.

ETPL

DIP-123 A Protocol for Evaluating Video Trackers Under Real-World Conditions

Abstract: The absence of a commonly adopted performance evaluation framework is hampering advances in the design of effective video trackers. In this paper, we present a single-score evaluation measure and a

protocol to objectively compare trackers. The proposed measure evaluates tracking accuracy and failure,

and combines them for both summative and formative performance assessment. The proposed protocol is composed of a set of trials that evaluate the robustness of trackers on a range of test scenarios

representing several real-world conditions. The protocol is validated on a set of sequences with a

diversity of targets (head, vehicle and person) and challenges (occlusions, background clutter, pose changes and scale changes) using six state-of-the-art trackers, highlighting their strengths and weaknesses

on more than 187000 frames. The software implementing the protocol and the evaluation results are made

available online and new results can be included, thus facilitating the comparison of trackers.

ETPL

DIP-124 Blur and Illumination Robust Face Recognition via Set-Theoretic Characterization

Abstract: We address the problem of unconstrained face recognition from remotely acquired images. The

main factors that make this problem challenging are image degradation due to blur, and appearance

variations due to illumination and pose. In this paper, we address the problems of blur and illumination. We show that the set of all images obtained by blurring a given image forms a convex set. Based on this

set-theoretic characterization, we propose a blur-robust algorithm whose main step involves solving

simple convex optimization problems. We do not assume any parametric form for the blur kernels,

however, if this information is available it can be easily incorporated into our algorithm. Furthermore, using the low-dimensional model for illumination variations, we show that the set of all images obtained

from a face image by blurring it and by changing the illumination conditions forms a bi-convex set.

Based on this characterization, we propose a blur and illumination-robust algorithm. Our experiments on a challenging real dataset obtained in uncontrolled settings illustrate the importance of jointly modeling

blur and illumination.

ETPL

DIP-125 Improved Bounds for Subband-Adaptive Iterative Shrinkage/Thresholding Algorithms

Abstract: This paper presents new methods for computing the step sizes of the subband-adaptive iterative

shrinkage-thresholding algorithms proposed by Bayram & Selesnick and Vonesch & Unser. The method

yields tighter wavelet-domain bounds of the system matrix, thus leading to improved convergence speeds. It is directly applicable to non-redundant wavelet bases, and we also adapt it for cases of redundant

frames. It turns out that the simplest and most intuitive setting for the step sizes that ignores subband

aliasing is often satisfactory in practice. We show that our methods can be used to advantage with reweighted least squares penalty functions as well as L1 penalties. We emphasize that the algorithms

presented here are suitable for performing inverse filtering on very large datasets, including 3D data,

since inversions are applied only to diagonal matrices and fast transforms are used to achieve all matrix-

vector products.

ETPL

DIP-126

Sparse Representation Based Image Interpolation With Nonlocal Autoregressive

Modeling

Abstract: Sparse representation is proven to be a promising approach to image super-resolution, where the




low-resolution (LR) image is usually modeled as the down-sampled version of its high-resolution (HR)

counterpart after blurring. When the blurring kernel is the Dirac delta function, i.e., the LR image is

directly down-sampled from its HR counterpart without blurring, the super-resolution problem becomes an image interpolation problem. In such cases, however, the conventional sparse representation models

(SRM) become less effective, because the data fidelity term fails to constrain the image local structures.

In natural images, fortunately, many nonlocal similar patches to a given patch could provide nonlocal constraint to the local structure. In this paper, we incorporate the image nonlocal self-similarity into SRM

for image interpolation. More specifically, a nonlocal autoregressive model (NARM) is proposed and

taken as the data fidelity term in SRM. We show that the NARM-induced sampling matrix is less

coherent with the representation dictionary, and consequently makes SRM more effective for image interpolation. Our extensive experimental results demonstrate that the proposed NARM-based image

interpolation method can effectively reconstruct the edge structures and suppress the jaggy/ringing

artifacts, achieving the best image interpolation results so far in terms of PSNR as well as perceptual quality metrics such as SSIM and FSIM.

ETPL

DIP-127

View-Based Discriminative Probabilistic Modeling for 3D Object Retrieval and

Recognition

Abstract: In view-based 3D object retrieval and recognition, each object is described by multiple views. A

central problem is how to estimate the distance between two objects. Most conventional methods integrate the distances of view pairs across two objects as an estimation of their distance. In this paper,

we propose a discriminative probabilistic object modeling approach. It builds probabilistic models for

each object based on the distribution of its views, and the distance between two objects is defined as the upper bound of the Kullback-Leibler divergence of the corresponding probabilistic models. 3D object

retrieval and recognition is accomplished based on the distance measures. We first learn models for each

object by the adaptation from a set of global models with a maximum likelihood principle. A further adaption step is then performed to enhance the discriminative ability of the models. We conduct

experiments on the ETH 3D object dataset, the National Taiwan University 3D model dataset, and the

Princeton Shape Benchmark. We compare our approach with different methods, and experimental results

demonstrate the superiority of our approach.

ETPL

DIP-128 Robust Document Image Binarization Technique for Degraded Document Images

Abstract: Segmentation of text from badly degraded document images is a very challenging task due to

the high inter/intra-variation between the document background and the foreground text of different document images. In this paper, we propose a novel document image binarization technique that

addresses these issues by using adaptive image contrast. The adaptive image contrast is a combination of

the local image contrast and the local image gradient that is tolerant to text and background variation

caused by different types of document degradations. In the proposed technique, an adaptive contrast map is first constructed for an input degraded document image. The contrast map is then binarized and

combined with Canny's edge map to identify the text stroke edge pixels. The document text is further

segmented by a local threshold that is estimated based on the intensities of detected text stroke edge pixels within a local window. The proposed method is simple, robust, and involves minimum parameter

tuning. It has been tested on three public datasets that are used in the recent document image binarization

contest (DIBCO) 2009 & 2011 and handwritten-DIBCO 2010 and achieves accuracies of 93.5%, 87.8%, and 92.03%, respectively, that are significantly higher than or close to that of the best-performing

methods reported in the three contests. Experiments on the Bickley diary dataset that consists of several

challenging bad quality document images also show the superior performance of our proposed method,

compared with other techniques.

ETPL Perceptual Video Coding Based on SSIM-Inspired Divisive Normalization




DIP-129

Abstract: We propose a perceptual video coding framework based on the divisive normalization scheme, which is found to be an effective approach to model the perceptual sensitivity of biological vision, but has

not been fully exploited in the context of video coding. At the macroblock (MB) level, we derive the

normalization factors based on the structural similarity (SSIM) index as an attempt to transform the

discrete cosine transform domain frame residuals to a perceptually uniform space. We further develop an MB level perceptual mode selection scheme and a frame level global quantization matrix optimization

method. Extensive simulations and subjective tests verify that, compared with the H.264/AVC video

coding standard, the proposed method can achieve significant gain in terms of rate-SSIM performance and provide better visual quality.

ETPL

DIP-130 Hyperspectral Image Representation and Processing With Binary Partition Trees

Abstract: The optimal exploitation of the information provided by hyperspectral images requires the development of advanced image-processing tools. This paper proposes the construction and the

processing of a new region-based hierarchical hyperspectral image representation relying on the binary

partition tree (BPT). This hierarchical region-based representation can be interpreted as a set of

hierarchical regions stored in a tree structure. Hence, the BPT succeeds in presenting: 1) the decomposition of the image in terms of coherent regions, and 2) the inclusion relations of the regions in

the scene. Based on region-merging techniques, the BPT construction is investigated by studying the

hyperspectral region models and the associated similarity metrics. Once the BPT is constructed, the fixed tree structure allows implementing efficient and advanced application-dependent techniques on it. The

application-dependent processing of BPT is generally implemented through a specific pruning of the tree.

In this paper, a pruning strategy is proposed and discussed in a classification context. Experimental

results on various hyperspectral data sets demonstrate the interest and the good performances of the BPT representation.

ETPL

DIP-131 Visually Weighted Compressive Sensing: Measurement and Reconstruction

Abstract: Compressive sensing (CS) makes it possible to more naturally create compact representations of data with respect to a desired data rate. Through wavelet decomposition, smooth and piecewise smooth

signals can be represented as sparse and compressible coefficients. These coefficients can then be

effectively compressed via the CS. Since a wavelet transform divides image information into layered

blockwise wavelet coefficients over spatial and frequency domains, visual improvement can be attained by an appropriate perceptually weighted CS scheme. We introduce such a method in this paper and

compare it with the conventional CS. The resulting visual CS model is shown to deliver improved visual

reconstructions.

ETPL

DIP-132 Context-Aware Sparse Decomposition for Image Denoising and Super-Resolution

Abstract: Image prior models based on sparse and redundant representations are attracting more and more

attention in the field of image restoration. The conventional sparsity-based methods enforce sparsity prior on small image patches independently. Unfortunately, these works neglected the contextual information

between sparse representations of neighboring image patches. It limits the modeling capability of

sparsity-based image prior, especially when the major structural information of the source image is lost in

the following serious degradation process. In this paper, we utilize the contextual information of local patches (denoted as context-aware sparsity prior) to enhance the performance of sparsity-based

restoration method. In addition, a unified framework based on the Markov random fields model is

proposed to tune the local prior into a global one to deal with arbitrary size images. An iterative numerical solution is presented to solve the joint problem of model parameters estimation and sparse




recovery. Finally, the experimental results on image denoising and super-resolution demonstrate the

effectiveness and robustness of the proposed context-aware method.

ETPL

DIP-133 How to SAIF-ly Boost Denoising Performance

Abstract: Spatial domain image filters (e.g., bilateral filter, non-local means, locally adaptive regression

kernel) have achieved great success in denoising. Their overall performance, however, has not generally

surpassed the leading transform domain-based filters (such as BM3-D). One important reason is that

spatial domain filters lack efficiency to adaptively fine tune their denoising strength; something that is relatively easy to do in transform domain method with shrinkage operators. In the pixel domain, the

smoothing strength is usually controlled globally by, for example, tuning a regularization parameter. In

this paper, we propose spatially adaptive iterative filtering (SAIF) a new strategy to control the denoising strength locally for any spatial domain method. This approach is capable of filtering local image content

iteratively using the given base filter, and the type of iteration and the iteration number are automatically

optimized with respect to estimated risk (i.e., mean-squared error). In exploiting the estimated local signal-to-noise-ratio, we also present a new risk estimator that is different from the often-employed

SURE method, and exceeds its performance in many cases. Experiments illustrate that our strategy can

significantly relax the base algorithm's sensitivity to its tuning (smoothing) parameters, and effectively

boost the performance of several existing denoising filters to generate state-of-the-art results under both simulated and practical conditions.

ETPL

DIP-134 Frozen-State Hierarchical Annealing

Abstract: There is significant interest in the synthesis of discrete-state random fields, particularly those possessing structure over a wide range of scales. However, given a model on some finest, pixellated

scale, it is computationally very difficult to synthesize both large- and small-scale structures, motivating

research into hierarchical methods. In this paper, we propose a frozen-state approach to hierarchical

modeling, in which simulated annealing is performed on each scale, constrained by the state estimates at the parent scale. This approach leads to significant advantages in both modeling flexibility and

computational complexity. In particular, a complex structure can be realized with very simple, local,

scale-dependent models, and by constraining the domain to be annealed at finer scales to only the uncertain portions of coarser scales; the approach leads to huge improvements in computational

complexity. Results are shown for a synthesis problem in porous media.

ETPL

DIP-135

Per-Colorant-Channel Color Barcodes for Mobile Applications: An Interference

Cancellation Framework

Abstract: We propose a color barcode framework for mobile phone applications by exploiting the spectral

diversity afforded by the cyan (C), magenta (M), and yellow (Y) print colorant channels commonly used

for color printing and the complementary red (R), green (G), and blue (B) channels, respectively, used for

capturing color images. Specifically, we exploit this spectral diversity to realize a three-fold increase in the data rate by encoding independent data in the C, M, and Y print colorant channels and decoding the

data from the complementary R, G, and B channels captured via a mobile phone camera. To mitigate the

effect of cross-channel interference among the print-colorant and capture color channels, we develop an algorithm for interference cancellation based on a physically-motivated mathematical model for the print

and capture processes. To estimate the model parameters required for cross-channel interference

cancellation, we propose two alternative methodologies: a pilot block approach that uses suitable

selections of colors for the synchronization blocks and an expectation maximization approach that estimates the parameters from regions encoding the data itself. We evaluate the performance of the

proposed framework using specific implementations of the framework for two of the most commonly

used barcodes in mobile applications, QR and Aztec codes. Experimental results show that the proposed




framework successfully overcomes the impact of the color interference, providing a low bit error rate and

a high decoding rate for each of the colorant channels when used with a corresponding error correction

scheme.

ETPL

DIP-136 Segmented Gray-Code Kernels for Fast Pattern Matching

Abstract: The gray-code kernels (GCK) family, which has Walsh Hadamard transform on sliding

windows as a member, is a family of kernels that can perform image analysis efficiently using a fast

algorithm, such as the GCK algorithm. The GCK has been successfully used for pattern matching. In this paper, we propose that the G4-GCK algorithm is more efficient than the previous algorithm in computing

GCK. The G4-GCK algorithm requires four additions per pixel for three basis vectors independent of

transform size and dimension. Based on the G4-GCK algorithm, we then propose the segmented GCK. By segmenting input data into Ls parts, the SegGCK requires only four additions per pixel for 3Ls basis

vectors. Experimental results show that the proposed algorithm can significantly accelerate the full-search

equivalent pattern matching process and outperforms state-of-the-art methods.

ETPL

DIP-137 Video Processing for Human Perceptual Visual Quality-Oriented Video Coding

Abstract: We have developed a video processing method that achieves human perceptual visual quality-

oriented video coding. The patterns of moving objects are modeled by considering the limited human

capacity for spatial-temporal resolution and the visual sensory memory together, and an online moving pattern classifier is devised by using the Hedge algorithm. The moving pattern classifier is embedded in

the existing visual saliency with the purpose of providing a human perceptual video quality saliency

model. In order to apply the developed saliency model to video coding, the conventional foveation filtering method is extended. The proposed foveation filter can smooth and enhance the video signals

locally, in conformance with the developed saliency model, without causing any artifacts. The

performance evaluation results confirm that the proposed video processing method shows reliable

improvements in the perceptual quality for various sequences and at various bandwidths, compared to existing saliency-based video coding methods.

ETPL

DIP-138 Additive Log-Logistic Model for Networked Video Quality Assessment

Abstract: Modeling subjective opinions on visual quality is a challenging problem, which closely relates to many factors of the human perception. In this paper, the additive log-logistic model (ALM) is proposed

to formulate such a multidimensional nonlinear problem. The log-logistic model has flexible monotonic

or nonmonotonic partial derivatives and thus is suitable to model various uni-type impairments. The proposed ALM metric adds the distortions due to each type of impairment in a log-logistic transformed

space of subjective opinions. The features can be evaluated and selected by classic statistical inference,

and the model parameters can be easily estimated. Cross validations on five Telecommunication

Standardization Sector of International Telecommunication Union (ITU-T) subjectively-rated databases confirm that: 1) based on the same features, the ALM outper-forms the support vector regression and the

logistic model in quality prediction and, 2) the resultant no-reference quality met-ric based on

impairment-relevant video parameters achieves high correlation with a total of 27 216 subjective opinions on 1134 video clips, even compared with existing full-reference quality metrics based on pixel

differences. The ALM metric wins the model competition of the ITU-T Study Group 12 (where the

validation databases are independent with the training databases) and thus is being put forth into ITU-T

Recommendation P.1202.2 for the consent of ITU-T.

ETPL

DIP-139

Linear Feature Separation From Topographic Maps Using Energy Density and the

Shear Transform

Abstract: Linear features are difficult to be separated from complicated background in color scanned




topographic maps, especially when the color of linear features approximate to that of background in some

particular images. This paper presents a method, which is based on energy density and the shear

transform, for the separation of lines from background. First, the shear transform, which could add the directional characteristics of the lines, is introduced to overcome the disadvantage that linear information

loss would happen if the separation method is used in an image, which is in only one direction. Then

templates in the horizontal and vertical directions are built to separate lines from background on account of the fact that the energy concentration of the lines usually reaches a higher level than that of the

background in the negtive image. Furthermore, the remaining grid background can be wiped off by grid

templates matching. The isolated patches, which include only one pixel or less than ten pixels, are

removed according to the connected region area measurement. Finally, using the union operation, the linear features obtained in different sheared images could supplement each other, thus the lines of the

final result are more complete. The basic property of this method is introducing the energy density instead

of color information commonly used in traditional methods. The experiment results indicate that the proposed method could distinguish the linear features from the background more effectively, and obtain

good results for its ability in changing the directions of the lines with the shear transform.

ETPL

DIP-140

De-Interlacing Using Nonlocal Costs and Markov-Chain-Based Estimation of

Interpolation Methods

Abstract: A new method of de-interlacing is proposed. De-interlacing is revisited as the problem of assigning a sequence of interpolation methods (interpolators) to a sequence of missing pixels of an

interlaced frame (field). With this assumption, our de-interlacing algorithm (de-interlacer), undergoes

transitions from one interpolation method to another, as it moves from one missing pixel position to the horizontally adjacent missing pixel position in a missing row of a field. We assume a discrete countable-

state Markov-chain model on the sequence of interpolators (Markov-chain states) which are selected from

a user-defined set of candidate interpolators. An estimation of the optimum sequence of interpolators with the aforementioned Markov-chain model requires the definition of an efficient cost function as well as a

global optimization technique. Our algorithm introduces for the first time using a nonlocal cost (NLC)

scheme. The proposed algorithm uses the NLC to not only measure the fitness of an interpolator at a

missing pixel position, but also to derive an approximation for transition matrix (TM) of the Markov-chain of interpolators. The TM in our algorithm is a frame-variate matrix, i.e., the algorithm updates the

TM for each frame automatically. The algorithm finally uses a Viterbi algorithm to find the global

optimum sequence of interpolators given the cost function defined and neighboring original pixels in hand. Next, we introduce a new MAP-based formulation for the estimation of the sequence of

interpolators this time not by estimating the best sequence of interpolators but by successive estimations

of the best interpolator at each missing pixel using Forward-Backward algorithm. Simulation results

prove that, while competitive with each other on different test sequences, the proposed methods (one using Viterbi and the other Forward-Backward algorithm) are superior to state-of-the-art de-interlacing

algorithms proposed recently. Finally, we propose motion compensa- ed versions of our algorithm based

on optical flow computation methods and discuss how it can improve the proposed algorithm.

ETPL

DIP-141 Pose-Invariant Face Recognition Using Markov Random Fields

Abstract: One of the key challenges for current face recognition techniques is how to handle pose

variations between the probe and gallery face images. In this paper, we present a method for reconstructing the virtual frontal view from a given nonfrontal face image using Markov random fields

(MRFs) and an efficient variant of the belief propagation algorithm. In the proposed approach, the input

face image is divided into a grid of overlapping patches, and a globally optimal set of local warps is

estimated to synthesize the patches at the frontal view. A set of possible warps for each patch is obtained by aligning it with images from a training database of frontal faces. The alignments are performed




efficiently in the Fourier domain using an extension of the Lucas-Kanade algorithm that can handle

illumination variations. The problem of finding the optimal warps is then formulated as a discrete

labeling problem using an MRF. The reconstructed frontal face image can then be used with any face recognition technique. The two main advantages of our method are that it does not require manually

selected facial landmarks or head pose estimation. In order to improve the performance of our pose

normalization method in face recognition, we also present an algorithm for classifying whether a given face image is at a frontal or nonfrontal pose. Experimental results on different datasets are presented to

demonstrate the effectiveness of the proposed approach.

ETPL

DIP-142 Objective-Guided Image Annotation

Abstract: Automatic image annotation, which is usually formulated as a multi-label classification

problem, is one of the major tools used to enhance the semantic understanding of web images. Many

multimedia applications (e.g., tag-based image retrieval) can greatly benefit from image annotation. However, the insufficient performance of image annotation methods prevents these applications from

being practical. On the other hand, specific measures are usually designed to evaluate how well one

annotation method performs for a specific objective or application, but most image annotation methods do not consider optimization of these measures, so that they are inevitably trapped into suboptimal

performance of these objective-specific measures. To address this issue, we first summarize a variety of

objective-guided performance measures under a unified representation. Our analysis reveals that macro-

averaging measures are very sensitive to infrequent keywords, and hamming measure is easily affected by skewed distributions. We then propose a unified multi-label learning framework, which directly

optimizes a variety of objective-specific measures of multi-label learning tasks. Specifically, we first

present a multilayer hierarchical structure of learning hypotheses for multi-label problems based on which a variety of loss functions with respect to objective-guided measures are defined. And then, we formulate

these loss functions as relaxed surrogate functions and optimize them by structural SVMs. According to

the analysis of various measures and the high time complexity of optimizing micro-averaging measures,

in this paper, we focus on example-based measures that are tailor-made for image annotation tasks but are seldom explored in the literature. Experiments show consistency with the formal analysis on two widely

used multi-label datasets, and demonstrate the superior performance of our proposed method over state-

of-the-art baseline methods in terms of example-based measures on four - mage annotation datasets.

ETPL

DIP-143 Multiview Coding Mode Decision With Hybrid Optimal Stopping Model

Abstract: In a generic decision process, optimal stopping theory aims to achieve a good tradeoff between

decision performance and time consumed, with the advantages of theoretical decision-making and predictable decision performance. In this paper, optimal stopping theory is employed to develop an

effective hybrid model for the mode decision problem, which aims to theoretically achieve a good

tradeoff between the two interrelated measurements in mode decision, as computational complexity

reduction and rate-distortion degradation. The proposed hybrid model is implemented and examined with a multiview encoder. To support the model and further promote coding performance, the multiview

coding mode characteristics, including predicted mode probability and estimated coding time, are jointly

investigated with inter-view correlations. Exhaustive experimental results with a wide range of video resolutions reveal the efficiency and robustness of our method, with high decision accuracy, negligible

computational overhead, and almost intact rate-distortion performance compared to the original encoder.

ETPL

DIP-144 Joint Framework for Motion Validity and Estimation Using Block Overlap

Abstract: This paper presents a block-overlap-based validity metric for use as a measure of motion vector




(MV) validity and to improve the quality of the motion field. In contrast to other validity metrics in the

literature, the proposed metric is not sensitive to image features and does not require the use of

neighboring MVs or manual thresholds. Using a hybrid de-interlacer, it is shown that the proposed metric outperforms other block-based validity metrics in the literature. To help regularize the ill-posed nature of

motion estimation, the proposed validity metric is also used as a regularizer in an energy minimization

framework to determine the optimal MV. Experimental results show that the proposed energy minimization framework outperforms several existing motion estimation methods in the literature in

terms of MV and interpolation quality. For interpolation quality, our algorithm outperforms all other

block-based methods as well as several complex optical flow methods. In addition, it is one of the fastest

implementations at the time of this writing.

ETPL

DIP-145 Nonlocally Centralized Sparse Representation for Image Restoration

Abstract: Sparse representation models code an image patch as a linear combination of a few atoms

chosen out from an over-complete dictionary, and they have shown promising results in various image restoration applications. However, due to the degradation of the observed image (e.g., noisy, blurred,

and/or down-sampled), the sparse representations by conventional models may not be accurate enough

for a faithful reconstruction of the original image. To improve the performance of sparse representation-

based image restoration, in this paper the concept of sparse coding noise is introduced, and the goal of image restoration turns to how to suppress the sparse coding noise. To this end, we exploit the image

nonlocal self-similarity to obtain good estimates of the sparse coding coefficients of the original image,

and then centralize the sparse coding coefficients of the observed image to those estimates. The so-called nonlocally centralized sparse representation (NCSR) model is as simple as the standard sparse

representation model, while our extensive experiments on various types of image restoration problems,

including denoising, deblurring and super-resolution, validate the generality and state-of-the-art performance of the proposed NCSR algorithm.

ETPL

DIP-146 Image Segmentation Using a Sparse Coding Model of Cortical Area V1

Abstract: Algorithms that encode images using a sparse set of basis functions have previously been

shown to explain aspects of the physiology of a primary visual cortex (V1), and have been used for applications, such as image compression, restoration, and classification. Here, a sparse coding algorithm,

that has previously been used to account for the response properties of orientation tuned cells in primary

visual cortex, is applied to the task of perceptually salient boundary detection. The proposed algorithm is currently limited to using only intensity information at a single scale. However, it is shown to out-

perform the current state-of-the-art image segmentation method (Pb) when this method is also restricted

to using the same information.

ETPL

DIP-147 Circular Reranking for Visual Search

Abstract: Search reranking is regarded as a common way to boost retrieval precision. The problem

nevertheless is not trivial especially when there are multiple features or modalities to be considered for

search, which often happens in image and video retrieval. This paper proposes a new reranking algorithm, named circular reranking, that reinforces the mutual exchange of information across multiple modalities

for improving search performance, following the philosophy that strong performing modality could learn

from weaker ones, while weak modality does benefit from interacting with stronger ones. Technically,

circular reranking conducts multiple runs of random walks through exchanging the ranking scores among different features in a cyclic manner. Unlike the existing techniques, the reranking procedure encourages

interaction among modalities to seek a consensus that are useful for reranking. In this paper, we study

several properties of circular reranking, including how and which order of information propagation




should be configured to fully exploit the potential of modalities for reranking. Encouraging results are

reported for both image and video retrieval on Microsoft Research Asia Multimedia image dataset and

TREC Video Retrieval Evaluation 2007-2008 datasets, respectively.

ETPL

DIP-148

On Random Field Completely Automated Public Turing Test to Tell Computers and

Humans Apart Generation

Abstract: Herein, we propose generating CAPTCHAs through random field simulation and give a novel,

effective and efficient algorithm to do so. Indeed, we demonstrate that sufficient information about word

tests for easy human recognition is contained in the site marginal probabilities and the site-to-nearby-site covariances and that these quantities can be embedded directly into certain conditional probabilities,

designed for effective simulation. The CAPTCHAs are then partial random realizations of the random

CAPTCHA word. We start with an initial random field (e.g., randomly scattered letter pieces) and use Gibbs resampling to re-simulate portions of the field repeatedly using these conditional probabilities until

the word becomes human-readable. The residual randomness from the initial random field together with

the random implementation of the CAPTCHA word provide significant resistance to attack. This results in a CAPTCHA, which is unrecognizable to modern optical character recognition but is recognized about

95% of the time in a human readability study.

ETPL

DIP-149 Active Contours Driven by the Salient Edge Energy Model

Abstract: In this brief, we present a new indicator, i.e., salient edge energy, for guiding a given contour robustly and precisely toward the object boundary. Specifically, we define the salient edge energy by

exploiting the higher order statistics on the diffusion space, and incorporate it into a variational level set

formulation with the local region-based segmentation energy for solving the problem of curve evolution.

In contrast to most previous methods, the proposed salient edge energy allows the curve to find only significant local minima relevant to the object boundary even in the noisy and cluttered background.

Moreover, the segmentation performance derived from our new energy is less sensitive to the size of local

windows compared with other recently developed methods, owing to the ability of our energy function to suppress diverse clutters. The proposed method has been tested on various images, and experimental

results show that the salient edge energy effectively drives the active contour both qualitatively and

quantitatively compared to various state-of-the-art methods.

ETPL

DIP-150 Bayesian Saliency via Low and Mid Level Cues

Abstract: Visual saliency detection is a challenging problem in computer vision, but one of great

importance and numerous applications. In this paper, we propose a novel model for bottom-up saliency

within the Bayesian framework by exploiting low and mid level cues. In contrast to most existing methods that operate directly on low level cues, we propose an algorithm in which a coarse saliency

region is first obtained via a convex hull of interest points. We also analyze the saliency information with

mid level visual cues via superpixels. We present a Laplacian sparse subspace clustering method to group superpixels with local features, and analyze the results with respect to the coarse saliency region to

compute the prior saliency map. We use the low level visual cues based on the convex hull to compute

the observation likelihood, thereby facilitating inference of Bayesian saliency at each pixel. Extensive

experiments on a large data set show that our Bayesian saliency model performs favorably against the state-of-the-art algorithms.

ETPL

DIP-151 Exemplar-Based Image Inpainting Using Multiscale Graph Cuts




Abstract: We present a novel formulation of exemplar-based inpainting as a global energy optimization

problem, written in terms of the offset map. The proposed energy function combines a data attachment

term that ensures the continuity of reconstruction at the boundary of the inpainting domain with a smoothness term that ensures a visually coherent reconstruction inside the hole. This formulation is

adapted to obtain a global minimum using the graph cuts algorithm. To reduce the computational

complexity, we propose an efficient multiscale graph cuts algorithm. To compensate the loss of information at low resolution levels, we use a feature representation computed at the original image

resolution. This permits alleviation of the ambiguity induced by comparing only color information when

the image is represented at low resolution levels. Our experiments show how well the proposed algorithm

performs compared with other recent algorithms.

ETPL

DIP-152 Activity Recognition Using a Mixture of Vector Fields

Abstract: The analysis of moving objects in image sequences (video) has been one of the major themes in

computer vision. In this paper, we focus on video-surveillance tasks; more specifically, we consider pedestrian trajectories and propose modeling them through a small set of motion/vector fields together

with a space-varying switching mechanism. Despite the diversity of motion patterns that can occur in a

given scene, we show that it is often possible to find a relatively small number of typical behaviors, and

model each of these behaviors by a “simple” motion field. We increase the expressiveness of the formulation by allowing the trajectories to switch from one motion field to another, in a space-dependent

manner. We present an expectation-maximization algorithm to learn all the parameters of the model, and

apply it to trajectory classification tasks. Experiments with both synthetic and real data support the claims about the performance of the proposed approach.

ETPL

DIP-153 Low-Resolution Face Tracker Robust to Illumination Variations

Abstract: In many practical video surveillance applications, the faces acquired by outdoor cameras are of low resolution and are affected by uncontrolled illumination. Although significant efforts have been made

to facilitate face tracking or illumination normalization in unconstrained videos, the approaches

developed may not be effective in video surveillance applications. This is because: 1) a low-resolution face contains limited information, and 2) major changes in illumination on a small region of the face

make the tracking ineffective. To overcome this problem, this paper proposes to perform tracking in an

illumination-insensitive feature space, called the gradient logarithm field (GLF) feature space. The GLF feature mainly depends on the intrinsic characteristics of a face and is only marginally affected by the

lighting source. In addition, the GLF feature is a global feature and does not depend on a specific face

model, and thus is effective in tracking low-resolution faces. Experimental results show that the proposed

GLF-based tracker works well under significant illumination changes and outperforms many state-of-the-art tracking algorithms.

ETPL

DIP-154 Local Directional Number Pattern for Face Analysis: Face and Expression Recognition

Abstract: This paper proposes a novel local feature descriptor, local directional number pattern (LDN), for face analysis, i.e., face and expression recognition. LDN encodes the directional information of the

face's textures (i.e., the texture's structure) in a compact way, producing a more discriminative code than

current methods. We compute the structure of each micro-pattern with the aid of a compass mask that

extracts directional information, and we encode such information using the prominent direction indices (directional numbers) and sign-which allows us to distinguish among similar structural patterns that have

different intensity transitions. We divide the face into several regions, and extract the distribution of the

LDN features from them. Then, we concatenate these features into a feature vector, and we use it as a




face descriptor. We perform several experiments in which our descriptor performs consistently under

illumination, noise, expression, and time lapse variations. Moreover, we test our descriptor with different

masks to analyze its performance in different face analysis tasks.

ETPL

DIP-155 Regularized Robust Coding for Face Recognition

Abstract: Recently the sparse representation based classification (SRC) has been proposed for robust face

recognition (FR). In SRC, the testing image is coded as a sparse linear combination of the training

samples, and the representation fidelity is measured by the l2-norm or l1 -norm of the coding residual. Such a sparse coding model assumes that the coding residual follows Gaussian or Laplacian distribution,

which may not be effective enough to describe the coding residual in practical FR systems. Meanwhile,

the sparsity constraint on the coding coefficients makes the computational cost of SRC very high. In this paper, we propose a new face coding model, namely regularized robust coding (RRC), which could

robustly regress a given signal with regularized regression coefficients. By assuming that the coding

residual and the coding coefficient are respectively independent and identically distributed, the RRC seeks for a maximum a posterior solution of the coding problem. An iteratively reweighted regularized

robust coding (IR3C) algorithm is proposed to solve the RRC model efficiently. Extensive experiments on

representative face databases demonstrate that the RRC is much more effective and efficient than state-

of-the-art sparse representation based methods in dealing with face occlusion, corruption, lighting, and expression changes, etc.

ETPL

DIP-156 Exploration of Optimal Many-Core Models for Efficient Image Segmentation

Abstract: Image segmentation plays a crucial role in numerous biomedical imaging applications, assisting clinicians or health care professionals with diagnosis of various diseases using scientific data. However,

its high computational complexities require substantial amount of time and have limited their

applicability. Research has thus focused on parallel processing models that support biomedical image

segmentation. In this paper, we present analytical results of the design space exploration of many-core processors for efficient fuzzy c-means (FCM) clustering, which is widely used in many medical image

segmentations. We quantitatively evaluate the impact of varying a number of processing elements (PEs)

and an amount of local memory for a fixed image size on system performance and efficiency using architectural and workload simulations. Experimental results indicate that PEs=4,096 provides the most

efficient operation for the FCM algorithm with four clusters, while PEs=1,024 and PEs=4,096 yield the

highest area efficiency and energy efficiency, respectively, for three clusters.

ETPL

DIP-157 Active Contour-Based Visual Tracking by Integrating Colors, Shapes, and Motions

Abstract: In this paper, we present a framework for active contour-based visual tracking using level sets.

The main components of our framework include contour-based tracking initialization, color-based

contour evolution, adaptive shape-based contour evolution for non-periodic motions, dynamic shape-based contour evolution for periodic motions, and the handling of abrupt motions. For the initialization of

contour-based tracking, we develop an optical flow-based algorithm for automatically initializing

contours at the first frame. For the color-based contour evolution, Markov random field theory is used to measure correlations between values of neighboring pixels for posterior probability estimation. For

adaptive shape-based contour evolution, the global shape information and the local color information are

combined to hierarchically evolve the contour, and a flexible shape updating model is constructed. For

the dynamic shape-based contour evolution, a shape mode transition matrix is learnt to characterize the temporal correlations of object shapes. For the handling of abrupt motions, particle swarm optimization is

adopted to capture the global motion which is applied to the contour in the current frame to produce an

initial contour in the next frame.




ETPL

DIP-158 Image Quality Assessment Using Multi-Method Fusion

Abstract: A new methodology for objective image quality assessment (IQA) with multi-method fusion

(MMF) is presented in this paper. The research is motivated by the observation that there is no single method that can give the best performance in all situations. To achieve MMF, we adopt a regression

approach. The new MMF score is set to be the nonlinear combination of scores from multiple methods

with suitable weights obtained by a training process. In order to improve the regression results further, we divide distorted images into three to five groups based on the distortion types and perform regression

within each group, which is called “context-dependent MMF” (CD-MMF). One task in CD-MMF is to

determine the context automatically, which is achieved by a machine learning approach. To further reduce the complexity of MMF, we perform algorithms to select a small subset from the candidate

method set. The result is very good even if only three quality assessment methods are included in the

fusion process. The proposed MMF method using support vector regression is shown to outperform a

large number of existing IQA methods by a significant margin when being tested in six representative databases.

ETPL

DIP-159 Robust Radial Face Detection for Omnidirectional Vision

Abstract: Bio-inspired and non-conventional vision systems are highly researched topics. Among them, omnidirectional vision systems have demonstrated their ability to significantly improve the geometrical

interpretation of scenes. However, few researchers have investigated how to perform object detection

with such systems. The existing approaches require a geometrical transformation prior to the

interpretation of the picture. In this paper, we investigate what must be taken into account and how to process omnidirectional images provided by the sensor. We focus our research on face detection and

highlight the fact that particular attention should be paid to the descriptors in order to successfully

perform face detection on omnidirectional images. We demonstrate that this choice is critical to obtaining high detection rates. Our results imply that the adaptation of existing object-detection frameworks,

designed for perspective images, should be focused on the choice of appropriate image descriptors in the

design of the object-detection pipeline.

ETPL

DIP-160 Optimized 3D Watermarking for Minimal Surface Distortion

Abstract: This paper proposes a new approach to 3D watermarking by ensuring the optimal preservation

of mesh surfaces. A new 3D surface preservation function metric is defined consisting of the distance of a

vertex displaced by watermarking to the original surface, to the watermarked object surface as well as the actual vertex displacement. The proposed method is statistical, blind, and robust. Minimal surface

distortion according to the proposed function metric is enforced during the statistical watermark

embedding stage using Levenberg-Marquardt optimization method. A study of the watermark code crypto-security is provided for the proposed methodology. According to the experimental results, the

proposed methodology has high robustness against the common mesh attacks while preserving the

original object surface during watermarking

ETPL

DIP-161

Approximate Least Trimmed Sum of Squares Fitting and Applications in Image

Analysis

Abstract: The least trimmed sum of squares (LTS) regression estimation criterion is a robust statistical

method for model fitting in the presence of outliers. Compared with the classical least squares estimator,

which uses the entire data set for regression and is consequently sensitive to outliers, LTS identifies the outliers and fits to the remaining data points for improved accuracy. Exactly solving an LTS problem is

NP-hard, but as we show here, LTS can be formulated as a concave minimization problem. Since it is

usually tractable to globally solve a convex minimization or concave maximization problem in




polynomial time, inspired by , we instead solve LTS' approximate complementary problem, which is

convex minimization. We show that this complementary problem can be efficiently solved as a second

order cone program. We thus propose an iterative procedure to approximately solve the original LTS problem. Our extensive experiments demonstrate that the proposed method is robust, efficient and

scalable in dealing with problems where data are contaminated with outliers. We show several

applications of our method in image analysis.

ETPL

DIP-162 Design of Low-Complexity High-Performance Wavelet Filters for Image Analysis

Abstract: This paper addresses the construction of a family of wavelets based on halfband polynomials.

An algorithm is proposed that ensures maximum zeros at for a desired length of analysis and synthesis

filters. We start with the coefficients of the polynomial and then use a generalized matrix formulation method to construct the filter halfband polynomial. The designed wavelets are efficient and give

acceptable levels of peak signal-to-noise ratio when used for image compression. Furthermore, these

wavelets give satisfactory recognition rates when used for feature extraction. Simulation results show that the designed wavelets are effective and more efficient than the existing standard wavelets.

ETPL

DIP-163

Noise Reduction Based on Partial-Reference, Dual-Tree Complex Wavelet Transform

Shrinkage

Abstract: This paper presents a novel way to reduce noise introduced or exacerbated by image

enhancement methods, in particular algorithms based on the random spray sampling technique, but not only. According to the nature of sprays, output images of spray-based methods tend to exhibit noise with

unknown statistical distribution. To avoid inappropriate assumptions on the statistical characteristics of

noise, a different one is made. In fact, the non-enhanced image is considered to be either free of noise or affected by non-perceivable levels of noise. Taking advantage of the higher sensitivity of the human

visual system to changes in brightness, the analysis can be limited to the luma channel of both the non-

enhanced and enhanced image. Also, given the importance of directional content in human vision, the

analysis is performed through the dual-tree complex wavelet transform (DTWCT). Unlike the discrete wavelet transform, the DTWCT allows for distinction of data directionality in the transform space. For

each level of the transform, the standard deviation of the non-enhanced image coefficients is computed

across the six orientations of the DTWCT, then it is normalized. The result is a map of the directional structures present in the non-enhanced image. Said map is then used to shrink the coefficients of the

enhanced image. The shrunk coefficients and the coefficients from the non-enhanced image are then

mixed according to data directionality. Finally, a noise-reduced version of the enhanced image is computed via the inverse transforms. A thorough numerical analysis of the results has been performed in

order to confirm the validity of the proposed approach.

ETPL

DIP-164 Hessian Schatten-Norm Regularization for Linear Inverse Problems

Abstract: We introduce a novel family of invariant, convex, and non-quadratic functionals that we employ to derive regularized solutions of ill-posed linear inverse imaging problems. The proposed

regularizers involve the Schatten norms of the Hessian matrix, which are computed at every pixel of the

image. They can be viewed as second-order extensions of the popular total-variation (TV) semi-norm since they satisfy the same invariance properties. Meanwhile, by taking advantage of second-order

derivatives, they avoid the staircase effect, a common artifact of TV-based reconstructions, and perform

well for a wide range of applications. To solve the corresponding optimization problems, we propose an

algorithm that is based on a primal-dual formulation. A fundamental ingredient of this algorithm is the projection of matrices onto Schatten norm balls of arbitrary radius. This operation is performed efficiently

based on a direct link we provide between vector projections onto norm balls and matrix projections onto

Schatten norm balls. Finally, we demonstrate the effectiveness of the proposed methods through




experimental results on several inverse imaging problems with real and simulated data.

ETPL

DIP-165 Structured Sparse Error Coding for Face Recognition With Occlusion

Abstract: Face recognition with occlusion is common in the real world. Inspired by the works of

structured sparse representation, we try to explore the structure of the error incurred by occlusion from

two aspects: the error morphology and the error distribution. Since human beings recognize the occlusion mainly according to its region shape or profile without knowing accurately what the occlusion is, we

argue that the shape of the occlusion is also an important feature. We propose a morphological graph

model to describe the morphological structure of the error. Due to the uncertainty of the occlusion, the

distribution of the error incurred by occlusion is also uncertain. However, we observe that the unoccluded part and the occluded part of the error measured by the correntropy induced metric follow the exponential

distribution, respectively. Incorporating the two aspects of the error structure, we propose the structured

sparse error coding for face recognition with occlusion. Our extensive experiments demonstrate that the proposed method is more stable and has higher breakdown point in dealing with the occlusion problems

in face recognition as compared to the related state-of-the-art methods, especially for the extreme

situation, such as the high level occlusion and the low feature dimension.

ETPL

DIP-166

Accurate Multiple View 3D Reconstruction Using Patch-Based Stereo for Large-Scale

Scenes

Abstract: In this paper, we propose a depth-map merging based multiple view stereo method for large-

scale scenes which takes both accuracy and efficiency into account. In the proposed method, an efficient

patch-based stereo matching process is used to generate depth-map at each image with acceptable errors, followed by a depth-map refinement process to enforce consistency over neighboring views. Compared to

state-of-the-art methods, the proposed method can reconstruct quite accurate and dense point clouds with

high computational efficiency. Besides, the proposed method could be easily parallelized at image level, i.e., each depth-map is computed individually, which makes it suitable for large-scale scene

reconstruction with high resolution images. The accuracy and efficiency of the proposed method are

evaluated quantitatively on benchmark data and qualitatively on large data sets.

ETPL

DIP-167 Mixed-Domain Edge-Aware Image Manipulation

Abstract: This paper presents a novel approach to edge-aware image manipulation. Our method processes

a Gaussian pyramid from coarse to fine, and at each level, applies a nonlinear filter bank to the

neighborhood of each pixel. Outputs of these spatially-varying filters are merged using global optimization. The optimization problem is solved using an explicit mixed-domain (real space and DCT

transform space) solution, which is efficient, accurate, and easy-to-implement. We demonstrate

applications of our method to a set of problems, including detail and contrast manipulation, HDR compression, nonphotorealistic rendering, and haze removal.

ETPL

DIP-168 Monocular Depth Ordering Using T-Junctions and Convexity Occlusion Cues

Abstract: This paper proposes a system that relates objects in an image using occlusion cues and arranges

them according to depth. The system does not rely on a priori knowledge of the scene structure and focuses on detecting special points, such as T-junctions and highly convex contours, to infer the depth

relationships between objects in the scene. The system makes extensive use of the binary partition tree as

hierarchical region-based image representation jointly with a new approach for candidate T-junction estimation. Since some regions may not involve T-junctions, occlusion is also detected by examining

convex shapes on region boundaries. Combining T-junctions and convexity leads to a system which only

relies on low level depth cues and does not rely on semantic information. However, it shows a similar or

better performance with the state-of-the-art while not assuming any type of scene. As an extension of the




automatic depth ordering system, a semi-automatic approach is also proposed. If the user provides the

depth order for a subset of regions in the image, the system is able to easily integrate this user information

to the final depth order for the complete image. For some applications, user interaction can naturally be integrated, improving the quality of the automatically generated depth map

ETPL

DIP-169

Perceptual Full-Reference Quality Assessment of Stereoscopic Images by Considering

Binocular Visual Characteristics

Abstract: Perceptual quality assessment is a challenging issue in 3D signal processing research. It is

important to study 3D signal directly instead of studying simple extension of the 2D metrics directly to the 3D case as in some previous studies. In this paper, we propose a new perceptual full-reference quality

assessment metric of stereoscopic images by considering the binocular visual characteristics. The major

technical contribution of this paper is that the binocular perception and combination properties are considered in quality assessment. To be more specific, we first perform left-right consistency checks and

compare matching error between the corresponding pixels in binocular disparity calculation, and classify

the stereoscopic images into non-corresponding, binocular fusion, and binocular suppression regions. Also, local phase and local amplitude maps are extracted from the original and distorted stereoscopic

images as features in quality assessment. Then, each region is evaluated independently by considering its

binocular perception property, and all evaluation results are integrated into an overall score. Besides, a

binocular just noticeable difference model is used to reflect the visual sensitivity for the binocular fusion and suppression regions. Experimental results show that compared with the relevant existing metrics, the

proposed metric can achieve higher consistency with subjective assessment of stereoscopic images.

ETPL

DIP-170 Multi-Wiener SURE-LET Deconvolution

Abstract: In this paper, we propose a novel deconvolution algorithm based on the minimization of a

regularized Stein's unbiased risk estimate (SURE), which is a good estimate of the mean squared error.

We linearly parametrize the deconvolution process by using multiple Wiener filters as elementary

functions, followed by undecimated Haar-wavelet thresholding. Due to the quadratic nature of SURE and the linear parametrization, the deconvolution problem finally boils down to solving a linear system of

equations, which is very fast and exact. The linear coefficients, i.e., the solution of the linear system of

equations, constitute the best approximation of the optimal processing on the Wiener-Haar-threshold basis that we consider. In addition, the proposed multi-Wiener SURE-LET approach is applicable for

both periodic and symmetric boundary conditions, and can thus be used in various practical scenarios.

The very competitive (both in computation time and quality) results show that the proposed algorithm, which can be interpreted as a kind of nonlinear Wiener processing, can be used as a basic tool for

building more sophisticated deconvolution algorithms.

ETPL

DIP-171 Joint Reconstruction of Multiview Compressed Images

Abstract: Distributed representation of correlated multiview images is an important problem that arises in vision sensor networks. This paper concentrates on the joint reconstruction problem where the

distributively compressed images are decoded together in order to take benefit from the image

correlation. We consider a scenario where the images captured at different viewpoints are encoded independently using common coding solutions (e.g., JPEG) with a balanced rate distribution among

different cameras. A central decoder first estimates the inter-view image correlation from the

independently compressed data. The joint reconstruction is then cast as a constrained convex optimization

problem that reconstructs total-variation (TV) smooth images, which comply with the estimated correlation model. At the same time, we add constraints that force the reconstructed images to be as close

as possible to their compressed versions. We show through experiments that the proposed joint

reconstruction scheme outperforms independent reconstruction in terms of image quality, for a given




target bit rate. In addition, the decoding performance of our algorithm compares advantageously to state-

of-the-art distributed coding schemes based on motion learning and on the DISCOVER algorithm.

ETPL

DIP-172 Scalable Coding of Depth Maps With R-D Optimized Embedding

Abstract: Recent work on depth map compression has revealed the importance of incorporating a

description of discontinuity boundary geometry into the compression scheme. We propose a novel

compression strategy for depth maps that incorporates geometry information while achieving the goals of

scalability and embedded representation. Our scheme involves two separate image pyramid structures, one for breakpoints and the other for sub-band samples produced by a breakpoint-adaptive transform.

Breakpoints capture geometric attributes, and are amenable to scalable coding. We develop a rate-

distortion optimization framework for determining the presence and precision of breakpoints in the pyramid representation. We employ a variation of the EBCOT scheme to produce embedded bit-streams

for both the breakpoint and sub-band data. Compared to JPEG 2000, our proposed scheme enables the

same the scalability features while achieving substantially improved rate-distortion performance at the higher bit-rate range and comparable performance at the lower rates.

ETPL

DIP-173 Automatic Virus Particle Selection—The Entropy Approach

Abstract: This paper describes a fully automatic approach to locate icosahedral virus particles in

transmission electron microscopy images. The initial detection of the particles takes place through automatic segmentation of the entropy-proportion image; this image is computed in particular regions of

interest defined by two concentric structuring elements contained in a small overlapping window running

over all the image. Morphological features help to select the candidates, as the threshold is kept low enough to avoid false negatives. The candidate points are subject to a credibility test based on features

extracted from eight radial intensity profiles in each point from a texture image. A candidate is accepted

if these features meet the set of acceptance conditions describing the typical intensity profiles of these

kinds of particles. The set of points accepted is subjected to a last validation in a three-parameter space using a discrimination plan that is a function of the input image to separate possible outliers.

ETPL

DIP-174

A Tuned Mesh-Generation Strategy for Image Representation Based on Data-

Dependent Triangulation

Abstract: A mesh-generation framework for image representation based on data-dependent triangulation is proposed. The proposed framework is a modified version of the frameworks of Rippa and Garland and

Heckbert that facilitates the development of more effective mesh-generation methods. As the proposed

framework has several free parameters, the effects of different choices of these parameters on mesh quality are studied, leading to the recommendation of a particular set of choices for these parameters. A

mesh-generation method is then introduced that employs the proposed framework with these best

parameter choices. This method is demonstrated to produce meshes of higher quality (both in terms of

squared error and subjectively) than those generated by several competing approaches, at a relatively modest computational and memory cost.

ETPL

DIP-175 Accelerated Edge-Preserving Image Restoration Without Boundary Artifacts

Abstract: To reduce blur in noisy images, regularized image restoration methods have been proposed that use nonquadratic regularizers (like l1 regularization or total-variation) that suppress noise while

preserving edges in the image. Most of these methods assume a circulant blur (periodic convolution with

a blurring kernel) that can lead to wraparound artifacts along the boundaries of the image due to the

implied periodicity of the circulant model. Using a noncirculant model could prevent these artifacts at the cost of increased computational complexity. In this paper, we propose to use a circulant blur model

combined with a masking operator that prevents wraparound artifacts. The resulting model is




noncirculant, so we propose an efficient algorithm using variable splitting and augmented Lagrangian

(AL) strategies. Our variable splitting scheme, when combined with the AL framework and alternating

minimization, leads to simple linear systems that can be solved noniteratively using fast Fourier transforms (FFTs), eliminating the need for more expensive conjugate gradient-type solvers. The

proposed method can also efficiently tackle a variety of convex regularizers, including edge-preserving

(e.g., total-variation) and sparsity promoting (e.g., l1-norm) regularizers. Simulation results show fast convergence of the proposed method, along with improved image quality at the boundaries where the

circulant model is inaccurate.

ETPL

DIP-176

Box Relaxation Schemes in Staggered Discretizations for the Dual Formulation of

Total Variation Minimization

Abstract: In this paper, we propose some new box relaxation numerical schemes on staggered grids to solve the stationary system of partial differential equations arising from the dual minimization problem

associated with the total variation operator. We present in detail the numerical schemes for the scalar case

and its generalization to multichannel (vectorial) images. Then, we discuss their implementation in digital image denoising. The results outperform the resolution of the dual equation based on the gradient descent

approach and pave the way for more advanced numerical strategies.

ETPL

DIP-177 Constrained Optical Flow Estimation as a Matching Problem

Abstract: In general, discretization in the motion vector domain yields an intractable number of labels. In this paper, we propose an approach that can reduce general optical flow to the constrained matching

problem by pre-estimating a 2-D disparity labeling map of the desired discrete motion vector function.

One of the goals of the proposed paper is estimating coarse distribution of motion vectors and then utilizing this distribution as global constraints for discrete optical flow estimation. This pre-estimation is

done with a simple frame-to-frame correlation technique also known as the digital symmetric-phase-only-

filter (SPOF). We discover a strong correlation between the output of the SPOF and the motion vector

distribution of the related optical flow. A two step matching paradigm for optical flow estimation is applied: pixel accuracy (integer flow) and subpixel accuracy estimation. The matching problem is solved

by global optimization. Experiments on the Middlebury optical flow datasets confirm our intuitive

assumptions about strong correlation between motion vector distribution of optical flow and maximal peaks of SPOF outputs. The overall performance of the proposed method is promising and achieves state-

of-the-art results on the Middlebury benchmark

ETPL

DIP-178 Nonseparable Shearlet Transform

Abstract: Over the past few years, various representation systems which sparsely approximate functions

governed by anisotropic features, such as edges in images, have been proposed. Alongside the theoretical

development of these systems, algorithmic realizations of the associated transforms are provided.

However, one of the most common shortcomings of these frameworks is the lack of providing a unified treatment of the continuum and digital world, i.e., allowing a digital theory to be a natural digitization of

the continuum theory. In this paper, we introduce a new shearlet transform associated with a nonseparable

shearlet generator, which improves the directional selectivity of previous shearlet transforms. Our approach is based on a discrete framework, which allows a faithful digitization of the continuum domain

directional transform based on compactly supported shearlets introduced as means to sparsely encode

anisotropic singularities of multivariate data. We show numerical experiments demonstrating the

potential of our new shearlet transform in 2D and 3D image processing applications.

ETPL

DIP-179

Modeling and Classifying Human Activities From Trajectories Using a Class of Space-

Varying Parametric Motion Fields

Abstract: Many approaches to trajectory analysis, such as clustering or classification, use probabilistic




generative models, thus not requiring trajectory alignment/registration. Switched linear dynamical models

(e.g., HMMs) have been used in this context, due to their ability to describe different motion regimes.

However, these models are not suitable for handling space-dependent dynamics that are more naturally captured by nonlinear models. As is well known, these are more difficult to identify. In this paper, we

propose a new way of modeling trajectories, based on a mixture of parametric motion vector fields that

depend on a small number of parameters. Switching among these fields follows a probabilistic mechanism, characterized by a field of stochastic matrices. This approach allows representing a wide

variety of trajectories and modeling space-dependent behaviors without using global nonlinear dynamical

models. Experimental evaluation is conducted in both synthetic and real scenarios. The latter concerning

with human trajectory modeling for activity classification, a central task in video surveillance

ETPL

DIP-180 Real-Time Continuous Image Registration Enabling Ultraprecise 2-D Motion Tracking

Abstract: In this paper, we present a novel continuous image registration method (CIRM), which yields

near-zero bias and has high computational efficiency. It can be realized for real-time position estimation to enable ultraprecise 2-D motion tracking and motion control over a large motion range. As the two

variables of the method are continuous in spatial domain, pixel-level image registration is unnecessary,

thus the CIRM can continuously track the moving target according to the incoming target image. When

applied to a specific target object, measurement resolution of the method is predicted according to the reference image model of the object along with the variance of the camera's overall image noise. The

maximum permissible target speed is proportional to the permissible frame rate, which is limited by the

required computational time. The precision, measurement resolution, and computational efficiency of the method are verified through computer simulations and experiments. Specifically, the CIRM is

implemented and integrated with a visual sensing system. Near-zero bias, measurement resolution of 0.1

nm (0.0008 pixels), and measurement of one nanometer stepping are demonstrated.

ETPL

DIP-181

Unified Blind Method for Multi-Image Super-Resolution and Single/Multi-Image Blur

Deconvolution

Abstract: This paper presents, for the first time, a unified blind method for multi-image super-resolution

(MISR or SR), single-image blur deconvolution (SIBD), and multi-image blur deconvolution (MIBD) of

low-resolution (LR) images degraded by linear space-invariant (LSI) blur, aliasing, and additive white Gaussian noise (AWGN). The proposed approach is based on alternating minimization (AM) of a new

cost function with respect to the unknown high-resolution (HR) image and blurs. The regularization term

for the HR image is based upon the Huber-Markov random field (HMRF) model, which is a type of variational integral that exploits the piecewise smooth nature of the HR image. The blur estimation

process is supported by an edge-emphasizing smoothing operation, which improves the quality of blur

estimates by enhancing strong soft edges toward step edges, while filtering out weak structures. The

parameters are updated gradually so that the number of salient edges used for blur estimation increases at each iteration. For better performance, the blur estimation is done in the filter domain rather than the pixel

domain, i.e., using the gradients of the LR and HR images. The regularization term for the blur is

Gaussian (L2 norm), which allows for fast noniterative optimization in the frequency domain. We accelerate the processing time of SR reconstruction by separating the upsampling and registration

processes from the optimization procedure. Simulation results on both synthetic and real-life images

(from a novel computational imager) confirm the robustness and effectiveness of the proposed method.

ETPL

DIP-182 Informative State-Based Video Communication

Abstract: We study state-based video communication where a client simultaneously informs the server

about the presence status of various packets in its buffer. In sender-driven transmission, the client

periodically sends to the server a single acknowledgement packet that provides information about all




packets that have arrived at the client by the time the acknowledgment is sent. In receiver-driven

streaming, the client periodically sends to the server a single request packet that comprises a transmission

schedule for sending missing data to the client over a horizon of time. We develop a comprehensive optimization framework that enables computing packet transmission decisions that maximize the end-to-

end video quality for the given bandwidth resources, in both prospective scenarios. The core step of the

optimization comprises computing the probability that a single packet will be communicated in error as a function of the expected transmission redundancy (or cost) used to communicate the packet. Through

comprehensive simulation experiments, we carefully examine the performance advances that our

framework enables relative to state-of-the-art scheduling systems that employ regular acknowledgement

or request packets. Consistent gains in video quality of up to 2B are demonstrated across a variety of content types. We show that there is a direct analogy between the error-cost efficiency of streaming a

single packet and the overall rate-distortion performance of streaming the whole content. In the case of

sender-driven transmission, we develop an effective modeling approach that accurately characterizes the end-to-end performance as a function of the packet loss rate on the backward channel and the source

encoding characteristics

ETPL

DIP-183

Quantification of Smoothing Requirement for 3D Optic Flow Calculation of

Volumetric Images

Abstract: Complexities of dynamic volumetric imaging challenge the available computer vision techniques on a number of different fronts. This paper examines the relationship between the estimation

accuracy and required amount of smoothness for a general solution from a robust statistics perspective.

We show that a (surprisingly) small amount of local smoothing is required to satisfy both the necessary and sufficient conditions for accurate optic flow estimation. This notion is called “just enough”

smoothing, and its proper implementation has a profound effect on the preservation of local information

in processing 3D dynamic scans. To demonstrate the effect of “just enough” smoothing, a robust 3D optic flow method with quantized local smoothing is presented, and the effect of local smoothing on the

accuracy of motion estimation in dynamic lung CT images is examined using both synthetic and real

image sequences with ground truth.

ETPL

DIP-184 Analysis Operator Learning and its Application to Image Reconstruction

Abstract: Exploiting a priori known structural information lies at the core of many image reconstruction

methods that can be stated as inverse problems. The synthesis model, which assumes that images can be

decomposed into a linear combination of very few atoms of some dictionary, is now a well established tool for the design of image reconstruction algorithms. An interesting alternative is the analysis model,

where the signal is multiplied by an analysis operator and the outcome is assumed to be sparse. This

approach has only recently gained increasing interest. The quality of reconstruction methods based on an

analysis model severely depends on the right choice of the suitable operator. In this paper, we present an algorithm for learning an analysis operator from training images. Our method is based on lp-norm

minimization on the set of full rank matrices with normalized columns. We carefully introduce the

employed conjugate gradient method on manifolds, and explain the underlying geometry of the constraints. Moreover, we compare our approach to state-of-the-art methods for image denoising,

inpainting, and single image super-resolution. Our numerical results show competitive performance of

our general approach in all presented applications compared to the specialized state-of-the-art techniques

ETPL

DIP-185 Computational Model of Stereoscopic 3D Visual Saliency

Abstract: Many computational models of visual attention performing well in predicting salient areas of

2D images have been proposed in the literature. The emerging applications of stereoscopic 3D display

bring an additional depth of information affecting the human viewing behavior, and require extensions of




the efforts made in 2D visual modeling. In this paper, we propose a new computational model of visual

attention for stereoscopic 3D still images. Apart from detecting salient areas based on 2D visual features,

the proposed model takes depth as an additional visual dimension. The measure of depth saliency is derived from the eye movement data obtained from an eye-tracking experiment using synthetic stimuli.

Two different ways of integrating depth information in the modeling of 3D visual attention are then

proposed and examined. For the performance evaluation of 3D visual attention models, we have created an eye-tracking database, which contains stereoscopic images of natural content and is publicly available,

along with this paper. The proposed model gives a good performance, compared to that of state-of-the-art

2D models on 2D images. The results also suggest that a better performance is obtained when depth

information is taken into account through the creation of a depth saliency map, rather than when it is integrated by a weighting method.

ETPL

DIP-186 In-Plane Rotation and Scale Invariant Clustering Using Dictionaries

Abstract: In this paper, we present an approach that simultaneously clusters images and learns dictionaries from the clusters. The method learns dictionaries and clusters images in the radon transform

domain. The main feature of the proposed approach is that it provides both in-plane rotation and scale

invariant clustering, which is useful in numerous applications, including content-based image retrieval

(CBIR). We demonstrate the effectiveness of our rotation and scale invariant clustering method on a series of CBIR experiments. Experiments are performed on the Smithsonian isolated leaf, Kimia shape,

and Brodatz texture datasets. Our method provides both good retrieval performance and greater

robustness compared to standard Gabor-based and three state-of-the-art shape-based methods that have similar objectives.

ETPL

DIP-187 General Framework to Histogram-Shifting-Based Reversible Data Hiding

Abstract: Histogram shifting (HS) is a useful technique of reversible data hiding (RDH). With HS-based

RDH, high capacity and low distortion can be achieved efficiently. In this paper, we revisit the HS technique and present a general framework to construct HS-based RDH. By the proposed framework, one

can get a RDH algorithm by simply designing the so-called shifting and embedding functions. Moreover,

by taking specific shifting and embedding functions, we show that several RDH algorithms reported in the literature are special cases of this general construction. In addition, two novel and efficient RDH

algorithms are also introduced to further demonstrate the universality and applicability of our framework.

It is expected that more efficient RDH algorithms can be devised according to the proposed framework by carefully designing the shifting and embedding functions.

ETPL

DIP-188

Computationally Tractable Stochastic Image Modeling Based on Symmetric Markov

Mesh Random Fields

Abstract: In this paper, the properties of a new class of causal Markov random fields, named symmetric

Markov mesh random field, are initially discussed. It is shown that the symmetric Markov mesh random fields from the upper corners are equivalent to the symmetric Markov mesh random fields from the lower

corners. Based on this new random field, a symmetric, corner-independent, and isotropic image model is

then derived which incorporates the dependency of a pixel on all its neighbors. The introduced image model comprises the product of several local 1D density and 2D joint density functions of pixels in an

image thus making it computationally tractable and practically feasible by allowing the use of histogram

and joint histogram approximations to estimate the model parameters. An image restoration application is

also presented to confirm the effectiveness of the model developed. The experimental results demonstrate that this new model provides an improved tool for image modeling purposes compared to the

conventional Markov random field models.

ETPL Robust Ellipse Fitting Based on Sparse Combination of Data Points




DIP-189

Abstract: Ellipse fitting is widely applied in the fields of computer vision and automatic industry control, in which the procedure of ellipse fitting often follows the preprocessing step of edge detection in the

original image. Therefore, the ellipse fitting method also depends on the accuracy of edge detection

besides their own performance, especially due to the introduced outliers and edge point errors from edge

detection which will cause severe performance degradation. In this paper, we develop a robust ellipse fitting method to alleviate the influence of outliers. The proposed algorithm solves ellipse parameters by

linearly combining a subset of (“more accurate”) data points (formed from edge points) rather than all

data points (which contain possible outliers). In addition, considering that squaring the fitting residuals can magnify the contributions of these extreme data points, our algorithm replaces it with the absolute

residuals to reduce this influence. Moreover, the norm of data point errors is bounded, and the worst case

performance optimization is formed to be robust against data point errors. The resulting mixed l1-l2 optimization problem is further derived as a second-order cone programming one and solved by the

computationally efficient interior-point methods. Note that the fitting approach developed in this paper

specifically deals with the overdetermined system, whereas the current sparse representation theory is

only applied to underdetermined systems. Therefore, the proposed algorithm can be looked upon as an extended application and development of the sparse representation theory. Some simulated and

experimental examples are presented to illustrate the effectiveness of the proposed ellipse fitting

approach.

ETPL

DIP-190 Learning Dynamic Hybrid Markov Random Field for Image Labeling

Abstract: Using shape information has gained increasing concerns in the task of image labeling. In this

paper, we present a dynamic hybrid Markov random field (DHMRF), which explicitly captures middle-

level object shape and low-level visual appearance (e.g., texture and color) for image labeling. Each node in DHMRF is described by either a deformable template or an appearance model as visual prototype. On

the other hand, the edges encode two types of intersections: co-occurrence and spatial layered context,

with respect to the labels and prototypes of connected nodes. To learn the DHMRF model, an iterative algorithm is designed to automatically select the most informative features and estimate model

parameters. The algorithm achieves high computational efficiency since a branch-and-bound schema is

introduced to estimate model parameters. Compared with previous methods, which usually employ implicit shape cues, our DHMRF model seamlessly integrates color, texture, and shape cues to inference

labeling output, and thus produces more accurate and reliable results. Extensive experiments validate its

superiority over other state-of-the-art methods in terms of recognition accuracy and implementation

efficiency on: the MSRC 21-class dataset, and the lotus hill institute 15-class dataset

ETPL

DIP-191

Coupled Variational Image Decomposition and Restoration Model for Blurred

Cartoon-Plus-Texture Images With Missing Pixels

Abstract: In this paper, we develop a decomposition model to restore blurred images with missing pixels.

Our assumption is that the underlying image is the superposition of cartoon and texture components. We use the total variation norm and its dual norm to regularize the cartoon and texture, respectively. We

recommend an efficient numerical algorithm based on the splitting versions of augmented Lagrangian

method to solve the problem. Theoretically, the existence of a minimizer to the energy function and the

convergence of the algorithm are guaranteed. In contrast to recently developed methods for deblurring images, the proposed algorithm not only gives the restored image, but also gives a decomposition of

cartoon and texture parts. These two parts can be further used in segmentation and inpainting problems.

Numerical comparisons between this algorithm and some state-of-the-art methods are also reported.

ETPL

DIP-192

Perceptual Quality-Regulable Video Coding System With Region-Based Rate Control

Scheme




Abstract: In this paper, we discuss a region-based perceptual quality-regulable H.264 video encoder

system that we developed. The ability to adjust the quality of specific regions of a source video to a

predefined level of quality is an essential technique for region-based video applications. We use the structural similarity index as the quality metric for distortion-quantization modeling and develop a bit

allocation and rate control scheme for enhancing regional perceptual quality. Exploiting the relationship

between the reconstructed macroblock and the best predicted macroblock from mode decision, a novel quantization parameter prediction method is built and used to achieve the target video quality of the

processed macroblock. Experimental results show that the system model has only 0.013 quality error in

average. Moreover, the proposed region-based rate control system can encode video well under a bitrate

constraint with a 0.1% bitrate error in average. For the situation of the low bitrate constraint, the proposed system can encode video with a 0.5% bit error rate in average and enhance the quality of the target

regions

ETPL

DIP-193 Color and Depth Priors in Natural Images

Abstract: Natural scene statistics have played an increasingly important role in both our understanding of

the function and evolution of the human vision system, and in the development of modern image

processing applications. Because range (egocentric distance) is arguably the most important thing a visual

system must compute (from an evolutionary perspective), the joint statistics between image information (color and luminance) and range information are of particular interest. It seems obvious that where there

is a depth discontinuity, there must be a higher probability of a brightness or color discontinuity too. This

is true, but the more interesting case is in the other direction - because image information is much more easily computed than range information, the key conditional probabilities are those of finding a range

discontinuity given an image discontinuity. Here, the intuition is much weaker; the plethora of shadows

and textures in the natural environment imply that many image discontinuities must exist without corresponding changes in range. In this paper, we extend previous work in two ways - we use as our

starting point a very high quality data set of co-registered color and range values collected specifically for

this purpose, and we evaluate the statistics of perceptually relevant chromatic information in addition to

luminance, range, and binocular disparity information. The most fundamental finding is that the probabilities of finding range changes do in fact depend in a useful and systematic way on color and

luminance changes; larger range changes are associated with larger image changes. Second, we are able

to parametrically model the prior marginal and conditional distributions of luminance, color, range, and (computed) binocular disparity. Finally, we provide a proof of principle that this information is useful by

showing that our distribution models improve the performance of a Bayesian stereo algorithm on an

independent set of input images. To summarize- we show that there is useful information about range in

very low-level luminance and color information. To a system sensitive to this statistical information, it amounts to an additional (and only recently appreciated) depth cue, and one that is trivial to compute

from the image data. We are confident that this information is robust, in that we go to great effort and

expense to collect very high quality raw data. Finally, we demonstrate the practical utility of these findings by using them to improve the performance of a Bayesian stereo algorithm.

ETPL

DIP-194 Sparse Image Reconstruction on the Sphere: Implications of a New Sampling Theorem

Abstract: We study the impact of sampling theorems on the fidelity of sparse image reconstruction on the sphere. We discuss how a reduction in the number of samples required to represent all information

content of a band-limited signal acts to improve the fidelity of sparse image reconstruction, through both

the dimensionality and sparsity of signals. To demonstrate this result, we consider a simple inpainting

problem on the sphere and consider images sparse in the magnitude of their gradient. We develop a framework for total variation inpainting on the sphere, including fast methods to render the inpainting




problem computationally feasible at high resolution. Recently a new sampling theorem on the sphere was

developed, reducing the required number of samples by a factor of two for equiangular sampling

schemes. Through numerical simulations, we verify the enhanced fidelity of sparse image reconstruction due to the more efficient sampling of the sphere provided by the new sampling theorem.

ETPL

DIP-195 Log-Gabor Filters for Image-Based Vehicle Verification

Abstract: Vehicle detection based on image analysis has attracted increasing attention in recent years due

to its low cost, flexibility, and potential toward collision avoidance. In particular, vehicle verification is especially challenging on account of the heterogeneity of vehicles in color, size, pose, etc. Image-based

vehicle verification is usually addressed as a supervised classification problem. Specifically, descriptors

using Gabor filters have been reported to show good performance in this task. However, Gabor functions have a number of drawbacks relating to their frequency response. The main contribution of this paper is

the proposal and evaluation of a new descriptor based on the alternative family of log-Gabor functions for

vehicle verification, as opposed to existing Gabor filter-based descriptors. These filters are theoretically superior to Gabor filters as they can better represent the frequency properties of natural images. As a

second contribution, and in contrast to existing approaches, which transfer the standard configuration of

filters used for other applications to the vehicle classification task, an in-depth analysis of the required

filter configuration by both Gabor and log-Gabor descriptors for this particular application is performed for fair comparison. The extensive experiments conducted in this paper confirm that the proposed log-

Gabor descriptor significantly outperforms the standard Gabor filter for image-based vehicle verification.

ETPL

DIP-196 Scene Text Detection via Connected Component Clustering and Nontext Filtering

Abstract: In this paper, we present a new scene text detection algorithm based on two machine learning

classifiers: one allows us to generate candidate word regions and the other filters out nontext ones. To be

precise, we extract connected components (CCs) in images by using the maximally stable extremal region

algorithm. These extracted CCs are partitioned into clusters so that we can generate candidate regions. Unlike conventional methods relying on heuristic rules in clustering, we train an AdaBoost classifier that

determines the adjacency relationship and cluster CCs by using their pairwise relations. Then we

normalize candidate word regions and determine whether each region contains text or not. Since the scale, skew, and color of each candidate can be estimated from CCs, we develop a text/nontext classifier

for normalized images. This classifier is based on multilayer perceptrons and we can control recall and

precision rates with a single free parameter. Finally, we extend our approach to exploit multichannel information. Experimental results on ICDAR 2005 and 2011 robust reading competition datasets show

that our method yields the state-of-the-art performance both in speed and accuracy.

ETPL

DIP-197 A Robust Method for Rotation Estimation Using Spherical Harmonics Representation

Abstract: This paper presents a robust method for 3D object rotation estimation using spherical harmonics

representation and the unit quaternion vector. The proposed method provides a closed-form solution for

rotation estimation without recurrence relations or searching for point correspondences between two objects. The rotation estimation problem is casted as a minimization problem, which finds the optimum

rotation angles between two objects of interest in the frequency domain. The optimum rotation angles are

obtained by calculating the unit quaternion vector from a symmetric matrix, which is constructed from

the two sets of spherical harmonics coefficients using eigendecomposition technique. Our experimental results on hundreds of 3D objects show that our proposed method is very accurate in rotation estimation,

robust to noisy data, missing surface points, and can handle intra-class variability between 3D objects.

ETPL Synthetic Aperture Radar Autofocus via Semidefinite Relaxation




DIP-198

Abstract: The autofocus problem in synthetic aperture radar imaging amounts to estimating unknown phase errors caused by unknown platform or target motion. At the heart of three state-of-the-art autofocus

algorithms, namely, phase gradient autofocus, multichannel autofocus (MCA), and Fourier-domain

multichannel autofocus (FMCA), is the solution of a constant modulus quadratic program (CMQP).

Currently, these algorithms solve a CMQP by using an eigenvalue relaxation approach. We propose an alternative relaxation approach based on semidefinite programming, which has recently attracted

considerable attention in other signal processing problems. Experimental results show that our proposed

methods provide promising performance improvements for MCA and FMCA through an increase in computational complexity.

ETPL

DIP-199

Regional Spatially Adaptive Total Variation Super-Resolution With Spatial

Information Filtering and Clustering

Abstract: Total variation is used as a popular and effective image prior model in the regularization-based

image processing fields. However, as the total variation model favors a piecewise constant solution, the

processing result under high noise intensity in the flat regions of the image is often poor, and some pseudoedges are produced. In this paper, we develop a regional spatially adaptive total variation model.

Initially, the spatial information is extracted based on each pixel, and then two filtering processes are

added to suppress the effect of pseudoedges. In addition, the spatial information weight is constructed and classified with k-means clustering, and the regularization strength in each region is controlled by the

clustering center value. The experimental results, on both simulated and real datasets, show that the

proposed approach can effectively reduce the pseudoedges of the total variation regularization in the flat

regions, and maintain the partial smoothness of the high-resolution image. More importantly, compared with the traditional pixel-based spatial information adaptive approach, the proposed region-based spatial

information adaptive total variation model can better avoid the effect of noise on the spatial information

extraction, and maintains robustness with changes in the noise intensity in the super-resolution process.

ETPL

DIP-200

Detecting, Grouping, and Structure Inference for Invariant Repetitive Patterns in

Images

Abstract: The efficient and robust extraction of invariant patterns from an image is a long-standing

problem in computer vision. Invariant structures are often related to repetitive or near-repetitive patterns.

The perception of repetitive patterns in an image is strongly linked to the visual interpretation and composition of textures. Repetitive patterns are products of both repetitive structures as well as repetitive

reflections or color patterns. In other words, patterns that exhibit near-stationary behavior provide rich

information about objects, their shapes, and their texture in an image. In this paper, we propose a new algorithm for repetitive pattern detection and grouping. The algorithm follows the classical region

growing image segmentation scheme. It utilizes a mean-shift-like dynamic to group local image patches

into clusters. It exploits a continuous joint alignment to: 1) match similar patches, and 2) refine the subspace grouping. We also propose an algorithm for inferring the composition structure of the repetitive

patterns. The inference algorithm constructs a data-driven structural completion field, which merges the

detected repetitive patterns into specific global geometric structures. The result of higher level grouping

for image patterns can be used to infer the geometry of objects and estimate the general layout of a crowded scene.

ETPL

DIP-201 Compressive Framework for Demosaicing of Natural Images

Abstract: Typical consumer digital cameras sense only one out of three color components per image pixel. The problem of demosaicing deals with interpolating those missing color components. In this




paper, we present compressive demosaicing (CD), a framework for demosaicing natural images based on

the theory of compressed sensing (CS). Given sensed samples of an image, CD employs a CS solver to

find the sparse representation of that image under a fixed sparsifying dictionary Ψ. As opposed to state of the art CS-based demosaicing approaches, we consider a clear distinction between the interchannel

(color) and interpixel correlations of natural images. Utilizing some well-known facts about the human

visual system, those two types of correlations are utilized in a nonseparable format to construct the sparsifying transform Ψ. Our simulation results verify that CD performs better (both visually and in terms

of PSNR) than leading demosaicing approaches when applied to the majority of standard test images.

ETPL

DIP-202

Locally Optimal Detection of Image Watermarks in the Wavelet Domain Using Bessel

K Form Distribution

Abstract: A uniformly most powerful watermark detector, which applies the Bessel K form (BKF) probability density function to model the noise distribution was proposed by Bian and Liang. In this

paper, we derive a locally optimum (LO) detector using the same noise model. Since the literature lacks

thorough discussion on the performance of the BKF-LO nonlinearities, the performance of the proposed detector is discussed in detail. First, we prove that the test statistic of the proposed detector is

asymptotically Gaussian and evaluate the actual performance of the proposed detector using the receiver

operating characteristic (ROC). Then, the large sample performance of the proposed detector is evaluated

using asymptotic relative efficiency (ARE) and “maximum ARE.” The experimental results show that the proposed detector has a good performance with or without attacks in terms of its ROC curves, particularly

when the watermark is weak. Therefore, the proposed method is suitable for wavelet domain watermark

detection, particularly when the watermark is weak.

ETPL

DIP-203

Estimating the Granularity Coefficient of a Potts-Markov Random Field Within a

Markov Chain Monte Carlo Algorithm

Abstract: This paper addresses the problem of estimating the Potts parameter β jointly with the unknown

parameters of a Bayesian model within a Markov chain Monte Carlo (MCMC) algorithm. Standard

MCMC methods cannot be applied to this problem because performing inference on β requires computing the intractable normalizing constant of the Potts model. In the proposed MCMC method, the

estimation of β is conducted using a likelihood-free Metropolis-Hastings algorithm. Experimental results

obtained for synthetic data show that estimating β jointly with the other unknown parameters leads to estimation results that are as good as those obtained with the actual value of β. On the other hand,

choosing an incorrect value of β can degrade estimation performance significantly. To illustrate the

interest of this method, the proposed algorithm is successfully applied to real bidimensional SAR and tridimensional ultrasound images

ETPL

DIP-204 Atmospheric Turbulence Mitigation Using Complex Wavelet-Based Fusion

Abstract: Restoring a scene distorted by atmospheric turbulence is a challenging problem in video

surveillance. The effect, caused by random, spatially varying, perturbations, makes a model-based solution difficult and in most cases, impractical. In this paper, we propose a novel method for mitigating

the effects of atmospheric distortion on observed images, particularly airborne turbulence which can

severely degrade a region of interest (ROI). In order to extract accurate detail about objects behind the distorting layer, a simple and efficient frame selection method is proposed to select informative ROIs

only from good-quality frames. The ROIs in each frame are then registered to further reduce offsets and

distortions. We solve the space-varying distortion problem using region-level fusion based on the dual

tree complex wavelet transform. Finally, contrast enhancement is applied. We further propose a learning-based metric specifically for image quality assessment in the presence of atmospheric distortion. This is

capable of estimating quality in both full- and no-reference scenarios. The proposed method is shown to

significantly outperform existing methods, providing enhanced situational awareness in a range of




surveillance scenarios.

ETPL

DIP-205 Rotation Invariant Local Frequency Descriptors for Texture Classification

Abstract: This paper presents a novel rotation invariant method for texture classification based on local

frequency components. The local frequency components are computed by applying 1-D Fourier transform

on a neighboring function defined on a circle of radius R at each pixel. We observed that the low frequency components are the major constituents of the circular functions and can effectively represent

textures. Three sets of features are extracted from the low frequency components, two based on the phase

and one based on the magnitude. The proposed features are invariant to rotation and linear changes of

illumination. Moreover, by using low frequency components, the proposed features are very robust to noise. While the proposed method uses a relatively small number of features, it outperforms state-of-the-

art methods in three well-known datasets: Brodatz, Outex, and CUReT. In addition, the proposed method

is very robust to noise and can remarkably improve the classification accuracy especially in the presence of high levels of noise.

ETPL

DIP-206 Scanned Document Compression Using Block-Based Hybrid Video Codec

Abstract: This paper proposes a hybrid pattern matching/transform-based compression method for scanned documents. The idea is to use regular video interframe prediction as a pattern matching

algorithm that can be applied to document coding. We show that this interpretation may generate residual

data that can be efficiently compressed by a transform-based encoder. The efficiency of this approach is

demonstrated using H.264/advanced video coding (AVC) as a high-quality single and multipage document compressor. The proposed method, called advanced document coding (ADC), uses segments of

the originally independent scanned pages of a document to create a video sequence, which is then

encoded through regular H.264/AVC. The encoding performance is unrivaled. Results show that ADC outperforms AVC-I (H.264/AVC operating in pure intramode) and JPEG2000 by up to 2.7 and 6.2 dB,

respectively. Superior subjective quality is also achieved.

ETPL

DIP-207 Space-Time Hole Filling With Random Walks in View Extrapolation for 3D Video

Abstract: In this paper, a space-time hole filling approach is presented to deal with a disocclusion when a view is synthesized for the 3D video. The problem becomes even more complicated when the view is

extrapolated from a single view, since the hole is large and has no stereo depth cues. Although many

techniques have been developed to address this problem, most of them focus only on view interpolation. We propose a space-time joint filling method for color and depth videos in view extrapolation. For proper

texture and depth to be sampled in the following hole filling process, the background of a scene is

automatically segmented by the random walker segmentation in conjunction with the hole formation process. Then, the patch candidate selection process is formulated as a labeling problem, which can be

solved with random walks. The patch candidates that best describe the hole region are dynamically

selected in the space-time domain, and the hole is filled with the optimal patch for ensuring both spatial

and temporal coherence. The experimental results show that the proposed method is superior to state-of-the-art methods and provides both spatially and temporally consistent results with significantly reduced

flicker artifacts.

ETPL

DIP-208 Rate Control for Consistent Objective Quality in High Efficiency Video Coding

Abstract: Since video quality fluctuation degrades the visual perception significantly in multimedia

communication systems, it is important to maintain a consistent objective quality over the entire video

sequence. We propose a rate control algorithm to keep the consistent objective quality in high efficiency

video coding (HEVC), which is an upcoming standard video codec. In the proposed algorithm, the




probability density function of transformed coefficients is modeled based on a Laplacian function that

considers the quadtree coding unit structure, which is one of the characteristics of HEVC. In controlling

the video quality, distortion-quantization and rate-quantization models are derived by using the Laplacian function. Based on those models, a quantization parameter is determined to control the quality of the

encoded frames where the fluctuation of video quality is minimized and the overflow and underflow of

buffer are prevented. From the simulation results, it is shown that the proposed rate control algorithm outperforms the other conventional schemes.

ETPL

DIP-209

Discrete Wavelet Transform and Data Expansion Reduction in Homomorphic

Encrypted Domain

Abstract: Signal processing in the encrypted domain is a new technology with the goal of protecting

valuable signals from insecure signal processing. In this paper, we propose a method for implementing discrete wavelet transform (DWT) and multiresolution analysis (MRA) in homomorphic encrypted

domain. We first suggest a framework for performing DWT and inverse DWT (IDWT) in the encrypted

domain, then conduct an analysis of data expansion and quantization errors under the framework. To solve the problem of data expansion, which may be very important in practical applications, we present a

method for reducing data expansion in the case that both DWT and IDWT are performed. With the

proposed method, multilevel DWT/IDWT can be performed with less data expansion in homomorphic

encrypted domain. We propose a new signal processing procedure, where the multiplicative inverse method is employed as the last step to limit the data expansion. Taking a 2-D Haar wavelet transform as

an example, we conduct a few experiments to demonstrate the advantages of our method in secure image

processing. We also provide computational complexity analyses and comparisons. To the best of our knowledge, there has been no report on the implementation of DWT and MRA in the encrypted domain.

ETPL

DIP-210 QoE-Based Multi-Exposure Fusion in Hierarchical Multivariate Gaussian CRF

Abstract: Many state-of-the-art fusion methods, combining details in images taken under different

exposures into one well-exposed image, can be found in the literature. However, insufficient study has been conducted to explore how perceptual factors can provide viewers better quality of experience on

fused images. We propose two perceptual quality measures: perceived local contrast and color saturation,

which are embedded in our novel hierarchical multivariate Gaussian conditional random field model, to illustrate improved performance for multi-exposure fusion. We show that our method generates images

with better quality than existing methods for a variety of scenes.

ETPL

DIP-211 Action Recognition From Video Using Feature Covariance Matrices

Abstract: We propose a general framework for fast and accurate recognition of actions in video using

empirical covariance matrices of features. A dense set of spatio-temporal feature vectors are computed

from video to provide a localized description of the action, and subsequently aggregated in an empirical

covariance matrix to compactly represent the action. Two supervised learning methods for action recognition are developed using feature covariance matrices. Common to both methods is the

transformation of the classification problem in the closed convex cone of covariance matrices into an

equivalent problem in the vector space of symmetric matrices via the matrix logarithm. The first method applies nearest-neighbor classification using a suitable Riemannian metric for covariance matrices. The

second method approximates the logarithm of a query covariance matrix by a sparse linear combination

of the logarithms of training covariance matrices. The action label is then determined from the sparse

coefficients. Both methods achieve state-of-the-art classification performance on several datasets, and are robust to action variability, viewpoint changes, and low object resolution. The proposed framework is

conceptually simple and has low storage and computational requirements making it attractive for real-

time implementation.




ETPL

DIP-212 2-D Wavelet Packet Spectrum for Texture Analysis

Abstract: This brief derives a 2-D spectrum estimator from some recent results on the statistical properties

of wavelet packet coefficients of random processes. It provides an analysis of the bias of this estimator with respect to the wavelet order. This brief also discusses the performance of this wavelet-based

estimator, in comparison with the conventional 2-D Fourier-based spectrum estimator on texture analysis

and content-based image retrieval. It highlights the effectiveness of the wavelet-based spectrum estimation.

ETPL

DIP-213

UND: Unite-and-Divide Method in Fourier and Radon Domains for Line Segment

Detection

Abstract: In this paper, we extend our previously proposed line detection method to line segmentation

using a so-called unite-and-divide (UND) approach. The methodology includes two phases, namely the union of spectra in the frequency domain, and the division of the sinogram in Radon space. In the union

phase, given an image, its sinogram is obtained by parallel 2D multilayer Fourier transforms, Cartesian-

to-polar mapping and 1D inverse Fourier transform. In the division phase, the edges of butterfly wings in the neighborhood of every sinogram peak are firstly specified, with each neighborhood area

corresponding to a window in image space. By applying the separated sinogram of each such windowed

image, we can extract the line segments. The division Phase identifies the edges of butterfly wings in the neighborhood of every sinogram peak such that each neighborhood area corresponds to a window in

image space. Line segments are extracted by applying the separated sinogram of each windowed image.

Our experiments are conducted on benchmark images and the results reveal that the UND method yields

higher accuracy, has lower computational cost and is more robust to noise, compared to existing state-of-the-art methods.

ETPL

DIP-214

Stable Orthogonal Local Discriminant Embedding for Linear Dimensionality

Reduction

Abstract: Manifold learning is widely used in machine learning and pattern recognition. However, manifold learning only considers the similarity of samples belonging to the same class and ignores the

within-class variation of data, which will impair the generalization and stableness of the algorithms. For

this purpose, we construct an adjacency graph to model the intraclass variation that characterizes the most

important properties, such as diversity of patterns, and then incorporate the diversity into the discriminant objective function for linear dimensionality reduction. Finally, we introduce the orthogonal constraint for

the basis vectors and propose an orthogonal algorithm called stable orthogonal local discriminate

embedding. Experimental results on several standard image databases demonstrate the effectiveness of the proposed dimensionality reduction approach

ETPL

DIP-215 Motion-Aware Gradient Domain Video Composition

Abstract: For images, gradient domain composition methods like Poisson blending offer practical solutions for uncertain object boundaries and differences in illumination conditions. However, adapting

Poisson image blending to video presents new challenges due to the added temporal dimension. In video,

the human eye is sensitive to small changes in blending boundaries across frames and slight differences in

motions of the source patch and target video. We present a novel video blending approach that tackles these problems by merging the gradient of source and target videos and optimizing a consistent blending

boundary based on a user-provided blending trimap for the source video. Our approach extends mean-

value coordinates interpolation to support hybrid blending with a dynamic boundary while maintaining interactive performance. We also provide a user interface and source object positioning method that can

efficiently deal with complex video sequences beyond the capabilities of alpha blending.

ETPL Structural Texture Similarity Metrics for Image Analysis and Retrieval




DIP-216

Abstract: We develop new metrics for texture similarity that accounts for human visual perception and the stochastic nature of textures. The metrics rely entirely on local image statistics and allow substantial

point-by-point deviations between textures that according to human judgment are essentially identical.

The proposed metrics extend the ideas of structural similarity and are guided by research in texture

analysis-synthesis. They are implemented using a steerable filter decomposition and incorporate a concise set of subband statistics, computed globally or in sliding windows. We conduct systematic tests to

investigate metric performance in the context of “known-item search,” the retrieval of textures that are

“identical” to the query texture. This eliminates the need for cumbersome subjective tests, thus enabling comparisons with human performance on a large database. Our experimental results indicate that the

proposed metrics outperform peak signal-to-noise ratio (PSNR), structural similarity metric (SSIM) and

its variations, as well as state-of-the-art texture classification metrics, using standard statistical measures.

ETPL

DIP-217 Simultaneous Facial Feature Tracking and Facial Expression Recognition

Abstract: The tracking and recognition of facial activities from images or videos have attracted great

attention in computer vision field. Facial activities are characterized by three levels. First, in the bottom

level, facial feature points around each facial component, i.e., eyebrow, mouth, etc., capture the detailed face shape information. Second, in the middle level, facial action units, defined in the facial action coding

system, represent the contraction of a specific set of facial muscles, i.e., lid tightener, eyebrow raiser, etc.

Finally, in the top level, six prototypical facial expressions represent the global facial muscle movement and are commonly used to describe the human emotion states. In contrast to the mainstream approaches,

which usually only focus on one or two levels of facial activities, and track (or recognize) them

separately, this paper introduces a unified probabilistic framework based on the dynamic Bayesian

network to simultaneously and coherently represent the facial evolvement in different levels, their interactions and their observations. Advanced machine learning methods are introduced to learn the

model based on both training data and subjective prior knowledge. Given the model and the

measurements of facial motions, all three levels of facial activities are simultaneously recognized through a probabilistic inference. Extensive experiments are performed to illustrate the feasibility and

effectiveness of the proposed model on all three level facial activities.

ETPL

DIP-218

A Generalized Random Walk With Restart and its Application in Depth Up-Sampling

and Interactive Segmentation

Abstract: In this paper, the origin of random walk with restart (RWR) and its generalization are described. It is well known that the random walk (RW) and the anisotropic diffusion models share the same energy

functional, i.e., the former provides a steady-state solution and the latter gives a flow solution. In contrast,

the theoretical background of the RWR scheme is different from that of the diffusion-reaction equation, although the restarting term of the RWR plays a role similar to the reaction term of the diffusion-reaction

equation. The behaviors of the two approaches with respect to outliers reveal that they possess different

attributes in terms of data propagation. This observation leads to the derivation of a new energy functional, where both volumetric heat capacity and thermal conductivity are considered together, and

provides a common framework that unifies both the RW and the RWR approaches, in addition to other

regularization methods. The proposed framework allows the RWR to be generalized (GRWR) in

semilocal and nonlocal forms. The experimental results demonstrate the superiority of GRWR over existing regularization approaches in terms of depth map up-sampling and interactive image

segmentation.

ETPL

DIP-219 Variational Optical Flow Estimation Based on Stick Tensor Voting

Abstract: Variational optical flow techniques allow the estimation of flow fields from spatio-temporal




derivatives. They are based on minimizing a functional that contains a data term and a regularization

term. Recently, numerous approaches have been presented for improving the accuracy of the estimated

flow fields. Among them, tensor voting has been shown to be particularly effective in the preservation of flow discontinuities. This paper presents an adaptation of the data term by using anisotropic stick tensor

voting in order to gain robustness against noise and outliers with significantly lower computational cost

than (full) tensor voting. In addition, an anisotropic complementary smoothness term depending on directional information estimated through stick tensor voting is utilized in order to preserve discontinuity

capabilities of the estimated flow fields. Finally, a weighted non-local term that depends on both the

estimated directional information and the occlusion state of pixels is integrated during the optimization

process in order to denoise the final flow field. The proposed approach yields state-of-the-art results on the Middlebury benchmark.

ETPL

DIP-220 Exploring Visual and Motion Saliency for Automatic Video Object Extraction

Abstract: This paper presents a saliency-based video object extraction (VOE) framework. The proposed framework aims to automatically extract foreground objects of interest without any user interaction or the

use of any training data (i.e., not limited to any particular type of object). To separate foreground and

background regions within and across video frames, the proposed method utilizes visual and motion

saliency information extracted from the input video. A conditional random field is applied to effectively combine the saliency induced features, which allows us to deal with unknown pose and scale variations of

the foreground object (and its articulated parts). Based on the ability to preserve both spatial continuity

and temporal consistency in the proposed VOE framework, experiments on a variety of videos verify that our method is able to produce quantitatively and qualitatively satisfactory VOE results.

ETPL

DIP-221 Enhanced Compressed Sensing Recovery With Level Set Normals

Abstract: We propose a compressive sensing algorithm that exploits geometric properties of images to

recover images of high quality from few measurements. The image reconstruction is done by iterating the two following steps: 1) estimation of normal vectors of the image level curves, and 2) reconstruction of

an image fitting the normal vectors, the compressed sensing measurements, and the sparsity constraint.

The proposed technique can naturally extend to nonlocal operators and graphs to exploit the repetitive nature of textured images to recover fine detail structures. In both cases, the problem is reduced to a

series of convex minimization problems that can be efficiently solved with a combination of variable

splitting and augmented Lagrangian methods, leading to fast and easy-to-code algorithms. Extended experiments show a clear improvement over related state-of-the-art algorithms in the quality of the

reconstructed images and the robustness of the proposed method to noise, different kind of images, and

reduced measurements.

ETPL

DIP-222 Colorization-Based Compression Using Optimization

Abstract: In this paper, we formulate the colorization-based coding problem into an optimization

problem, i.e., an L1 minimization problem. In colorization-based coding, the encoder chooses a few

representative pixels (RP) for which the chrominance values and the positions are sent to the decoder, whereas in the decoder, the chrominance values for all the pixels are reconstructed by colorization

methods. The main issue in colorization-based coding is how to extract the RP well therefore the

compression rate and the quality of the reconstructed color image becomes good. By formulating the

colorization-based coding into an L1 minimization problem, it is guaranteed that, given the colorization matrix, the chosen set of RP becomes the optimal set in the sense that it minimizes the error between the

original and the reconstructed color image. In other words, for a fixed error value and a given colorization

matrix, the chosen set of RP is the smallest set possible. We also propose a method to construct the




colorization matrix that colorizes the image in a multiscale manner. This, combined with the proposed RP

extraction method, allows us to choose a very small set of RP. It is shown experimentally that the

proposed method outperforms conventional colorization-based coding methods as well as the JPEG standard and is comparable with the JPEG2000 compression standard, both in terms of the compression

rate and the quality of the reconstructed color image.

ETPL

DIP-223

Orientation Imaging Microscopy With Optimized Convergence Angle Using CBED

Patterns in TEMs

Abstract: Grain size statistics, texture, and grain boundary distribution are microstructural characteristics that greatly influence materials properties. These characteristics can be derived from an orientation map

obtained using orientation imaging microscopy (OIM) techniques. The OIM techniques are generally

performed using a transmission electron microscopy (TEM) for nanomaterials. Although some of these techniques have limited applicability in certain situations, others have limited availability because of

external hardware required. In this paper, an automated method to generate orientation maps using

convergence beam electron diffraction patterns obtained in a conventional TEM setup is presented. This method is based upon dynamical diffraction theory that describes electron diffraction more accurately as

compared with kinematical theory used by several existing OIM techniques. In addition, the method of

this paper uses wide angle convergent beam electron diffraction for performing OIM. It is shown in this

paper that the use of the wide angle convergent electron beam provides additional information that is not available otherwise. Together, the presented method exploits the additional information and combines it

with the calculations from the dynamical theory to provide accurate orientation maps in a conventional

TEM setup. The automated method of this paper is applied to a platinum thin film sample. The presented method correctly identified the texture preference in the sample.

ETPL

DIP-224

Grassmannian Regularized Structured Multi-View Embedding for Image

Classification

Abstract: Images are usually represented by features from multiple views, e.g., color and texture. In

image classification, the goal is to fuse all the multi-view features in a reasonable manner and achieve satisfactory classification performance. However, the features are often different in nature and it is

nontrivial to fuse them. Particularly, some extracted features are redundant or noisy and are consequently

not discriminative for classification. To alleviate these problems in an image classification context, we propose in this paper a novel multi-view embedding framework, termed as Grassmannian regularized

structured multi-view embedding, or GrassReg for short. GrassReg transfers the graph Laplacian obtained

from each view to a point on the Grassmann manifold and penalizes the disagreement between different views according to Grassmannian distance. Therefore, a view that is consistent with others is more

important than a view that disagrees with others for learning a unified subspace for multi-view data

representation. In addition, we impose the group sparsity penalty onto the low-dimensional embeddings

obtained hence they can better explore the group structure of the intrinsic data distribution. Empirically, we compare GrassReg with representative multi-view algorithms and show the effectiveness of GrassReg

on a number of multi-view image data sets.

ETPL

DIP-225

Efficient Minimum Error Bounded Particle Resampling L1 Tracker With Occlusion

Detection

Abstract: Recently, sparse representation has been applied to visual tracking to find the target with the

minimum reconstruction error from a target template subspace. Though effective, these L1 trackers

require high computational costs due to numerous calculations for l1 minimization. In addition, the

inherent occlusion insensitivity of the l1 minimization has not been fully characterized. In this paper, we propose an efficient L1 tracker, named bounded particle resampling (BPR)-L1 tracker, with a minimum

error bound and occlusion detection. First, the minimum error bound is calculated from a linear least

squares equation and serves as a guide for particle resampling in a particle filter (PF) framework. Most of




the insignificant samples are removed before solving the computationally expensive l1 minimization in a

two-step testing. The first step, named τ testing, compares the sample observation likelihood to an

ordered set of thresholds to remove insignificant samples without loss of resampling precision. The second step, named max testing, identifies the largest sample probability relative to the target to further

remove insignificant samples without altering the tracking result of the current frame. Though sacrificing

minimal precision during resampling, max testing achieves significant speed up on top of τ testing. The BPR-L1 technique can also be beneficial to other trackers that have minimum error bounds in a PF

framework, especially for trackers based on sparse representations. After the error-bound calculation,

BPR-L1 performs occlusion detection by investigating the trivial coefficients in the l1 minimization.

These coefficients, by design, contain rich information about image corruptions, including occlusion. Detected occlusions are then used to enhance the template updating. For evaluation, we conduct

experiments on three video applications: biometrics (head movement, hand hold- ng object, singers on

stage), pedestrians (urban travel, hallway monitoring), and cars in traffic (wide area motion imagery, ground-mounted perspectives). The proposed BPR-L1 method demonstrates an excellent performance as

compared with nine state-of-the-art trackers on eleven challenging benchmark sequences.

ETPL

DIP-226 Multiview Hessian Regularization for Image Annotation

Abstract: The rapid development of computer hardware and Internet technology makes large scale data dependent models computationally tractable, and opens a bright avenue for annotating images through

innovative machine learning algorithms. Semisupervised learning (SSL) therefore received intensive

attention in recent years and was successfully deployed in image annotation. One representative work in SSL is Laplacian regularization (LR), which smoothes the conditional distribution for classification along

the manifold encoded in the graph Laplacian, however, it is observed that LR biases the classification

function toward a constant function that possibly results in poor generalization. In addition, LR is developed to handle uniformly distributed data (or single-view data), although instances or objects, such

as images and videos, are usually represented by multiview features, such as color, shape, and texture. In

this paper, we present multiview Hessian regularization (mHR) to address the above two problems in LR-

based image annotation. In particular, mHR optimally combines multiple HR, each of which is obtained from a particular view of instances, and steers the classification function that varies linearly along the

data manifold. We apply mHR to kernel least squares and support vector machines as two examples for

image annotation. Extensive experiments on the PASCAL VOC'07 dataset validate the effectiveness of mHR by comparing it with baseline algorithms, including LR and HR.

ETPL

DIP-227

GPU Accelerated Edge-Region Based Level Set Evolution Constrained by 2D Gray-

Scale Histogram

Abstract: Due to its intrinsic nature which allows to easily handle complex shapes and topological

changes, the level set method (LSM) has been widely used in image segmentation. Nevertheless, LSM is computationally expensive, which limits its applications in real-time systems. For this purpose, we

propose a new level set algorithm, which uses simultaneously edge, region, and 2D histogram

information in order to efficiently segment objects of interest in a given scene. The computational complexity of the proposed LSM is greatly reduced by using the highly parallelizable lattice Boltzmann

method (LBM) with a body force to solve the level set equation (LSE). The body force is the link with

image data and is defined from the proposed LSE. The proposed LSM is then implemented using an NVIDIA graphics processing units to fully take advantage of the LBM local nature. The new algorithm is

effective, robust against noise, independent to the initial contour, fast, and highly parallelizable. The edge

and region information enable to detect objects with and without edges, and the 2D histogram

information enable the effectiveness of the method in a noisy environment. Experimental results on synthetic and real images demonstrate subjectively and objectively the performance of the proposed




method.

ETPL

DIP-228 Sparse Stochastic Processes and Discretization of Linear Inverse Problems

Abstract: We present a novel statistically-based discretization paradigm and derive a class of maximum a

posteriori (MAP) estimators for solving ill-conditioned linear inverse problems. We are guided by the

theory of sparse stochastic processes, which specifies continuous-domain signals as solutions of linear stochastic differential equations. Accordingly, we show that the class of admissible priors for the

discretized version of the signal is confined to the family of infinitely divisible distributions. Our

estimators not only cover the well-studied methods of Tikhonov and l1-type regularizations as particular

cases, but also open the door to a broader class of sparsity-promoting regularization schemes that are typically nonconvex. We provide an algorithm that handles the corresponding nonconvex problems and

illustrate the use of our formalism by applying it to deconvolution, magnetic resonance imaging, and X-

ray tomographic reconstruction problems. Finally, we compare the performance of estimators associated with models of increasing sparsity.

ETPL

DIP-229

Sparse/DCT (S/DCT) Two-Layered Representation of Prediction Residuals for Video

Coding

Abstract: In this paper, we propose a cascaded sparse/DCT (S/DCT) two-layer representation of prediction residuals, and implement this idea on top of the state-of-the-art high efficiency video coding

(HEVC) standard. First, a dictionary is adaptively trained to contain featured patterns of residual signals

so that a high portion of energy in a structured residual can be efficiently coded via sparse coding. It is

observed that the sparse representation alone is less effective in the R-D performance due to the side information overhead at higher bit rates. To overcome this problem, the DCT representation is cascaded

at the second stage. It is applied to the remaining signal to improve coding efficiency. The two

representations successfully complement each other. It is demonstrated by experimental results that the proposed algorithm outperforms the HEVC reference codec HM5.0 in the Common Test Condition.

ETPL

DIP-230 Representing and Retrieving Video Shots in Human-Centric Brain Imaging Space

Abstract: Meaningful representation and effective retrieval of video shots in a large-scale database has

been a profound challenge for the image/video processing and computer vision communities. A great deal of effort has been devoted to the extraction of low-level visual features, such as color, shape, texture, and

motion for characterizing and retrieving video shots. However, the accuracy of these feature descriptors is

still far from satisfaction due to the well-known semantic gap. In order to alleviate the problem, this paper investigates a novel methodology of representing and retrieving video shots using human-centric high-

level features derived in brain imaging space (BIS) where brain responses to natural stimulus of video

watching can be explored and interpreted. At first, our recently developed dense individualized and common connectivity-based cortical landmarks (DICCCOL) system is employed to locate large-scale

functional brain networks and their regions of interests (ROIs) that are involved in the comprehension of

video stimulus. Then, functional connectivities between various functional ROI pairs are utilized as BIS

features to characterize the brain's comprehension of video semantics. Then an effective feature selection procedure is applied to learn the most relevant features while removing redundancy, which results in the

formation of the final BIS features. Afterwards, a mapping from low-level visual features to high-level

semantic features in the BIS is built via the Gaussian process regression (GPR) algorithm, and a manifold structure is then inferred, in which video key frames are represented by the mapped feature vectors in the

BIS. Finally, the manifold-ranking algorithm concerning the relationship among all data is applied to

measure the similarity between key frames of video shots. Experimental results on the TRECVID 2005

dataset demonstrate the superiority of the proposed work in comparison with traditional methods.

ETPL Multivariate Slow Feature Analysis and Decorrelation Filtering for Blind Source




DIP-231 Separation

Abstract: We generalize the method of Slow Feature Analysis (SFA) for vector-valued functions of several variables and apply it to the problem of blind source separation, in particular to image separation.

It is generally necessary to use multivariate SFA instead of univariate SFA for separating multi-

dimensional signals. For the linear case, an exact mathematical analysis is given, which shows in

particular that the sources are perfectly separated by SFA if and only if they and their first-order derivatives are uncorrelated. When the sources are correlated, we apply the following technique called

Decorrelation Filtering: use a linear filter to decorrelate the sources and their derivatives in the given

mixture, then apply the unmixing matrix obtained on the filtered mixtures to the original mixtures. If the filtered sources are perfectly separated by this matrix, so are the original sources. A decorrelation filter

can be numerically obtained by solving a nonlinear optimization problem. This technique can also be

applied to other linear separation methods, whose output signals are decorrelated, such as ICA. When there are more mixtures than sources, one can determine the actual number of sources by using a

regularized version of SFA with decorrelation filtering. Extensive numerical experiments using SFA and

ICA with decorrelation filtering, supported by mathematical analysis, demonstrate the potential of our

methods for solving problems involving blind source separation.

ETPL

DIP-232

Parameter Estimation for Blind and Non-Blind Deblurring Using Residual Whiteness

Measures

Abstract: Image deblurring (ID) is an ill-posed problem typically addressed by using regularization, or

prior knowledge, on the unknown image (and also on the blur operator, in the blind case). ID is often formulated as an optimization problem, where the objective function includes a data term encouraging the

estimated image (and blur, in blind ID) to explain the observed data well (typically, the squared norm of a

residual) plus a regularizer that penalizes solutions deemed undesirable. The performance of this

approach depends critically (among other things) on the relative weight of the regularizer (the regularization parameter) and on the number of iterations of the algorithm used to address the

optimization problem. In this paper, we propose new criteria for adjusting the regularization parameter

and/or the number of iterations of ID algorithms. The rationale is that if the recovered image (and blur, in blind ID) is well estimated, the residual image is spectrally white; contrarily, a poorly deblurred image

typically exhibits structured artifacts (e.g., ringing, oversmoothness), yielding residuals that are not

spectrally white. The proposed criterion is particularly well suited to a recent blind ID algorithm that uses continuation, i.e., slowly decreases the regularization parameter along the iterations; in this case,

choosing this parameter and deciding when to stop are one and the same thing. Our experiments show

that the proposed whiteness-based criteria yield improvements in SNR, on average, only 0.15 dB below

those obtained by (clairvoyantly) stopping the algorithm at the best SNR. We also illustrate the proposed criteria on non-blind ID, reporting results that are competitive with state-of-the-art criteria (such as Monte

Carlo-based GSURE and projected SURE), which, however, are not applicable for blind ID.

ETPL

DIP-233 Image Processing Using Smooth Ordering of its Patches

Abstract: We propose an image processing scheme based on reordering of its patches. For a given

corrupted image, we extract all patches with overlaps, refer to these as coordinates in high-dimensional

space, and order them such that they are chained in the “shortest possible path,” essentially solving the traveling salesman problem. The obtained ordering applied to the corrupted image implies a permutation

of the image pixels to what should be a regular signal. This enables us to obtain good recovery of the

clean image by applying relatively simple one-dimensional smoothing operations (such as filtering or interpolation) to the reordered set of pixels. We explore the use of the proposed approach to image

denoising and inpainting, and show promising results in both cases.




ETPL

DIP-234

Recursive Histogram Modification: Establishing Equivalency Between Reversible Data

Hiding and Lossless Data Compression

Abstract: State-of-the-art schemes for reversible data hiding (RDH) usually consist of two steps: first

construct a host sequence with a sharp histogram via prediction errors, and then embed messages by modifying the histogram with methods, such as difference expansion and histogram shift. In this paper,

we focus on the second stage, and propose a histogram modification method for RDH, which embeds the

message by recursively utilizing the decompression and compression processes of an entropy coder. We prove that, for independent identically distributed (i.i.d.) gray-scale host signals, the proposed method

asymptotically approaches the rate-distortion bound of RDH as long as perfect compression can be

realized, i.e., the entropy coder can approach entropy. Therefore, this method establishes the equivalency between reversible data hiding and lossless data compression. Experiments show that this coding method

can be used to improve the performance of previous RDH schemes and the improvements are more

significant for larger images.

ETPL

DIP-235 Optical Flow Estimation for Flame Detection in Videos

Abstract: Computational vision-based flame detection has drawn significant attention in the past decade

with camera surveillance systems becoming ubiquitous. Whereas many discriminating features, such as

color, shape, texture, etc., have been employed in the literature, this paper proposes a set of motion features based on motion estimators. The key idea consists of exploiting the difference between the

turbulent, fast, fire motion, and the structured, rigid motion of other objects. Since classical optical flow

methods do not model the characteristics of fire motion (e.g., non-smoothness of motion, non-constancy

of intensity), two optical flow methods are specifically designed for the fire detection task: optimal mass transport models fire with dynamic texture, while a data-driven optical flow scheme models saturated

flames. Then, characteristic features related to the flow magnitudes and directions are computed from the

flow fields to discriminate between fire and non-fire motion. The proposed features are tested on a large video database to demonstrate their practical usefulness. Moreover, a novel evaluation method is

proposed by fire simulations that allow for a controlled environment to analyze parameter influences,

such as flame saturation, spatial resolution, frame rate, and random noise.

ETPL

DIP-236 Image Sharpness Assessment Based on Local Phase Coherence

Abstract: Sharpness is an important determinant in visual assessment of image quality. The human visual

system is able to effortlessly detect blur and evaluate sharpness of visual images, but the underlying

mechanism is not fully understood. Existing blur/sharpness evaluation algorithms are mostly based on edge width, local gradient, or energy reduction of global/local high frequency content. Here we

understand the subject from a different perspective, where sharpness is identified as strong local phase

coherence (LPC) near distinctive image features evaluated in the complex wavelet transform domain. Previous LPC computation is restricted to be applied to complex coefficients spread in three consecutive

dyadic scales in the scale-space. Here we propose a flexible framework that allows for LPC computation

in arbitrary fractional scales. We then develop a new sharpness assessment algorithm without referencing

the original image. We use four subject-rated publicly available image databases to test the proposed algorithm, which demonstrates competitive performance when compared with state-of-the-art algorithms.

ETPL

DIP-237 Library-Based Illumination Synthesis for Critical CMOS Patterning

Abstract: In optical microlithography, the illumination source for critical complementary metal-oxide-semiconductor layers needs to be determined in the early stage of a technology node with very limited

design information, leading to simple binary shapes. Recently, the availability of freeform sources

permits us to increase pattern fidelity and relax mask complexities with minimal insertion risks to the




current manufacturing flow. However, source optimization across many patterns is often treated as a

design-of-experiments problem, which may not fully exploit the benefits of a freeform source. In this

paper, a rigorous source-optimization algorithm is presented via linear superposition of optimal sources for pre-selected patterns. We show that analytical solutions are made possible by using Hopkins

formulation and quadratic programming. The algorithm allows synthesized illumination to be linked with

assorted pattern libraries, which has a direct impact on design rule studies for early planning and design automation for full wafer optimization.

ETPL

DIP-238 A Variational Approach for Pan-Sharpening

Abstract: Pan-sharpening is a process of acquiring a high resolution multispectral (MS) image by

combining a low resolution MS image with a corresponding high resolution panchromatic (PAN) image. In this paper, we propose a new variational pan-sharpening method based on three basic assumptions: 1)

the gradient of PAN image could be a linear combination of those of the pan-sharpened image bands; 2)

the upsampled low resolution MS image could be a degraded form of the pan-sharpened image; and 3) the gradient in the spectrum direction of pan-sharpened image should be approximated to those of the

upsampled low resolution MS image. An energy functional, whose minimizer is related to the best pan-

sharpened result, is built based on these assumptions. We discuss the existence of minimizer of our

energy and describe the numerical procedure based on the split Bregman algorithm. To verify the effectiveness of our method, we qualitatively and quantitatively compare it with some state-of-the-art

schemes using QuickBird and IKONOS data. Particularly, we classify the existing quantitative measures

into four categories and choose two representatives in each category for more reasonable quantitative evaluation. The results demonstrate the effectiveness and stability of our method in terms of the related

evaluation benchmarks. Besides, the computation efficiency comparison with other variational methods

also shows that our method is remarkable.

ETPL

DIP-239

Reducing the Complexity of the N-FINDR Algorithm for Hyperspectral Image

Analysis

Abstract: The N-FINDR algorithm for unmixing hyperspectral data is both popular and successful. However, opportunities for improving the algorithm exist, particularly to reduce its computational

expense. Two approaches to achieve this are examined. First, the redundancy inherent in the determinant

calculations at the heart of N-FINDR is reduced using an LDU decomposition to form two new algorithms, one based on the original N-FINDR algorithm and one based on the closely related Sequential

N-FINDR algorithm. The second approach lowers complexity by reducing the repetition of the volume

calculations by removing pixels unlikely to represent pure materials. This is accomplished at no

additional cost through the reuse of the volume calculations inherent in the Sequential N-FINDR algorithm. Various thresholding methods for excluding pixels are considered. The impact of these

modifications on complexity and the accuracy is examined on simulated and real data showing that the

LDU-based approaches save considerable complexity, while pixel reduction methods, with appropriate threshold selection, can produce a favorable complexity-accuracy trade-off.

ETPL

DIP-240 3-D Curvilinear Structure Detection Filter Via Structure-Ball Analysis

Abstract: Curvilinear structure detection filters are crucial building blocks in many medical image

processing applications, where they are used to detect important structures, such as blood vessels, airways, and other similar fibrous tissues. Unfortunately, most of these filters are plagued by an implicit

single structure direction assumption, which results in a loss of signal around bifurcations. This

peculiarity limits the performance of all subsequent processes, such as understanding angiography




acquisitions, computing an accurate segmentation or tractography, or automatically classifying image

voxels. This paper presents a new 3-D curvilinear structure detection filter based on the analysis of the

structure ball, a geometric construction representing second order differences sampled in many directions. The structure ball is defined formally, and its computation on a discreet image is discussed. A contrast

invariant diffusion index easing voxel analysis and visualization is also introduced, and different structure

ball shape descriptors are proposed. A new curvilinear structure detection filter is defined based on the shape descriptors that best characterize curvilinear structures. The new filter produces a vesselness

measure that is robust to the presence of X- and Y-junctions along the structure by going beyond the

single direction assumption. At the same time, it stays conceptually simple and deterministic, and allows

for an intuitive representation of the structure's principal directions. Sample results are provided for synthetic images and for two medical imaging modalities.

ETPL

DIP-241 Image Fusion With Guided Filtering

Abstract: A fast and effective image fusion method is proposed for creating a highly informative fused image through merging multiple images. The proposed method is based on a two-scale decomposition of

an image into a base layer containing large scale variations in intensity, and a detail layer capturing small

scale details. A novel guided filtering-based weighted average technique is proposed to make full use of

spatial consistency for fusion of the base and detail layers. Experimental results demonstrate that the proposed method can obtain state-of-the-art performance for fusion of multispectral, multifocus,

multimodal, and multiexposure images.

ETPL

DIP-242 Global Propagation of Affine Invariant Features for Robust Matching

Abstract: Local invariant features have been successfully used in image matching to cope with viewpoint

change, partial occlusion, and clutters. However, when these factors become too strong, there will be a lot

of mismatches due to the limited repeatability and discriminative power of features. In this paper, we

present an efficient approach to remove the false matches and propagate the correct ones for the affine invariant features which represent the state-of-the-art local invariance. First, a pair-wise affine

consistency measure is proposed to evaluate the consensus of the matches of affine invariant regions. The

measure takes into account both the keypoint location and the region shape, size, and orientation. Based on this measure, a geometric filter is then presented which can efficiently remove the outliers from the

initial matches, and is robust to severe clutters and non-rigid deformation. To increase the correct

matches, we propose a global match refinement and propagation method that simultaneously finds a optimal group of local affine transforms to relate the features in two images. The global method is

capable of producing a quasi-dense set of matches even for the weakly textured surfaces that suffer strong

rigid transformation or non-rigid deformation. The strong capability of the proposed method in dealing

with significant viewpoint change, non-rigid deformation, and low-texture objects is demonstrated in experiments of image matching, object recognition, and image based rendering.

ETPL

DIP-243

Edge-SIFT: Discriminative Binary Descriptor for Scalable Partial-Duplicate Mobile

Search

Abstract: As the basis of large-scale partial duplicate visual search on mobile devices, image local descriptor is expected to be discriminative, efficient, and compact. Our study shows that the popularly

used histogram-based descriptors, such as scale invariant feature transform (SIFT) are not optimal for this

task. This is mainly because histogram representation is relatively expensive to compute on mobile

platforms and loses significant spatial clues, which are important for improving discriminative power and matching near-duplicate image patches. To address these issues, we propose to extract a novel binary

local descriptor named Edge-SIFT from the binary edge maps of scale- and orientation-normalized image

patches. By preserving both locations and orientations of edges and compressing the sparse binary edge




maps with a boosting strategy, the final Edge-SIFT shows strong discriminative power with compact

representation. Furthermore, we propose a fast similarity measurement and an indexing framework with

flexible online verification. Hence, the Edge-SIFT allows an accurate and efficient image search and is ideal for computation sensitive scenarios such as a mobile image search. Experiments on a large-scale

dataset manifest that the Edge-SIFT shows superior retrieval accuracy to Oriented BRIEF (ORB) and is

superior to SIFT in the aspects of retrieval precision, efficiency, compactness, and transmission cost.

ETPL

DIP-244 Parametric Generalized Linear System Based on the Notion of the T-Norm

Abstract: By using the triangular norm, we propose two methods for the construction of generalized

linear systems, and show new insights into the relationship between typical systems. Using the Hamacher

and Frank t-norm, we propose a parametric log-ratio model, which is a generalization of the log-ratio model and is more flexible in algorithmic development. We develop a generalized linear contrast

enhancement algorithm based on the proposed parametric log-ratio model. We show that the performance

of the proposed algorithm is effective and robust for different types of images.

ETPL

DIP-245 A Linear Support Higher-Order Tensor Machine for Classification

Abstract: There has been growing interest in developing more effective learning machines for tensor

classification. At present, most of the existing learning machines, such as support tensor machine (STM),

involve nonconvex optimization problems and need to resort to iterative techniques. Obviously, it is very time-consuming and may suffer from local minima. In order to overcome these two shortcomings, in this

paper, we present a novel linear support higher-order tensor machine (SHTM) which integrates the merits

of linear C-support vector machine (C-SVM) and tensor rank-one decomposition. Theoretically, SHTM is an extension of the linear C-SVM to tensor patterns. When the input patterns are vectors, SHTM

degenerates into the standard C-SVM. A set of experiments is conducted on nine second-order face

recognition datasets and three third-order gait recognition datasets to illustrate the performance of the

proposed SHTM. The statistic test shows that compared with STM and C-SVM with the RBF kernel, SHTM provides significant performance gain in terms of test accuracy and training speed, especially in

the case of higher-order tensors

ETPL

DIP-246

Novel True-Motion Estimation Algorithm and Its Application to Motion-Compensated

Temporal Frame Interpolation

Abstract: In this paper, a new low-complexity true-motion estimation (TME) algorithm is proposed for

video processing applications, such as motion-compensated temporal frame interpolation (MCTFI) or

motion-compensated frame rate up-conversion (MCFRUC). Regular motion estimation, which is often used in video coding, aims to find the motion vectors (MVs) to reduce the temporal redundancy, whereas

TME aims to track the projected object motion as closely as possible. TME is obtained by imposing

implicit and/or explicit smoothness constraints on the block-matching algorithm. To produce better

quality-interpolated frames, the dense motion field at interpolation time is obtained for both forward and backward MVs; then, bidirectional motion compensation using forward and backward MVs is applied by

mixing both elegantly. Finally, the performance of the proposed algorithm for MCTFI is demonstrated

against recently proposed methods and smoothness constraint optical flow employed by a professional video production suite. Experimental results show that the quality of the interpolated frames using the

proposed method is better when compared with the MCFRUC techniques.

ETPL

DIP-247 Motion Analysis Using 3D High-Resolution Frequency Analysis

Abstract: The spatiotemporal spectra of a video that contains a moving object form a plane in the 3D frequency domain. This plane, which is described as the theoretical motion plane, reflects the velocity of

the moving objects, which is calculated from the slope. However, if the resolution of the frequency




analysis method is not high enough to obtain actual spectra from the object signal, the spatiotemporal

spectra disperse away from the theoretical motion plane. In this paper, we propose a high-resolution

frequency analysis method, described as 3D nonharmonic analysis (NHA), which is only weakly influenced by the analysis window. In addition, we estimate the motion vectors of objects in a video using

the plane-clustering method, in conjunction with the least-squares method, for 3D NHA spatiotemporal

spectra. We experimentally verify the accuracy of the 3D NHA and its usefulness for a sequence containing complex motions, such as cross-over motion, through comparison with 3D fast Fourier

transform. The experimental results show that increasing the frequency resolution contributes to high-

accuracy estimation of a motion plane.

ETPL

DIP-248 Segment Adaptive Gradient Angle Interpolation

Abstract: We introduce a new edge-directed interpolator based on locally defined, straight line

approximations of image isophotes. Spatial derivatives of image intensity are used to describe the principal behavior of pixel-intersecting isophotes in terms of their slopes. The slopes are determined by

inverting a tridiagonal matrix and are forced to vary linearly from pixel-to-pixel within segments. Image

resizing is performed by interpolating along the approximated isophotes. The proposed method can

accommodate arbitrary scaling factors, provides state-of-the-art results in terms of PSNR as well as other quantitative visual quality metrics, and has the advantage of reduced computational complexity that is

directly proportional to the number of pixels.

ETPL

DIP-249

Fast Computation of Rotation-Invariant Image Features by an Approximate Radial

Gradient Transform

Abstract: We present the radial gradient transform (RGT) and a fast approximation, the approximate RGT

(ARGT). We analyze the effects of the approximation on gradient quantization and histogramming. The

ARGT is incorporated into the rotation-invariant fast feature (RIFF) algorithm. We demonstrate that,

using the ARGT, RIFF extracts features 16× faster than SURF while achieving a similar performance for image matching and retrieval.

ETPL

DIP-250 Image Completion by Diffusion Maps and Spectral Relaxation

Abstract: We present a framework for image inpainting that utilizes the diffusion framework approach to

spectral dimensionality reduction. We show that on formulating the inpainting problem in the embedding domain, the domain to be inpainted is smoother in general, particularly for the textured images. Thus, the

textured images can be inpainted through simple exemplar-based and variational methods. We discuss the

properties of the induced smoothness and relate it to the underlying assumptions used in contemporary

inpainting schemes. As the diffusion embedding is nonlinear and noninvertible, we propose a novel computational approach to approximate the inverse mapping from the inpainted embedding space to the

image domain. We formulate the mapping as a discrete optimization problem, solved through spectral

relaxation. The effectiveness of the presented method is exemplified by inpainting real images, where it is shown to compare favorably with contemporary state-of-the-art schemes.

ETPL

DIP-251

A Continuous Method for Reducing Interpolation Artifacts in Mutual Information-

Based Rigid Image Registration

Abstract: We propose an approach for computing mutual information in rigid multimodality image

registration. Images to be registered are modeled as functions defined on a continuous image domain. Analytic forms of the probability density functions for the images and the joint probability density

function are first defined in 1D. We describe how the entropies of the images, the joint entropy, and




mutual information can be computed accurately by a numerical method. We then extend the method to

2D and 3D. The mutual information function generated is smooth and does not seem to have the typical

interpolation artifacts that are commonly observed in other standard models. The relationship between the proposed method and the partial volume (PV) model is described. In addition, we give a theoretical

analysis to explain the nonsmoothness of the mutual information function computed by the PV model.

Numerical experiments in 2D and 3D are presented to illustrate the smoothness of the mutual information function, which leads to robust and accurate numerical convergence results for solving the image

registration problem

ETPL

DIP-252 Image Inpainting on the Basis of Spectral Structure From 2-D Nonharmonic Analysis

Abstract: The restoration of images by digital inpainting is an active field of research and such algorithms are, in fact, now widely used. Conventional methods generally apply textures that are most similar to the

areas around the missing region or use a large image database. However, this produces discontinuous

textures and thus unsatisfactory results. Here, we propose a new technique to overcome this limitation by using signal prediction based on the nonharmonic analysis (NHA) technique proposed by the authors.

NHA can be used to extract accurate spectra, irrespective of the window function, and its frequency

resolution is less than that of the discrete Fourier transform. The proposed method sequentially generates

new textures on the basis of the spectrum obtained by NHA. Missing regions from the spectrum are repaired using an improved cost function for 2D NHA. The proposed method is evaluated using the

standard images Lena, Barbara, Airplane, Pepper, and Mandrill. The results show an improvement in

MSE of about 10-20 compared with the examplar-based method and good subjective quality

ETPL

DIP-253 Linear Discriminant Analysis Based on L1-Norm Maximization

Abstract: Linear discriminant analysis (LDA) is a well-known dimensionality reduction technique, which

is widely used for many purposes. However, conventional LDA is sensitive to outliers because its

objective function is based on the distance criterion using L2-norm. This paper proposes a simple but effective robust LDA version based on L1-norm maximization, which learns a set of local optimal

projection vectors by maximizing the ratio of the L1-norm-based between-class dispersion and the L1-

norm-based within-class dispersion. The proposed method is theoretically proved to be feasible and robust to outliers while overcoming the singular problem of the within-class scatter matrix for

conventional LDA. Experiments on artificial datasets, standard classification datasets and three popular

image databases demonstrate the efficacy of the proposed method.

ETPL

DIP-254 Visual Tracking With Spatio-Temporal Dempster–Shafer Information Fusion

Abstract: A key problem in visual tracking is how to effectively combine spatio-temporal visual

information from throughout a video to accurately estimate the state of an object. We address this

problem by incorporating Dempster-Shafer (DS) information fusion into the tracking approach. To implement this fusion task, the entire image sequence is partitioned into spatially and temporally adjacent

subsequences. A support vector machine (SVM) classifier is trained for object/nonobject classification on

each of these subsequences, the outputs of which act as separate data sources. To combine the discriminative information from these classifiers, we further present a spatio-temporal weighted DS

(STWDS) scheme. In addition, temporally adjacent sources are likely to share discriminative information

on object/nonobject classification. To use such information, an adaptive SVM learning scheme is

designed to transfer discriminative information across sources. Finally, the corresponding DS belief function of the STWDS scheme is embedded into a Bayesian tracking model. Experimental results on

challenging videos demonstrate the effectiveness and robustness of the proposed tracking approach.

ETPL Dimensionality Reduction for Registration of High-Dimensional Data Sets




DIP-255

Abstract: Registration of two high-dimensional data sets often involves dimensionality reduction to yield a single-band image from each data set followed by pairwise image registration. We develop a new

application-specific algorithm for dimensionality reduction of high-dimensional data sets such that the

weighted harmonic mean of Cramer -Rao lower bounds for the estimation of the transformation

parameters for registration is minimized. The performance of the proposed dimensionality reduction algorithm is evaluated using three remotes sensing data sets. The experimental results using mutual

information-based pairwise registration technique demonstrate that our proposed dimensionality

reduction algorithm combines the original data sets to obtain the image pair with more texture, resulting in improved image registration.

ETPL

DIP-256

Multiple-Kernel, Multiple-Instance Similarity Features for Efficient Visual Object

Detection

Abstract: We propose to use the similarity between the sample instance and a number of exemplars as features in visual object detection. Concepts from multiple-kernel learning and multiple-instance learning

are incorporated into our scheme at the feature level by properly calculating the similarity. The similarity

between two instances can be measured by various metrics and by using the information from various

sources, which mimics the use of multiple kernels for kernel machines. Pooling of the similarity values from multiple instances of an object part is introduced to cope with alignment inaccuracy between object

instances. To deal with the high dimensionality of the multiple-kernel multiple-instance similarity feature,

we propose a forward feature-selection technique and a coarse-to-fine learning scheme to find a set of good exemplars, hence we can produce an efficient classifier while maintaining a good performance.

Both the feature and the learning technique have interesting properties. We demonstrate the performance

of our method using both synthetic data and real-world visual object detection data sets.

ETPL

DIP-257 Asymmetric Correlation: A Noise Robust Similarity Measure for Template Matching

Abstract: We present an efficient and noise robust template matching method based on asymmetric

correlation (ASC). The ASC similarity function is invariant to affine illumination changes and robust to

extreme noise. It correlates the given non-normalized template with a normalized version of each image window in the frequency domain. We show that this asymmetric normalization is more robust to noise

than other cross correlation variants, such as the correlation coefficient. Direct computation of ASC is

very slow, as a DFT needs to be calculated for each image window independently. To make the template

matching efficient, we develop a much faster algorithm, which carries out a prediction step in linear time and then computes DFTs for only a few promising candidate windows. We extend the proposed template

matching scheme to deal with partial occlusion and spatially varying light change. Experimental results

demonstrate the robustness of the proposed ASC similarity measure compared to state-of-the-art template matching methods.

ETPL

DIP-258

Deconvolving Images With Unknown Boundaries Using the Alternating Direction

Method of Multipliers

Abstract: The alternating direction method of multipliers (ADMM) has recently sparked interest as a flexible and efficient optimization tool for inverse problems, namely, image deconvolution and

reconstruction under non-smooth convex regularization. ADMM achieves state-of-the-art speed by

adopting a divide and conquer strategy, wherein a hard problem is split into simpler, efficiently solvable

sub-problems (e.g., using fast Fourier or wavelet transforms, or simple proximity operators). In deconvolution, one of these sub-problems involves a matrix inversion (i.e., solving a linear system),

which can be done efficiently (in the discrete Fourier domain) if the observation operator is circulant, i.e.,

under periodic boundary conditions. This paper extends ADMM-based image deconvolution to the more realistic scenario of unknown boundary, where the observation operator is modeled as the composition of




a convolution (with arbitrary boundary conditions) with a spatial mask that keeps only pixels that do not

depend on the unknown boundary. The proposed approach also handles, at no extra cost, problems that

combine the recovery of missing pixels (i.e., inpainting) with deconvolution. We show that the resulting algorithms inherit the convergence guarantees of ADMM and illustrate its performance on non-periodic

deblurring (with and without inpainting of interior pixels) under total-variation and frame-based

regularization.

ETPL

DIP-259

Integration of Gibbs Markov Random Field and Hopfield-Type Neural Networks for

Unsupervised Change Detection in Remotely Sensed Multitemporal Images

Abstract: In this paper, a spatiocontextual unsupervised change detection technique for multitemporal,

multispectral remote sensing images is proposed. The technique uses a Gibbs Markov random field

(GMRF) to model the spatial regularity between the neighboring pixels of the multitemporal difference image. The difference image is generated by change vector analysis applied to images acquired on the

same geographical area at different times. The change detection problem is solved using the maximum a

posteriori probability (MAP) estimation principle. The MAP estimator of the GMRF used to model the difference image is exponential in nature, thus a modified Hopfield type neural network (HTNN) is

exploited for estimating the MAP. In the considered Hopfield type network, a single neuron is assigned to

each pixel of the difference image and is assumed to be connected only to its neighbors. Initial values of

the neurons are set by histogram thresholding. An expectation-maximization algorithm is used to estimate the GMRF model parameters. Experiments are carried out on three-multispectral and multitemporal

remote sensing images. Results of the proposed change detection scheme are compared with those of the

manual-trial-and-error technique, automatic change detection scheme based on GMRF model and iterated conditional mode algorithm, a context sensitive change detection scheme based on HTNN, the GMRF

model, and a graph-cut algorithm. A comparison points out that the proposed method provides more

accurate change detection maps than other methods.

ETPL

DIP-260 SparCLeS: Dynamic Sparse Classifiers With Level Sets for Robust

Beard/Moustache Detection and Segmentation

Abstract: Robust facial hair detection and segmentation is a highly valued soft biometric attribute for carrying out forensic facial analysis. In this paper, we propose a novel and fully automatic system, called

SparCLeS, for beard/moustache detection and segmentation in challenging facial images. SparCLeS uses

the multiscale self-quotient (MSQ) algorithm to preprocess facial images and deal with illumination

variation. Histogram of oriented gradients (HOG) features are extracted from the preprocessed images and a dynamic sparse classifier is built using these features to classify a facial region as either containing

skin or facial hair. A level set based approach, which makes use of the advantages of both global and

local information, is then used to segment the regions of a face containing facial hair. Experimental results demonstrate the effectiveness of our proposed system in detecting and segmenting facial hair

regions in images drawn from three databases, i.e., the NIST Multiple Biometric Grand Challenge

(MBGC) still face database, the NIST Color Facial Recognition Technology FERET database, and the Labeled Faces in the Wild (LFW) database.

ETPL

DIP-261 Cross-Domain Object Recognition Via Input-Output Kernel Analysis

Abstract: It is of great importance to investigate the domain adaptation problem of image object

recognition, because now image data is available from a variety of source domains. To understand the changes in data distributions across domains, we study both the input and output kernel spaces for cross-

domain learning situations, where most labeled training images are from a source domain and testing

images are from a different target domain. To address the feature distribution change issue in the




reproducing kernel Hilbert space induced by vector-valued functions, we propose a domain adaptive

input-output kernel learning (DA-IOKL) algorithm, which simultaneously learns both the input and

output kernels with a discriminative vector-valued decision function by reducing the data mismatch and minimizing the structural error. We also extend the proposed method to the cases of having multiple

source domains. We examine two cross-domain object recognition benchmark data sets, and the proposed

method consistently outperforms the state-of-the-art domain adaptation and multiple kernel learning methods.

ETPL

DIP-262 Regularized Feature Reconstruction for Spatio-Temporal Saliency Detection

Abstract: Multimedia applications such as image or video retrieval, copy detection, and so forth can

benefit from saliency detection, which is essentially a method to identify areas in images and videos that capture the attention of the human visual system. In this paper, we propose a new spatio-temporal

saliency detection framework on the basis of regularized feature reconstruction. Specifically, for video

saliency detection, both the temporal and spatial saliency detection are considered. For temporal saliency, we model the movement of the target patch as a reconstruction process using the patches in neighboring

frames. A Laplacian smoothing term is introduced to model the coherent motion trajectories. With

psychological findings that abrupt stimulus could cause a rapid and involuntary deployment of attention,

our temporal model combines the reconstruction error, regularizer, and local trajectory contrast to measure the temporal saliency. For spatial saliency, a similar sparse reconstruction process is adopted to

capture the regions with high center-surround contrast. Finally, the temporal saliency and spatial saliency

are combined together to favor salient regions with high confidence for video saliency detection. We also apply the spatial saliency part of the spatio-temporal model to image saliency detection. Experimental

results on a human fixation video dataset and an image saliency detection dataset show that our method

achieves the best performance over several state-of-the-art approaches.

ETPL

DIP-263 Texture Enhanced Histogram Equalization Using TV- Image Decomposition

Abstract: Histogram transformation defines a class of image processing operations that are widely applied

in the implementation of data normalization algorithms. In this paper, we present a new variational

approach for image enhancement that is constructed to alleviate the intensity saturation effects that are introduced by standard contrast enhancement (CE) methods based on histogram equalization. In this

paper, we initially apply total variation (TV) minimization with a L1 fidelity term to decompose the input

image with respect to cartoon and texture components. Contrary to previous papers that rely solely on the information encompassed in the distribution of the intensity information, in this paper, the texture

information is also employed to emphasize the contribution of the local textural features in the CE

process. This is achieved by implementing a nonlinear histogram warping CE strategy that is able to

maximize the information content in the transformed image. Our experimental study addresses the CE of a wide variety of image data and comparative evaluations are provided to illustrate that our method

produces better results than conventional CE strategies.

ETPL

DIP-264 Gaussian Blurring-Invariant Comparison of Signals and Images

Abstract: We present a Riemannian framework for analyzing signals and images in a manner that is

invariant to their level of blurriness, under Gaussian blurring. Using a well known relation between

Gaussian blurring and the heat equation, we establish an action of the blurring group on image space and

define an orthogonal section of this action to represent and compare images at the same blur level. This comparison is based on geodesic distances on the section manifold which, in turn, are computed using a

path-straightening algorithm. The actual implementations use coefficients of images under a truncated

orthonormal basis and the blurring action corresponds to exponential decays of these coefficients. We




demonstrate this framework using a number of experimental results, involving 1D signals and 2D images.

As a specific application, we study the effect of blurring on the recognition performance when 2D facial

images are used for recognizing people.

ETPL

DIP-265 Fast SIFT Design for Real-Time Visual Feature Extraction

Abstract: Visual feature extraction with scale invariant feature transform (SIFT) is widely used for object

recognition. However, its real-time implementation suffers from long latency, heavy computation, and

high memory storage because of its frame level computation with iterated Gaussian blur operations. Thus, this paper proposes a layer parallel SIFT (LPSIFT) with integral image, and its parallel hardware design

with an on-the-fly feature extraction flow for real-time application needs. Compared with the original

SIFT algorithm, the proposed approach reduces the computational amount by 90% and memory usage by 95%. The final implementation uses 580-K gate count with 90-nm CMOS technology, and offers 6000

feature points/frame for VGA images at 30 frames/s and ~ 2000 feature points/frame for 1920 × 1080

images at 30 frames/s at the clock rate of 100 MHz.

ETPL

DIP-266 Artistic Image Analysis Using Graph-Based Learning Approaches

Abstract: We introduce a new methodology for the problem of artistic image analysis, which among other

tasks, involves the automatic identification of visual classes present in an art work. In this paper, we

advocate the idea that artistic image analysis must explore a graph that captures the network of artistic influences by computing the similarities in terms of appearance and manual annotation. One of the

novelties of our methodology is the proposed formulation that is a principled way of combining these two

similarities in a single graph. Using this graph, we show that an efficient random walk algorithm based on an inverted label propagation formulation produces more accurate annotation and retrieval results

compared with the following baseline algorithms: bag of visual words, label propagation, matrix

completion, and structural learning. We also show that the proposed approach leads to a more efficient

inference and training procedures. This experiment is run on a database containing 988 artistic images (with 49 visual classification problems divided into a multiclass problem with 27 classes and 48 binary

problems), where we show the inference and training running times, and quantitative comparisons with

respect to several retrieval and annotation performance measures.

ETPL

DIP-267

Self-Supervised Online Metric Learning With Low Rank Constraint for Scene

Categorization

Abstract: Conventional visual recognition systems usually train an image classifier in a bath mode with

all training data provided in advance. However, in many practical applications, only a small amount of training samples are available in the beginning and many more would come sequentially during online

recognition. Because the image data characteristics could change over time, it is important for the

classifier to adapt to the new data incrementally. In this paper, we present an online metric learning

method to address the online scene recognition problem via adaptive similarity measurement. Given a number of labeled data followed by a sequential input of unseen testing samples, the similarity metric is

learned to maximize the margin of the distance among different classes of samples. By considering the

low rank constraint, our online metric learning model not only can provide competitive performance compared with the state-of-the-art methods, but also guarantees convergence. A bi-linear graph is also

defined to model the pair-wise similarity, and an unseen sample is labeled depending on the graph-based

label propagation, while the model can also self-update using the more confident new samples. With the

ability of online learning, our methodology can well handle the large-scale streaming video data with the ability of incremental self-updating. We evaluate our model to online scene categorization and

experiments on various benchmark datasets and comparisons with state-of-the-art methods demonstrate

the effectiveness and efficiency of our algorithm.




ETPL

DIP-268 Nonlocal Regularization of Inverse Problems: A Unified Variational Framework

Abstract: We introduce a unifying energy minimization framework for nonlocal regularization of inverse

problems. In contrast to the weighted sum of square differences between image pixels used by current schemes, the proposed functional is an unweighted sum of inter-patch distances. We use robust distance

metrics that promote the averaging of similar patches, while discouraging the averaging of dissimilar

patches. We show that the first iteration of a majorize-minimize algorithm to minimize the proposed cost function is similar to current nonlocal methods. The reformulation thus provides a theoretical justification

for the heuristic approach of iterating nonlocal schemes, which re-estimate the weights from the current

image estimate. Thanks to the reformulation, we now understand that the widely reported alias amplification associated with iterative nonlocal methods are caused by the convergence to local minimum

of the nonconvex penalty. We introduce an efficient continuation strategy to overcome this problem. The

similarity of the proposed criterion to widely used nonquadratic penalties (e.g., total variation

and lp semi-norms) opens the door to the adaptation of fast algorithms developed in the context of compressive sensing; we introduce several novel algorithms to solve the proposed nonlocal optimization

problem. Thanks to the unifying framework, these fast algorithms are readily applicable for a large class

of distance metrics.

ETPL

DIP-269

Corner Detection and Classification Using Anisotropic Directional Derivative

Representations

Abstract: This paper proposes a corner detector and classifier using anisotropic directional derivative

(ANDD) representations. The ANDD representation at a pixel is a function of the oriented angle and

characterizes the local directional grayscale variation around the pixel. The proposed corner detector fuses the ideas of the contour- and intensity-based detection. It consists of three cascaded blocks. First,

the edge map of an image is obtained by the Canny detector and from which contours are extracted and

patched. Next, the ANDD representation at each pixel on contours is calculated and normalized by its maximal magnitude. The area surrounded by the normalized ANDD representation forms a new corner

measure. Finally, the nonmaximum suppression and thresholding are operated on each contour to find

corners in terms of the corner measure. Moreover, a corner classifier based on the peak number of the ANDD representation is given. Experiments are made to evaluate the proposed detector and classifier.

The proposed detector is competitive with the two recent state-of-the-art corner detectors, the He & Yung

detector and CPDA detector, in detection capability and attains higher repeatability under affine

transforms. The proposed classifier can discriminate effectively simple corners, Y-type corners, and higher order corners

ETPL

DIP-270 Classification of Time Series of Multispectral Images With Limited Training Data

Abstract: Image classification usually requires the availability of reliable reference data collected for the

considered image to train supervised classifiers. Unfortunately when time series of images are considered,

this is seldom possible because of the costs associated with reference data collection. In most of the applications it is realistic to have reference data available for one or few images of a time series acquired

on the area of interest. In this paper, we present a novel system for automatically classifying image time

series that takes advantage of image(s) with an associated reference information (i.e., the source domain) to classify image(s) for which reference information is not available (i.e., the target domain). The

proposed system exploits the already available knowledge on the source domain and, when possible,

integrates it with a minimum amount of new labeled data for the target domain. In addition, it is able to handle possible significant differences between statistical distributions of the source and target domains.




Here, the method is presented in the context of classification of remote sensing image time series, where

ground reference data collection is a highly critical and demanding task. Experimental results show the

effectiveness of the proposed technique. The method can work on multimodal (e.g., multispectral) images

ETPL

DIP-271 Fast -Minimization Algorithms for Robust Face Recognition

Abstract: l 1-minimization refers to finding the minimum l1-norm solution to an underdetermined linear

system mbib=Ambix. Under certain conditions as described in compressive sensing theory, the

minimum l1-norm solution is also the sparsest solution. In this paper, we study the speed and scalability of its algorithms. In particular, we focus on the numerical implementation of a sparsity-based

classification framework in robust face recognition, where sparse representation is sought to recover

human identities from high-dimensional facial images that may be corrupted by illumination, facial disguise, and pose variation. Although the underlying numerical problem is a linear program, traditional

algorithms are known to suffer poor scalability for large-scale applications. We investigate a new solution

based on a classical convex optimization framework, known as augmented Lagrangian methods. We conduct extensive experiments to validate and compare its performance against several popular l1-

minimization solvers, including interior-point method, Homotopy, FISTA, SESOP-PCD, approximate

message passing, and TFOCS. To aid peer evaluation, the code for all the algorithms has been made

publicly available.

ETPL

DIP-272 Robust Face Representation Using Hybrid Spatial Feature Interdependence Matrix

Abstract: A key issue in face recognition is to seek an effective descriptor for representing face

appearance. In the context of considering the face image as a set of small facial regions, this paper presents a new face representation approach coined spatial feature interdependence matrix (SFIM).

Unlike classical face descriptors which usually use a hierarchically organized or a sequentially

concatenated structure to describe the spatial layout features extracted from local regions, SFIM is

attributed to the exploitation of the underlying feature interdependences regarding local region pairs inside a class specific face. According to SFIM, the face image is projected onto an undirected connected

graph in a manner that explicitly encodes feature interdependence-based relationships between local

regions. We calculate the pair-wise interdependence strength as the weighted discrepancy between two feature sets extracted in a hybrid feature space fusing histograms of intensity, local binary pattern and

oriented gradients. To achieve the goal of face recognition, our SFIM-based face descriptor is embedded

in three different recognition frameworks, namely nearest neighbor search, subspace-based classification, and linear optimization-based classification. Extensive experimental results on four well-known face

databases and comprehensive comparisons with the state-of-the-art results are provided to demonstrate

the efficacy of the proposed SFIM-based descriptor

ETPL

DIP-273 Motion Estimation Using the Correlation Transform

Abstract: The zero-mean normalized cross-correlation is shown to improve the accuracy of optical flow,

but its analytical form is quite complicated for the variational framework. This paper addresses this issue and presents a new direct approach to this matching measure. Our approach uses the correlation transform

to define very discriminative descriptors that are pre-computed and that have to be matched in the target

frame. It is equivalent to the computation of the optical flow for the correlation transforms of the images.

The smoothness energy is non-local and uses a robust penalty in order to preserve motion discontinuities. The model is associated with a fast and parallelizable minimization procedure based on the projected-

proximal point algorithm. The experiments confirm the strength of this model and implicitly demonstrate

the correctness of our solution. The results demonstrate that the involved data term is very robust with




respect to changes in illumination, especially where large illumination exists.

ETPL

DIP-274 Single Image Dehazing by Multi-Scale Fusion

Abstract: Haze is an atmospheric phenomenon that significantly degrades the visibility of outdoor scenes.

This is mainly due to the atmosphere particles that absorb and scatter the light. This paper introduces a

novel single image approach that enhances the visibility of such degraded images. Our method is a

fusion-based strategy that derives from two original hazy image inputs by applying a white balance and a contrast enhancing procedure. To blend effectively the information of the derived inputs to preserve the

regions with good visibility, we filter their important features by computing three measures (weight

maps): luminance, chromaticity, and saliency. To minimize artifacts introduced by the weight maps, our approach is designed in a multiscale fashion, using a Laplacian pyramid representation. We are the first to

demonstrate the utility and effectiveness of a fusion-based technique for dehazing based on a single

degraded image. The method performs in a per-pixel fashion, which is straightforward to implement. The experimental results demonstrate that the method yields results comparative to and even better than the

more complex state-of-the-art techniques, having the advantage of being appropriate for real-time

applications.

ETPL

DIP-275 Joint Sparse Learning for 3-D Facial Expression Generation

Abstract: 3-D facial expression generation, including synthesis and retargeting, has received intensive

attentions in recent years, because it is important to produce realistic 3-D faces with specific expressions

in modern film production and computer games. In this paper, we present joint sparse learning (JSL) to learn mapping functions and their respective inverses to model the relationship between the high-

dimensional 3-D faces (of different expressions and identities) and their corresponding low-dimensional

representations. Based on JSL, we can effectively and efficiently generate various expressions of a 3-D

face by either synthesizing or retargeting. Furthermore, JSL is able to restore 3-D faces with holes by learning a mapping function between incomplete and intact data. Experimental results on a wide range of

3-D faces demonstrate the effectiveness of the proposed approach by comparing with representative ones

in terms of quality, time cost, and robustness.

ETPL

DIP-276 Robust Model for Segmenting Images With/Without Intensity Inhomogeneities

Abstract: Intensity inhomogeneities and different types/levels of image noise are the two major obstacles

to accurate image segmentation by region-based level set models. To provide a more general solution to these challenges, we propose a novel segmentation model that considers global and local image statistics

to eliminate the influence of image noise and to compensate for intensity inhomogeneities. In our model,

the global energy derived from a Gaussian model estimates the intensity distribution of the target object

and background; the local energy derived from the mutual influences of neighboring pixels can eliminate the impact of image noise and intensity inhomogeneities. The robustness of our method is validated on

segmenting synthetic images with/without intensity inhomogeneities, and with different types/levels of

noise, including Gaussian noise, speckle noise, and salt and pepper noise, as well as images from different medical imaging modalities. Quantitative experimental comparisons demonstrate that our

method is more robust and more accurate in segmenting the images with intensity inhomogeneities than

the local binary fitting technique and its more recent systematic model. Our technique also outperformed

the region-based Chan-Vese model when dealing with images without intensity inhomogeneities and produce better segmentation results than the graph-based algorithms including graph-cuts and random

walker when segmenting noisy images.

ETPL Learning Prototype Hyperplanes for Face Verification in the Wild




DIP-277

Abstract: In this paper, we propose a new scheme called Prototype Hyperplane Learning (PHL) for face verification in the wild using only weakly labeled training samples (i.e., we only know whether each pair

of samples are from the same class or different classes without knowing the class label of each sample)

by leveraging a large number of unlabeled samples in a generic data set. Our scheme represents each

sample in the weakly labeled data set as a mid-level feature with each entry as the corresponding decision value from the classification hyperplane (referred to as the prototype hyperplane) of one Support Vector

Machine (SVM) model, in which a sparse set of support vectors is selected from the unlabeled generic

data set based on the learnt combination coefficients. To learn the optimal prototype hyperplanes for the extraction of mid-level features, we propose a Fisher's Linear Discriminant-like (FLD-like) objective

function by maximizing the discriminability on the weakly labeled data set with a constraint enforcing

sparsity on the combination coefficients of each SVM model, which is solved by using an alternating optimization method. Then, we use the recent work called Side-Information based Linear Discriminant

(SILD) analysis for dimensionality reduction and a cosine similarity measure for final face verification.

Comprehensive experiments on two data sets, Labeled Faces in the Wild (LFW) and YouTube Faces,

demonstrate the effectiveness of our scheme.

Download - Final Year IEEE Project 2013-2014 - Digital Image Processing Project Title and Abstract

Top Related