Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
13 Years of Experience
Automated Services
24/7 Help Desk Support
Experience & Expertise Developers
Advanced Technologies & Tools
Legitimate Member of all Journals
Having 1,50,000 Successive records in
all Languages
More than 12 Branches in Tamilnadu,
Kerala & Karnataka.
Ticketing & Appointment Systems.
Individual Care for every Student.
Around 250 Developers & 20
Researchers
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
227-230 Church Road, Anna Nagar, Madurai – 625020.
0452-4390702, 4392702, + 91-9944793398.
[email protected], [email protected]
S.P.Towers, No.81 Valluvar Kottam High Road, Nungambakkam,
Chennai - 600034. 044-42072702, +91-9600354638,
15, III Floor, SI Towers, Melapudur main Road, Trichy – 620001.
0431-4002234, + 91-9790464324.
577/4, DB Road, RS Puram, Opp to KFC, Coimbatore – 641002
0422- 4377758, +91-9677751577.
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Plot No: 4, C Colony, P&T Extension, Perumal puram, Tirunelveli-
627007. 0462-2532104, +919677733255,
1st Floor, A.R.IT Park, Rasi Color Scan Building, Ramanathapuram
- 623501. 04567-223225,
74, 2nd floor, K.V.K Complex,Upstairs Krishna Sweets, Mettur
Road, Opp. Bus stand, Erode-638 011. 0424-4030055, +91-
9677748477 [email protected]
No: 88, First Floor, S.V.Patel Salai, Pondicherry – 605 001. 0413–
4200640 +91-9677704822
TNHB A-Block, D.no.10, Opp: Hotel Ganesh Near Busstand. Salem
– 636007, 0427-4042220, +91-9894444716.
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
ETPL
DIP-001
Local Edge Preserving Multiscale Decomposition For High Dynamic Range Image
Tone Mapping
A novel filter is proposed for edge-preserving decomposition of an image. It is different from previous
filters in its locally adaptive property. The filtered image contains local means everywhere and preserves
local salient edges. Comparisons are made between our filtered result and the results of three other
methods. A detailed analysis is also made on the behavior of the filter. A multiscale decomposition with this filter is proposed for manipulating a high dynamic range image, which has three detail layers and one
base layer. The multiscale decomposition with the filter addresses three assumptions: 1) the base layer
preserves local means everywhere; 2) every scale's salient edges are relatively large gradients in a local window; and 3) all of the nonzero gradient information belongs to the detail layer. An effective function
is also proposed for compressing the detail layers. The reproduced image gives a good visualization.
Experimental results on real images demonstrate that our algorithm is especially effective at preserving or
enhancing local details.
ETPL
DIP-002
Multistucture Large Deformation Diffeomorphic Brain Registration( Biomedical
Engineering)
Whole brain MRI registration has many useful applications in group analysis and morphometry, yet
accurate registration across different neuropathological groups remains challenging. Structure-specific information, or anatomical guidance, can be used to initialize and constrain registration to improve
accuracy and robustness. We describe here a multistructure diffeomorphic registration approach that uses
concurrent subcortical and cortical shape matching to guide the overall registration. Validation experiments carried out on openly available datasets demonstrate comparable or improved alignment of
subcortical and cortical brain structures over leading brain registration algorithms. We also demonstrate
that a group-wise average atlas built with multistructure registration accounts for greater intersubject
variability and provides more sensitive tensor-based morphometry measurements.
ETPL
DIP-003
Iterative Closest Normal Point for 3D Face Recognition( Pattern Analysis and Machine
Intelligence)
The common approach for 3D face recognition is to register a probe face to each of the gallery faces and
then calculate the sum of the distances between their points. This approach is computationally expensive and sensitive to facial expression variation. In this paper, we introduce the iterative closest normal point
method for finding the corresponding points between a generic reference face and every input face. The
proposed correspondence finding method samples a set of points for each face, denoted as the closest
normal points. These points are effectively aligned across all faces, enabling effective application of discriminant analysis methods for 3D face recognition. As a result, the expression variation problem is
addressed by minimizing the within-class variability of the face samples while maximizing the between-
class variability. As an important conclusion, we show that the surface normal vectors of the face at the sampled points contain more discriminatory information than the coordinates of the points. We have
performed comprehensive experiments on the Face Recognition Grand Challenge database, which is
presently the largest available 3D face database. We have achieved verification rates of 99.6 and 99.2 percent at a false acceptance rate of 0.1 percent for the all versus all and ROC III experiments,
respectively, which, to the best of our knowledge, have seven and four times less error rates, respectively,
compared to the best existing methods on this database.
ETPL
DIP-004
Face Recognition & verification using photometric stergo(Information Forensics and
Security)
This paper presents a new database suitable for both 2-D and 3-D face recognition based on photometric
stereo (PS): the Photoface database. The database was collected using a custom-made four-source PS
device designed to enable data capture with minimal interaction necessary from the subjects. The device, which automatically detects the presence of a subject using ultrasound, was placed at the entrance to a
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
busy workplace and captured 1839 sessions of face images with natural pose and expression. This meant
that the acquired data is more realistic for everyday use than existing databases and is, therefore, an
invaluable test bed for state-of-the-art recognition algorithms. The paper also presents experiments of various face recognition and verification algorithms using the albedo, surface normals, and recovered
depth maps. Finally, we have conducted experiments in order to demonstrate how different methods in
the pipeline of PS (i.e., normal field computation and depth map reconstruction) affect recognition and verification performance. These experiments help to 1) demonstrate the usefulness of PS, and our device
in particular, for minimal-interaction face recognition, and 2) highlight the optimal reconstruction and
recognition algorithms for use with natural-expression PS data. The database can be downloaded from
http://www.uwe.ac.uk/research/Photoface.
ETPL
DIP-005 Objective Quality Assessment of Tone-Mapped Images
Tone-mapping operators (TMOs) that convert high dynamic range (HDR) to low dynamic range (LDR)
images provide practically useful tools for the visualization of HDR images on standard LDR displays. Different TMOs create different tone-mapped images, and a natural question is which one has the best
quality. Without an appropriate quality measure, different TMOs cannot be compared, and further
improvement is directionless. Subjective rating may be a reliable evaluation method, but it is expensive
and time consuming, and more importantly, is difficult to be embedded into optimization frameworks. Here we propose an objective quality assessment algorithm for tone-mapped images by combining: 1) a
multiscale signal fidelity measure on the basis of a modified structural similarity index and 2) a
naturalness measure on the basis of intensity statistics of natural images. Validations using independent subject-rated image databases show good correlations between subjective ranking score and the proposed
tone-mapped image quality index (TMQI). Furthermore, we demonstrate the extended applications of
TMQI using two examples - parameter tuning for TMOs and adaptive fusion of multiple tone-mapped images.
ETPL
DIP-006
Segmentation and Tracing of Single Neurons from 3D Confocal Microscope Images(
Biomedical and Health Informatics)
In order to understand the brain, we need to first understand the morphology of neurons. In the neurobiology community, there have been recent pushes to analyze both neuron connectivity and the
influence of structure on function. Currently, a technical roadblock that stands in the way of these studies
is the inability to automatically trace neuronal structure from microscopy. On the image processing side, proposed tracing algorithms face difficulties in low contrast, indistinct boundaries, clutter, and complex
branching structure. To tackle these difficulties, we develop Tree2Tree, a robust automatic neuron
segmentation and morphology generation algorithm. Tree2Tree uses a local medial tree generation
strategy in combination with a global tree linking to build a maximum likelihood global tree. Recasting the neuron tracing problem in a graph-theoretic context enables Tree2Tree to estimate bifurcations
naturally, which is currently a challenge for current neuron tracing algorithms. Tests on cluttered confocal
microscopy images of Drosophila neurons give results that correspond to ground truth within a margin of $ pm hbox{2.75}$% normalized mean absolute error.
ETPL
DIP-007
Silhoutte Analysis-Based action recognition via Exploiting Human Poses( Circuits and
Systems for Video Technology)
In this paper, we propose a novel scheme for human action recognition that combines the advantages of both local and global representations. We explore human silhouettes for human action representation by
taking into account the correlation between sequential poses in an action. A modified bag-of-words
model, named bag of correlated poses, is introduced to encode temporally local features of actions. To
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
utilize the property of visual word ambiguity, we adopt the soft assignment strategy to reduce the
dimensionality of our model and circumvent the penalty of computational complexity and quantization
error. To compensate for the loss of structural information, we propose an extended motion template, i.e., extensions of the motion history image, to capture the holistic structural features. The proposed scheme
takes advantages of local and global features and, therefore, provides a discriminative representation for
human actions. Experimental results prove the viability of the complimentary properties of two descriptors and the proposed approach outperforms the state-of-the-art methods on the IXMAS action
recognition dataset.
ETPL
DIP-008 Pose-Invariant Face Recognition Using Markov Random Fields
One of the key challenges for current face recognition techniques is how to handle pose variations
between the probe and gallery face images. In this paper, we present a method for reconstructing the
virtual frontal view from a given nonfrontal face image using Markov random fields (MRFs) and an efficient variant of the belief propagation algorithm. In the proposed approach, the input face image is
divided into a grid of overlapping patches, and a globally optimal set of local warps is estimated to
synthesize the patches at the frontal view. A set of possible warps for each patch is obtained by aligning it
with images from a training database of frontal faces. The alignments are performed efficiently in the Fourier domain using an extension of the Lucas-Kanade algorithm that can handle illumination
variations. The problem of finding the optimal warps is then formulated as a discrete labeling problem
using an MRF. The reconstructed frontal face image can then be used with any face recognition technique. The two main advantages of our method are that it does not require manually selected facial
landmarks or head pose estimation. In order to improve the performance of our pose normalization
method in face recognition, we also present an algorithm for classifying whether a given face image is at a frontal or nonfrontal pose. Experimental results on different datasets are presented to demonstrate the
effectiveness of the proposed approach
ETPL
DIP-009
Color Video Denoising Based on Combined Interframe and Intercolor Prediction(
Circuits and Systems for Video Technology)
An advanced color video denoising scheme which we call CIFIC based on combined interframe and intercolor prediction is proposed in this paper. CIFIC performs the denoising filtering in the RGB color
space, and exploits both the interframe and intercolor correlation in color video signal directly by forming
multiple predictors for each color component using all three color components in the current frame as well as the motion-compensated neighboring reference frames. The temporal correspondence is
established through the joint-RGB motion estimation (ME) which acquires a single motion trajectory for
the red, green, and blue components. Then the current noisy observation as well as the interframe and
intercolor predictors are combined by a linear minimum mean squared error (LMMSE) filter to obtain the denoised estimate for every color component. The ill condition in the weight determination of the
LMMSE filter is detected and remedied by gradually removing the “least contributing” predictor.
Furthermore, our previous work on the LMMSE filter applied in the adaptive luminance-chrominance space (LAYUV for short) is revisited. By reformulating LAYUV and comparing it with CIFIC, we
deduce that LAYUV is a restricted version of CIFIC, and thus CIFIC can theoretically achieve lower
denoising error. Experimental results verify the improvement brought by the joint-RGB ME and the integration of the intercolor prediction, as well as the superiority of CIFIC over LAYUV. Meanwhile,
when compared with other state-of-the-art algorithms, CIFIC provides competitive performance both in
terms of the color peak signal-to-noise ratio and in perceptual quality.
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
ETPL
DIP-010
Wang-Landau Monte Carlo-Based Tracking Methods for Abrupt Motions( Pattern
Analysis and Machine Intelligence)
We propose a novel tracking algorithm based on the Wang-Landau Monte Carlo (WLMC) sampling
method for dealing with abrupt motions efficiently. Abrupt motions cause conventional tracking methods to fail because they violate the motion smoothness constraint. To address this problem, we introduce the
Wang-Landau sampling method and integrate it into a Markov Chain Monte Carlo (MCMC)-based
tracking framework. By employing the novel density-of-states term estimated by the Wang-Landau sampling method into the acceptance ratio of MCMC, our WLMC-based tracking method alleviates the
motion smoothness constraint and robustly tracks the abrupt motions. Meanwhile, the marginal likelihood
term of the acceptance ratio preserves the accuracy in tracking smooth motions. The method is then extended to obtain good performance in terms of scalability, even on a high-dimensional state space.
Hence, it covers drastic changes in not only position but also scale of a target. To achieve this, we modify
our method by combining it with the N-fold way algorithm and present the N-Fold Wang-Landau
(NFWL)-based tracking method. The N-fold way algorithm helps estimate the density-of-states with a smaller number of samples. Experimental results demonstrate that our approach efficiently samples the
states of the target, even in a whole state space, without loss of time, and tracks the target accurately and
robustly when position and scale are changing severely
ETPL
DIP-011
Multi-View ML Object Tracking With Online Learning on Riemannian Manifolds by
Combining Geometric Constraints
This paper addresses issues in object tracking with occlusion scenarios, where multiple uncalibrated
cameras with overlapping fields of view are exploited. We propose a novel method where tracking is first
done independently in each individual view and then tracking results are mapped from different views to improve the tracking jointly. The proposed tracker uses the assumptions that objects are visible in at least
one view and move uprightly on a common planar ground that may induce a homography relation
between views. A method for online learning of object appearances on Riemannian manifolds is also introduced. The main novelties of the paper include: 1) define a similarity measure, based on geodesics
between a candidate object and a set of mapped references from multiple views on a Riemannian
manifold; 2) propose multi-view maximum likelihood estimation of object bounding box parameters, based on Gaussian-distributed geodesics on the manifold; 3) introduce online learning of object
appearances on the manifold, taking into account of possible occlusions; 4) utilize projective
transformations for objects between views, where parameters are estimated from warped vertical axis by
combining planar homography, epipolar geometry, and vertical vanishing point; 5) embed single-view trackers in a three-layer multi-view tracking scheme. Experiments have been conducted on videos from
multiple uncalibrated cameras, where objects contain long-term partial/full occlusions, or frequent
intersections. Comparisons have been made with three existing methods, where the performance is evaluated both qualitatively and quantitatively. Results have shown the effectiveness of the proposed
method in terms of robustness against tracking drift caused by occlusions.
ETPL
DIP-012
Multi-Atlas Segmentation with Joint Label Fusion ( Pattern Analysis and Machine
Intelligence)
Multi-atlas segmentation is an effective approach for automatically labeling objects of interest in biomedical images. In this approach, multiple expert-segmented example images, called atlases, are
registered to a target image, and deformed atlas segmentations are combined using label fusion. Among
the proposed label fusion strategies, weighted voting with spatially varying weight distributions derived from atlas-target intensity similarity have been particularly successful. However, one limitation of these
strategies is that the weights are computed independently for each atlas, without taking into account the
fact that different atlases may produce similar label errors. To address this limitation, we propose a new solution for the label fusion problem in which weighted voting is formulated in terms of minimizing the
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
total expectation of labeling error and in which pairwise dependency between atlases is explicitly
modeled as the joint probability of two atlases making a segmentation error at a voxel. This probability is
approximated using intensity similarity between a pair of atlases and the target image in the neighborhood of each voxel. We validate our method in two medical image segmentation problems: hippocampus
segmentation and hippocampus subfield segmentation in magnetic resonance (MR) images. For both
problems, we show consistent and significant improvement over label fusion strategies that assign atlas weights independently.
ETPL
DIP-013
Spatially Coherent Fuzzy Clustering for Accurate and Noise-Robust Image
Segmentation
In this letter, we present a new FCM-based method for spatially coherent and noise-robust image
segmentation. Our contribution is twofold: 1) the spatial information of local image features is integrated into both the similarity measure and the membership function to compensate for the effect of noise; and
2) an anisotropic neighborhood, based on phase congruency features, is introduced to allow more
accurate segmentation without image smoothing. The segmentation results, for both synthetic and real images, demonstrate that our method efficiently preserves the homogeneity of the regions and is more
robust to noise than related FCM-based methods.
ETPL
DIP-014
Adaptive Markov Random Fields for Joint Unmixing and Segmentation of
Hyperspectral Images
Abstract: Linear spectral unmixing is a challenging problem in hyperspectral imaging that consists of decomposing an observed pixel into a linear combination of pure spectra (or endmembers) with their
corresponding proportions (or abundances). Endmember extraction algorithms can be employed for
recovering the spectral signatures while abundances are estimated using an inversion step. Recent works have shown that exploiting spatial dependencies between image pixels can improve spectral unmixing.
Markov random fields (MRF) are classically used to model these spatial correlations and partition the
image into multiple classes with homogeneous abundances. This paper proposes to define the MRF sites
using similarity regions. These regions are built using a self-complementary area filter that stems from the morphological theory. This kind of filter divides the original image into flat zones where the
underlying pixels have the same spectral values. Once the MRF has been clearly established, a
hierarchical Bayesian algorithm is proposed to estimate the abundances, the class labels, the noise variance, and the corresponding hyperparameters. A hybrid Gibbs sampler is constructed to generate
samples according to the corresponding posterior distribution of the unknown parameters and
hyperparameters. Simulations conducted on synthetic and real AVIRIS data demonstrate the good performance of the algorithm.
ETPL
DIP-015 Depth Estimation of Face Images Using the Nonlinear Least-Squares Model
Abstract: In this paper, we propose an efficient algorithm to reconstruct the 3D structure of a human face
from one or more of its 2D images with different poses. In our algorithm, the nonlinear least-squares model is first employed to estimate the depth values of facial feature points and the pose of the 2D face
image concerned by means of the similarity transform. Furthermore, different optimization schemes are
presented with regard to the accuracy levels and the training time required. Our algorithm also embeds the symmetrical property of the human face into the optimization procedure, in order to alleviate the
sensitivities arising from changes in pose. In addition, the regularization term, based on linear correlation,
is added in the objective function to improve the estimation accuracy of the 3D structure. Further, a
model-integration method is proposed to improve the depth-estimation accuracy when multiple nonfrontal-view face images are available. Experimental results on the 2D and 3D databases demonstrate
the feasibility and efficiency of the proposed methods.
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
ETPL
DIP-016
Local Energy Pattern for Texture Classification Using Self-Adaptive Quantization
Thresholds
Abstract: Local energy pattern, a statistical histogram-based representation, is proposed for texture
classification. First, we use normalized local-oriented energies to generate local feature vectors, which describe the local structures distinctively and are less sensitive to imaging conditions. Then, each local
feature vector is quantized by self-adaptive quantization thresholds determined in the learning stage using
histogram specification, and the quantized local feature vector is transformed to a number by N-nary coding, which helps to preserve more structure information during vector quantization. Finally, the
frequency histogram is used as the representation feature. The performance is benchmarked by material
categorization on KTH-TIPS and KTH-TIPS2-a databases. Our method is compared with typical statistical approaches, such as basic image features, local binary pattern (LBP), local ternary pattern,
completed LBP, Weber local descriptor, and VZ algorithms (VZ-MR8 and VZ-Joint). The results show
that our method is superior to other methods on the KTH-TIPS2-a database, and achieving competitive
performance on the KTH-TIPS database. Furthermore, we extend the representation from static image to dynamic texture, and achieve favorable recognition results on the University of California at Los Angeles
(UCLA) dynamic texture database.
ETPL
DIP-017 Perceptual Quality Metric With Internal Generative Mechanism
Abstract: Objective image quality assessment (IQA) aims to evaluate image quality consistently with
human perception. Most of the existing perceptual IQA metrics cannot accurately represent the
degradations from different types of distortion, e.g., existing structural similarity metrics perform well on
content-dependent distortions while not as well as peak signal-to-noise ratio (PSNR) on content-independent distortions. In this paper, we integrate the merits of the existing IQA metrics with the guide
of the recently revealed internal generative mechanism (IGM). The IGM indicates that the human visual
system actively predicts sensory information and tries to avoid residual uncertainty for image perception and understanding. Inspired by the IGM theory, we adopt an autoregressive prediction algorithm to
decompose an input scene into two portions, the predicted portion with the predicted visual content and
the disorderly portion with the residual content. Distortions on the predicted portion degrade the primary visual information, and structural similarity procedures are employed to measure its degradation;
distortions on the disorderly portion mainly change the uncertain information and the PNSR is employed
for it. Finally, according to the noise energy deployment on the two portions, we combine the two
evaluation results to acquire the overall quality score. Experimental results on six publicly available databases demonstrate that the proposed metric is comparable with the state-of-the-art quality metrics.
ETPL
DIP-018
Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A
Comparative Study
Abstract: Visual attention is a process that enables biological and machine vision systems to select the most relevant regions from a scene. Relevance is determined by two components: 1) top-down factors
driven by task and 2) bottom-up factors that highlight image regions that are different from their
surroundings. The latter are often referred to as “visual saliency.” Modeling bottom-up visual saliency
has been the subject of numerous research efforts during the past 20 years, with many successful applications in computer vision and robotics. Available models have been tested with different datasets
(e.g., synthetic psychological search arrays, natural images or videos) using different evaluation scores
(e.g., search slopes, comparison to human eye tracking) and parameter settings. This has made direct comparison of models difficult. Here, we perform an exhaustive comparison of 35 state-of-the-art
saliency models over 54 challenging synthetic patterns, three natural image datasets, and two video
datasets, using three evaluation scores. We find that although model rankings vary, some models consistently perform better. Analysis of datasets reveals that existing datasets are highly center-biased,
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
which influences some of the evaluation scores. Computational complexity analysis shows that some
models are very fast, yet yield competitive eye movement prediction accuracy. Different models often
have common easy/difficult stimuli. Furthermore, several concerns in visual saliency modeling, eye movement datasets, and evaluation scores are discussed and insights for future work are provided. Our
study allows one to assess the state-of-the-art, helps to organizing this rapidly growing field, and sets a
unified comparison framework for gauging future efforts, similar to the PASCAL VOC challenge in the object recognition and detection domains.
ETPL
DIP-019
Local Edge-Preserving Multiscale Decomposition for High Dynamic Range Image
Tone Mapping
Abstract: A novel filter is proposed for edge-preserving decomposition of an image. It is different from
previous filters in its locally adaptive property. The filtered image contains local means everywhere and preserves local salient edges. Comparisons are made between our filtered result and the results of three
other methods. A detailed analysis is also made on the behavior of the filter. A multiscale decomposition
with this filter is proposed for manipulating a high dynamic range image, which has three detail layers and one base layer. The multiscale decomposition with the filter addresses three assumptions: 1) the base
layer preserves local means everywhere; 2) every scale's salient edges are relatively large gradients in a
local window; and 3) all of the nonzero gradient information belongs to the detail layer. An effective
function is also proposed for compressing the detail layers. The reproduced image gives a good visualization. Experimental results on real images demonstrate that our algorithm is especially effective at
preserving or enhancing local details.
ETPL
DIP-020 LLSURE: Local Linear SURE-Based Edge-Preserving Image Filtering
Abstract: In this paper, we propose a novel approach for performing high-quality edge-preserving image
filtering. Based on a local linear model and using the principle of Stein's unbiased risk estimate as an
estimator for the mean squared error from the noisy image only, we derive a simple explicit image filter
which can filter out noise while preserving edges and fine-scale details. Moreover, this filter has a fast and exact linear-time algorithm whose computational complexity is independent of the filtering kernel
size; thus, it can be applied to real time image processing tasks. The experimental results demonstrate the
effectiveness of the new filter for various computer vision applications, including noise reduction, detail smoothing and enhancement, high dynamic range compression, and flash/no-flash denoising.
ETPL
DIP-021
Optimal Inversion of the Generalized Anscombe Transformation for Poisson-Gaussian
Noise
Abstract: Many digital imaging devices operate by successive photon-to-electron, electron-to-voltage,
and voltage-to-digit conversions. These processes are subject to various signal-dependent errors, which
are typically modeled as Poisson-Gaussian noise. The removal of such noise can be effected indirectly by
applying a variance-stabilizing transformation (VST) to the noisy data, denoising the stabilized data with a Gaussian denoising algorithm, and finally applying an inverse VST to the denoised data. The
generalized Anscombe transformation (GAT) is often used for variance stabilization, but its unbiased
inverse transformation has not been rigorously studied in the past. We introduce the exact unbiased inverse of the GAT and show that it plays an integral part in ensuring accurate denoising results. We
demonstrate that this exact inverse leads to state-of-the-art results without any notable increase in the
computational complexity compared to the other inverses. We also show that this inverse is optimal in the
sense that it can be interpreted as a maximum likelihood inverse. Moreover, we thoroughly analyze the behavior of the proposed inverse, which also enables us to derive a closed-form approximation for it. This
paper generalizes our work on the exact unbiased inverse of the Anscombe transformation, which we
have presented earlier for the removal of pure Poisson noise.
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
ETPL
DIP-022 Blind Separation of Time/Position Varying Mixtures
Abstract: We address the challenging open problem of blindly separating time/position varying mixtures,
and attempt to separate the sources from such mixtures without having prior information about the sources or the mixing system. Unlike studies concerning instantaneous or convolutive mixtures, we
assume that the mixing system (medium) is varying in time/position. Attempts to solve this problem have
mostly utilized, so far, online algorithms based on tracking the mixing system by methods previously developed for the instantaneous or convolutive mixtures. In contrast with these attempts, we develop a
unified approach in the form of staged sparse component analysis (SSCA). Accordingly, we assume that
the sources are either sparse or can be “sparsified.” In the first stage, we estimate the filters of the mixing system, based on the scatter plot of the sparse mixtures' data, using a proper clustering and curve/surface
fitting. In the second stage, the mixing system is inverted, yielding the estimated sources. We use the
SSCA approach for solving three types of mixtures: time/position varying instantaneous mixtures, single-
path mixtures, and multipath mixtures. Real-life scenarios and simulated mixtures are used to demonstrate the performance of our approach.
ETPL
DIP-023 Nonlocal Transform-Domain Filter for Volumetric Data Denoising and Reconstruction
Abstract: We present an extension of the BM3D filter to volumetric data. The proposed algorithm, BM4D, implements the grouping and collaborative filtering paradigm, where mutually similar d -
dimensional patches are stacked together in a (d+1) -dimensional array and jointly filtered in transform
domain. While in BM3D the basic data patches are blocks of pixels, in BM4D we utilize cubes of voxels,
which are stacked into a 4-D “group.” The 4-D transform applied on the group simultaneously exploits the local correlation present among voxels in each cube and the nonlocal correlation between the
corresponding voxels of different cubes. Thus, the spectrum of the group is highly sparse, leading to very
effective separation of signal and noise through coefficient shrinkage. After inverse transformation, we obtain estimates of each grouped cube, which are then adaptively aggregated at their original locations.
We evaluate the algorithm on denoising of volumetric data corrupted by Gaussian and Rician noise, as
well as on reconstruction of volumetric phantom data with non-zero phase from noisy and incomplete Fourier-domain (k-space) measurements. Experimental results demonstrate the state-of-the-art denoising
performance of BM4D, and its effectiveness when exploited as a regularizer in volumetric data
reconstruction.
ETPL
DIP-024 Huber Fractal Image Coding Based on a Fitting Plane
Abstract: Recently, there has been significant interest in robust fractal image coding for the purpose of
robustness against outliers. However, the known robust fractal coding methods (HFIC and LAD-FIC,
etc.) are not optimal, since, besides the high computational cost, they use the corrupted domain block as the independent variable in the robust regression model, which may adversely affect the robust estimator
to calculate the fractal parameters (depending on the noise level). This paper presents a Huber fitting
plane-based fractal image coding (HFPFIC) method. This method builds Huber fitting planes (HFPs) for
the domain and range blocks, respectively, ensuring the use of an uncorrupted independent variable in the robust model. On this basis, a new matching error function is introduced to robustly evaluate the best
scaling factor. Meanwhile, a median absolute deviation (MAD) about the median decomposition criterion
is proposed to achieve fast adaptive quadtree partitioning for the image corrupted by salt & pepper noise. In order to reduce computational cost, the no-search method is applied to speedup the encoding process.
Experimental results show that the proposed HFPFIC can yield superior performance over conventional
robust fractal image coding methods in encoding speed and the quality of the restored image. Furthermore, the no-search method can significantly reduce encoding time and achieve less than 2.0 s for
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
the HFPFIC with acceptable image quality degradation. In addition, we show that, combined with the
MAD decomposition scheme, the HFP technique used as a robust method can further reduce the encoding
time while maintaining image quality.
ETPL
DIP-025
Demosaicking of Noisy Bayer-Sampled Color Images With Least-Squares Luma-
Chroma Demultiplexing and Noise Level Estimation
Abstract: This paper adapts the least-squares luma-chroma demultiplexing (LSLCD) demosaicking
method to noisy Bayer color filter array (CFA) images. A model is presented for the noise in white-
balanced gamma-corrected CFA images. A method to estimate the noise level in each of the red, green, and blue color channels is then developed. Based on the estimated noise parameters, one of a finite set of
configurations adapted to a particular level of noise is selected to demosaic the noisy data. The noise-
adaptive demosaicking scheme is called LSLCD with noise estimation (LSLCD-NE). Experimental results demonstrate state-of-the-art performance over a wide range of noise levels, with low
computational complexity. Many results with several algorithms, noise levels, and images are presented
on our companion web site along with software to allow reproduction of our results.
ETPL
DIP-026 Multiscale Gradients-Based Color Filter Array Interpolation
Abstract: Single sensor digital cameras use color filter arrays to capture a subset of the color data at each
pixel coordinate. Demosaicing or color filter array (CFA) interpolation is the process of estimating the
missing color samples to reconstruct a full color image. In this paper, we propose a demosaicing method that uses multiscale color gradients to adaptively combine color difference estimates from different
directions. The proposed solution does not require any thresholds since it does not make any hard
decisions, and it is noniterative. Although most suitable for the Bayer CFA pattern, the method can be extended to other mosaic patterns. To demonstrate this, we describe its application to the Lukac CFA
pattern. Experimental results show that it outperforms other available demosaicing methods by a clear
margin in terms of CPSNR and S-CIELAB measures for both mosaic patterns.
ETPL
DIP-027 Optimal local dimming for LC image formation with controllable backlighting
Abstract: Light emitting diode (LED)-backlit liquid crystal displays (LCDs) hold the promise of
improving image quality while reducing the energy consumption with signal-dependent local dimming.
However, most existing local dimming algorithms are mostly motivated by simple implementation, and they often lack concern for visual quality. To fully realize the potential of LED-backlit LCDs and reduce
the artifacts that often occur in current systems, we propose a novel local dimming technique that can
achieve the theoretical highest fidelity of intensity reproduction in either l1 or l2 metrics. Both the exact and fast approximate versions of the optimal local dimming algorithm are proposed. Simulation results
demonstrate superior performances of the proposed algorithm in terms of visual quality and power
consumption.
ETPL
DIP-028
Multiscale Bi-Gaussian Filter for Adjacent Curvilinear Structures Detection With
Application to Vasculature Images
Abstract: The intensity or gray-level derivatives have been widely used in image segmentation and
enhancement. Conventional derivative filters often suffer from an undesired merging of adjacent objects
because of their intrinsic usage of an inappropriately broad Gaussian kernel; as a result, neighboring structures cannot be properly resolved. To avoid this problem, we propose to replace the low-level
Gaussian kernel with a bi-Gaussian function, which allows independent selection of scales in the
foreground and background. By selecting a narrow neighborhood for the background with regard to the
foreground, the proposed method will reduce interference from adjacent objects simultaneously preserving the ability of intraregion smoothing. Our idea is inspired by a comparative analysis of existing
line filters, in which several traditional methods, including the vesselness, gradient flux, and medialness
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
models, are integrated into a uniform framework. The comparison subsequently aids in understanding the
principles of different filtering kernels, which is also a contribution of this paper. Based on some
axiomatic scale-space assumptions, the full representation of our bi-Gaussian kernel is deduced. The popular γ-normalization scheme for multiscale integration is extended to the bi-Gaussian operators.
Finally, combined with a parameter-free shape estimation scheme, a derivative filter is developed for the
typical applications of curvilinear structure detection and vasculature image enhancement. It is verified in experiments using synthetic and real data that the proposed method outperforms several conventional
filters in separating closely located objects and being robust to noise.
ETPL
DIP-029 Visually Lossless Encoding for JPEG2000
Abstract: Due to exponential growth in image sizes, visually lossless coding is increasingly being considered as an alternative to numerically lossless coding, which has limited compression ratios. This
paper presents a method of encoding color images in a visually lossless manner using JPEG2000. In order
to hide coding artifacts caused by quantization, visibility thresholds (VTs) are measured and used for quantization of subband signals in JPEG2000. The VTs are experimentally determined from statistically
modeled quantization distortion, which is based on the distribution of wavelet coefficients and the dead-
zone quantizer of JPEG2000. The resulting VTs are adjusted for locally changing backgrounds through a
visual masking model, and then used to determine the minimum number of coding passes to be included in the final codestream for visually lossless quality under the desired viewing conditions. Codestreams
produced by this scheme are fully JPEG2000 Part-I compliant.
ETPL
DIP-030
Rate-Distortion Analysis of Dead-Zone Plus Uniform Threshold Scalar Quantization
and Its Application—Part I: Fundamental Theory,
Abstract: This paper provides a systematic rate-distortion (R-D) analysis of the dead-zone plus uniform
threshold scalar quantization (DZ+UTSQ) with nearly uniform reconstruction quantization (NURQ) for
generalized Gaussian distribution (GGD), which consists of two aspects: R-D performance analysis and
R-D modeling. In R-D performance analysis, we first derive the preliminary constraint of optimum entropy-constrained DZ+UTSQ/NURQ for GGD, under which the property of the GGD distortion-rate
(D-R) function is elucidated. Then for the GGD source of actual transform coefficients, the refined
constraint and precise conditions of optimum DZ+UTSQ/NURQ are rigorously deduced in the real coding bit rate range, and efficient DZ+UTSQ/NURQ design criteria are proposed to reasonably simplify
the utilization of effective quantizers in practice. In R-D modeling, inspired by R-D performance analysis,
the D-R function is first developed, followed by the novel rate-quantization (R-Q) and distortion-quantization (D-Q) models derived using analytical and heuristic methods. The D-R, R-Q, and D-Q
models form the source model describing the relationship between the rate, distortion, and quantization
steps. One application of the proposed source model is the effective two-pass VBR coding algorithm
design on an encoder of H.264/AVC reference software, which achieves constant video quality and desirable rate control accuracy.
ETPL
DIP-031
Rate-Distortion Analysis of Dead-Zone Plus Uniform Threshold Scalar Quantization
and Its Application—Part II: Two-Pass VBR Coding for H.264/AVC
Abstract: In the first part of this paper, we derive a source model describing the relationship between the rate, distortion, and quantization steps of the dead-zone plus uniform threshold scalar quantizers with
nearly uniform reconstruction quantizers for generalized Gaussian distribution. This source model
consists of rate-quantization, distortion-quantization (D-Q), and distortion-rate (D-R) models. In this part,
we first rigorously confirm the accuracy of the proposed source model by comparing the calculated results with the coding data of JM 16.0. Efficient parameter estimation strategies are then developed to
better employ this source model in our two-pass rate control method for H.264 variable bit rate coding.
Based on our D-Q and D-R models, the proposed method is of high stability, low complexity and is easy
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
to implement. Extensive experiments demonstrate that the proposed method achieves: 1) average peak
signal-to-noise ratio variance of only 0.0658 dB, compared to 1.8758 dB of JM 16.0's method, with an
average rate control error of 1.95% and 2) significant improvement in smoothing the video quality compared with the latest two-pass rate control method.
ETPL
DIP-032 Nonrigid Image Registration With Crystal Dislocation Energy
Abstract: The goal of nonrigid image registration is to find a suitable transformation such that the
transformed moving image becomes similar to the reference image. The image registration problem can also be treated as an optimization problem, which tries to minimize an objective energy function that
measures the differences between two involved images. In this paper, we consider image matching as the
process of aligning object boundaries in two different images. The registration energy function can be defined based on the total energy associated with the object boundaries. The optimal transformation is
obtained by finding the equilibrium state when the total energy is minimized, which indicates the object
boundaries find their correspondences and stop deforming. We make an analogy between the above processes with the dislocation system in physics. The object boundaries are viewed as dislocations (line
defects) in crystal. Then the well-developed dislocation energy is used to derive the energy assigned to
object boundaries in images. The newly derived registration energy function takes the global gradient
information of the entire image into consideration, and produces an orientation-dependent and long-range interaction between two images to drive the registration process. This property of interaction endows the
new registration framework with both fast convergence rate and high registration accuracy. Moreover, the
new energy function can be adapted to realize symmetric diffeomorphic transformation so as to ensure one-to-one matching between subjects. In this paper, the superiority of the new method is theoretically
proven, experimentally tested and compared with the state-of-the-art SyN method. Experimental results
with 3-D magnetic resonance brain images demonstrate that the proposed method outperforms the compared methods in terms of both registration accuracy and computation time.
ETPL
DIP-033 Double Shrinking Sparse Dimension Reduction
Abstract: Learning tasks such as classification and clustering usually perform better and cost less (time
and space) on compressed representations than on the original data. Previous works mainly compress data via dimension reduction. In this paper, we propose “double shrinking” to compress image data on both
dimensionality and cardinality via building either sparse low-dimensional representations or a sparse
projection matrix for dimension reduction. We formulate a double shrinking model (DSM) as an l1 regularized variance maximization with constraint ||x||2=1, and develop a double shrinking algorithm
(DSA) to optimize DSM. DSA is a path-following algorithm that can build the whole solution path of
locally optimal solutions of different sparse levels. Each solution on the path is a “warm start” for
searching the next sparser one. In each iteration of DSA, the direction, the step size, and the Lagrangian multiplier are deduced from the Karush-Kuhn-Tucker conditions. The magnitudes of trivial variables are
shrunk and the importances of critical variables are simultaneously augmented along the selected
direction with the determined step length. Double shrinking can be applied to manifold learning and feature selections for better interpretation of features, and can be combined with classification and
clustering to boost their performance. The experimental results suggest that double shrinking produces
efficient and effective data compression.
ETPL
DIP-034 Reinitialization-Free Level Set Evolution via Reaction Diffusion
Abstract: This paper presents a novel reaction-diffusion (RD) method for implicit active contours that is
completely free of the costly reinitialization procedure in level set evolution (LSE). A diffusion term is
introduced into LSE, resulting in an RD-LSE equation, from which a piecewise constant solution can be
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
derived. In order to obtain a stable numerical solution from the RD-based LSE, we propose a two-step
splitting method to iteratively solve the RD-LSE equation, where we first iterate the LSE equation, then
solve the diffusion equation. The second step regularizes the level set function obtained in the first step to ensure stability, and thus the complex and costly reinitialization procedure is completely eliminated from
LSE. By successfully applying diffusion to LSE, the RD-LSE model is stable by means of the simple
finite difference method, which is very easy to implement. The proposed RD method can be generalized to solve the LSE for both variational level set method and partial differential equation-based level set
method. The RD-LSE method shows very good performance on boundary antileakage. The extensive and
promising experimental results on synthetic and real images validate the effectiveness of the proposed
RD-LSE approach.
ETPL
DIP-035 Track Creation and Deletion Framework for Long-Term Online Multiface Tracking
Abstract: To improve visual tracking, a large number of papers study more powerful features, or better
cue fusion mechanisms, such as adaptation or contextual models. A complementary approach consists of improving the track management, that is, deciding when to add a target or stop its tracking, for example,
in case of failure. This is an essential component for effective multiobject tracking applications, and is
often not trivial. Deciding whether or not to stop a track is a compromise between avoiding erroneous
early stopping while tracking is fine, and erroneous continuation of tracking when there is an actual failure. This decision process, very rarely addressed in the literature, is difficult due to object detector
deficiencies or observation models that are insufficient to describe the full variability of tracked objects
and deliver reliable likelihood (tracking) information. This paper addresses the track management issue and presents a real-time online multiface tracking algorithm that effectively deals with the above
difficulties. The tracking itself is formulated in a multiobject state-space Bayesian filtering framework
solved with Markov Chain Monte Carlo. Within this framework, an explicit probabilistic filtering step decides when to add or remove a target from the tracker, where decisions rely on multiple cues such as
face detections, likelihood measures, long-term observations, and track state characteristics. The method
has been applied to three challenging data sets of more than 9 h in total, and demonstrate a significant
performance increase compared to more traditional approaches (Markov Chain Monte Carlo, reversible-jump Markov Chain Monte Carlo) only relying on head detection and likelihood for track management.
ETPL
DIP-036 Wavelet Domain Multifractal Analysis for Static and Dynamic Texture Classification
Abstract: In this paper, we propose a new texture descriptor for both static and dynamic textures. The new descriptor is built on the wavelet-based spatial-frequency analysis of two complementary wavelet
pyramids: standard multiscale and wavelet leader. These wavelet pyramids essentially capture the local
texture responses in multiple high-pass channels in a multiscale and multiorientation fashion, in which
there exists a strong power-law relationship for natural images. Such a power-law relationship is characterized by the so-called multifractal analysis. In addition, two more techniques, scale normalization
and multiorientation image averaging, are introduced to further improve the robustness of the proposed
descriptor. Combining these techniques, the proposed descriptor enjoys both high discriminative power and robustness against many environmental changes. We apply the descriptor for classifying both static
and dynamic textures. Our method has demonstrated excellent performance in comparison with the state-
of-the-art approaches in several public benchmark datasets.
ETPL
DIP-037
Video Object Tracking in the Compressed Domain Using Spatio-Temporal Markov
Random Fields
Abstract: Despite the recent progress in both pixel-domain and compressed-domain video object tracking,
the need for a tracking framework with both reasonable accuracy and reasonable complexity still exists.
This paper presents a method for tracking moving objects in H.264/AVC-compressed video sequences
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
using a spatio-temporal Markov random field (ST-MRF) model. An ST-MRF model naturally integrates
the spatial and temporal aspects of the object's motion. Built upon such a model, the proposed method
works in the compressed domain and uses only the motion vectors (MVs) and block coding modes from the compressed bitstream to perform tracking. First, the MVs are preprocessed through intracoded block
motion approximation and global motion compensation. At each frame, the decision of whether a
particular block belongs to the object being tracked is made with the help of the ST-MRF model, which is updated from frame to frame in order to follow the changes in the object's motion. The proposed method
is tested on a number of standard sequences, and the results demonstrate its advantages over some of the
recent state-of-the-art methods.
ETPL
DIP-038 Online Object Tracking With Sparse Prototypes
Abstract: Online object tracking is a challenging problem as it entails learning an effective model to
account for appearance change caused by intrinsic and extrinsic factors. In this paper, we propose a novel
online object tracking algorithm with sparse prototypes, which exploits both classic principal component analysis (PCA) algorithms with recent sparse representation schemes for learning effective appearance
models. We introduce l1regularization into the PCA reconstruction, and develop a novel algorithm to
represent an object by sparse prototypes that account explicitly for data and noise. For tracking, objects
are represented by the sparse prototypes learned online with update. In order to reduce tracking drift, we present a method that takes occlusion and motion blur into account rather than simply includes image
observations for model update. Both qualitative and quantitative evaluations on challenging image
sequences demonstrate that the proposed tracking algorithm performs favorably against several state-of-the-art methods.
ETPL
DIP-039 Automatic Dynamic Texture Segmentation Using Local Descriptors and Optical Flow
bstract: A dynamic texture (DT) is an extension of the texture to the temporal domain. How to segment a
DT is a challenging problem. In this paper, we address the problem of segmenting a DT into disjoint regions. A DT might be different from its spatial mode (i.e., appearance) and/or temporal mode (i.e.,
motion field). To this end, we develop a framework based on the appearance and motion modes. For the
appearance mode, we use a new local spatial texture descriptor to describe the spatial mode of the DT; for the motion mode, we use the optical flow and the local temporal texture descriptor to represent the
temporal variations of the DT. In addition, for the optical flow, we use the histogram of oriented optical
flow (HOOF) to organize them. To compute the distance between two HOOFs, we develop a simple effective and efficient distance measure based on Weber's law. Furthermore, we also address the problem
of threshold selection by proposing a method for determining thresholds for the segmentation method by
an offline supervised statistical learning. The experimental results show that our method provides very
good segmentation results compared to the state-of-the-art methods in segmenting regions that differ in their dynamics.
ETPL
DIP-040 Efficient Image Classification via Multiple Rank Regression
bstract: The problem of image classification has aroused considerable research interest in the field of image processing. Traditional methods often convert an image to a vector and then use a vector-based
classifier. In this paper, a novel multiple rank regression model (MRR) for matrix data classification is
proposed. Unlike traditional vector-based methods, we employ multiple-rank left projecting vectors and
right projecting vectors to regress each matrix data set to its label for each category. The convergence behavior, initialization, computational complexity, and parameter determination are also analyzed.
Compared with vector-based regression methods, MRR achieves higher accuracy and has lower
computational complexity. Compared with traditional supervised tensor-based methods, MRR performs
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
better for matrix data classification. Promising experimental results on face, object, and hand-written digit
image classification tasks are provided to show the effectiveness of our method.
ETPL
DIP-041
Regularized Discriminative Spectral Regression Method for Heterogeneous Face
Matching
Abstract: Face recognition is confronted with situations in which face images are captured in various
modalities, such as the visual modality, the near infrared modality, and the sketch modality. This is
known as heterogeneous face recognition. To solve this problem, we propose a new method called
discriminative spectral regression (DSR). The DSR maps heterogeneous face images into a common discriminative subspace in which robust classification can be achieved. In the proposed method, the
subspace learning problem is transformed into a least squares problem. Different mappings should map
heterogeneous images from the same class close to each other, while images from different classes should be separated as far as possible. To realize this, we introduce two novel regularization terms, which reflect
the category relationships among data, into the least squares approach. Experiments conducted on two
heterogeneous face databases validate the superiority of the proposed method over the previous methods.
ETPL
DIP-042 Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search
Abstract: Due to the popularity of social media websites, extensive research efforts have been dedicated
to tag-based social image search. Both visual information and tags have been investigated in the research
field. However, most existing methods use tags and visual characteristics either separately or sequentially in order to estimate the relevance of images. In this paper, we propose an approach that simultaneously
utilizes both visual and textual information to estimate the relevance of user tagged images. The
relevance estimation is determined with a hypergraph learning approach. In this method, a social image hypergraph is constructed, where vertices represent images and hyperedges represent visual or textual
terms. Learning is achieved with use of a set of pseudo-positive images, where the weights of hyperedges
are updated throughout the learning process. In this way, the impact of different tags and visual words can
be automatically modulated. Comparative results of the experiments conducted on a dataset including 370+images are presented, which demonstrate the effectiveness of the proposed approach.
ETPL
DIP-043 Action Search by Example Using Randomized Visual Vocabularies
Abstract: Because actions can be small video objects, it is a challenging problem to search for similar actions in crowded and dynamic scenes when a single query example is provided. We propose a fast
action search method that can efficiently locate similar actions spatiotemporally. Both the query action
and the video datasets are characterized by spatio-temporal interest points. Instead of using a unified visual vocabulary to index all interest points in the database, we propose randomized visual vocabularies
to enable fast and robust interest point matching. To accelerate action localization, we have developed a
coarse-to-fine video subvolume search scheme, which is several orders of magnitude faster than the
existing spatio-temporal branch and bound search. Our experiments on cross-dataset action search show promising results when compared with the state of the arts. Additional experiments on a 5-h versatile
video dataset validate the efficiency of our method, where an action search can be finished in just 37.6 s
on a regular desktop machine.
ETPL
DIP-044
Robust Albedo Estimation From a Facial Image With Cast Shadow Under General
Unknown Lighting
Abstract: Albedo estimation from a facial image is crucial for various computer vision tasks, such as 3-D
morphable-model fitting, shape recovery, and illumination-invariant face recognition, but the currently
available methods do not give good estimation results. Most methods ignore the influence of cast shadows and require a statistical model to obtain facial albedo. This paper describes a method for albedo
estimation that makes combined use of image intensity and facial depth information for an image with
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
cast shadows and general unknown light. In order to estimate the albedo map of a face, we formulate the
albedo estimation problem as a linear programming problem that minimizes intensity error under the
assumption that the surface of the face has constant albedo. Since the solution thus obtained has significant errors in certain parts of the facial image, the albedo estimate needs to be compensated. We
minimize the mean square error of albedo under the assumption that the surface normals, which are
calculated from the facial depth information, are corrupted with noise. The proposed method is simple and the experimental results show that this method gives better estimates than other methods.
ETPL
DIP-045 Separable Markov Random Field Model and Its Applications in Low Level Vision
Abstract: This brief proposes a continuously-valued Markov random field (MRF) model with separable
filter bank, denoted as MRFSepa, which significantly reduces the computational complexity in the MRF modeling. In this framework, we design a novel gradient-based discriminative learning method to learn
the potential functions and separable filter banks. We learn MRFSepa models with 2-D and 3-D separable
filter banks for the applications of gray-scale/color image denoising and color image demosaicing. By implementing MRFSepa model on graphics processing unit, we achieve real-time image denoising and
fast image demosaicing with high-quality results.
ETPL
DIP-046 Two-Direction Nonlocal Model for Image Denoising
Abstract: Similarities inherent in natural images have been widely exploited for image denoising and
other applications. In fact, if a cluster of similar image patches is rearranged into a matrix, similarities exist both between columns and rows. Using the similarities, we present a two-directional nonlocal
(TDNL) variational model for image denoising. The solution of our model consists of three components:
one component is a scaled version of the original observed image and the other two components are
obtained by utilizing the similarities. Specifically, by using the similarity between columns, we get a nonlocal-means-like estimation of the patch with consideration to all similar patches, while the weights
are not the pairwise similarities but a set of clusterwise coefficients. Moreover, by using the similarity
between rows, we also get nonlocal-autoregression-like estimations for the center pixels of the similar patches. The TDNL model leads to an alternative minimization algorithm. Experiments indicate that the
model can perform on par with or better than the state-of-the-art denoising methods.
ETPL
DIP-047
Optimizing the Error Diffusion Filter for Blue Noise Halftoning With Multiscale Error
Diffusion
Abstract: A good halftoning output should bear a blue noise characteristic contributed by isotropically-
distributed isolated dots. Multiscale error diffusion (MED) algorithms try to achieve this by exploiting
radially symmetric and noncausal error diffusion filters to guarantee spatial homogeneity. In this brief, an
optimized diffusion filter is suggested to make the diffusion close to isotropic. When it is used with MED, the resulting output has a nearly ideal blue noise characteristic.
ETPL
DIP-049 Sparse Representation With Kernels
Abstract: Recent research has shown the initial success of sparse coding (Sc) in solving many computer vision tasks. Motivated by the fact that kernel trick can capture the nonlinear similarity of features, which
helps in finding a sparse representation of nonlinear features, we propose kernel sparse representation
(KSR). Essentially, KSR is a sparse coding technique in a high dimensional feature space mapped by an
implicit mapping function. We apply KSR to feature coding in image classification, face recognition, and kernel matrix approximation. More specifically, by incorporating KSR into spatial pyramid matching
(SPM), we develop KSRSPM, which achieves a good performance for image classification. Moreover,
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
KSR-based feature coding can be shown as a generalization of efficient match kernel and an extension of
Sc-based SPM. We further show that our proposed KSR using a histogram intersection kernel (HIK) can
be considered a soft assignment extension of HIK-based feature quantization in the feature coding process. Besides feature coding, comparing with sparse coding, KSR can learn more discriminative
sparse codes and achieve higher accuracy for face recognition. Moreover, KSR can also be applied to
kernel matrix approximation in large scale learning tasks, and it demonstrates its robustness to kernel matrix approximation, especially when a small fraction of the data is used. Extensive experimental results
demonstrate promising results of KSR in image classification, face recognition, and kernel matrix
approximation. All these applications prove the effectiveness of KSR in computer vision and machine
learning tasks.
ETPL
DIP-050 Image-Difference Prediction: From Grayscale to Color
Abstract: Existing image-difference measures show excellent accuracy in predicting distortions, such as
lossy compression, noise, and blur. Their performance on certain other distortions could be improved; one example of this is gamut mapping. This is partly because they either do not interpret chromatic
information correctly or they ignore it entirely. We present an image-difference framework that
comprises image normalization, feature extraction, and feature combination. Based on this framework,
we create image-difference measures by selecting specific implementations for each of the steps. Particular emphasis is placed on using color information to improve the assessment of gamut-mapped
images. Our best image-difference measure shows significantly higher prediction accuracy on a gamut-
mapping dataset than all other evaluated measures.
ETPL
DIP-051 When Does Computational Imaging Improve Performance?
Abstract: A number of computational imaging techniques are introduced to improve image quality by
increasing light throughput. These techniques use optical coding to measure a stronger signal level. However, the performance of these techniques is limited by the decoding step, which amplifies noise.
Although it is well understood that optical coding can increase performance at low light levels, little is
known about the quantitative performance advantage of computational imaging in general settings. In this paper, we derive the performance bounds for various computational imaging techniques. We then discuss
the implications of these bounds for several real-world scenarios (e.g., illumination conditions, scene
properties, and sensor noise characteristics). Our results show that computational imaging techniques do not provide a significant performance advantage when imaging with illumination that is brighter than
typical daylight. These results can be readily used by practitioners to design the most suitable imaging
systems given the application at hand.
ETPL
DIP-052 Anisotropic Interpolation of Sparse Generalized Image Samples
Abstract: Practical image-acquisition systems are often modeled as a continuous-domain prefilter
followed by an ideal sampler, where generalized samples are obtained after convolution with the impulse
response of the device. In this paper, our goal is to interpolate images from a given subset of such samples. We express our solution in the continuous domain, considering consistent resampling as a data-
fidelity constraint. To make the problem well posed and ensure edge-preserving solutions, we develop an
efficient anisotropic regularization approach that is based on an improved version of the edge-enhancing
anisotropic diffusion equation. Following variational principles, our reconstruction algorithm minimizes successive quadratic cost functionals. To ensure fast convergence, we solve the corresponding sequence
of linear problems by using multigrid iterations that are specifically tailored to their sparse structure. We
conduct illustrative experiments and discuss the potential of our approach both in terms of algorithmic
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
design and reconstruction quality. In particular, we present results that use as little as 2% of the image
samples.
ETPL
DIP-053 Clustered-Dot Halftoning With Direct Binary Search
Abstract: In this paper, we present a new algorithm for aperiodic clustered-dot halftoning based on direct
binary search (DBS). The DBS optimization framework has been modified for designing clustered-dot
texture, by using filters with different sizes in the initialization and update steps of the algorithm.
Following an intuitive explanation of how the clustered-dot texture results from this modified framework, we derive a closed-form cost metric which, when minimized, equivalently generates stochastic clustered-
dot texture. An analysis of the cost metric and its influence on the texture quality is presented, which is
followed by a modification to the cost metric to reduce computational cost and to make it more suitable for screen design.
ETPL
DIP-054 Task-Specific Image Partitioning
Abstract: Image partitioning is an important preprocessing step for many of the state-of-the-art algorithms used for performing high-level computer vision tasks. Typically, partitioning is conducted without regard
to the task in hand. We propose a task-specific image partitioning framework to produce a region-based
image representation that will lead to a higher task performance than that reached using any task-
oblivious partitioning framework and existing supervised partitioning framework, albeit few in number. The proposed method partitions the image by means of correlation clustering, maximizing a linear
discriminant function defined over a superpixel graph. The parameters of the discriminant function that
define task-specific similarity/dissimilarity among superpixels are estimated based on structured support vector machine (S-SVM) using task-specific training data. The S-SVM learning leads to a better
generalization ability while the construction of the superpixel graph used to define the discriminant
function allows a rich set of features to be incorporated to improve discriminability and robustness. We
evaluate the learned task-aware partitioning algorithms on three benchmark datasets. Results show that task-aware partitioning leads to better labeling performance than the partitioning computed by the state-
of-the-art general-purpose and supervised partitioning algorithms. We believe that the task-specific image
partitioning paradigm is widely applicable to improving performance in high-level image understanding tasks
ETPL
DIP-055 Generalized Inverse-Approach Model for Spectral-Signal Recovery
Abstract: We have studied the transformation system of a spectral signal to the response of the system as a linear mapping from higher to lower dimensional space in order to look more closely at inverse-
approach models. The problem of spectral-signal recovery from the response of a transformation system
is generally stated on the basis of the generalized inverse-approach theorem, which provides a modular
model for generating a spectral signal from a given response value. The controlling criteria, including the robustness of the inverse model to perturbations of the response caused by noise, and the condition
number for matrix inversion, are proposed, together with the mean square error, so as to create an
efficient model for spectral-signal recovery. The spectral-reflectance recovery and color correction of natural surface color are numerically investigated to appraise different illuminant-observer transformation
matrices based on the proposed controlling criteria both in the absence and the presence of noise.
ETPL
DIP-056 Spatio-Temporal Auxiliary Particle Filtering With -Norm-Based Appearance Model
Learning for Robust Visual Tracking
Abstract: In this paper, we propose an efficient and accurate visual tracker equipped with a new particle filtering algorithm and robust subspace learning-based appearance model. The proposed visual tracker
avoids drifting problems caused by abrupt motion changes and severe appearance variations that are well-
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
known difficulties in visual tracking. The proposed algorithm is based on a type of auxiliary particle
filtering that uses a spatio-temporal sliding window. Compared to conventional particle filtering
algorithms, spatio-temporal auxiliary particle filtering is computationally efficient and successfully implemented in visual tracking. In addition, a real-time robust principal component pursuit (RRPCP)
equipped with l1-norm optimization has been utilized to obtain a new appearance model learning block
for reliable visual tracking especially for occlusions in object appearance. The overall tracking framework based on the dual ideas is robust against occlusions and out-of-plane motions because of the proposed
spatio-temporal filtering and recursive form of RRPCP. The designed tracker has been evaluated using
challenging video sequences, and the results confirm the advantage of using this tracker.
ETPL
DIP-057
Manifold Regularized Multitask Learning for Semi-Supervised Multilabel Image
Classification
Abstract: It is a significant challenge to classify images with multiple labels by using only a small number
of labeled samples. One option is to learn a binary classifier for each label and use manifold
regularization to improve the classification performance by exploring the underlying geometric structure of the data distribution. However, such an approach does not perform well in practice when images from
multiple concepts are represented by high-dimensional visual features. Thus, manifold regularization is
insufficient to control the model complexity. In this paper, we propose a manifold regularized multitask
learning (MRMTL) algorithm. MRMTL learns a discriminative subspace shared by multiple classification tasks by exploiting the common structure of these tasks. It effectively controls the model
complexity because different tasks limit one another's search volume, and the manifold regularization
ensures that the functions in the shared hypothesis space are smooth along the data manifold. We conduct extensive experiments, on the PASCAL VOC'07 dataset with 20 classes and the MIR dataset with 38
classes, by comparing MRMTL with popular image classification algorithms. The results suggest that
MRMTL is effective for image classification.
ETPL
DIP-058 Linear Distance Coding for Image Classification
Abstract: The feature coding-pooling framework is shown to perform well in image classification tasks,
because it can generate discriminative and robust image representations. The unavoidable information
loss incurred by feature quantization in the coding process and the undesired dependence of pooling on the image spatial layout, however, may severely limit the classification. In this paper, we propose a linear
distance coding (LDC) method to capture the discriminative information lost in traditional coding
methods while simultaneously alleviating the dependence of pooling on the image spatial layout. The core of the LDC lies in transforming local features of an image into more discriminative distance vectors,
where the robust image-to-class distance is employed. These distance vectors are further encoded into
sparse codes to capture the salient features of the image. The LDC is theoretically and experimentally
shown to be complementary to the traditional coding methods, and thus their combination can achieve higher classification accuracy. We demonstrate the effectiveness of LDC on six data sets, two of each of
three types (specific object, scene, and general object), i.e., Flower 102 and PFID 61, Scene 15 and
Indoor 67, Caltech 101 and Caltech 256. The results show that our method generally outperforms the traditional coding methods, and achieves or is comparable to the state-of-the-art performance on these
data sets.
ETPL
DIP-059 What Are We Tracking: A Unified Approach of Tracking and Recognition
Abstract: Tracking is essentially a matching problem. While traditional tracking methods mostly focus on low-level image correspondences between frames, we argue that high-level semantic correspondences are
indispensable to make tracking more reliable. Based on that, a unified approach of low-level object
tracking and high-level recognition is proposed for single object tracking, in which the target category is
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
actively recognized during tracking. High-level offline models corresponding to the recognized category
are then adaptively selected and combined with low-level online tracking models so as to achieve better
tracking performance. Extensive experimental results show that our approach outperforms state-of-the-art online models in many challenging tracking scenarios such as drastic view change, scale change,
background clutter, and morphable objects.
ETPL
DIP-060
Unsupervised Amplitude and Texture Classification of SAR Images With Multinomial
Latent Model
Abstract: In this paper, we combine amplitude and texture statistics of the synthetic aperture radar images for the purpose of model-based classification. In a finite mixture model, we bring together the Nakagami
densities to model the class amplitudes and a 2-D auto-regressive texture model with t-distributed
regression error to model the textures of the classes. A non-stationary multinomial logistic latent class label model is used as a mixture density to obtain spatially smooth class segments. The classification
expectation-maximization algorithm is performed to estimate the class parameters and to classify the
pixels. We resort to integrated classification likelihood criterion to determine the number of classes in the model. We present our results on the classification of the land covers obtained in both supervised and
unsupervised cases processing TerraSAR-X, as well as COSMO-SkyMed data.
ETPL
DIP-061
Fuzzy C-Means Clustering With Local Information and Kernel Metric for Image
Segmentation
Abstract: In this paper, we present an improved fuzzy C-means (FCM) algorithm for image segmentation by introducing a tradeoff weighted fuzzy factor and a kernel metric. The tradeoff weighted fuzzy factor
depends on the space distance of all neighboring pixels and their gray-level difference simultaneously. By
using this factor, the new algorithm can accurately estimate the damping extent of neighboring pixels. In order to further enhance its robustness to noise and outliers, we introduce a kernel distance measure to its
objective function. The new algorithm adaptively determines the kernel parameter by using a fast
bandwidth selection rule based on the distance variance of all data points in the collection. Furthermore,
the tradeoff weighted fuzzy factor and the kernel distance measure are both parameter free. Experimental results on synthetic and real images show that the new algorithm is effective and efficient, and is
relatively independent of this type of noise.
ETPL
DIP-062 Rate-Distortion Optimized Rate Control for Depth Map-Based 3-D Video Coding
Abstract: In this paper, a novel rate control scheme with optimized bits allocation for the 3-D video
coding is proposed. First, we investigate the R-D characteristics of the texture and depth map of the coded
view, as well as the quality dependency between the virtual view and the coded view. Second, an optimal bit allocation scheme is developed to allocate target bits for both the texture and depth maps of different
views. Meanwhile, a simplified model parameter estimation scheme is adopted to speed up the coding
process. Finally, the experimental results on various 3-D video sequences demonstrate that the proposed
algorithm achieves excellent R-D efficiency and bit rate accuracy compared to benchmark algorithms.
ETPL
DIP-063 Performance Evaluation Methodology for Historical Document Image Binarization
Abstract: Document image binarization is of great importance in the document image analysis and
recognition pipeline since it affects further stages of the recognition process. The evaluation of a binarization method aids in studying its algorithmic behavior, as well as verifying its effectiveness, by
providing qualitative and quantitative indication of its performance. This paper addresses a pixel-based
binarization evaluation methodology for historical handwritten/machine-printed document images. In the
proposed evaluation scheme, the recall and precision evaluation measures are properly modified using a weighting scheme that diminishes any potential evaluation bias. Additional performance metrics of the
proposed evaluation scheme consist of the percentage rates of broken and missed text, false alarms,
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
background noise, character enlargement, and merging. Several experiments conducted in comparison
with other pixel-based evaluation measures demonstrate the validity of the proposed evaluation scheme.
ETPL
DIP-064 Video Quality Pooling Adaptive to Perceptual Distortion Severity
Abstract: It is generally recognized that severe video distortions that are transient in space and/or time
have a large effect on overall perceived video quality. In order to understand this phenomena, we study
the distribution of spatio-temporally local quality scores obtained from several video quality assessment
(VQA) algorithms on videos suffering from compression and lossy transmission over communication channels. We propose a content adaptive spatial and temporal pooling strategy based on the observed
distribution. Our method adaptively emphasizes “worst” scores along both the spatial and temporal
dimensions of a video sequence and also considers the perceptual effect of large-area cohesive motion flow such as egomotion. We demonstrate the efficacy of the method by testing it using three different
VQA algorithms on the LIVE Video Quality database and the EPFL-PoliMI video quality database.
ETPL
DIP-065 Modified Gradient Search for Level Set Based Image Segmentation
Abstract: Level set methods are a popular way to solve the image segmentation problem. The solution
contour is found by solving an optimization problem where a cost functional is minimized. Gradient
descent methods are often used to solve this optimization problem since they are very easy to implement
and applicable to general nonconvex functionals. They are, however, sensitive to local minima and often display slow convergence. Traditionally, cost functionals have been modified to avoid these problems. In
this paper, we instead propose using two modified gradient descent methods, one using a momentum term
and one based on resilient propagation. These methods are commonly used in the machine learning community. In a series of 2-D/3-D-experiments using real and synthetic data with ground truth, the
modifications are shown to reduce the sensitivity for local optima and to increase the convergence rate.
The parameter sensitivity is also investigated. The proposed methods are very simple modifications of the
basic method, and are directly compatible with any type of level set implementation. Downloadable reference code with examples is available online.
ETPL
DIP-066
Maximum Margin Correlation Filter: A New Approach for Localization and
Classification
Abstract: Support vector machine (SVM) classifiers are popular in many computer vision tasks. In most of them, the SVM classifier assumes that the object to be classified is centered in the query image, which
might not always be valid, e.g., when locating and classifying a particular class of vehicles in a large
scene. In this paper, we introduce a new classifier called Maximum Margin Correlation Filter (MMCF), which, while exhibiting the good generalization capabilities of SVM classifiers, is also capable of
localizing objects of interest, thereby avoiding the need for image centering as is usually required in SVM
classifiers. In other words, MMCF can simultaneously localize and classify objects of interest. We test
the efficacy of the proposed classifier on three different tasks: vehicle recognition, eye localization, and face classification. We demonstrate that MMCF outperforms SVM classifiers as well as well known
correlation filters.
ETPL
DIP-067 Adaptive Fingerprint Image Enhancement With Emphasis on Preprocessing of Data
Abstract: This article proposes several improvements to an adaptive fingerprint enhancement method that
is based on contextual filtering. The term adaptive implies that parameters of the method are automatically adjusted based on the input fingerprint image. Five processing blocks comprise the
adaptive fingerprint enhancement method, where four of these blocks are updated in our proposed
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
system. Hence, the proposed overall system is novel. The four updated processing blocks are: 1)
preprocessing; 2) global analysis; 3) local analysis; and 4) matched filtering. In the preprocessing and
local analysis blocks, a nonlinear dynamic range adjustment method is used. In the global analysis and matched filtering blocks, different forms of order statistical filters are applied. These processing blocks
yield an improved and new adaptive fingerprint image processing method. The performance of the
updated processing blocks is presented in the evaluation part of this paper. The algorithm is evaluated toward the NIST developed NBIS software for fingerprint recognition on FVC databases.
ETPL
DIP-068 Objective Quality Assessment of Tone-Mapped Images
Abstract: Tone-mapping operators (TMOs) that convert high dynamic range (HDR) to low dynamic range
(LDR) images provide practically useful tools for the visualization of HDR images on standard LDR displays. Different TMOs create different tone-mapped images, and a natural question is which one has
the best quality. Without an appropriate quality measure, different TMOs cannot be compared, and
further improvement is directionless. Subjective rating may be a reliable evaluation method, but it is expensive and time consuming, and more importantly, is difficult to be embedded into optimization
frameworks. Here we propose an objective quality assessment algorithm for tone-mapped images by
combining: 1) a multiscale signal fidelity measure on the basis of a modified structural similarity index
and 2) a naturalness measure on the basis of intensity statistics of natural images. Validations using independent subject-rated image databases show good correlations between subjective ranking score and
the proposed tone-mapped image quality index (TMQI). Furthermore, we demonstrate the extended
applications of TMQI using two examples - parameter tuning for TMOs and adaptive fusion of multiple tone-mapped images.
ETPL
DIP-069 Catching a Rat by Its Edglets
Abstract: Computer vision is a noninvasive method for monitoring laboratory animals. In this article, we
propose a robust tracking method that is capable of extracting a rodent from a frame under uncontrolled normal laboratory conditions. The method consists of two steps. First, a sliding window combines three
features to coarsely track the animal. Then, it uses the edglets of the rodent to adjust the tracked region to
the animal's boundary. The method achieves an average tracking error that is smaller than a representative state-of-the-art method.
ETPL
DIP-070 Juxtaposed Color Halftoning Relying on Discrete Lines
Abstract: Most halftoning techniques allow screen dots to overlap. They rely on the assumption that the inks are transparent, i.e., the inks do not scatter a significant portion of the light back to the air. However,
many special effect inks, such as metallic inks, iridescent inks, or pigmented inks, are not transparent. In
order to create halftone images, halftone dots formed by such inks should be juxtaposed, i.e., printed side
by side. We propose an efficient juxtaposed color halftoning technique for placing any desired number of colorant layers side by side without overlapping. The method uses a monochrome library of screen
elements made of discrete lines with rational thicknesses. Discrete line juxtaposed color halftoning is
performed efficiently by multiple accesses to the screen element library.
ETPL
DIP-071 Image Noise Level Estimation by Principal Component Analysis
Abstract: The problem of blind noise level estimation arises in many image processing applications, such
as denoising, compression, and segmentation. In this paper, we propose a new noise level estimation
method on the basis of principal component analysis of image blocks. We show that the noise variance can be estimated as the smallest eigenvalue of the image block covariance matrix. Compared with 13
existing methods, the proposed approach shows a good compromise between speed and accuracy. It is at
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
least 15 times faster than methods with similar accuracy, and it is at least two times more accurate than
other methods. Our method does not assume the existence of homogeneous areas in the input image and,
hence, can successfully process images containing only textures.
ETPL
DIP-072
Nonlocal Image Restoration With Bilateral Variance Estimation: A Low-Rank
Approach
Abstract: Simultaneous sparse coding (SSC) or nonlocal image representation has shown great potential
in various low-level vision tasks, leading to several state-of-the-art image restoration techniques,
including BM3D and LSSC. However, it still lacks a physically plausible explanation about why SSC is a better model than conventional sparse coding for the class of natural images. Meanwhile, the problem of
sparsity optimization, especially when tangled with dictionary learning, is computationally difficult to
solve. In this paper, we take a low-rank approach toward SSC and provide a conceptually simple interpretation from a bilateral variance estimation perspective, namely that singular-value decomposition
of similar packed patches can be viewed as pooling both local and nonlocal information for estimating
signal variances. Such perspective inspires us to develop a new class of image restoration algorithms called spatially adaptive iterative singular-value thresholding (SAIST). For noise data, SAIST generalizes
the celebrated BayesShrink from local to nonlocal models; for incomplete data, SAIST extends previous
deterministic annealing-based solution to sparsity optimization through incorporating the idea of
dictionary learning. In addition to conceptual simplicity and computational efficiency, SAIST has achieved highly competent (often better) objective performance compared to several state-of-the-art
methods in image denoising and completion experiments. Our subjective quality results compare
favorably with those obtained by existing techniques, especially at high noise levels and with a large amount of missing data.
ETPL
DIP-073 Variational Approach for the Fusion of Exposure Bracketed Pairs
Abstract: When taking pictures of a dark scene with artificial lighting, ambient light is not sufficient for
most cameras to obtain both accurate color and detail information. The exposure bracketing feature usually available in many camera models enables the user to obtain a series of pictures taken in rapid
succession with different exposure times; the implicit idea is that the user picks the best image from this
set. But in many cases, none of these images is good enough; in general, good brightness and color information are retained from longer-exposure settings, whereas sharp details are obtained from shorter
ones. In this paper, we propose a variational method for automatically combining an exposure-bracketed
pair of images within a single picture that reflects the desired properties of each one. We introduce an energy functional consisting of two terms, one measuring the difference in edge information with the
short-exposure image and the other measuring the local color difference with a warped version of the
long-exposure image. This method is able to handle camera and subject motion as well as noise, and the
results compare favorably with the state of the art.
ETPL
DIP-074 Image Denoising With Dominant Sets by a Coalitional Game Approach
Abstract: Dominant sets are a new graph partition method for pairwise data clustering proposed by Pavan
and Pelillo. We address the problem of dominant sets with a coalitional game model, in which each data point is treated as a player and similar data points are encouraged to group together for cooperation. We
propose betrayal and hermit rules to describe the cooperative behaviors among the players. After applying
the betrayal and hermit rules, an optimal and stable graph partition emerges, and all the players in the
partition will not change their groups. For computational feasibility, we design an approximate algorithm for finding a dominant set of mutually similar players and then apply the algorithm to an application such
as image denoising. In image denoising, every pixel is treated as a player who seeks similar partners
according to its patch appearance in its local neighborhood. By averaging the noisy effects with the
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
similar pixels in the dominant sets, we improve nonlocal means image denoising to restore the intrinsic
structure of the original images and achieve competitive denoising results with the state-of-the-art
methods in visual and quantitative qualities.
ETPL
DIP-075 High-Order Local Spatial Context Modeling by Spatialized Random Forest
Abstract: In this paper, we propose a novel method for spatial context modeling toward boosting visual
discriminating power. We are particularly interested in how to model high-order local spatial contexts instead of the intensively studied second-order spatial contexts, i.e., co-occurrence relations. Motivated
by the recent success of random forest in learning discriminative visual codebook, we present a
spatialized random forest (SRF) approach, which can encode an unlimited length of high-order local spatial contexts. By spatially random neighbor selection and random histogram-bin partition during the
tree construction, the SRF can explore much more complicated and informative local spatial patterns in a
randomized manner. Owing to the discriminative capability test for the random partition in each tree node's split process, a set of informative high-order local spatial patterns are derived, and new images are
then encoded by counting the occurrences of such discriminative local spatial patterns. Extensive
comparison experiments on face recognition and object/scene classification clearly demonstrate the
superiority of the proposed spatial context modeling method over other state-of-the-art approaches for this purpose.
ETPL
DIP-076 Adaptive Inpainting Algorithm Based on DCT Induced Wavelet Regularization
Abstract: In this paper, we propose an image inpainting optimization model whose objective function is a smoothed ℓ
1 norm of the weighted nondecimated discrete cosine transform (DCT) coefficients of the
underlying image. By identifying the objective function of the proposed model as a sum of a
differentiable term and a nondifferentiable term, we present a basic algorithm inspired by Beck and
Teboulle's recent work on the model. Based on this basic algorithm, we propose an automatic way to determine the weights involved in the model and update them in each iteration. The DCT as an
orthogonal transform is used in various applications. We view the rows of a DCT matrix as the filters
associated with a multiresolution analysis. Nondecimated wavelet transforms with these filters are explored in order to analyze the images to be inpainted. Our numerical experiments verify that under the
proposed framework, the filters from a DCT matrix demonstrate promise for the task of image inpainting
ETPL
DIP-077 Extended Coding and Pooling in the HMAX Model
Abstract: This paper presents an extension of the HMAX model, a neural network model for image
classification. The HMAX model can be described as a four-level architecture, with the first level
consisting of multiscale and multiorientation local filters. We introduce two main contributions to this
model. First, we improve the way the local filters at the first level are integrated into more complex filters at the last level, providing a flexible description of object regions and combining local information of
multiple scales and orientations. These new filters are discriminative and yet invariant, two key aspects of
visual classification. We evaluate their discriminative power and their level of invariance to geometrical transformations on a synthetic image set. Second, we introduce a multiresolution spatial pooling. This
pooling encodes both local and global spatial information to produce discriminative image signatures.
Classification results are reported on three image data sets: Caltech101, Caltech256, and fifteen scenes.
We show significant improvements over previous architectures using a similar framework.
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
ETPL
DIP-078 Human Detection in Images via Piecewise Linear Support Vector Machines
Abstract: Human detection in images is challenged by the view and posture variation problem. In this
paper, we propose a piecewise linear support vector machine (PL-SVM) method to tackle this problem. The motivation is to exploit the piecewise discriminative function to construct a nonlinear classification
boundary that can discriminate multiview and multiposture human bodies from the backgrounds in a
high-dimensional feature space. A PL-SVM training is designed as an iterative procedure of feature space division and linear SVM training, aiming at the margin maximization of local linear SVMs. Each
piecewise SVM model is responsible for a subspace, corresponding to a human cluster of a special view
or posture. In the PL-SVM, a cascaded detector is proposed with block orientation features and a histogram of oriented gradient features. Extensive experiments show that compared with several recent
SVM methods, our method reaches the state of the art in both detection accuracy and computational
efficiency, and it performs best when dealing with low-resolution human regions in clutter backgrounds.
ETPL
DIP-079 Short Distance Intra Coding Scheme for High Efficiency Video Coding
Abstract: This paper proposes a new intra coding scheme, known as short distance intra prediction
(SDIP), for high efficiency video coding (HEVC) standardization work. The proposed method is based on
the quadtree unit structure of HEVC. By splitting a coding unit into nonsquare units for coding and reconstruction, and therefore shortening the distances between the predicted and the reference samples,
the accuracy of intra prediction can be improved when applying the directional prediction method. SDIP
improves the intra prediction accuracy, especially for high-detailed regions. This approach is applied in
both luma and chroma components. When integrated into the HEVC reference software, it shows up to a 12.8% bit rate reduction to sequences with rich textures.
ETPL
DIP-080 Probabilistic Graphlet Transfer for Photo Cropping
Abstract: As one of the most basic photo manipulation processes, photo cropping is widely used in the printing, graphic design, and photography industries. In this paper, we introduce graphlets (i.e., small
connected subgraphs) to represent a photo's aesthetic features, and propose a probabilistic model to
transfer aesthetic features from the training photo onto the cropped photo. In particular, by segmenting
each photo into a set of regions, we construct a region adjacency graph (RAG) to represent the global aesthetic feature of each photo. Graphlets are then extracted from the RAGs, and these graphlets capture
the local aesthetic features of the photos. Finally, we cast photo cropping as a candidate-searching
procedure on the basis of a probabilistic model, and infer the parameters of the cropped photos using Gibbs sampling. The proposed method is fully automatic. Subjective evaluations have shown that it is
preferred over a number of existing approaches.
ETPL
DIP-081 On Removing Interpolation and Resampling Artifacts in Rigid Image Registration
Abstract: We show that image registration using conventional interpolation and summation
approximations of continuous integrals can generally fail because of resampling artifacts. These artifacts
negatively affect the accuracy of registration by producing local optima, altering the gradient, shifting the
global optimum, and making rigid registration asymmetric. In this paper, after an extensive literature review, we demonstrate the causes of the artifacts by comparing inclusion and avoidance of resampling
analytically. We show the sum-of-squared-differences cost function formulated as an integral to be more
accurate compared with its traditional sum form in a simple case of image registration. We then discuss aliasing that occurs in rotation, which is due to the fact that an image represented in the Cartesian grid is
sampled with different rates in different directions, and propose the use of oscillatory isotropic
interpolation kernels, which allow better recovery of true global optima by overcoming this type of
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
aliasing. Through our experiments on brain, fingerprint, and white noise images, we illustrate the superior
performance of the integral registration cost function in both the Cartesian and spherical coordinates, and
also validate the introduced radial interpolation kernel by demonstrating the improvement in registration.
ETPL
DIP-082 Fast Positive Deconvolution of Hyperspectral Images
Abstract: In this brief, we provide an efficient scheme for performing deconvolution of large
hyperspectral images under a positivity constraint, while accounting for spatial and spectral smoothness
of the data.
ETPL
DIP-083
Segmentation of Intracranial Vessels and Aneurysms in Phase Contrast Magnetic
Resonance Angiography Using Multirange Filters and Local Variances
Abstract: Segmentation of intensity varying and low-contrast structures is an extremely challenging and
rewarding task. In computer-aided diagnosis of intracranial aneurysms, segmenting the high-intensity major vessels along with the attached low-contrast aneurysms is essential to the recognition of this lethal
vascular disease. It is particularly helpful in performing early and noninvasive diagnosis of intracranial
aneurysms using phase contrast magnetic resonance angiographic (PC-MRA) images. The major challenges of developing a PC-MRA-based segmentation method are the significantly varying voxel
intensity inside vessels with different flow velocities and the signal loss in the aneurysmal regions where
turbulent flows occur. This paper proposes a novel intensity-based algorithm to segment intracranial
vessels and the attached aneurysms. The proposed method can handle intensity varying vasculatures and also the low-contrast aneurysmal regions affected by turbulent flows. It is grounded on the use of
multirange filters and local variances to extract intensity-based image features for identifying contrast
varying vasculatures. The extremely low-intensity region affected by turbulent flows is detected according to the topology of the structure detected by multirange filters and local variances. The proposed
method is evaluated using a phantom image volume with an aneurysm and four clinical cases. It achieves
0.80 dice score in the phantom case. In addition, different components of the proposed method-the
multirange filters, local variances, and topology-based detection-are evaluated in the comparison between the proposed method and its lower complexity variants. Owing to the analogy between these variants and
existing vascular segmentation methods, this comparison also exemplifies the advantage of the proposed
method over the existing approaches. It analyzes the weaknesses of these existing approaches and justifies the use of every component involved in the proposed method. It- is shown that the proposed
method is capable of segmenting blood vessels and the attached aneurysms on PC-MRA images.
ETPL
DIP-084 Robust Image Analysis With Sparse Representation on Quantized Visual Features
Abstract: Recent techniques based on sparse representation (SR) have demonstrated promising
performance in high-level visual recognition, exemplified by the highly accurate face recognition under
occlusion and other sparse corruptions. Most research in this area has focused on classification algorithms
using raw image pixels, and very few have been proposed to utilize the quantized visual features, such as the popular bag-of-words feature abstraction. In such cases, besides the inherent quantization errors,
ambiguity associated with visual word assignment and misdetection of feature points, due to factors such
as visual occlusions and noises, constitutes the major cause of dense corruptions of the quantized representation. The dense corruptions can jeopardize the decision process by distorting the patterns of the
sparse reconstruction coefficients. In this paper, we aim to eliminate the corruptions and achieve robust
image analysis with SR. Toward this goal, we introduce two transfer processes (ambiguity transfer and
mis-detection transfer) to account for the two major sources of corruption as discussed. By reasonably assuming the rarity of the two kinds of distortion processes, we augment the original SR-based
reconstruction objective with mmbl0-norm regularization on the transfer terms to encourage sparsity and,
hence, discourage dense distortion/transfer. Computationally, we relax the nonconvex mmbl0-norm
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
optimization into a convex mmbl1-norm optimization problem, and employ the accelerated proximal
gradient method to optimize the convergence provable updating procedure. Extensive experiments on
four benchmark datasets, Caltech-101, Caltech-256, Corel-5k, and CMU pose, illumination, and expression, manifest the necessity of removing the quantization corruptions and the various advantages of
the proposed framework.
ETPL
DIP-085 Additive White Gaussian Noise Level Estimation in SVD Domain for Images
Abstract: Accurate estimation of Gaussian noise level is of fundamental interest in a wide variety of vision and image processing applications as it is critical to the processing techniques that follow. In this
paper, a new effective noise level estimation method is proposed on the basis of the study of singular
values of noise-corrupted images. Two novel aspects of this paper address the major challenges in noise estimation: 1) the use of the tail of singular values for noise estimation to alleviate the influence of the
signal on the data basis for the noise estimation process and 2) the addition of known noise to estimate the
content-dependent parameter, so that the proposed scheme is adaptive to visual signals, thereby enabling a wider application scope of the proposed scheme. The analysis and experiment results demonstrate that
the proposed algorithm can reliably infer noise levels and show robust behavior over a wide range of
visual content and noise conditions, and that is outperforms relevant existing methods.
ETPL
DIP-086
Nonedge-Specific Adaptive Scheme for Highly Robust Blind Motion Deblurring of
Natural Imagess,
Abstract: Blind motion deblurring estimates a sharp image from a motion blurred image without the
knowledge of the blur kernel. Although significant progress has been made on tackling this problem,
existing methods, when applied to highly diverse natural images, are still far from stable. This paper focuses on the robustness of blind motion deblurring methods toward image diversity-a critical problem
that has been previously neglected for years. We classify the existing methods into two schemes and
analyze their robustness using an image set consisting of 1.2 million natural images. The first scheme is
edge-specific, as it relies on the detection and prediction of large-scale step edges. This scheme is sensitive to the diversity of the image edges in natural images. The second scheme is nonedge-specific
and explores various image statistics, such as the prior distributions. This scheme is sensitive to statistical
variation over different images. Based on the analysis, we address the robustness by proposing a novel nonedge-specific adaptive scheme (NEAS), which features a new prior that is adaptive to the variety of
textures in natural images. By comparing the performance of NEAS against the existing methods on a
very large image set, we demonstrate its advance beyond the state-of-the-art.
ETPL
DIP-087
Image Enhancement Using the Hypothesis Selection Filter: Theory and Application to
JPEG Decoding
Abstract: We introduce the hypothesis selection filter (HSF) as a new approach for image quality
enhancement. We assume that a set of filters has been selected a priori to improve the quality of a
distorted image containing regions with different characteristics. At each pixel, HSF uses a locally computed feature vector to predict the relative performance of the filters in estimating the corresponding
pixel intensity in the original undistorted image. The prediction result then determines the proportion of
each filter used to obtain the final processed output. In this way, the HSF serves as a framework for combining the outputs of a number of different user selected filters, each best suited for a different region
of an image. We formulate our scheme in a probabilistic framework where the HSF output is obtained as
the Bayesian minimum mean square error estimate of the original image. Maximum likelihood estimates
of the model parameters are determined from an offline fully unsupervised training procedure that is derived from the expectation-maximization algorithm. To illustrate how to apply the HSF and to
demonstrate its potential, we apply our scheme as a post-processing step to improve the decoding quality
of JPEG-encoded document images. The scheme consistently improves the quality of the decoded image
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
over a variety of image content with different characteristics. We show that our scheme results in
quantitative improvements over several other state-of-the-art JPEG decoding methods.
ETPL
DIP-088 Learning the Spherical Harmonic Features for 3-D Face Recognition
Abstract: In this paper, a competitive method for 3-D face recognition (FR) using spherical harmonic
features (SHF) is proposed. With this solution, 3-D face models are characterized by the energies
contained in spherical harmonics with different frequencies, thereby enabling the capture of both gross
shape and fine surface details of a 3-D facial surface. This is in clear contrast to most 3-D FR techniques which are either holistic or feature based, using local features extracted from distinctive points. First, 3-D
face models are represented in a canonical representation, namely, spherical depth map, by which SHF
can be calculated. Then, considering the predictive contribution of each SHF feature, especially in the presence of facial expression and occlusion, feature selection methods are used to improve the predictive
performance and provide faster and more cost-effective predictors. Experiments have been carried out on
three public 3-D face datasets, SHREC2007, FRGC v2.0, and Bosphorus, with increasing difficulties in terms of facial expression, pose, and occlusion, and which demonstrate the effectiveness of the proposed
method.
ETPL
DIP-089
Video Deblurring Algorithm Using Accurate Blur Kernel Estimation and Residual
Deconvolution Based on a Blurred-Unblurred Frame Pair
Abstract: Blurred frames may happen sparsely in a video sequence acquired by consumer devices such as digital camcorders and digital cameras. In order to avoid visually annoying artifacts due to those blurred
frames, this paper presents a novel motion deblurring algorithm in which a blurred frame can be
reconstructed utilizing the high-resolution information of adjacent unblurred frames. First, a motion-compensated predictor for the blurred frame is derived from its neighboring unblurred frame via specific
motion estimation. Then, an accurate blur kernel, which is difficult to directly obtain from the blurred
frame itself, is computed using both the predictor and the blurred frame. Next, a residual deconvolution is
applied to both of those frames in order to reduce the ringing artifacts inherently caused by conventional deconvolution. The blur kernel estimation and deconvolution processes are iteratively performed for the
deblurred frame. Simulation results show that the proposed algorithm provides superior deblurring results
over conventional deblurring algorithms while preserving details and reducing ringing artifacts
ETPL
DIP-090
Rank Minimization Code Aperture Design for Spectrally Selective Compressive
Imaging
Abstract: A new code aperture design framework for multiframe code aperture snapshot spectral imaging
(CASSI) system is presented. It aims at the optimization of code aperture sets such that a group of compressive spectral measurements is constructed, each with information from a specific subset of bands.
A matrix representation of CASSI is introduced that permits the optimization of spectrally selective code
aperture sets. Furthermore, each code aperture set forms a matrix such that rank minimization is used to
reduce the number of CASSI shots needed. Conditions for the code apertures are identified such that a restricted isometry property in the CASSI compressive measurements is satisfied with higher probability.
Simulations show higher quality of spectral image reconstruction than that attained by systems using
Hadamard or random code aperture sets.
ETPL
DIP-091
Coaching the Exploration and Exploitation in Active Learning for Interactive Video
Retrieval
Abstract: Conventional active learning approaches for interactive video/image retrieval usually assume
the query distribution is unknown, as it is difficult to estimate with only a limited number of labeled
instances available. Thus, it is easy to put the system in a dilemma whether to explore the feature space in uncertain areas for a better understanding of the query distribution or to harvest in certain areas for more
relevant instances. In this paper, we propose a novel approach called coached active learning that makes
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
the query distribution predictable through training and, therefore, avoids the risk of searching on a
completely unknown space. The estimated distribution, which provides a more global view of the feature
space, can be used to schedule not only the timing but also the step sizes of the exploration and the exploitation in a principled way. The results of the experiments on a large-scale data set from TRECVID
2005-2009 validate the efficiency and effectiveness of our approach, which demonstrates an encouraging
performance when facing domain-shift, outperforms eight conventional active learning methods, and shows superiority to six state-of-the-art interactive video retrieval systems.
ETPL
DIP-092 Nonnegative Local Coordinate Factorization for Image Representation,
Abstract: Recently, nonnegative matrix factorization (NMF) has become increasingly popular for feature
extraction in computer vision and pattern recognition. NMF seeks two nonnegative matrices whose product can best approximate the original matrix. The nonnegativity constraints lead to sparse parts-based
representations that can be more robust than nonsparse global features. To obtain more accurate control
over the sparseness, in this paper, we propose a novel method called nonnegative local coordinate factorization (NLCF) for feature extraction. NLCF adds a local coordinate constraint into the standard
NMF objective function. Specifically, we require that the learned basis vectors be as close to the original
data points as possible. In this way, each data point can be represented by a linear combination of only a
few nearby basis vectors, which naturally leads to sparse representation. Extensive experimental results suggest that the proposed approach provides a better representation and achieves higher accuracy in
image clustering.
ETPL
DIP-093 Flip-Invariant SIFT for Copy and Object Detection
Abstract: Scale-invariant feature transform (SIFT) feature has been widely accepted as an effective local
keypoint descriptor for its invariance to rotation, scale, and lighting changes in images. However, it is
also well known that SIFT, which is derived from directionally sensitive gradient fields, is not flip
invariant. In real-world applications, flip or flip-like transformations are commonly observed in images due to artificial flipping, opposite capturing viewpoint, or symmetric patterns of objects. This paper
proposes a new descriptor, named flip-invariant SIFT (or F-SIFT), that preserves the original properties
of SIFT while being tolerant to flips. F-SIFT starts by estimating the dominant curl of a local patch and then geometrically normalizes the patch by flipping before the computation of SIFT. We demonstrate the
power of F-SIFT on three tasks: large-scale video copy detection, object recognition, and detection. In
copy detection, a framework, which smartly indices the flip properties of F-SIFT for rapid filtering and weak geometric checking, is proposed. F-SIFT not only significantly improves the detection accuracy of
SIFT, but also leads to a more than 50% savings in computational cost. In object recognition, we
demonstrate the superiority of F-SIFT in dealing with flip transformation by comparing it to seven other
descriptors. In object detection, we further show the ability of F-SIFT in describing symmetric objects. Consistent improvement across different kinds of keypoint detectors is observed for F-SIFT over the
original SIFT.
ETPL
DIP-094
Multiscale Image Fusion Using the Undecimated Wavelet Transform With Spectral
Factorization and Nonorthogonal Filter Banks
Abstract: Multiscale transforms are among the most popular techniques in the field of pixel-level image
fusion. However, the fusion performance of these methods often deteriorates for images derived from
different sensor modalities. In this paper, we demonstrate that for such images, results can be improved
using a novel undecimated wavelet transform (UWT)-based fusion scheme, which splits the image decomposition process into two successive filtering operations using spectral factorization of the analysis
filters. The actual fusion takes place after convolution with the first filter pair. Its significantly smaller
support size leads to the minimization of the unwanted spreading of coefficient values around
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
overlapping image singularities. This usually complicates the feature selection process and may lead to
the introduction of reconstruction errors in the fused image. Moreover, we will show that the
nonsubsampled nature of the UWT allows the design of nonorthogonal filter banks, which are more robust to artifacts introduced during fusion, additionally improving the obtained results. The combination
of these techniques leads to a fusion framework, which provides clear advantages over traditional
multiscale fusion approaches, independent of the underlying fusion rule, and reduces unwanted side effects such as ringing artifacts in the fused reconstruction.
ETPL
DIP-095 Context-Dependent Logo Matching and Recognition
Abstract: We contribute, through this paper, to the design of a novel variational framework able to match
and recognize multiple instances of multiple reference logos in image archives. Reference logos and test images are seen as constellations of local features (interest points, regions, etc.) and matched by
minimizing an energy function mixing: 1) a fidelity term that measures the quality of feature matching, 2)
a neighborhood criterion that captures feature co-occurrence/geometry, and 3) a regularization term that controls the smoothness of the matching solution. We also introduce a detection/recognition procedure
and study its theoretical consistency. Finally, we show the validity of our method through extensive
experiments on the challenging MICC-Logos dataset. Our method overtakes, by 20%, baseline as well as
state-of-the-art matching/recognition procedures.
ETPL
DIP-096
Efficient Contrast Enhancement Using Adaptive Gamma Correction With Weighting
Distribution
Abstract: This paper proposes an efficient method to modify histograms and enhance contrast in digital
images. Enhancement plays a significant role in digital image processing, computer vision, and pattern recognition. We present an automatic transformation technique that improves the brightness of dimmed
images via the gamma correction and probability distribution of luminance pixels. To enhance video, the
proposed image-enhancement method uses temporal information regarding the differences between each
frame to reduce computational complexity. Experimental results demonstrate that the proposed method produces enhanced images of comparable or higher quality than those produced using previous state-of-
the-art methods.
ETPL
DIP-097 Binary Compressed Imaging
Abstract: Compressed sensing can substantially reduce the number of samples required for conventional
signal acquisition at the expense of an additional reconstruction procedure. It also provides robust
reconstruction when using quantized measurements, including in the one-bit setting. In this paper, our goal is to design a framework for binary compressed sensing that is adapted to images. Accordingly, we
propose an acquisition and reconstruction approach that complies with the high dimensionality of image
data and that provides reconstructions of satisfactory visual quality. Our forward model describes data
acquisition and follows physical principles. It entails a series of random convolutions performed optically followed by sampling and binary thresholding. The binary samples that are obtained can be either
measured or ignored according to predefined functions. Based on these measurements, we then express
our reconstruction problem as the minimization of a compound convex cost that enforces the consistency of the solution with the available binary data under total-variation regularization. Finally, we derive an
efficient reconstruction algorithm relying on convex-optimization principles. We conduct several
experiments on standard images and demonstrate the practical interest of our approach.
ETPL
DIP-098
MIMO Nonlinear Ultrasonic Tomography by Propagation and Backpropagation
Method
Abstract: This paper develops a fast ultrasonic tomographic imaging method in a multiple-input multiple-
output (MIMO) configuration using the propagation and backpropagation (PBP) method. By this method,
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
ultrasonic excitation signals from multiple sources are transmitted simultaneously to probe the objects
immersed in the medium. The scattering signals are recorded by multiple receivers. Utilizing the
nonlinear ultrasonic wave propagation equation and the received time domain scattered signals, the objects are to be reconstructed iteratively in three steps. First, the propagation step calculates the
predicted acoustic potential data at the receivers using an initial guess. Second, the difference signal
between the predicted value and the measured data is calculated. Third, the backpropagation step computes updated acoustical potential data by backpropagating the difference signal to the same medium
computationally. Unlike the conventional PBP method for tomographic imaging where each source takes
turns to excite the acoustical field until all the sources are used, the developed MIMO-PBP method
achieves faster image reconstruction by utilizing multiple source simultaneous excitation. Furthermore, we develop an orthogonal waveform signaling method using a waveform delay scheme to reduce the
impact of speckle patterns in the reconstructed images. By numerical experiments we demonstrate that
the proposed MIMO-PBP tomographic imaging method results in faster convergence and achieves superior imaging quality.
ETPL
DIP-099
Vector Extension of Monogenic Wavelets for Geometric Representation of Color
Images
Abstract: Monogenic wavelets offer a geometric representation of grayscale images through an AM-FM
model allowing invariance of coefficients to translations and rotations. The underlying concept of local phase includes a fine contour analysis into a coherent unified framework. Starting from a link with
structure tensors, we propose a nontrivial extension of the monogenic framework to vector-valued signals
to carry out a nonmarginal color monogenic wavelet transform. We also give a practical study of this new wavelet transform in the contexts of sparse representations and invariant analysis, which helps to
understand the physical interpretation of coefficients and validates the interest of our theoretical
construction.
ETPL
DIP-100 Myocardial Motion Estimation From Medical Images Using the Monogenic Signal
Abstract: We present a method for the analysis of heart motion from medical images. The algorithm
exploits monogenic signal theory, recently introduced as an N-dimensional generalization of the analytic
signal. The displacement is computed locally by assuming the conservation of the monogenic phase over time. A local affine displacement model is considered to account for typical heart motions as
contraction/expansion and shear. A coarse-to-fine B-spline scheme allows a robust and effective
computation of the model's parameters, and a pyramidal refinement scheme helps to handle large motions. Robustness against noise is increased by replacing the standard point-wise computation of the
monogenic orientation with a robust least-squares orientation estimate. Given its general formulation, the
algorithm is well suited for images from different modalities, in particular for those cases where time
variant changes of local intensity invalidate the standard brightness constancy assumption. This paper evaluates the method's feasibility on two emblematic cases: cardiac tagged magnetic resonance and
cardiac ultrasound. In order to quantify the performance of the proposed method, we made use of realistic
synthetic sequences from both modalities for which the benchmark motion is known. A comparison is presented with state-of-the-art methods for cardiac motion analysis. On the data considered, these
conventional approaches are outperformed by the proposed algorithm. A recent global optical-flow
estimation algorithm based on the monogenic curvature tensor is also considered in the comparison. With respect to the latter, the proposed framework provides, along with higher accuracy, superior robustness to
noise and a considerably shorter computation time.
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
ETPL
DIP-101
Revisiting the Relationship Between Adaptive Smoothing and Anisotropic Diffusion
With Modified Filters
Abstract: Anisotropic diffusion has been known to be closely related to adaptive smoothing and
discretized in a similar manner. This paper revisits a fundamental relationship between two approaches. It is shown that adaptive smoothing and anisotropic diffusion have different theoretical backgrounds by
exploring their characteristics with the perspective of normalization, evolution step size, and energy flow.
Based on this principle, adaptive smoothing is derived from a second order partial differential equation (PDE), not a conventional anisotropic diffusion, via the coupling of Fick's law with a generalized
continuity equation where a “source” or “sink” exists, which has not been extensively exploited. We
show that the source or sink is closely related to the asymmetry of energy flow as well as the normalization term of adaptive smoothing. It enables us to analyze behaviors of adaptive smoothing, such
as the maximum principle and stability with a perspective of a PDE. Ultimately, this relationship provides
new insights into application-specific filtering algorithm design. By modeling the source or sink in the
PDE, we introduce two specific diffusion filters, the robust anisotropic diffusion and the robust coherence enhancing diffusion, as novel instantiations which are more robust against the outliers than the
conventional filters.
ETPL
DIP-102
A Weighted Dictionary Learning Model for Denoising Images Corrupted by Mixed
Noise
Abstract: This paper proposes a general weighted l2-l
0 norms energy minimization model to remove
mixed noise such as Gaussian-Gaussian mixture, impulse noise, and Gaussian-impulse noise from the
images. The approach is built upon maximum likelihood estimation framework and sparse representations
over a trained dictionary. Rather than optimizing the likelihood functional derived from a mixture distribution, we present a new weighting data fidelity function, which has the same minimizer as the
original likelihood functional but is much easier to optimize. The weighting function in the model can be
determined by the algorithm itself, and it plays a role of noise detection in terms of the different estimated noise parameters. By incorporating the sparse regularization of small image patches, the proposed method
can efficiently remove a variety of mixed or single noise while preserving the image textures well. In
addition, a modified K-SVD algorithm is designed to address the weighted rank-one approximation. The experimental results demonstrate its better performance compared with some existing methods
ETPL
DIP-103 Comparative Study of Fixation Density Maps
Abstract: Fixation density maps (FDM) created from eye tracking experiments are widely used in image
processing applications. The FDM are assumed to be reliable ground truths of human visual attention and as such, one expects a high similarity between FDM created in different laboratories. So far, no studies
have analyzed the degree of similarity between FDM from independent laboratories and the related
impact on the applications. In this paper, we perform a thorough comparison of FDM from three independently conducted eye tracking experiments. We focus on the effect of presentation time and
image content and evaluate the impact of the FDM differences on three applications: visual saliency
modeling, image quality assessment, and image retargeting. It is shown that the FDM are very similar and
that their impact on the applications is low. The individual experiment comparisons, however, are found to be significantly different, showing that inter-laboratory differences strongly depend on the
experimental conditions of the laboratories. The FDM are publicly available to the research community.
ETPL
DIP-104 Efficient Method for Content Reconstruction With Self-Embedding
Abstract: This paper presents a new model of the content reconstruction problem in self-embedding
systems, based on an erasure communication channel. We explain why such a model is a good fit for this
problem, and how it can be practically implemented with the use of digital fountain codes. The proposed
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
method is based on an alternative approach to spreading the reference information over the whole image,
which has recently been shown to be of critical importance in the application at hand. Our paper presents
a theoretical analysis of the inherent restoration trade-offs. We analytically derive formulas for the reconstruction success bounds, and validate them experimentally with Monte Carlo simulations and a
reference image authentication system. We perform an exhaustive reconstruction quality assessment,
where the presented reference scheme is compared to five state-of-the-art alternatives in a common evaluation scenario. Our paper leads to important insights on how self-embedding schemes should be
constructed to achieve optimal performance. The reference authentication system designed according to
the presented principles allows for high-quality reconstruction, regardless of the amount of the tampered
content. The average reconstruction quality, measured on 10000 natural images is 37 dB, and is achievable even when 50% of the image area becomes tampered.
ETPL
DIP-105
Modeling IrisCode and Its Variants as Convex Polyhedral Cones and Its Security
Implications
Abstract: IrisCode, developed by Daugman, in 1993, is the most influential iris recognition algorithm. A thorough understanding of IrisCode is essential, because over 100 million persons have been enrolled by
this algorithm and many biometric personal identification and template protection methods have been
developed based on IrisCode. This paper indicates that a template produced by IrisCode or its variants is
a convex polyhedral cone in a hyperspace. Its central ray, being a rough representation of the original biometric signal, can be computed by a simple algorithm, which can often be implemented in one Matlab
command line. The central ray is an expected ray and also an optimal ray of an objective function on a
group of distributions. This algorithm is derived from geometric properties of a convex polyhedral cone but does not rely on any prior knowledge (e.g., iris images). The experimental results show that biometric
templates, including iris and palmprint templates, produced by different recognition methods can be
matched through the central rays in their convex polyhedral cones and that templates protected by a method extended from IrisCode can be broken into. These experimental results indicate that, without a
thorough security analysis, convex polyhedral cone templates cannot be assumed secure. Additionally,
the simplicity of the algorithm implies that even junior hackers without knowledge of advanced image
processing and biometric databases can still break into protected templates and reveal relationships among templates produced by different recognition methods.
ETPL
DIP-106 Correspondence Map-Aided Neighbor Embedding for Image Intra Prediction
Abstract: This paper describes new image prediction methods based on neighbor embedding (NE) techniques. Neighbor embedding methods are used here to approximate an input block (the block to be
predicted) in the image as a linear combination of K nearest neighbors. However, in order for the decoder
to proceed similarly, the K nearest neighbors are found by computing distances between the known pixels
in a causal neighborhood (called template) of the input block and the co-located pixels in candidate patches taken from a causal window. Similarly, the weights used for the linear approximation are
computed in order to best approximate the template pixels. Although efficient, these methods suffer from
limitations when the template and the block to be predicted are not correlated, e.g., in non homogenous texture areas. To cope with these limitations, this paper introduces new image prediction methods based
on NE techniques in which the K-NN search is done in two steps and aided, at the decoder, by a block
correspondence map, hence the name map-aided neighbor embedding (MANE) method. Another optimized variant of this approach, called oMANE method, is also studied. In these methods, several
alternatives have also been proposed for the K-NN search. The resulting prediction methods are shown to
bring significant rate-distortion performance improvements when compared to H.264 Intra prediction
modes (up to 44.75% rate saving at low bit rates).
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
ETPL
DIP-107
Estimation-Theoretic Approach to Delayed Decoding of Predictively Encoded Video
Sequences
Abstract: Current video coders employ predictive coding with motion compensation to exploit temporal
redundancies in the signal. In particular, blocks along a motion trajectory are modeled as an auto-
regressive (AR) process, and it is generally assumed that the prediction errors are temporally independent and approximate the innovations of this process. Thus, zero-delay encoding and decoding is considered
efficient. This paper is premised on the largely ignored fact that these prediction errors are, in fact,
temporally dependent due to quantization effects in the prediction loop. It presents an estimation-theoretic
delayed decoding scheme, which exploits information from future frames to improve the reconstruction quality of the current frame. In contrast to the standard decoder that reproduces every block
instantaneously once the corresponding quantization indices of residues are available, the proposed
delayed decoder efficiently combines all accessible (including any future) information in an appropriately derived probability density function, to obtain the optimal delayed reconstruction per transform
coefficient. Experiments demonstrate significant gains over the standard decoder. Requisite information
about the source AR model is estimated in a spatio-temporally adaptive manner from a bit-stream conforming to the H.264/AVC standard, i.e., no side information needs to be sent to the decoder in order
to employ the proposed approach, thereby compatibility with the standard syntax and existing encoders is
retained
ETPL
DIP-108 Correction of Axial and Lateral Chromatic Aberration With False Color Filtering
Abstract: In this paper, we propose a chromatic aberration (CA) correction algorithm based on a false
color filtering technique. In general, CA produces color distortions called color fringes near the
contrasting edges of captured images, and these distortions cause false color artifacts. In the proposed method, a false color filtering technique is used to filter out the false color components from the chroma-
signals of the input image. The filtering process is performed with the adaptive weights obtained from
both the gradient and color differences, and the weights are designed to reduce the various types of color
fringes regardless of the colors of the artifacts. Moreover, as preprocessors of the filtering process, a transient improvement (TI) technique is applied to enhance the slow transitions of the red and blue
channels that are blurred by the CA. The TI process improves the filtering performance by narrowing the
false color regions before the filtering process when severe color fringes (typically purple fringes) occur widely. Last, the CA-corrected chroma-signal is combined with the TI chroma-signal to avoid incorrect
color adjustment. The experimental results show that the proposed method substantially reduces the CA
artifacts and provides natural-looking replacement colors, while it avoids incorrect color adjustment
ETPL
DIP-109 New Class Tiling Design for Dot-Diffused Halftoning
Abstract: In this paper, a new class tiling designed dot diffusion along with the optimized class matrix
and diffused matrix are proposed. The result of this method presents a nearly periodic-free halftone when
compared to the former schemes. Formerly, the class matrix of the dot diffusion is duplicated and orthogonally tiled to fulfill the entire image for further thresholding and quantized-error diffusion, which
accompanies subsequent periodic artifacts. In our observation, this artifact can be solved by manipulating
the class tiling with comprising rotation, transpose, and alternatively shifting of the class matrices. As documented in the experimental results, the proposed dot diffusion has been compared with the former
halftoning methods with parallelism in terms of image quality, processing efficiency, periodicity, and
memory consumption; the proposed dot diffusion exhibits as a very competitive candidate in the
printing/display market
ETPL W-Tree Indexing for Fast Visual Word Generation
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
DIP-110
Abstract: The bag-of-visual-words representation has been widely used in image retrieval and visual recognition. The most time-consuming step in obtaining this representation is the visual word generation,
i.e., assigning visual words to the corresponding local features in a high-dimensional space. Recently,
structures based on multibranch trees and forests have been adopted to reduce the time cost. However,
these approaches cannot perform well without a large number of backtrackings. In this paper, by considering the spatial correlation of local features, we can significantly speed up the time consuming
visual word generation process while maintaining accuracy. In particular, visual words associated with
certain structures frequently co-occur; hence, we can build a co-occurrence table for each visual word for a large-scale data set. By associating each visual word with a probability according to the corresponding
co-occurrence table, we can assign a probabilistic weight to each node of a certain index structure (e.g., a
KD-tree and a K-means tree), in order to re-direct the searching path to be close to its global optimum within a small number of backtrackings. We carefully study the proposed scheme by comparing it with
the fast library for approximate nearest neighbors and the random KD-trees on the Oxford data set.
Thorough experimental results suggest the efficiency and effectiveness of the new scheme.
ETPL
DIP-111
Efficient Improvements on the BDND Filtering Algorithm for the Removal of High-
Density Impulse Noise
Abstract: Switching median filters are known to outperform standard median filters in the removal of
impulse noise due to their capability of filtering candidate noisy pixels and leaving other pixels intact.
The boundary discriminative noise detection (BDND) is one powerful example in this class of filters. However, there are some issues related to the filtering step in the BDND algorithm that may degrade its
performance. In this paper, we propose two modifications to the filtering step of the BDND algorithm to
address these issues. Experimental evaluation shows the effectiveness of the proposed modifications in
producing sharper images than the BDND algorithm.
ETPL
DIP-112 Novel Approaches to the Parametric Cubic-Spline Interpolation
Abstract: The cubic-spline interpolation (CSI) scheme can be utilized to obtain a better quality
reconstructed image. It is based on the least-squares method with cubic convolution interpolation (CCI) function. Within the parametric CSI scheme, it is difficult to determine the optimal parameter for various
target images. In this paper, a novel method involving the concept of opportunity costs is proposed to
identify the most suitable parameter for the CCI function needed in the CSI scheme. It is shown that such
an optimal four-point CCI function in conjunction with the least-squares method can achieve a better performance with the same arithmetic operations in comparison with the existing CSI algorithm. In
addition, experimental results show that the optimal six-point CSI scheme together with cross-zonal filter
is superior in performance to the optimal four-point CSI scheme without increasing the computational complexity.
ETPL
DIP-113
Generation of All-in-Focus Images by Noise-Robust Selective Fusion of Limited Depth-
of-Field Images
Abstract: The limited depth-of-field of some cameras prevents them from capturing perfectly focused images when the imaged scene covers a large distance range. In order to compensate for this problem,
image fusion has been exploited for combining images captured with different camera settings, thus
yielding a higher quality all-in-focus image. Since most current approaches for image fusion rely on
maximizing the spatial frequency of the composed image, the fusion process is sensitive to noise. In this paper, a new algorithm for computing the all-in-focus image from a sequence of images captured with a
low depth-of-field camera is presented. The proposed approach adaptively fuses the different frames of
the focus sequence in order to reduce noise while preserving image features. The algorithm consists of three stages: 1) focus measure; 2) selectivity measure; 3) and image fusion. An extensive set of
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
experimental tests has been carried out in order to compare the proposed algorithm with state-of-the-art
all-in-focus methods using both synthetic and real sequences. The obtained results show the advantages
of the proposed scheme even for high levels of noise.
ETPL
DIP-114
Missing Texture Reconstruction Method Based on Error Reduction Algorithm Using
Fourier Transform Magnitude Estimation Scheme
Abstract: A missing texture reconstruction method based on an error reduction (ER) algorithm, including
a novel estimation scheme of Fourier transform magnitudes is presented in this brief. In our method,
Fourier transform magnitude is estimated for a target patch including missing areas, and the missing intensities are estimated by retrieving its phase based on the ER algorithm. Specifically, by monitoring
errors converged in the ER algorithm, known patches whose Fourier transform magnitudes are similar to
that of the target patch are selected from the target image. In the second approach, the Fourier transform magnitude of the target patch is estimated from those of the selected known patches and their
corresponding errors. Consequently, by using the ER algorithm, we can estimate both the Fourier
transform magnitudes and phases to reconstruct the missing areas
ETPL
DIP-115 A Robust Fuzzy Local Information C-Means Clustering Algorithm
Abstract: In a recent paper, Krinidis and Chatzis proposed a variation of fuzzy c-means algorithm for
image clustering. The local spatial and gray-level information are incorporated in a fuzzy way through an
energy function. The local minimizers of the designed energy function to obtain the fuzzy membership of each pixel and cluster centers are proposed. In this paper, it is shown that the local minimizers of Krinidis
and Chatzis to obtain the fuzzy membership and the cluster centers in an iterative manner are not
exclusively solutions for true local minimizers of their designed energy function. Thus, the local minimizers of Krinidis and Chatzis do not converge to the correct local minima of the designed energy
function not because of tackling to the local minima, but because of the design of energy function.
ETPL
DIP-116
Nonlinearity Detection in Hyperspectral Images Using a Polynomial Post-Nonlinear
Mixing Model
Abstract: This paper studies a nonlinear mixing model for hyperspectral image unmixing and nonlinearity detection. The proposed model assumes that the pixel reflectances are nonlinear functions of pure spectral
components contaminated by an additive white Gaussian noise. These nonlinear functions are
approximated by polynomials leading to a polynomial post-nonlinear mixing model. We have shown in a previous paper that the parameters involved in the resulting model can be estimated using least squares
methods. A generalized likelihood ratio test based on the estimator of the nonlinearity parameter is
proposed to decide whether a pixel of the image results from the commonly used linear mixing model or from a more general nonlinear mixing model. To compute the test statistic associated with the
nonlinearity detection , we propose to approximate the variance of the estimated nonlinearity parameter by
its constrained Cramer -Rao bound. The performance of the detection strategy is evaluated via simulations
conducted on synthetic and real data. More precisely, synthetic data have been generated according to the standard linear mixing model and three nonlinear models from the literature. The real data investigated in
this study are extracted from the Cuprite image, which shows that some minerals seem to be nonlinearly
mixed in this image. Finally, it is interesting to note that the estimated abundance maps obtained with the post-nonlinear mixing model are in good agreement with results obtained in previous studies.
ETPL
DIP-117 Wavelet Bayesian Network Image Denoising
Abstract: From the perspective of the Bayesian approach, the denoising problem is essentially a prior
probability modeling and estimation task. In this paper, we propose an approach that exploits a hidden Bayesian network, constructed from wavelet coefficients, to model the prior probability of the original
image. Then, we use the belief propagation (BP) algorithm, which estimates a coefficient based on all the
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
coefficients of an image, as the maximum-a-posterior (MAP) estimator to derive the denoised wavelet
coefficients. We show that if the network is a spanning tree, the standard BP algorithm can perform MAP
estimation efficiently. Our experiment results demonstrate that, in terms of the peak-signal-to-noise-ratio and perceptual quality, the proposed approach outperforms state-of-the-art algorithms on several images,
particularly in the textured regions, with various amounts of white Gaussian noise.
ETPL
DIP-118 Acceleration of the Shiftable Algorithm for Bilateral Filtering and Nonlocal
Means
Abstract: A direct implementation of the bilateral filter requires O(σs2) operations per pixel, where σs is
the (effective) width of the spatial kernel. A fast implementation of the bilateral filter that required O(1) operations per pixel with respect to σs was recently proposed. This was done by using trigonometric
functions for the range kernel of the bilateral filter, and by exploiting their so-called shiftability property.
In particular, a fast implementation of the Gaussian bilateral filter was realized by approximating the
Gaussian range kernel using raised cosines. Later, it was demonstrated that this idea could be extended to a larger class of filters, including the popular non-local means filter. As already observed, a flip side of
this approach was that the run time depended on the width σr of the range kernel. For an image with
dynamic range [0,T], the run time scaled as O(T2/σr
2) with σr. This made it difficult to implement narrow
range kernels, particularly for images with large dynamic range. In this paper, we discuss this problem,
and propose some simple steps to accelerate the implementation, in general, and for small σr in particular.
We provide some experimental results to demonstrate the acceleration that is achieved using these modifications.
ETPL
DIP-119
Determining the Intrinsic Dimension of a Hyperspectral Image Using Random Matrix
Theory
Abstract: Determining the intrinsic dimension of a hyperspectral image is an important step in the spectral
unmixing process and under- or overestimation of this number may lead to incorrect unmixing in
unsupervised methods. In this paper, we discuss a new method for determining the intrinsic dimension using recent advances in random matrix theory. This method is entirely unsupervised, free from any user-
determined parameters and allows spectrally correlated noise in the data. Robustness tests are run on
synthetic data, to determine how the results were affected by noise levels, noise variability, noise
approximation, and spectral characteristics of the end-members. Success rates are determined for many different synthetic images, and the method is tested on two pairs of real images, namely a Cuprite scene
taken from Airborne Visible InfraRed Imaging Spectrometer (AVIRIS) and SpecTIR sensors, and a
Lunar Lakes scene taken from AVIRIS and Hyperion, with good results.
ETPL
DIP-120 Learning Smooth Pattern Transformation Manifolds
Abstract: Manifold models provide low-dimensional representations that are useful for processing and
analyzing data in a transformation-invariant way. In this paper, we study the problem of learning smooth pattern transformation manifolds from image sets that represent observations of geometrically
transformed signals. To construct a manifold, we build a representative pattern whose transformations
accurately fit various input images. We examine two objectives of the manifold-building problem,
namely, approximation and classification. For the approximation problem, we propose a greedy method that constructs a representative pattern by selecting analytic atoms from a continuous dictionary manifold.
We present a dc optimization scheme that is applicable to a wide range of transformation and dictionary
models, and demonstrate its application to the transformation manifolds generated by the rotation, translation, and anisotropic scaling of a reference pattern. Then, we generalize this approach to a setting
with multiple transformation manifolds, where each manifold represents a different class of signals. We
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
present an iterative multiple-manifold-building algorithm such that the classification accuracy is
promoted in the learning of the representative patterns. The experimental results suggest that the proposed
methods yield high accuracy in the approximation and classification of data compared with some reference methods, while the invariance to geometric transformations is achieved because of the
transformation manifold model.
ETPL
DIP-121
Modifying JPEG Binary Arithmetic Codec for Exploiting Inter/Intra-Block and DCT
Coefficient Sign Redundancies
Abstract: This article presents four modifications to the JPEG arithmetic coding (JAC) algorithm, a topic not studied well before. It then compares the compression performance of the modified JPEG with JPEG
XR, the latest block-based image coding standard. We first show that the bulk of inter/intra-block
redundancy, caused due to the use of the block-based approach by JPEG, can be captured by applying efficient prediction coding. We propose the following modifications to JAC to take advantages of our
prediction approach. 1) We code a totally different DC difference. 2) JAC tests a DCT coefficient by
considering its bits in the increasing order of significance for coding the most significant bit position. It causes plenty of redundancy because JAC always begins with the zeroth bit. We modify this coding order
and propose alternations to the JPEG coding procedures. 3) We predict the sign of significant DCT
coefficients, a problem is not addressed from the perspective of the JPEG decoder before. 4) We reduce
the number of binary tests that JAC codes to mark end-of-block. We provide experimental results for two sets of eight-bit gray images. The first set consists of nine classical test images mostly of size 512
× 512 pixels. The second set consists of 13 images of size
2000 × 3000 pixels or more. Our modifications to JAC obtain extra-ordinary amount of code reduction without adding any kind of losses. More specifically, when we quantize the images
using the default quantizers, our modifications reduce the total JAC code size of the images of these two
sets by about 8.9 and 10.6%, and the JPEG Huffman code size by about 16.3 and 23.4%, respectively, on the average. Gains are even higher for coarsely quantized images. Finally, we compare the modified JAC
with two settings of JPEG XR, one with no block overlapping and the other with the default transform
(we denote them by JXR0 and JXR1, respectively). Our results show- that for the finest quality rate
image coding, the modified JAC compresses the large set images by about 5.8% more than JXR1 and by 6.7% more than JXR0, on the average. We provide some rate-distortion plots on lossy coding, which
show that the modified JAC distinctly outperforms JXR0, but JXR1 beats us by about a similar margin.
ETPL
DIP-122 Motion Estimation Without Integer-Pel Search
Abstract: The typical motion estimation (ME) consists of three main steps, including spatial-temporal
prediction, integer-pel search, and fractional-pel search. The integer-pel search, which seeks the best
matched integer-pel position within a search window, is considered to be crucial for video encoding. It
occupies over 50% of the overall encoding time (when adopting the full search scheme) for software encoders, and introduces remarkable area cost, memory traffic, and power consumption to hardware
encoders. In this paper, we find that video sequences (especially high-resolution videos) can often be
encoded effectively and efficiently even without integer-pel search. Such counter-intuitive phenomenon is not only because that spatial-temporal prediction and fractional-pel search are accurate enough for the
ME of many blocks. In fact, we observe that when the predicted motion vector is biased from the optimal
motion vector (mainly for boundary blocks of irregularly moving objects), it is also hard for integer-pel search to reduce the final rate-distortion cost: the deviation of reference position could be alleviated with
the fractional-pel interpolation and rate-distortion optimization techniques (e.g., adaptive macroblock
mode). Considering the decreasing proportion of boundary blocks caused by the increasing resolution of
videos, integer-pel search may be rather cost-ineffective in the era of high-resolution. Experimental results on 36 typical sequences of different resolutions encoded with x264, which is a widely-used video
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
encoder, comply with our analysis well. For 1080p sequences, removing the integer-pel search saves
57.9% of the overall H.264 encoding time on average (compared to the original x264 with full integer-pel
search using default parameters), while the resultant performance loss is negligible: the bit-rate is increased by only 0.18%, while the peak signal-to-noise ratio is decreased by only 0.01 dB per frame
averagely.
ETPL
DIP-123 A Protocol for Evaluating Video Trackers Under Real-World Conditions
Abstract: The absence of a commonly adopted performance evaluation framework is hampering advances in the design of effective video trackers. In this paper, we present a single-score evaluation measure and a
protocol to objectively compare trackers. The proposed measure evaluates tracking accuracy and failure,
and combines them for both summative and formative performance assessment. The proposed protocol is composed of a set of trials that evaluate the robustness of trackers on a range of test scenarios
representing several real-world conditions. The protocol is validated on a set of sequences with a
diversity of targets (head, vehicle and person) and challenges (occlusions, background clutter, pose changes and scale changes) using six state-of-the-art trackers, highlighting their strengths and weaknesses
on more than 187000 frames. The software implementing the protocol and the evaluation results are made
available online and new results can be included, thus facilitating the comparison of trackers.
ETPL
DIP-124 Blur and Illumination Robust Face Recognition via Set-Theoretic Characterization
Abstract: We address the problem of unconstrained face recognition from remotely acquired images. The
main factors that make this problem challenging are image degradation due to blur, and appearance
variations due to illumination and pose. In this paper, we address the problems of blur and illumination. We show that the set of all images obtained by blurring a given image forms a convex set. Based on this
set-theoretic characterization, we propose a blur-robust algorithm whose main step involves solving
simple convex optimization problems. We do not assume any parametric form for the blur kernels,
however, if this information is available it can be easily incorporated into our algorithm. Furthermore, using the low-dimensional model for illumination variations, we show that the set of all images obtained
from a face image by blurring it and by changing the illumination conditions forms a bi-convex set.
Based on this characterization, we propose a blur and illumination-robust algorithm. Our experiments on a challenging real dataset obtained in uncontrolled settings illustrate the importance of jointly modeling
blur and illumination.
ETPL
DIP-125 Improved Bounds for Subband-Adaptive Iterative Shrinkage/Thresholding Algorithms
Abstract: This paper presents new methods for computing the step sizes of the subband-adaptive iterative
shrinkage-thresholding algorithms proposed by Bayram & Selesnick and Vonesch & Unser. The method
yields tighter wavelet-domain bounds of the system matrix, thus leading to improved convergence speeds. It is directly applicable to non-redundant wavelet bases, and we also adapt it for cases of redundant
frames. It turns out that the simplest and most intuitive setting for the step sizes that ignores subband
aliasing is often satisfactory in practice. We show that our methods can be used to advantage with reweighted least squares penalty functions as well as L1 penalties. We emphasize that the algorithms
presented here are suitable for performing inverse filtering on very large datasets, including 3D data,
since inversions are applied only to diagonal matrices and fast transforms are used to achieve all matrix-
vector products.
ETPL
DIP-126
Sparse Representation Based Image Interpolation With Nonlocal Autoregressive
Modeling
Abstract: Sparse representation is proven to be a promising approach to image super-resolution, where the
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
low-resolution (LR) image is usually modeled as the down-sampled version of its high-resolution (HR)
counterpart after blurring. When the blurring kernel is the Dirac delta function, i.e., the LR image is
directly down-sampled from its HR counterpart without blurring, the super-resolution problem becomes an image interpolation problem. In such cases, however, the conventional sparse representation models
(SRM) become less effective, because the data fidelity term fails to constrain the image local structures.
In natural images, fortunately, many nonlocal similar patches to a given patch could provide nonlocal constraint to the local structure. In this paper, we incorporate the image nonlocal self-similarity into SRM
for image interpolation. More specifically, a nonlocal autoregressive model (NARM) is proposed and
taken as the data fidelity term in SRM. We show that the NARM-induced sampling matrix is less
coherent with the representation dictionary, and consequently makes SRM more effective for image interpolation. Our extensive experimental results demonstrate that the proposed NARM-based image
interpolation method can effectively reconstruct the edge structures and suppress the jaggy/ringing
artifacts, achieving the best image interpolation results so far in terms of PSNR as well as perceptual quality metrics such as SSIM and FSIM.
ETPL
DIP-127
View-Based Discriminative Probabilistic Modeling for 3D Object Retrieval and
Recognition
Abstract: In view-based 3D object retrieval and recognition, each object is described by multiple views. A
central problem is how to estimate the distance between two objects. Most conventional methods integrate the distances of view pairs across two objects as an estimation of their distance. In this paper,
we propose a discriminative probabilistic object modeling approach. It builds probabilistic models for
each object based on the distribution of its views, and the distance between two objects is defined as the upper bound of the Kullback-Leibler divergence of the corresponding probabilistic models. 3D object
retrieval and recognition is accomplished based on the distance measures. We first learn models for each
object by the adaptation from a set of global models with a maximum likelihood principle. A further adaption step is then performed to enhance the discriminative ability of the models. We conduct
experiments on the ETH 3D object dataset, the National Taiwan University 3D model dataset, and the
Princeton Shape Benchmark. We compare our approach with different methods, and experimental results
demonstrate the superiority of our approach.
ETPL
DIP-128 Robust Document Image Binarization Technique for Degraded Document Images
Abstract: Segmentation of text from badly degraded document images is a very challenging task due to
the high inter/intra-variation between the document background and the foreground text of different document images. In this paper, we propose a novel document image binarization technique that
addresses these issues by using adaptive image contrast. The adaptive image contrast is a combination of
the local image contrast and the local image gradient that is tolerant to text and background variation
caused by different types of document degradations. In the proposed technique, an adaptive contrast map is first constructed for an input degraded document image. The contrast map is then binarized and
combined with Canny's edge map to identify the text stroke edge pixels. The document text is further
segmented by a local threshold that is estimated based on the intensities of detected text stroke edge pixels within a local window. The proposed method is simple, robust, and involves minimum parameter
tuning. It has been tested on three public datasets that are used in the recent document image binarization
contest (DIBCO) 2009 & 2011 and handwritten-DIBCO 2010 and achieves accuracies of 93.5%, 87.8%, and 92.03%, respectively, that are significantly higher than or close to that of the best-performing
methods reported in the three contests. Experiments on the Bickley diary dataset that consists of several
challenging bad quality document images also show the superior performance of our proposed method,
compared with other techniques.
ETPL Perceptual Video Coding Based on SSIM-Inspired Divisive Normalization
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
DIP-129
Abstract: We propose a perceptual video coding framework based on the divisive normalization scheme, which is found to be an effective approach to model the perceptual sensitivity of biological vision, but has
not been fully exploited in the context of video coding. At the macroblock (MB) level, we derive the
normalization factors based on the structural similarity (SSIM) index as an attempt to transform the
discrete cosine transform domain frame residuals to a perceptually uniform space. We further develop an MB level perceptual mode selection scheme and a frame level global quantization matrix optimization
method. Extensive simulations and subjective tests verify that, compared with the H.264/AVC video
coding standard, the proposed method can achieve significant gain in terms of rate-SSIM performance and provide better visual quality.
ETPL
DIP-130 Hyperspectral Image Representation and Processing With Binary Partition Trees
Abstract: The optimal exploitation of the information provided by hyperspectral images requires the development of advanced image-processing tools. This paper proposes the construction and the
processing of a new region-based hierarchical hyperspectral image representation relying on the binary
partition tree (BPT). This hierarchical region-based representation can be interpreted as a set of
hierarchical regions stored in a tree structure. Hence, the BPT succeeds in presenting: 1) the decomposition of the image in terms of coherent regions, and 2) the inclusion relations of the regions in
the scene. Based on region-merging techniques, the BPT construction is investigated by studying the
hyperspectral region models and the associated similarity metrics. Once the BPT is constructed, the fixed tree structure allows implementing efficient and advanced application-dependent techniques on it. The
application-dependent processing of BPT is generally implemented through a specific pruning of the tree.
In this paper, a pruning strategy is proposed and discussed in a classification context. Experimental
results on various hyperspectral data sets demonstrate the interest and the good performances of the BPT representation.
ETPL
DIP-131 Visually Weighted Compressive Sensing: Measurement and Reconstruction
Abstract: Compressive sensing (CS) makes it possible to more naturally create compact representations of data with respect to a desired data rate. Through wavelet decomposition, smooth and piecewise smooth
signals can be represented as sparse and compressible coefficients. These coefficients can then be
effectively compressed via the CS. Since a wavelet transform divides image information into layered
blockwise wavelet coefficients over spatial and frequency domains, visual improvement can be attained by an appropriate perceptually weighted CS scheme. We introduce such a method in this paper and
compare it with the conventional CS. The resulting visual CS model is shown to deliver improved visual
reconstructions.
ETPL
DIP-132 Context-Aware Sparse Decomposition for Image Denoising and Super-Resolution
Abstract: Image prior models based on sparse and redundant representations are attracting more and more
attention in the field of image restoration. The conventional sparsity-based methods enforce sparsity prior on small image patches independently. Unfortunately, these works neglected the contextual information
between sparse representations of neighboring image patches. It limits the modeling capability of
sparsity-based image prior, especially when the major structural information of the source image is lost in
the following serious degradation process. In this paper, we utilize the contextual information of local patches (denoted as context-aware sparsity prior) to enhance the performance of sparsity-based
restoration method. In addition, a unified framework based on the Markov random fields model is
proposed to tune the local prior into a global one to deal with arbitrary size images. An iterative numerical solution is presented to solve the joint problem of model parameters estimation and sparse
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
recovery. Finally, the experimental results on image denoising and super-resolution demonstrate the
effectiveness and robustness of the proposed context-aware method.
ETPL
DIP-133 How to SAIF-ly Boost Denoising Performance
Abstract: Spatial domain image filters (e.g., bilateral filter, non-local means, locally adaptive regression
kernel) have achieved great success in denoising. Their overall performance, however, has not generally
surpassed the leading transform domain-based filters (such as BM3-D). One important reason is that
spatial domain filters lack efficiency to adaptively fine tune their denoising strength; something that is relatively easy to do in transform domain method with shrinkage operators. In the pixel domain, the
smoothing strength is usually controlled globally by, for example, tuning a regularization parameter. In
this paper, we propose spatially adaptive iterative filtering (SAIF) a new strategy to control the denoising strength locally for any spatial domain method. This approach is capable of filtering local image content
iteratively using the given base filter, and the type of iteration and the iteration number are automatically
optimized with respect to estimated risk (i.e., mean-squared error). In exploiting the estimated local signal-to-noise-ratio, we also present a new risk estimator that is different from the often-employed
SURE method, and exceeds its performance in many cases. Experiments illustrate that our strategy can
significantly relax the base algorithm's sensitivity to its tuning (smoothing) parameters, and effectively
boost the performance of several existing denoising filters to generate state-of-the-art results under both simulated and practical conditions.
ETPL
DIP-134 Frozen-State Hierarchical Annealing
Abstract: There is significant interest in the synthesis of discrete-state random fields, particularly those possessing structure over a wide range of scales. However, given a model on some finest, pixellated
scale, it is computationally very difficult to synthesize both large- and small-scale structures, motivating
research into hierarchical methods. In this paper, we propose a frozen-state approach to hierarchical
modeling, in which simulated annealing is performed on each scale, constrained by the state estimates at the parent scale. This approach leads to significant advantages in both modeling flexibility and
computational complexity. In particular, a complex structure can be realized with very simple, local,
scale-dependent models, and by constraining the domain to be annealed at finer scales to only the uncertain portions of coarser scales; the approach leads to huge improvements in computational
complexity. Results are shown for a synthesis problem in porous media.
ETPL
DIP-135
Per-Colorant-Channel Color Barcodes for Mobile Applications: An Interference
Cancellation Framework
Abstract: We propose a color barcode framework for mobile phone applications by exploiting the spectral
diversity afforded by the cyan (C), magenta (M), and yellow (Y) print colorant channels commonly used
for color printing and the complementary red (R), green (G), and blue (B) channels, respectively, used for
capturing color images. Specifically, we exploit this spectral diversity to realize a three-fold increase in the data rate by encoding independent data in the C, M, and Y print colorant channels and decoding the
data from the complementary R, G, and B channels captured via a mobile phone camera. To mitigate the
effect of cross-channel interference among the print-colorant and capture color channels, we develop an algorithm for interference cancellation based on a physically-motivated mathematical model for the print
and capture processes. To estimate the model parameters required for cross-channel interference
cancellation, we propose two alternative methodologies: a pilot block approach that uses suitable
selections of colors for the synchronization blocks and an expectation maximization approach that estimates the parameters from regions encoding the data itself. We evaluate the performance of the
proposed framework using specific implementations of the framework for two of the most commonly
used barcodes in mobile applications, QR and Aztec codes. Experimental results show that the proposed
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
framework successfully overcomes the impact of the color interference, providing a low bit error rate and
a high decoding rate for each of the colorant channels when used with a corresponding error correction
scheme.
ETPL
DIP-136 Segmented Gray-Code Kernels for Fast Pattern Matching
Abstract: The gray-code kernels (GCK) family, which has Walsh Hadamard transform on sliding
windows as a member, is a family of kernels that can perform image analysis efficiently using a fast
algorithm, such as the GCK algorithm. The GCK has been successfully used for pattern matching. In this paper, we propose that the G4-GCK algorithm is more efficient than the previous algorithm in computing
GCK. The G4-GCK algorithm requires four additions per pixel for three basis vectors independent of
transform size and dimension. Based on the G4-GCK algorithm, we then propose the segmented GCK. By segmenting input data into Ls parts, the SegGCK requires only four additions per pixel for 3Ls basis
vectors. Experimental results show that the proposed algorithm can significantly accelerate the full-search
equivalent pattern matching process and outperforms state-of-the-art methods.
ETPL
DIP-137 Video Processing for Human Perceptual Visual Quality-Oriented Video Coding
Abstract: We have developed a video processing method that achieves human perceptual visual quality-
oriented video coding. The patterns of moving objects are modeled by considering the limited human
capacity for spatial-temporal resolution and the visual sensory memory together, and an online moving pattern classifier is devised by using the Hedge algorithm. The moving pattern classifier is embedded in
the existing visual saliency with the purpose of providing a human perceptual video quality saliency
model. In order to apply the developed saliency model to video coding, the conventional foveation filtering method is extended. The proposed foveation filter can smooth and enhance the video signals
locally, in conformance with the developed saliency model, without causing any artifacts. The
performance evaluation results confirm that the proposed video processing method shows reliable
improvements in the perceptual quality for various sequences and at various bandwidths, compared to existing saliency-based video coding methods.
ETPL
DIP-138 Additive Log-Logistic Model for Networked Video Quality Assessment
Abstract: Modeling subjective opinions on visual quality is a challenging problem, which closely relates to many factors of the human perception. In this paper, the additive log-logistic model (ALM) is proposed
to formulate such a multidimensional nonlinear problem. The log-logistic model has flexible monotonic
or nonmonotonic partial derivatives and thus is suitable to model various uni-type impairments. The proposed ALM metric adds the distortions due to each type of impairment in a log-logistic transformed
space of subjective opinions. The features can be evaluated and selected by classic statistical inference,
and the model parameters can be easily estimated. Cross validations on five Telecommunication
Standardization Sector of International Telecommunication Union (ITU-T) subjectively-rated databases confirm that: 1) based on the same features, the ALM outper-forms the support vector regression and the
logistic model in quality prediction and, 2) the resultant no-reference quality met-ric based on
impairment-relevant video parameters achieves high correlation with a total of 27 216 subjective opinions on 1134 video clips, even compared with existing full-reference quality metrics based on pixel
differences. The ALM metric wins the model competition of the ITU-T Study Group 12 (where the
validation databases are independent with the training databases) and thus is being put forth into ITU-T
Recommendation P.1202.2 for the consent of ITU-T.
ETPL
DIP-139
Linear Feature Separation From Topographic Maps Using Energy Density and the
Shear Transform
Abstract: Linear features are difficult to be separated from complicated background in color scanned
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
topographic maps, especially when the color of linear features approximate to that of background in some
particular images. This paper presents a method, which is based on energy density and the shear
transform, for the separation of lines from background. First, the shear transform, which could add the directional characteristics of the lines, is introduced to overcome the disadvantage that linear information
loss would happen if the separation method is used in an image, which is in only one direction. Then
templates in the horizontal and vertical directions are built to separate lines from background on account of the fact that the energy concentration of the lines usually reaches a higher level than that of the
background in the negtive image. Furthermore, the remaining grid background can be wiped off by grid
templates matching. The isolated patches, which include only one pixel or less than ten pixels, are
removed according to the connected region area measurement. Finally, using the union operation, the linear features obtained in different sheared images could supplement each other, thus the lines of the
final result are more complete. The basic property of this method is introducing the energy density instead
of color information commonly used in traditional methods. The experiment results indicate that the proposed method could distinguish the linear features from the background more effectively, and obtain
good results for its ability in changing the directions of the lines with the shear transform.
ETPL
DIP-140
De-Interlacing Using Nonlocal Costs and Markov-Chain-Based Estimation of
Interpolation Methods
Abstract: A new method of de-interlacing is proposed. De-interlacing is revisited as the problem of assigning a sequence of interpolation methods (interpolators) to a sequence of missing pixels of an
interlaced frame (field). With this assumption, our de-interlacing algorithm (de-interlacer), undergoes
transitions from one interpolation method to another, as it moves from one missing pixel position to the horizontally adjacent missing pixel position in a missing row of a field. We assume a discrete countable-
state Markov-chain model on the sequence of interpolators (Markov-chain states) which are selected from
a user-defined set of candidate interpolators. An estimation of the optimum sequence of interpolators with the aforementioned Markov-chain model requires the definition of an efficient cost function as well as a
global optimization technique. Our algorithm introduces for the first time using a nonlocal cost (NLC)
scheme. The proposed algorithm uses the NLC to not only measure the fitness of an interpolator at a
missing pixel position, but also to derive an approximation for transition matrix (TM) of the Markov-chain of interpolators. The TM in our algorithm is a frame-variate matrix, i.e., the algorithm updates the
TM for each frame automatically. The algorithm finally uses a Viterbi algorithm to find the global
optimum sequence of interpolators given the cost function defined and neighboring original pixels in hand. Next, we introduce a new MAP-based formulation for the estimation of the sequence of
interpolators this time not by estimating the best sequence of interpolators but by successive estimations
of the best interpolator at each missing pixel using Forward-Backward algorithm. Simulation results
prove that, while competitive with each other on different test sequences, the proposed methods (one using Viterbi and the other Forward-Backward algorithm) are superior to state-of-the-art de-interlacing
algorithms proposed recently. Finally, we propose motion compensa- ed versions of our algorithm based
on optical flow computation methods and discuss how it can improve the proposed algorithm.
ETPL
DIP-141 Pose-Invariant Face Recognition Using Markov Random Fields
Abstract: One of the key challenges for current face recognition techniques is how to handle pose
variations between the probe and gallery face images. In this paper, we present a method for reconstructing the virtual frontal view from a given nonfrontal face image using Markov random fields
(MRFs) and an efficient variant of the belief propagation algorithm. In the proposed approach, the input
face image is divided into a grid of overlapping patches, and a globally optimal set of local warps is
estimated to synthesize the patches at the frontal view. A set of possible warps for each patch is obtained by aligning it with images from a training database of frontal faces. The alignments are performed
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
efficiently in the Fourier domain using an extension of the Lucas-Kanade algorithm that can handle
illumination variations. The problem of finding the optimal warps is then formulated as a discrete
labeling problem using an MRF. The reconstructed frontal face image can then be used with any face recognition technique. The two main advantages of our method are that it does not require manually
selected facial landmarks or head pose estimation. In order to improve the performance of our pose
normalization method in face recognition, we also present an algorithm for classifying whether a given face image is at a frontal or nonfrontal pose. Experimental results on different datasets are presented to
demonstrate the effectiveness of the proposed approach.
ETPL
DIP-142 Objective-Guided Image Annotation
Abstract: Automatic image annotation, which is usually formulated as a multi-label classification
problem, is one of the major tools used to enhance the semantic understanding of web images. Many
multimedia applications (e.g., tag-based image retrieval) can greatly benefit from image annotation. However, the insufficient performance of image annotation methods prevents these applications from
being practical. On the other hand, specific measures are usually designed to evaluate how well one
annotation method performs for a specific objective or application, but most image annotation methods do not consider optimization of these measures, so that they are inevitably trapped into suboptimal
performance of these objective-specific measures. To address this issue, we first summarize a variety of
objective-guided performance measures under a unified representation. Our analysis reveals that macro-
averaging measures are very sensitive to infrequent keywords, and hamming measure is easily affected by skewed distributions. We then propose a unified multi-label learning framework, which directly
optimizes a variety of objective-specific measures of multi-label learning tasks. Specifically, we first
present a multilayer hierarchical structure of learning hypotheses for multi-label problems based on which a variety of loss functions with respect to objective-guided measures are defined. And then, we formulate
these loss functions as relaxed surrogate functions and optimize them by structural SVMs. According to
the analysis of various measures and the high time complexity of optimizing micro-averaging measures,
in this paper, we focus on example-based measures that are tailor-made for image annotation tasks but are seldom explored in the literature. Experiments show consistency with the formal analysis on two widely
used multi-label datasets, and demonstrate the superior performance of our proposed method over state-
of-the-art baseline methods in terms of example-based measures on four - mage annotation datasets.
ETPL
DIP-143 Multiview Coding Mode Decision With Hybrid Optimal Stopping Model
Abstract: In a generic decision process, optimal stopping theory aims to achieve a good tradeoff between
decision performance and time consumed, with the advantages of theoretical decision-making and predictable decision performance. In this paper, optimal stopping theory is employed to develop an
effective hybrid model for the mode decision problem, which aims to theoretically achieve a good
tradeoff between the two interrelated measurements in mode decision, as computational complexity
reduction and rate-distortion degradation. The proposed hybrid model is implemented and examined with a multiview encoder. To support the model and further promote coding performance, the multiview
coding mode characteristics, including predicted mode probability and estimated coding time, are jointly
investigated with inter-view correlations. Exhaustive experimental results with a wide range of video resolutions reveal the efficiency and robustness of our method, with high decision accuracy, negligible
computational overhead, and almost intact rate-distortion performance compared to the original encoder.
ETPL
DIP-144 Joint Framework for Motion Validity and Estimation Using Block Overlap
Abstract: This paper presents a block-overlap-based validity metric for use as a measure of motion vector
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
(MV) validity and to improve the quality of the motion field. In contrast to other validity metrics in the
literature, the proposed metric is not sensitive to image features and does not require the use of
neighboring MVs or manual thresholds. Using a hybrid de-interlacer, it is shown that the proposed metric outperforms other block-based validity metrics in the literature. To help regularize the ill-posed nature of
motion estimation, the proposed validity metric is also used as a regularizer in an energy minimization
framework to determine the optimal MV. Experimental results show that the proposed energy minimization framework outperforms several existing motion estimation methods in the literature in
terms of MV and interpolation quality. For interpolation quality, our algorithm outperforms all other
block-based methods as well as several complex optical flow methods. In addition, it is one of the fastest
implementations at the time of this writing.
ETPL
DIP-145 Nonlocally Centralized Sparse Representation for Image Restoration
Abstract: Sparse representation models code an image patch as a linear combination of a few atoms
chosen out from an over-complete dictionary, and they have shown promising results in various image restoration applications. However, due to the degradation of the observed image (e.g., noisy, blurred,
and/or down-sampled), the sparse representations by conventional models may not be accurate enough
for a faithful reconstruction of the original image. To improve the performance of sparse representation-
based image restoration, in this paper the concept of sparse coding noise is introduced, and the goal of image restoration turns to how to suppress the sparse coding noise. To this end, we exploit the image
nonlocal self-similarity to obtain good estimates of the sparse coding coefficients of the original image,
and then centralize the sparse coding coefficients of the observed image to those estimates. The so-called nonlocally centralized sparse representation (NCSR) model is as simple as the standard sparse
representation model, while our extensive experiments on various types of image restoration problems,
including denoising, deblurring and super-resolution, validate the generality and state-of-the-art performance of the proposed NCSR algorithm.
ETPL
DIP-146 Image Segmentation Using a Sparse Coding Model of Cortical Area V1
Abstract: Algorithms that encode images using a sparse set of basis functions have previously been
shown to explain aspects of the physiology of a primary visual cortex (V1), and have been used for applications, such as image compression, restoration, and classification. Here, a sparse coding algorithm,
that has previously been used to account for the response properties of orientation tuned cells in primary
visual cortex, is applied to the task of perceptually salient boundary detection. The proposed algorithm is currently limited to using only intensity information at a single scale. However, it is shown to out-
perform the current state-of-the-art image segmentation method (Pb) when this method is also restricted
to using the same information.
ETPL
DIP-147 Circular Reranking for Visual Search
Abstract: Search reranking is regarded as a common way to boost retrieval precision. The problem
nevertheless is not trivial especially when there are multiple features or modalities to be considered for
search, which often happens in image and video retrieval. This paper proposes a new reranking algorithm, named circular reranking, that reinforces the mutual exchange of information across multiple modalities
for improving search performance, following the philosophy that strong performing modality could learn
from weaker ones, while weak modality does benefit from interacting with stronger ones. Technically,
circular reranking conducts multiple runs of random walks through exchanging the ranking scores among different features in a cyclic manner. Unlike the existing techniques, the reranking procedure encourages
interaction among modalities to seek a consensus that are useful for reranking. In this paper, we study
several properties of circular reranking, including how and which order of information propagation
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
should be configured to fully exploit the potential of modalities for reranking. Encouraging results are
reported for both image and video retrieval on Microsoft Research Asia Multimedia image dataset and
TREC Video Retrieval Evaluation 2007-2008 datasets, respectively.
ETPL
DIP-148
On Random Field Completely Automated Public Turing Test to Tell Computers and
Humans Apart Generation
Abstract: Herein, we propose generating CAPTCHAs through random field simulation and give a novel,
effective and efficient algorithm to do so. Indeed, we demonstrate that sufficient information about word
tests for easy human recognition is contained in the site marginal probabilities and the site-to-nearby-site covariances and that these quantities can be embedded directly into certain conditional probabilities,
designed for effective simulation. The CAPTCHAs are then partial random realizations of the random
CAPTCHA word. We start with an initial random field (e.g., randomly scattered letter pieces) and use Gibbs resampling to re-simulate portions of the field repeatedly using these conditional probabilities until
the word becomes human-readable. The residual randomness from the initial random field together with
the random implementation of the CAPTCHA word provide significant resistance to attack. This results in a CAPTCHA, which is unrecognizable to modern optical character recognition but is recognized about
95% of the time in a human readability study.
ETPL
DIP-149 Active Contours Driven by the Salient Edge Energy Model
Abstract: In this brief, we present a new indicator, i.e., salient edge energy, for guiding a given contour robustly and precisely toward the object boundary. Specifically, we define the salient edge energy by
exploiting the higher order statistics on the diffusion space, and incorporate it into a variational level set
formulation with the local region-based segmentation energy for solving the problem of curve evolution.
In contrast to most previous methods, the proposed salient edge energy allows the curve to find only significant local minima relevant to the object boundary even in the noisy and cluttered background.
Moreover, the segmentation performance derived from our new energy is less sensitive to the size of local
windows compared with other recently developed methods, owing to the ability of our energy function to suppress diverse clutters. The proposed method has been tested on various images, and experimental
results show that the salient edge energy effectively drives the active contour both qualitatively and
quantitatively compared to various state-of-the-art methods.
ETPL
DIP-150 Bayesian Saliency via Low and Mid Level Cues
Abstract: Visual saliency detection is a challenging problem in computer vision, but one of great
importance and numerous applications. In this paper, we propose a novel model for bottom-up saliency
within the Bayesian framework by exploiting low and mid level cues. In contrast to most existing methods that operate directly on low level cues, we propose an algorithm in which a coarse saliency
region is first obtained via a convex hull of interest points. We also analyze the saliency information with
mid level visual cues via superpixels. We present a Laplacian sparse subspace clustering method to group superpixels with local features, and analyze the results with respect to the coarse saliency region to
compute the prior saliency map. We use the low level visual cues based on the convex hull to compute
the observation likelihood, thereby facilitating inference of Bayesian saliency at each pixel. Extensive
experiments on a large data set show that our Bayesian saliency model performs favorably against the state-of-the-art algorithms.
ETPL
DIP-151 Exemplar-Based Image Inpainting Using Multiscale Graph Cuts
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Abstract: We present a novel formulation of exemplar-based inpainting as a global energy optimization
problem, written in terms of the offset map. The proposed energy function combines a data attachment
term that ensures the continuity of reconstruction at the boundary of the inpainting domain with a smoothness term that ensures a visually coherent reconstruction inside the hole. This formulation is
adapted to obtain a global minimum using the graph cuts algorithm. To reduce the computational
complexity, we propose an efficient multiscale graph cuts algorithm. To compensate the loss of information at low resolution levels, we use a feature representation computed at the original image
resolution. This permits alleviation of the ambiguity induced by comparing only color information when
the image is represented at low resolution levels. Our experiments show how well the proposed algorithm
performs compared with other recent algorithms.
ETPL
DIP-152 Activity Recognition Using a Mixture of Vector Fields
Abstract: The analysis of moving objects in image sequences (video) has been one of the major themes in
computer vision. In this paper, we focus on video-surveillance tasks; more specifically, we consider pedestrian trajectories and propose modeling them through a small set of motion/vector fields together
with a space-varying switching mechanism. Despite the diversity of motion patterns that can occur in a
given scene, we show that it is often possible to find a relatively small number of typical behaviors, and
model each of these behaviors by a “simple” motion field. We increase the expressiveness of the formulation by allowing the trajectories to switch from one motion field to another, in a space-dependent
manner. We present an expectation-maximization algorithm to learn all the parameters of the model, and
apply it to trajectory classification tasks. Experiments with both synthetic and real data support the claims about the performance of the proposed approach.
ETPL
DIP-153 Low-Resolution Face Tracker Robust to Illumination Variations
Abstract: In many practical video surveillance applications, the faces acquired by outdoor cameras are of low resolution and are affected by uncontrolled illumination. Although significant efforts have been made
to facilitate face tracking or illumination normalization in unconstrained videos, the approaches
developed may not be effective in video surveillance applications. This is because: 1) a low-resolution face contains limited information, and 2) major changes in illumination on a small region of the face
make the tracking ineffective. To overcome this problem, this paper proposes to perform tracking in an
illumination-insensitive feature space, called the gradient logarithm field (GLF) feature space. The GLF feature mainly depends on the intrinsic characteristics of a face and is only marginally affected by the
lighting source. In addition, the GLF feature is a global feature and does not depend on a specific face
model, and thus is effective in tracking low-resolution faces. Experimental results show that the proposed
GLF-based tracker works well under significant illumination changes and outperforms many state-of-the-art tracking algorithms.
ETPL
DIP-154 Local Directional Number Pattern for Face Analysis: Face and Expression Recognition
Abstract: This paper proposes a novel local feature descriptor, local directional number pattern (LDN), for face analysis, i.e., face and expression recognition. LDN encodes the directional information of the
face's textures (i.e., the texture's structure) in a compact way, producing a more discriminative code than
current methods. We compute the structure of each micro-pattern with the aid of a compass mask that
extracts directional information, and we encode such information using the prominent direction indices (directional numbers) and sign-which allows us to distinguish among similar structural patterns that have
different intensity transitions. We divide the face into several regions, and extract the distribution of the
LDN features from them. Then, we concatenate these features into a feature vector, and we use it as a
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
face descriptor. We perform several experiments in which our descriptor performs consistently under
illumination, noise, expression, and time lapse variations. Moreover, we test our descriptor with different
masks to analyze its performance in different face analysis tasks.
ETPL
DIP-155 Regularized Robust Coding for Face Recognition
Abstract: Recently the sparse representation based classification (SRC) has been proposed for robust face
recognition (FR). In SRC, the testing image is coded as a sparse linear combination of the training
samples, and the representation fidelity is measured by the l2-norm or l1 -norm of the coding residual. Such a sparse coding model assumes that the coding residual follows Gaussian or Laplacian distribution,
which may not be effective enough to describe the coding residual in practical FR systems. Meanwhile,
the sparsity constraint on the coding coefficients makes the computational cost of SRC very high. In this paper, we propose a new face coding model, namely regularized robust coding (RRC), which could
robustly regress a given signal with regularized regression coefficients. By assuming that the coding
residual and the coding coefficient are respectively independent and identically distributed, the RRC seeks for a maximum a posterior solution of the coding problem. An iteratively reweighted regularized
robust coding (IR3C) algorithm is proposed to solve the RRC model efficiently. Extensive experiments on
representative face databases demonstrate that the RRC is much more effective and efficient than state-
of-the-art sparse representation based methods in dealing with face occlusion, corruption, lighting, and expression changes, etc.
ETPL
DIP-156 Exploration of Optimal Many-Core Models for Efficient Image Segmentation
Abstract: Image segmentation plays a crucial role in numerous biomedical imaging applications, assisting clinicians or health care professionals with diagnosis of various diseases using scientific data. However,
its high computational complexities require substantial amount of time and have limited their
applicability. Research has thus focused on parallel processing models that support biomedical image
segmentation. In this paper, we present analytical results of the design space exploration of many-core processors for efficient fuzzy c-means (FCM) clustering, which is widely used in many medical image
segmentations. We quantitatively evaluate the impact of varying a number of processing elements (PEs)
and an amount of local memory for a fixed image size on system performance and efficiency using architectural and workload simulations. Experimental results indicate that PEs=4,096 provides the most
efficient operation for the FCM algorithm with four clusters, while PEs=1,024 and PEs=4,096 yield the
highest area efficiency and energy efficiency, respectively, for three clusters.
ETPL
DIP-157 Active Contour-Based Visual Tracking by Integrating Colors, Shapes, and Motions
Abstract: In this paper, we present a framework for active contour-based visual tracking using level sets.
The main components of our framework include contour-based tracking initialization, color-based
contour evolution, adaptive shape-based contour evolution for non-periodic motions, dynamic shape-based contour evolution for periodic motions, and the handling of abrupt motions. For the initialization of
contour-based tracking, we develop an optical flow-based algorithm for automatically initializing
contours at the first frame. For the color-based contour evolution, Markov random field theory is used to measure correlations between values of neighboring pixels for posterior probability estimation. For
adaptive shape-based contour evolution, the global shape information and the local color information are
combined to hierarchically evolve the contour, and a flexible shape updating model is constructed. For
the dynamic shape-based contour evolution, a shape mode transition matrix is learnt to characterize the temporal correlations of object shapes. For the handling of abrupt motions, particle swarm optimization is
adopted to capture the global motion which is applied to the contour in the current frame to produce an
initial contour in the next frame.
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
ETPL
DIP-158 Image Quality Assessment Using Multi-Method Fusion
Abstract: A new methodology for objective image quality assessment (IQA) with multi-method fusion
(MMF) is presented in this paper. The research is motivated by the observation that there is no single method that can give the best performance in all situations. To achieve MMF, we adopt a regression
approach. The new MMF score is set to be the nonlinear combination of scores from multiple methods
with suitable weights obtained by a training process. In order to improve the regression results further, we divide distorted images into three to five groups based on the distortion types and perform regression
within each group, which is called “context-dependent MMF” (CD-MMF). One task in CD-MMF is to
determine the context automatically, which is achieved by a machine learning approach. To further reduce the complexity of MMF, we perform algorithms to select a small subset from the candidate
method set. The result is very good even if only three quality assessment methods are included in the
fusion process. The proposed MMF method using support vector regression is shown to outperform a
large number of existing IQA methods by a significant margin when being tested in six representative databases.
ETPL
DIP-159 Robust Radial Face Detection for Omnidirectional Vision
Abstract: Bio-inspired and non-conventional vision systems are highly researched topics. Among them, omnidirectional vision systems have demonstrated their ability to significantly improve the geometrical
interpretation of scenes. However, few researchers have investigated how to perform object detection
with such systems. The existing approaches require a geometrical transformation prior to the
interpretation of the picture. In this paper, we investigate what must be taken into account and how to process omnidirectional images provided by the sensor. We focus our research on face detection and
highlight the fact that particular attention should be paid to the descriptors in order to successfully
perform face detection on omnidirectional images. We demonstrate that this choice is critical to obtaining high detection rates. Our results imply that the adaptation of existing object-detection frameworks,
designed for perspective images, should be focused on the choice of appropriate image descriptors in the
design of the object-detection pipeline.
ETPL
DIP-160 Optimized 3D Watermarking for Minimal Surface Distortion
Abstract: This paper proposes a new approach to 3D watermarking by ensuring the optimal preservation
of mesh surfaces. A new 3D surface preservation function metric is defined consisting of the distance of a
vertex displaced by watermarking to the original surface, to the watermarked object surface as well as the actual vertex displacement. The proposed method is statistical, blind, and robust. Minimal surface
distortion according to the proposed function metric is enforced during the statistical watermark
embedding stage using Levenberg-Marquardt optimization method. A study of the watermark code crypto-security is provided for the proposed methodology. According to the experimental results, the
proposed methodology has high robustness against the common mesh attacks while preserving the
original object surface during watermarking
ETPL
DIP-161
Approximate Least Trimmed Sum of Squares Fitting and Applications in Image
Analysis
Abstract: The least trimmed sum of squares (LTS) regression estimation criterion is a robust statistical
method for model fitting in the presence of outliers. Compared with the classical least squares estimator,
which uses the entire data set for regression and is consequently sensitive to outliers, LTS identifies the outliers and fits to the remaining data points for improved accuracy. Exactly solving an LTS problem is
NP-hard, but as we show here, LTS can be formulated as a concave minimization problem. Since it is
usually tractable to globally solve a convex minimization or concave maximization problem in
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
polynomial time, inspired by , we instead solve LTS' approximate complementary problem, which is
convex minimization. We show that this complementary problem can be efficiently solved as a second
order cone program. We thus propose an iterative procedure to approximately solve the original LTS problem. Our extensive experiments demonstrate that the proposed method is robust, efficient and
scalable in dealing with problems where data are contaminated with outliers. We show several
applications of our method in image analysis.
ETPL
DIP-162 Design of Low-Complexity High-Performance Wavelet Filters for Image Analysis
Abstract: This paper addresses the construction of a family of wavelets based on halfband polynomials.
An algorithm is proposed that ensures maximum zeros at for a desired length of analysis and synthesis
filters. We start with the coefficients of the polynomial and then use a generalized matrix formulation method to construct the filter halfband polynomial. The designed wavelets are efficient and give
acceptable levels of peak signal-to-noise ratio when used for image compression. Furthermore, these
wavelets give satisfactory recognition rates when used for feature extraction. Simulation results show that the designed wavelets are effective and more efficient than the existing standard wavelets.
ETPL
DIP-163
Noise Reduction Based on Partial-Reference, Dual-Tree Complex Wavelet Transform
Shrinkage
Abstract: This paper presents a novel way to reduce noise introduced or exacerbated by image
enhancement methods, in particular algorithms based on the random spray sampling technique, but not only. According to the nature of sprays, output images of spray-based methods tend to exhibit noise with
unknown statistical distribution. To avoid inappropriate assumptions on the statistical characteristics of
noise, a different one is made. In fact, the non-enhanced image is considered to be either free of noise or affected by non-perceivable levels of noise. Taking advantage of the higher sensitivity of the human
visual system to changes in brightness, the analysis can be limited to the luma channel of both the non-
enhanced and enhanced image. Also, given the importance of directional content in human vision, the
analysis is performed through the dual-tree complex wavelet transform (DTWCT). Unlike the discrete wavelet transform, the DTWCT allows for distinction of data directionality in the transform space. For
each level of the transform, the standard deviation of the non-enhanced image coefficients is computed
across the six orientations of the DTWCT, then it is normalized. The result is a map of the directional structures present in the non-enhanced image. Said map is then used to shrink the coefficients of the
enhanced image. The shrunk coefficients and the coefficients from the non-enhanced image are then
mixed according to data directionality. Finally, a noise-reduced version of the enhanced image is computed via the inverse transforms. A thorough numerical analysis of the results has been performed in
order to confirm the validity of the proposed approach.
ETPL
DIP-164 Hessian Schatten-Norm Regularization for Linear Inverse Problems
Abstract: We introduce a novel family of invariant, convex, and non-quadratic functionals that we employ to derive regularized solutions of ill-posed linear inverse imaging problems. The proposed
regularizers involve the Schatten norms of the Hessian matrix, which are computed at every pixel of the
image. They can be viewed as second-order extensions of the popular total-variation (TV) semi-norm since they satisfy the same invariance properties. Meanwhile, by taking advantage of second-order
derivatives, they avoid the staircase effect, a common artifact of TV-based reconstructions, and perform
well for a wide range of applications. To solve the corresponding optimization problems, we propose an
algorithm that is based on a primal-dual formulation. A fundamental ingredient of this algorithm is the projection of matrices onto Schatten norm balls of arbitrary radius. This operation is performed efficiently
based on a direct link we provide between vector projections onto norm balls and matrix projections onto
Schatten norm balls. Finally, we demonstrate the effectiveness of the proposed methods through
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
experimental results on several inverse imaging problems with real and simulated data.
ETPL
DIP-165 Structured Sparse Error Coding for Face Recognition With Occlusion
Abstract: Face recognition with occlusion is common in the real world. Inspired by the works of
structured sparse representation, we try to explore the structure of the error incurred by occlusion from
two aspects: the error morphology and the error distribution. Since human beings recognize the occlusion mainly according to its region shape or profile without knowing accurately what the occlusion is, we
argue that the shape of the occlusion is also an important feature. We propose a morphological graph
model to describe the morphological structure of the error. Due to the uncertainty of the occlusion, the
distribution of the error incurred by occlusion is also uncertain. However, we observe that the unoccluded part and the occluded part of the error measured by the correntropy induced metric follow the exponential
distribution, respectively. Incorporating the two aspects of the error structure, we propose the structured
sparse error coding for face recognition with occlusion. Our extensive experiments demonstrate that the proposed method is more stable and has higher breakdown point in dealing with the occlusion problems
in face recognition as compared to the related state-of-the-art methods, especially for the extreme
situation, such as the high level occlusion and the low feature dimension.
ETPL
DIP-166
Accurate Multiple View 3D Reconstruction Using Patch-Based Stereo for Large-Scale
Scenes
Abstract: In this paper, we propose a depth-map merging based multiple view stereo method for large-
scale scenes which takes both accuracy and efficiency into account. In the proposed method, an efficient
patch-based stereo matching process is used to generate depth-map at each image with acceptable errors, followed by a depth-map refinement process to enforce consistency over neighboring views. Compared to
state-of-the-art methods, the proposed method can reconstruct quite accurate and dense point clouds with
high computational efficiency. Besides, the proposed method could be easily parallelized at image level, i.e., each depth-map is computed individually, which makes it suitable for large-scale scene
reconstruction with high resolution images. The accuracy and efficiency of the proposed method are
evaluated quantitatively on benchmark data and qualitatively on large data sets.
ETPL
DIP-167 Mixed-Domain Edge-Aware Image Manipulation
Abstract: This paper presents a novel approach to edge-aware image manipulation. Our method processes
a Gaussian pyramid from coarse to fine, and at each level, applies a nonlinear filter bank to the
neighborhood of each pixel. Outputs of these spatially-varying filters are merged using global optimization. The optimization problem is solved using an explicit mixed-domain (real space and DCT
transform space) solution, which is efficient, accurate, and easy-to-implement. We demonstrate
applications of our method to a set of problems, including detail and contrast manipulation, HDR compression, nonphotorealistic rendering, and haze removal.
ETPL
DIP-168 Monocular Depth Ordering Using T-Junctions and Convexity Occlusion Cues
Abstract: This paper proposes a system that relates objects in an image using occlusion cues and arranges
them according to depth. The system does not rely on a priori knowledge of the scene structure and focuses on detecting special points, such as T-junctions and highly convex contours, to infer the depth
relationships between objects in the scene. The system makes extensive use of the binary partition tree as
hierarchical region-based image representation jointly with a new approach for candidate T-junction estimation. Since some regions may not involve T-junctions, occlusion is also detected by examining
convex shapes on region boundaries. Combining T-junctions and convexity leads to a system which only
relies on low level depth cues and does not rely on semantic information. However, it shows a similar or
better performance with the state-of-the-art while not assuming any type of scene. As an extension of the
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
automatic depth ordering system, a semi-automatic approach is also proposed. If the user provides the
depth order for a subset of regions in the image, the system is able to easily integrate this user information
to the final depth order for the complete image. For some applications, user interaction can naturally be integrated, improving the quality of the automatically generated depth map
ETPL
DIP-169
Perceptual Full-Reference Quality Assessment of Stereoscopic Images by Considering
Binocular Visual Characteristics
Abstract: Perceptual quality assessment is a challenging issue in 3D signal processing research. It is
important to study 3D signal directly instead of studying simple extension of the 2D metrics directly to the 3D case as in some previous studies. In this paper, we propose a new perceptual full-reference quality
assessment metric of stereoscopic images by considering the binocular visual characteristics. The major
technical contribution of this paper is that the binocular perception and combination properties are considered in quality assessment. To be more specific, we first perform left-right consistency checks and
compare matching error between the corresponding pixels in binocular disparity calculation, and classify
the stereoscopic images into non-corresponding, binocular fusion, and binocular suppression regions. Also, local phase and local amplitude maps are extracted from the original and distorted stereoscopic
images as features in quality assessment. Then, each region is evaluated independently by considering its
binocular perception property, and all evaluation results are integrated into an overall score. Besides, a
binocular just noticeable difference model is used to reflect the visual sensitivity for the binocular fusion and suppression regions. Experimental results show that compared with the relevant existing metrics, the
proposed metric can achieve higher consistency with subjective assessment of stereoscopic images.
ETPL
DIP-170 Multi-Wiener SURE-LET Deconvolution
Abstract: In this paper, we propose a novel deconvolution algorithm based on the minimization of a
regularized Stein's unbiased risk estimate (SURE), which is a good estimate of the mean squared error.
We linearly parametrize the deconvolution process by using multiple Wiener filters as elementary
functions, followed by undecimated Haar-wavelet thresholding. Due to the quadratic nature of SURE and the linear parametrization, the deconvolution problem finally boils down to solving a linear system of
equations, which is very fast and exact. The linear coefficients, i.e., the solution of the linear system of
equations, constitute the best approximation of the optimal processing on the Wiener-Haar-threshold basis that we consider. In addition, the proposed multi-Wiener SURE-LET approach is applicable for
both periodic and symmetric boundary conditions, and can thus be used in various practical scenarios.
The very competitive (both in computation time and quality) results show that the proposed algorithm, which can be interpreted as a kind of nonlinear Wiener processing, can be used as a basic tool for
building more sophisticated deconvolution algorithms.
ETPL
DIP-171 Joint Reconstruction of Multiview Compressed Images
Abstract: Distributed representation of correlated multiview images is an important problem that arises in vision sensor networks. This paper concentrates on the joint reconstruction problem where the
distributively compressed images are decoded together in order to take benefit from the image
correlation. We consider a scenario where the images captured at different viewpoints are encoded independently using common coding solutions (e.g., JPEG) with a balanced rate distribution among
different cameras. A central decoder first estimates the inter-view image correlation from the
independently compressed data. The joint reconstruction is then cast as a constrained convex optimization
problem that reconstructs total-variation (TV) smooth images, which comply with the estimated correlation model. At the same time, we add constraints that force the reconstructed images to be as close
as possible to their compressed versions. We show through experiments that the proposed joint
reconstruction scheme outperforms independent reconstruction in terms of image quality, for a given
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
target bit rate. In addition, the decoding performance of our algorithm compares advantageously to state-
of-the-art distributed coding schemes based on motion learning and on the DISCOVER algorithm.
ETPL
DIP-172 Scalable Coding of Depth Maps With R-D Optimized Embedding
Abstract: Recent work on depth map compression has revealed the importance of incorporating a
description of discontinuity boundary geometry into the compression scheme. We propose a novel
compression strategy for depth maps that incorporates geometry information while achieving the goals of
scalability and embedded representation. Our scheme involves two separate image pyramid structures, one for breakpoints and the other for sub-band samples produced by a breakpoint-adaptive transform.
Breakpoints capture geometric attributes, and are amenable to scalable coding. We develop a rate-
distortion optimization framework for determining the presence and precision of breakpoints in the pyramid representation. We employ a variation of the EBCOT scheme to produce embedded bit-streams
for both the breakpoint and sub-band data. Compared to JPEG 2000, our proposed scheme enables the
same the scalability features while achieving substantially improved rate-distortion performance at the higher bit-rate range and comparable performance at the lower rates.
ETPL
DIP-173 Automatic Virus Particle Selection—The Entropy Approach
Abstract: This paper describes a fully automatic approach to locate icosahedral virus particles in
transmission electron microscopy images. The initial detection of the particles takes place through automatic segmentation of the entropy-proportion image; this image is computed in particular regions of
interest defined by two concentric structuring elements contained in a small overlapping window running
over all the image. Morphological features help to select the candidates, as the threshold is kept low enough to avoid false negatives. The candidate points are subject to a credibility test based on features
extracted from eight radial intensity profiles in each point from a texture image. A candidate is accepted
if these features meet the set of acceptance conditions describing the typical intensity profiles of these
kinds of particles. The set of points accepted is subjected to a last validation in a three-parameter space using a discrimination plan that is a function of the input image to separate possible outliers.
ETPL
DIP-174
A Tuned Mesh-Generation Strategy for Image Representation Based on Data-
Dependent Triangulation
Abstract: A mesh-generation framework for image representation based on data-dependent triangulation is proposed. The proposed framework is a modified version of the frameworks of Rippa and Garland and
Heckbert that facilitates the development of more effective mesh-generation methods. As the proposed
framework has several free parameters, the effects of different choices of these parameters on mesh quality are studied, leading to the recommendation of a particular set of choices for these parameters. A
mesh-generation method is then introduced that employs the proposed framework with these best
parameter choices. This method is demonstrated to produce meshes of higher quality (both in terms of
squared error and subjectively) than those generated by several competing approaches, at a relatively modest computational and memory cost.
ETPL
DIP-175 Accelerated Edge-Preserving Image Restoration Without Boundary Artifacts
Abstract: To reduce blur in noisy images, regularized image restoration methods have been proposed that use nonquadratic regularizers (like l1 regularization or total-variation) that suppress noise while
preserving edges in the image. Most of these methods assume a circulant blur (periodic convolution with
a blurring kernel) that can lead to wraparound artifacts along the boundaries of the image due to the
implied periodicity of the circulant model. Using a noncirculant model could prevent these artifacts at the cost of increased computational complexity. In this paper, we propose to use a circulant blur model
combined with a masking operator that prevents wraparound artifacts. The resulting model is
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
noncirculant, so we propose an efficient algorithm using variable splitting and augmented Lagrangian
(AL) strategies. Our variable splitting scheme, when combined with the AL framework and alternating
minimization, leads to simple linear systems that can be solved noniteratively using fast Fourier transforms (FFTs), eliminating the need for more expensive conjugate gradient-type solvers. The
proposed method can also efficiently tackle a variety of convex regularizers, including edge-preserving
(e.g., total-variation) and sparsity promoting (e.g., l1-norm) regularizers. Simulation results show fast convergence of the proposed method, along with improved image quality at the boundaries where the
circulant model is inaccurate.
ETPL
DIP-176
Box Relaxation Schemes in Staggered Discretizations for the Dual Formulation of
Total Variation Minimization
Abstract: In this paper, we propose some new box relaxation numerical schemes on staggered grids to solve the stationary system of partial differential equations arising from the dual minimization problem
associated with the total variation operator. We present in detail the numerical schemes for the scalar case
and its generalization to multichannel (vectorial) images. Then, we discuss their implementation in digital image denoising. The results outperform the resolution of the dual equation based on the gradient descent
approach and pave the way for more advanced numerical strategies.
ETPL
DIP-177 Constrained Optical Flow Estimation as a Matching Problem
Abstract: In general, discretization in the motion vector domain yields an intractable number of labels. In this paper, we propose an approach that can reduce general optical flow to the constrained matching
problem by pre-estimating a 2-D disparity labeling map of the desired discrete motion vector function.
One of the goals of the proposed paper is estimating coarse distribution of motion vectors and then utilizing this distribution as global constraints for discrete optical flow estimation. This pre-estimation is
done with a simple frame-to-frame correlation technique also known as the digital symmetric-phase-only-
filter (SPOF). We discover a strong correlation between the output of the SPOF and the motion vector
distribution of the related optical flow. A two step matching paradigm for optical flow estimation is applied: pixel accuracy (integer flow) and subpixel accuracy estimation. The matching problem is solved
by global optimization. Experiments on the Middlebury optical flow datasets confirm our intuitive
assumptions about strong correlation between motion vector distribution of optical flow and maximal peaks of SPOF outputs. The overall performance of the proposed method is promising and achieves state-
of-the-art results on the Middlebury benchmark
ETPL
DIP-178 Nonseparable Shearlet Transform
Abstract: Over the past few years, various representation systems which sparsely approximate functions
governed by anisotropic features, such as edges in images, have been proposed. Alongside the theoretical
development of these systems, algorithmic realizations of the associated transforms are provided.
However, one of the most common shortcomings of these frameworks is the lack of providing a unified treatment of the continuum and digital world, i.e., allowing a digital theory to be a natural digitization of
the continuum theory. In this paper, we introduce a new shearlet transform associated with a nonseparable
shearlet generator, which improves the directional selectivity of previous shearlet transforms. Our approach is based on a discrete framework, which allows a faithful digitization of the continuum domain
directional transform based on compactly supported shearlets introduced as means to sparsely encode
anisotropic singularities of multivariate data. We show numerical experiments demonstrating the
potential of our new shearlet transform in 2D and 3D image processing applications.
ETPL
DIP-179
Modeling and Classifying Human Activities From Trajectories Using a Class of Space-
Varying Parametric Motion Fields
Abstract: Many approaches to trajectory analysis, such as clustering or classification, use probabilistic
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
generative models, thus not requiring trajectory alignment/registration. Switched linear dynamical models
(e.g., HMMs) have been used in this context, due to their ability to describe different motion regimes.
However, these models are not suitable for handling space-dependent dynamics that are more naturally captured by nonlinear models. As is well known, these are more difficult to identify. In this paper, we
propose a new way of modeling trajectories, based on a mixture of parametric motion vector fields that
depend on a small number of parameters. Switching among these fields follows a probabilistic mechanism, characterized by a field of stochastic matrices. This approach allows representing a wide
variety of trajectories and modeling space-dependent behaviors without using global nonlinear dynamical
models. Experimental evaluation is conducted in both synthetic and real scenarios. The latter concerning
with human trajectory modeling for activity classification, a central task in video surveillance
ETPL
DIP-180 Real-Time Continuous Image Registration Enabling Ultraprecise 2-D Motion Tracking
Abstract: In this paper, we present a novel continuous image registration method (CIRM), which yields
near-zero bias and has high computational efficiency. It can be realized for real-time position estimation to enable ultraprecise 2-D motion tracking and motion control over a large motion range. As the two
variables of the method are continuous in spatial domain, pixel-level image registration is unnecessary,
thus the CIRM can continuously track the moving target according to the incoming target image. When
applied to a specific target object, measurement resolution of the method is predicted according to the reference image model of the object along with the variance of the camera's overall image noise. The
maximum permissible target speed is proportional to the permissible frame rate, which is limited by the
required computational time. The precision, measurement resolution, and computational efficiency of the method are verified through computer simulations and experiments. Specifically, the CIRM is
implemented and integrated with a visual sensing system. Near-zero bias, measurement resolution of 0.1
nm (0.0008 pixels), and measurement of one nanometer stepping are demonstrated.
ETPL
DIP-181
Unified Blind Method for Multi-Image Super-Resolution and Single/Multi-Image Blur
Deconvolution
Abstract: This paper presents, for the first time, a unified blind method for multi-image super-resolution
(MISR or SR), single-image blur deconvolution (SIBD), and multi-image blur deconvolution (MIBD) of
low-resolution (LR) images degraded by linear space-invariant (LSI) blur, aliasing, and additive white Gaussian noise (AWGN). The proposed approach is based on alternating minimization (AM) of a new
cost function with respect to the unknown high-resolution (HR) image and blurs. The regularization term
for the HR image is based upon the Huber-Markov random field (HMRF) model, which is a type of variational integral that exploits the piecewise smooth nature of the HR image. The blur estimation
process is supported by an edge-emphasizing smoothing operation, which improves the quality of blur
estimates by enhancing strong soft edges toward step edges, while filtering out weak structures. The
parameters are updated gradually so that the number of salient edges used for blur estimation increases at each iteration. For better performance, the blur estimation is done in the filter domain rather than the pixel
domain, i.e., using the gradients of the LR and HR images. The regularization term for the blur is
Gaussian (L2 norm), which allows for fast noniterative optimization in the frequency domain. We accelerate the processing time of SR reconstruction by separating the upsampling and registration
processes from the optimization procedure. Simulation results on both synthetic and real-life images
(from a novel computational imager) confirm the robustness and effectiveness of the proposed method.
ETPL
DIP-182 Informative State-Based Video Communication
Abstract: We study state-based video communication where a client simultaneously informs the server
about the presence status of various packets in its buffer. In sender-driven transmission, the client
periodically sends to the server a single acknowledgement packet that provides information about all
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
packets that have arrived at the client by the time the acknowledgment is sent. In receiver-driven
streaming, the client periodically sends to the server a single request packet that comprises a transmission
schedule for sending missing data to the client over a horizon of time. We develop a comprehensive optimization framework that enables computing packet transmission decisions that maximize the end-to-
end video quality for the given bandwidth resources, in both prospective scenarios. The core step of the
optimization comprises computing the probability that a single packet will be communicated in error as a function of the expected transmission redundancy (or cost) used to communicate the packet. Through
comprehensive simulation experiments, we carefully examine the performance advances that our
framework enables relative to state-of-the-art scheduling systems that employ regular acknowledgement
or request packets. Consistent gains in video quality of up to 2B are demonstrated across a variety of content types. We show that there is a direct analogy between the error-cost efficiency of streaming a
single packet and the overall rate-distortion performance of streaming the whole content. In the case of
sender-driven transmission, we develop an effective modeling approach that accurately characterizes the end-to-end performance as a function of the packet loss rate on the backward channel and the source
encoding characteristics
ETPL
DIP-183
Quantification of Smoothing Requirement for 3D Optic Flow Calculation of
Volumetric Images
Abstract: Complexities of dynamic volumetric imaging challenge the available computer vision techniques on a number of different fronts. This paper examines the relationship between the estimation
accuracy and required amount of smoothness for a general solution from a robust statistics perspective.
We show that a (surprisingly) small amount of local smoothing is required to satisfy both the necessary and sufficient conditions for accurate optic flow estimation. This notion is called “just enough”
smoothing, and its proper implementation has a profound effect on the preservation of local information
in processing 3D dynamic scans. To demonstrate the effect of “just enough” smoothing, a robust 3D optic flow method with quantized local smoothing is presented, and the effect of local smoothing on the
accuracy of motion estimation in dynamic lung CT images is examined using both synthetic and real
image sequences with ground truth.
ETPL
DIP-184 Analysis Operator Learning and its Application to Image Reconstruction
Abstract: Exploiting a priori known structural information lies at the core of many image reconstruction
methods that can be stated as inverse problems. The synthesis model, which assumes that images can be
decomposed into a linear combination of very few atoms of some dictionary, is now a well established tool for the design of image reconstruction algorithms. An interesting alternative is the analysis model,
where the signal is multiplied by an analysis operator and the outcome is assumed to be sparse. This
approach has only recently gained increasing interest. The quality of reconstruction methods based on an
analysis model severely depends on the right choice of the suitable operator. In this paper, we present an algorithm for learning an analysis operator from training images. Our method is based on lp-norm
minimization on the set of full rank matrices with normalized columns. We carefully introduce the
employed conjugate gradient method on manifolds, and explain the underlying geometry of the constraints. Moreover, we compare our approach to state-of-the-art methods for image denoising,
inpainting, and single image super-resolution. Our numerical results show competitive performance of
our general approach in all presented applications compared to the specialized state-of-the-art techniques
ETPL
DIP-185 Computational Model of Stereoscopic 3D Visual Saliency
Abstract: Many computational models of visual attention performing well in predicting salient areas of
2D images have been proposed in the literature. The emerging applications of stereoscopic 3D display
bring an additional depth of information affecting the human viewing behavior, and require extensions of
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
the efforts made in 2D visual modeling. In this paper, we propose a new computational model of visual
attention for stereoscopic 3D still images. Apart from detecting salient areas based on 2D visual features,
the proposed model takes depth as an additional visual dimension. The measure of depth saliency is derived from the eye movement data obtained from an eye-tracking experiment using synthetic stimuli.
Two different ways of integrating depth information in the modeling of 3D visual attention are then
proposed and examined. For the performance evaluation of 3D visual attention models, we have created an eye-tracking database, which contains stereoscopic images of natural content and is publicly available,
along with this paper. The proposed model gives a good performance, compared to that of state-of-the-art
2D models on 2D images. The results also suggest that a better performance is obtained when depth
information is taken into account through the creation of a depth saliency map, rather than when it is integrated by a weighting method.
ETPL
DIP-186 In-Plane Rotation and Scale Invariant Clustering Using Dictionaries
Abstract: In this paper, we present an approach that simultaneously clusters images and learns dictionaries from the clusters. The method learns dictionaries and clusters images in the radon transform
domain. The main feature of the proposed approach is that it provides both in-plane rotation and scale
invariant clustering, which is useful in numerous applications, including content-based image retrieval
(CBIR). We demonstrate the effectiveness of our rotation and scale invariant clustering method on a series of CBIR experiments. Experiments are performed on the Smithsonian isolated leaf, Kimia shape,
and Brodatz texture datasets. Our method provides both good retrieval performance and greater
robustness compared to standard Gabor-based and three state-of-the-art shape-based methods that have similar objectives.
ETPL
DIP-187 General Framework to Histogram-Shifting-Based Reversible Data Hiding
Abstract: Histogram shifting (HS) is a useful technique of reversible data hiding (RDH). With HS-based
RDH, high capacity and low distortion can be achieved efficiently. In this paper, we revisit the HS technique and present a general framework to construct HS-based RDH. By the proposed framework, one
can get a RDH algorithm by simply designing the so-called shifting and embedding functions. Moreover,
by taking specific shifting and embedding functions, we show that several RDH algorithms reported in the literature are special cases of this general construction. In addition, two novel and efficient RDH
algorithms are also introduced to further demonstrate the universality and applicability of our framework.
It is expected that more efficient RDH algorithms can be devised according to the proposed framework by carefully designing the shifting and embedding functions.
ETPL
DIP-188
Computationally Tractable Stochastic Image Modeling Based on Symmetric Markov
Mesh Random Fields
Abstract: In this paper, the properties of a new class of causal Markov random fields, named symmetric
Markov mesh random field, are initially discussed. It is shown that the symmetric Markov mesh random fields from the upper corners are equivalent to the symmetric Markov mesh random fields from the lower
corners. Based on this new random field, a symmetric, corner-independent, and isotropic image model is
then derived which incorporates the dependency of a pixel on all its neighbors. The introduced image model comprises the product of several local 1D density and 2D joint density functions of pixels in an
image thus making it computationally tractable and practically feasible by allowing the use of histogram
and joint histogram approximations to estimate the model parameters. An image restoration application is
also presented to confirm the effectiveness of the model developed. The experimental results demonstrate that this new model provides an improved tool for image modeling purposes compared to the
conventional Markov random field models.
ETPL Robust Ellipse Fitting Based on Sparse Combination of Data Points
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
DIP-189
Abstract: Ellipse fitting is widely applied in the fields of computer vision and automatic industry control, in which the procedure of ellipse fitting often follows the preprocessing step of edge detection in the
original image. Therefore, the ellipse fitting method also depends on the accuracy of edge detection
besides their own performance, especially due to the introduced outliers and edge point errors from edge
detection which will cause severe performance degradation. In this paper, we develop a robust ellipse fitting method to alleviate the influence of outliers. The proposed algorithm solves ellipse parameters by
linearly combining a subset of (“more accurate”) data points (formed from edge points) rather than all
data points (which contain possible outliers). In addition, considering that squaring the fitting residuals can magnify the contributions of these extreme data points, our algorithm replaces it with the absolute
residuals to reduce this influence. Moreover, the norm of data point errors is bounded, and the worst case
performance optimization is formed to be robust against data point errors. The resulting mixed l1-l2 optimization problem is further derived as a second-order cone programming one and solved by the
computationally efficient interior-point methods. Note that the fitting approach developed in this paper
specifically deals with the overdetermined system, whereas the current sparse representation theory is
only applied to underdetermined systems. Therefore, the proposed algorithm can be looked upon as an extended application and development of the sparse representation theory. Some simulated and
experimental examples are presented to illustrate the effectiveness of the proposed ellipse fitting
approach.
ETPL
DIP-190 Learning Dynamic Hybrid Markov Random Field for Image Labeling
Abstract: Using shape information has gained increasing concerns in the task of image labeling. In this
paper, we present a dynamic hybrid Markov random field (DHMRF), which explicitly captures middle-
level object shape and low-level visual appearance (e.g., texture and color) for image labeling. Each node in DHMRF is described by either a deformable template or an appearance model as visual prototype. On
the other hand, the edges encode two types of intersections: co-occurrence and spatial layered context,
with respect to the labels and prototypes of connected nodes. To learn the DHMRF model, an iterative algorithm is designed to automatically select the most informative features and estimate model
parameters. The algorithm achieves high computational efficiency since a branch-and-bound schema is
introduced to estimate model parameters. Compared with previous methods, which usually employ implicit shape cues, our DHMRF model seamlessly integrates color, texture, and shape cues to inference
labeling output, and thus produces more accurate and reliable results. Extensive experiments validate its
superiority over other state-of-the-art methods in terms of recognition accuracy and implementation
efficiency on: the MSRC 21-class dataset, and the lotus hill institute 15-class dataset
ETPL
DIP-191
Coupled Variational Image Decomposition and Restoration Model for Blurred
Cartoon-Plus-Texture Images With Missing Pixels
Abstract: In this paper, we develop a decomposition model to restore blurred images with missing pixels.
Our assumption is that the underlying image is the superposition of cartoon and texture components. We use the total variation norm and its dual norm to regularize the cartoon and texture, respectively. We
recommend an efficient numerical algorithm based on the splitting versions of augmented Lagrangian
method to solve the problem. Theoretically, the existence of a minimizer to the energy function and the
convergence of the algorithm are guaranteed. In contrast to recently developed methods for deblurring images, the proposed algorithm not only gives the restored image, but also gives a decomposition of
cartoon and texture parts. These two parts can be further used in segmentation and inpainting problems.
Numerical comparisons between this algorithm and some state-of-the-art methods are also reported.
ETPL
DIP-192
Perceptual Quality-Regulable Video Coding System With Region-Based Rate Control
Scheme
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Abstract: In this paper, we discuss a region-based perceptual quality-regulable H.264 video encoder
system that we developed. The ability to adjust the quality of specific regions of a source video to a
predefined level of quality is an essential technique for region-based video applications. We use the structural similarity index as the quality metric for distortion-quantization modeling and develop a bit
allocation and rate control scheme for enhancing regional perceptual quality. Exploiting the relationship
between the reconstructed macroblock and the best predicted macroblock from mode decision, a novel quantization parameter prediction method is built and used to achieve the target video quality of the
processed macroblock. Experimental results show that the system model has only 0.013 quality error in
average. Moreover, the proposed region-based rate control system can encode video well under a bitrate
constraint with a 0.1% bitrate error in average. For the situation of the low bitrate constraint, the proposed system can encode video with a 0.5% bit error rate in average and enhance the quality of the target
regions
ETPL
DIP-193 Color and Depth Priors in Natural Images
Abstract: Natural scene statistics have played an increasingly important role in both our understanding of
the function and evolution of the human vision system, and in the development of modern image
processing applications. Because range (egocentric distance) is arguably the most important thing a visual
system must compute (from an evolutionary perspective), the joint statistics between image information (color and luminance) and range information are of particular interest. It seems obvious that where there
is a depth discontinuity, there must be a higher probability of a brightness or color discontinuity too. This
is true, but the more interesting case is in the other direction - because image information is much more easily computed than range information, the key conditional probabilities are those of finding a range
discontinuity given an image discontinuity. Here, the intuition is much weaker; the plethora of shadows
and textures in the natural environment imply that many image discontinuities must exist without corresponding changes in range. In this paper, we extend previous work in two ways - we use as our
starting point a very high quality data set of co-registered color and range values collected specifically for
this purpose, and we evaluate the statistics of perceptually relevant chromatic information in addition to
luminance, range, and binocular disparity information. The most fundamental finding is that the probabilities of finding range changes do in fact depend in a useful and systematic way on color and
luminance changes; larger range changes are associated with larger image changes. Second, we are able
to parametrically model the prior marginal and conditional distributions of luminance, color, range, and (computed) binocular disparity. Finally, we provide a proof of principle that this information is useful by
showing that our distribution models improve the performance of a Bayesian stereo algorithm on an
independent set of input images. To summarize- we show that there is useful information about range in
very low-level luminance and color information. To a system sensitive to this statistical information, it amounts to an additional (and only recently appreciated) depth cue, and one that is trivial to compute
from the image data. We are confident that this information is robust, in that we go to great effort and
expense to collect very high quality raw data. Finally, we demonstrate the practical utility of these findings by using them to improve the performance of a Bayesian stereo algorithm.
ETPL
DIP-194 Sparse Image Reconstruction on the Sphere: Implications of a New Sampling Theorem
Abstract: We study the impact of sampling theorems on the fidelity of sparse image reconstruction on the sphere. We discuss how a reduction in the number of samples required to represent all information
content of a band-limited signal acts to improve the fidelity of sparse image reconstruction, through both
the dimensionality and sparsity of signals. To demonstrate this result, we consider a simple inpainting
problem on the sphere and consider images sparse in the magnitude of their gradient. We develop a framework for total variation inpainting on the sphere, including fast methods to render the inpainting
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
problem computationally feasible at high resolution. Recently a new sampling theorem on the sphere was
developed, reducing the required number of samples by a factor of two for equiangular sampling
schemes. Through numerical simulations, we verify the enhanced fidelity of sparse image reconstruction due to the more efficient sampling of the sphere provided by the new sampling theorem.
ETPL
DIP-195 Log-Gabor Filters for Image-Based Vehicle Verification
Abstract: Vehicle detection based on image analysis has attracted increasing attention in recent years due
to its low cost, flexibility, and potential toward collision avoidance. In particular, vehicle verification is especially challenging on account of the heterogeneity of vehicles in color, size, pose, etc. Image-based
vehicle verification is usually addressed as a supervised classification problem. Specifically, descriptors
using Gabor filters have been reported to show good performance in this task. However, Gabor functions have a number of drawbacks relating to their frequency response. The main contribution of this paper is
the proposal and evaluation of a new descriptor based on the alternative family of log-Gabor functions for
vehicle verification, as opposed to existing Gabor filter-based descriptors. These filters are theoretically superior to Gabor filters as they can better represent the frequency properties of natural images. As a
second contribution, and in contrast to existing approaches, which transfer the standard configuration of
filters used for other applications to the vehicle classification task, an in-depth analysis of the required
filter configuration by both Gabor and log-Gabor descriptors for this particular application is performed for fair comparison. The extensive experiments conducted in this paper confirm that the proposed log-
Gabor descriptor significantly outperforms the standard Gabor filter for image-based vehicle verification.
ETPL
DIP-196 Scene Text Detection via Connected Component Clustering and Nontext Filtering
Abstract: In this paper, we present a new scene text detection algorithm based on two machine learning
classifiers: one allows us to generate candidate word regions and the other filters out nontext ones. To be
precise, we extract connected components (CCs) in images by using the maximally stable extremal region
algorithm. These extracted CCs are partitioned into clusters so that we can generate candidate regions. Unlike conventional methods relying on heuristic rules in clustering, we train an AdaBoost classifier that
determines the adjacency relationship and cluster CCs by using their pairwise relations. Then we
normalize candidate word regions and determine whether each region contains text or not. Since the scale, skew, and color of each candidate can be estimated from CCs, we develop a text/nontext classifier
for normalized images. This classifier is based on multilayer perceptrons and we can control recall and
precision rates with a single free parameter. Finally, we extend our approach to exploit multichannel information. Experimental results on ICDAR 2005 and 2011 robust reading competition datasets show
that our method yields the state-of-the-art performance both in speed and accuracy.
ETPL
DIP-197 A Robust Method for Rotation Estimation Using Spherical Harmonics Representation
Abstract: This paper presents a robust method for 3D object rotation estimation using spherical harmonics
representation and the unit quaternion vector. The proposed method provides a closed-form solution for
rotation estimation without recurrence relations or searching for point correspondences between two objects. The rotation estimation problem is casted as a minimization problem, which finds the optimum
rotation angles between two objects of interest in the frequency domain. The optimum rotation angles are
obtained by calculating the unit quaternion vector from a symmetric matrix, which is constructed from
the two sets of spherical harmonics coefficients using eigendecomposition technique. Our experimental results on hundreds of 3D objects show that our proposed method is very accurate in rotation estimation,
robust to noisy data, missing surface points, and can handle intra-class variability between 3D objects.
ETPL Synthetic Aperture Radar Autofocus via Semidefinite Relaxation
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
DIP-198
Abstract: The autofocus problem in synthetic aperture radar imaging amounts to estimating unknown phase errors caused by unknown platform or target motion. At the heart of three state-of-the-art autofocus
algorithms, namely, phase gradient autofocus, multichannel autofocus (MCA), and Fourier-domain
multichannel autofocus (FMCA), is the solution of a constant modulus quadratic program (CMQP).
Currently, these algorithms solve a CMQP by using an eigenvalue relaxation approach. We propose an alternative relaxation approach based on semidefinite programming, which has recently attracted
considerable attention in other signal processing problems. Experimental results show that our proposed
methods provide promising performance improvements for MCA and FMCA through an increase in computational complexity.
ETPL
DIP-199
Regional Spatially Adaptive Total Variation Super-Resolution With Spatial
Information Filtering and Clustering
Abstract: Total variation is used as a popular and effective image prior model in the regularization-based
image processing fields. However, as the total variation model favors a piecewise constant solution, the
processing result under high noise intensity in the flat regions of the image is often poor, and some pseudoedges are produced. In this paper, we develop a regional spatially adaptive total variation model.
Initially, the spatial information is extracted based on each pixel, and then two filtering processes are
added to suppress the effect of pseudoedges. In addition, the spatial information weight is constructed and classified with k-means clustering, and the regularization strength in each region is controlled by the
clustering center value. The experimental results, on both simulated and real datasets, show that the
proposed approach can effectively reduce the pseudoedges of the total variation regularization in the flat
regions, and maintain the partial smoothness of the high-resolution image. More importantly, compared with the traditional pixel-based spatial information adaptive approach, the proposed region-based spatial
information adaptive total variation model can better avoid the effect of noise on the spatial information
extraction, and maintains robustness with changes in the noise intensity in the super-resolution process.
ETPL
DIP-200
Detecting, Grouping, and Structure Inference for Invariant Repetitive Patterns in
Images
Abstract: The efficient and robust extraction of invariant patterns from an image is a long-standing
problem in computer vision. Invariant structures are often related to repetitive or near-repetitive patterns.
The perception of repetitive patterns in an image is strongly linked to the visual interpretation and composition of textures. Repetitive patterns are products of both repetitive structures as well as repetitive
reflections or color patterns. In other words, patterns that exhibit near-stationary behavior provide rich
information about objects, their shapes, and their texture in an image. In this paper, we propose a new algorithm for repetitive pattern detection and grouping. The algorithm follows the classical region
growing image segmentation scheme. It utilizes a mean-shift-like dynamic to group local image patches
into clusters. It exploits a continuous joint alignment to: 1) match similar patches, and 2) refine the subspace grouping. We also propose an algorithm for inferring the composition structure of the repetitive
patterns. The inference algorithm constructs a data-driven structural completion field, which merges the
detected repetitive patterns into specific global geometric structures. The result of higher level grouping
for image patterns can be used to infer the geometry of objects and estimate the general layout of a crowded scene.
ETPL
DIP-201 Compressive Framework for Demosaicing of Natural Images
Abstract: Typical consumer digital cameras sense only one out of three color components per image pixel. The problem of demosaicing deals with interpolating those missing color components. In this
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
paper, we present compressive demosaicing (CD), a framework for demosaicing natural images based on
the theory of compressed sensing (CS). Given sensed samples of an image, CD employs a CS solver to
find the sparse representation of that image under a fixed sparsifying dictionary Ψ. As opposed to state of the art CS-based demosaicing approaches, we consider a clear distinction between the interchannel
(color) and interpixel correlations of natural images. Utilizing some well-known facts about the human
visual system, those two types of correlations are utilized in a nonseparable format to construct the sparsifying transform Ψ. Our simulation results verify that CD performs better (both visually and in terms
of PSNR) than leading demosaicing approaches when applied to the majority of standard test images.
ETPL
DIP-202
Locally Optimal Detection of Image Watermarks in the Wavelet Domain Using Bessel
K Form Distribution
Abstract: A uniformly most powerful watermark detector, which applies the Bessel K form (BKF) probability density function to model the noise distribution was proposed by Bian and Liang. In this
paper, we derive a locally optimum (LO) detector using the same noise model. Since the literature lacks
thorough discussion on the performance of the BKF-LO nonlinearities, the performance of the proposed detector is discussed in detail. First, we prove that the test statistic of the proposed detector is
asymptotically Gaussian and evaluate the actual performance of the proposed detector using the receiver
operating characteristic (ROC). Then, the large sample performance of the proposed detector is evaluated
using asymptotic relative efficiency (ARE) and “maximum ARE.” The experimental results show that the proposed detector has a good performance with or without attacks in terms of its ROC curves, particularly
when the watermark is weak. Therefore, the proposed method is suitable for wavelet domain watermark
detection, particularly when the watermark is weak.
ETPL
DIP-203
Estimating the Granularity Coefficient of a Potts-Markov Random Field Within a
Markov Chain Monte Carlo Algorithm
Abstract: This paper addresses the problem of estimating the Potts parameter β jointly with the unknown
parameters of a Bayesian model within a Markov chain Monte Carlo (MCMC) algorithm. Standard
MCMC methods cannot be applied to this problem because performing inference on β requires computing the intractable normalizing constant of the Potts model. In the proposed MCMC method, the
estimation of β is conducted using a likelihood-free Metropolis-Hastings algorithm. Experimental results
obtained for synthetic data show that estimating β jointly with the other unknown parameters leads to estimation results that are as good as those obtained with the actual value of β. On the other hand,
choosing an incorrect value of β can degrade estimation performance significantly. To illustrate the
interest of this method, the proposed algorithm is successfully applied to real bidimensional SAR and tridimensional ultrasound images
ETPL
DIP-204 Atmospheric Turbulence Mitigation Using Complex Wavelet-Based Fusion
Abstract: Restoring a scene distorted by atmospheric turbulence is a challenging problem in video
surveillance. The effect, caused by random, spatially varying, perturbations, makes a model-based solution difficult and in most cases, impractical. In this paper, we propose a novel method for mitigating
the effects of atmospheric distortion on observed images, particularly airborne turbulence which can
severely degrade a region of interest (ROI). In order to extract accurate detail about objects behind the distorting layer, a simple and efficient frame selection method is proposed to select informative ROIs
only from good-quality frames. The ROIs in each frame are then registered to further reduce offsets and
distortions. We solve the space-varying distortion problem using region-level fusion based on the dual
tree complex wavelet transform. Finally, contrast enhancement is applied. We further propose a learning-based metric specifically for image quality assessment in the presence of atmospheric distortion. This is
capable of estimating quality in both full- and no-reference scenarios. The proposed method is shown to
significantly outperform existing methods, providing enhanced situational awareness in a range of
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
surveillance scenarios.
ETPL
DIP-205 Rotation Invariant Local Frequency Descriptors for Texture Classification
Abstract: This paper presents a novel rotation invariant method for texture classification based on local
frequency components. The local frequency components are computed by applying 1-D Fourier transform
on a neighboring function defined on a circle of radius R at each pixel. We observed that the low frequency components are the major constituents of the circular functions and can effectively represent
textures. Three sets of features are extracted from the low frequency components, two based on the phase
and one based on the magnitude. The proposed features are invariant to rotation and linear changes of
illumination. Moreover, by using low frequency components, the proposed features are very robust to noise. While the proposed method uses a relatively small number of features, it outperforms state-of-the-
art methods in three well-known datasets: Brodatz, Outex, and CUReT. In addition, the proposed method
is very robust to noise and can remarkably improve the classification accuracy especially in the presence of high levels of noise.
ETPL
DIP-206 Scanned Document Compression Using Block-Based Hybrid Video Codec
Abstract: This paper proposes a hybrid pattern matching/transform-based compression method for scanned documents. The idea is to use regular video interframe prediction as a pattern matching
algorithm that can be applied to document coding. We show that this interpretation may generate residual
data that can be efficiently compressed by a transform-based encoder. The efficiency of this approach is
demonstrated using H.264/advanced video coding (AVC) as a high-quality single and multipage document compressor. The proposed method, called advanced document coding (ADC), uses segments of
the originally independent scanned pages of a document to create a video sequence, which is then
encoded through regular H.264/AVC. The encoding performance is unrivaled. Results show that ADC outperforms AVC-I (H.264/AVC operating in pure intramode) and JPEG2000 by up to 2.7 and 6.2 dB,
respectively. Superior subjective quality is also achieved.
ETPL
DIP-207 Space-Time Hole Filling With Random Walks in View Extrapolation for 3D Video
Abstract: In this paper, a space-time hole filling approach is presented to deal with a disocclusion when a view is synthesized for the 3D video. The problem becomes even more complicated when the view is
extrapolated from a single view, since the hole is large and has no stereo depth cues. Although many
techniques have been developed to address this problem, most of them focus only on view interpolation. We propose a space-time joint filling method for color and depth videos in view extrapolation. For proper
texture and depth to be sampled in the following hole filling process, the background of a scene is
automatically segmented by the random walker segmentation in conjunction with the hole formation process. Then, the patch candidate selection process is formulated as a labeling problem, which can be
solved with random walks. The patch candidates that best describe the hole region are dynamically
selected in the space-time domain, and the hole is filled with the optimal patch for ensuring both spatial
and temporal coherence. The experimental results show that the proposed method is superior to state-of-the-art methods and provides both spatially and temporally consistent results with significantly reduced
flicker artifacts.
ETPL
DIP-208 Rate Control for Consistent Objective Quality in High Efficiency Video Coding
Abstract: Since video quality fluctuation degrades the visual perception significantly in multimedia
communication systems, it is important to maintain a consistent objective quality over the entire video
sequence. We propose a rate control algorithm to keep the consistent objective quality in high efficiency
video coding (HEVC), which is an upcoming standard video codec. In the proposed algorithm, the
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
probability density function of transformed coefficients is modeled based on a Laplacian function that
considers the quadtree coding unit structure, which is one of the characteristics of HEVC. In controlling
the video quality, distortion-quantization and rate-quantization models are derived by using the Laplacian function. Based on those models, a quantization parameter is determined to control the quality of the
encoded frames where the fluctuation of video quality is minimized and the overflow and underflow of
buffer are prevented. From the simulation results, it is shown that the proposed rate control algorithm outperforms the other conventional schemes.
ETPL
DIP-209
Discrete Wavelet Transform and Data Expansion Reduction in Homomorphic
Encrypted Domain
Abstract: Signal processing in the encrypted domain is a new technology with the goal of protecting
valuable signals from insecure signal processing. In this paper, we propose a method for implementing discrete wavelet transform (DWT) and multiresolution analysis (MRA) in homomorphic encrypted
domain. We first suggest a framework for performing DWT and inverse DWT (IDWT) in the encrypted
domain, then conduct an analysis of data expansion and quantization errors under the framework. To solve the problem of data expansion, which may be very important in practical applications, we present a
method for reducing data expansion in the case that both DWT and IDWT are performed. With the
proposed method, multilevel DWT/IDWT can be performed with less data expansion in homomorphic
encrypted domain. We propose a new signal processing procedure, where the multiplicative inverse method is employed as the last step to limit the data expansion. Taking a 2-D Haar wavelet transform as
an example, we conduct a few experiments to demonstrate the advantages of our method in secure image
processing. We also provide computational complexity analyses and comparisons. To the best of our knowledge, there has been no report on the implementation of DWT and MRA in the encrypted domain.
ETPL
DIP-210 QoE-Based Multi-Exposure Fusion in Hierarchical Multivariate Gaussian CRF
Abstract: Many state-of-the-art fusion methods, combining details in images taken under different
exposures into one well-exposed image, can be found in the literature. However, insufficient study has been conducted to explore how perceptual factors can provide viewers better quality of experience on
fused images. We propose two perceptual quality measures: perceived local contrast and color saturation,
which are embedded in our novel hierarchical multivariate Gaussian conditional random field model, to illustrate improved performance for multi-exposure fusion. We show that our method generates images
with better quality than existing methods for a variety of scenes.
ETPL
DIP-211 Action Recognition From Video Using Feature Covariance Matrices
Abstract: We propose a general framework for fast and accurate recognition of actions in video using
empirical covariance matrices of features. A dense set of spatio-temporal feature vectors are computed
from video to provide a localized description of the action, and subsequently aggregated in an empirical
covariance matrix to compactly represent the action. Two supervised learning methods for action recognition are developed using feature covariance matrices. Common to both methods is the
transformation of the classification problem in the closed convex cone of covariance matrices into an
equivalent problem in the vector space of symmetric matrices via the matrix logarithm. The first method applies nearest-neighbor classification using a suitable Riemannian metric for covariance matrices. The
second method approximates the logarithm of a query covariance matrix by a sparse linear combination
of the logarithms of training covariance matrices. The action label is then determined from the sparse
coefficients. Both methods achieve state-of-the-art classification performance on several datasets, and are robust to action variability, viewpoint changes, and low object resolution. The proposed framework is
conceptually simple and has low storage and computational requirements making it attractive for real-
time implementation.
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
ETPL
DIP-212 2-D Wavelet Packet Spectrum for Texture Analysis
Abstract: This brief derives a 2-D spectrum estimator from some recent results on the statistical properties
of wavelet packet coefficients of random processes. It provides an analysis of the bias of this estimator with respect to the wavelet order. This brief also discusses the performance of this wavelet-based
estimator, in comparison with the conventional 2-D Fourier-based spectrum estimator on texture analysis
and content-based image retrieval. It highlights the effectiveness of the wavelet-based spectrum estimation.
ETPL
DIP-213
UND: Unite-and-Divide Method in Fourier and Radon Domains for Line Segment
Detection
Abstract: In this paper, we extend our previously proposed line detection method to line segmentation
using a so-called unite-and-divide (UND) approach. The methodology includes two phases, namely the union of spectra in the frequency domain, and the division of the sinogram in Radon space. In the union
phase, given an image, its sinogram is obtained by parallel 2D multilayer Fourier transforms, Cartesian-
to-polar mapping and 1D inverse Fourier transform. In the division phase, the edges of butterfly wings in the neighborhood of every sinogram peak are firstly specified, with each neighborhood area
corresponding to a window in image space. By applying the separated sinogram of each such windowed
image, we can extract the line segments. The division Phase identifies the edges of butterfly wings in the neighborhood of every sinogram peak such that each neighborhood area corresponds to a window in
image space. Line segments are extracted by applying the separated sinogram of each windowed image.
Our experiments are conducted on benchmark images and the results reveal that the UND method yields
higher accuracy, has lower computational cost and is more robust to noise, compared to existing state-of-the-art methods.
ETPL
DIP-214
Stable Orthogonal Local Discriminant Embedding for Linear Dimensionality
Reduction
Abstract: Manifold learning is widely used in machine learning and pattern recognition. However, manifold learning only considers the similarity of samples belonging to the same class and ignores the
within-class variation of data, which will impair the generalization and stableness of the algorithms. For
this purpose, we construct an adjacency graph to model the intraclass variation that characterizes the most
important properties, such as diversity of patterns, and then incorporate the diversity into the discriminant objective function for linear dimensionality reduction. Finally, we introduce the orthogonal constraint for
the basis vectors and propose an orthogonal algorithm called stable orthogonal local discriminate
embedding. Experimental results on several standard image databases demonstrate the effectiveness of the proposed dimensionality reduction approach
ETPL
DIP-215 Motion-Aware Gradient Domain Video Composition
Abstract: For images, gradient domain composition methods like Poisson blending offer practical solutions for uncertain object boundaries and differences in illumination conditions. However, adapting
Poisson image blending to video presents new challenges due to the added temporal dimension. In video,
the human eye is sensitive to small changes in blending boundaries across frames and slight differences in
motions of the source patch and target video. We present a novel video blending approach that tackles these problems by merging the gradient of source and target videos and optimizing a consistent blending
boundary based on a user-provided blending trimap for the source video. Our approach extends mean-
value coordinates interpolation to support hybrid blending with a dynamic boundary while maintaining interactive performance. We also provide a user interface and source object positioning method that can
efficiently deal with complex video sequences beyond the capabilities of alpha blending.
ETPL Structural Texture Similarity Metrics for Image Analysis and Retrieval
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
DIP-216
Abstract: We develop new metrics for texture similarity that accounts for human visual perception and the stochastic nature of textures. The metrics rely entirely on local image statistics and allow substantial
point-by-point deviations between textures that according to human judgment are essentially identical.
The proposed metrics extend the ideas of structural similarity and are guided by research in texture
analysis-synthesis. They are implemented using a steerable filter decomposition and incorporate a concise set of subband statistics, computed globally or in sliding windows. We conduct systematic tests to
investigate metric performance in the context of “known-item search,” the retrieval of textures that are
“identical” to the query texture. This eliminates the need for cumbersome subjective tests, thus enabling comparisons with human performance on a large database. Our experimental results indicate that the
proposed metrics outperform peak signal-to-noise ratio (PSNR), structural similarity metric (SSIM) and
its variations, as well as state-of-the-art texture classification metrics, using standard statistical measures.
ETPL
DIP-217 Simultaneous Facial Feature Tracking and Facial Expression Recognition
Abstract: The tracking and recognition of facial activities from images or videos have attracted great
attention in computer vision field. Facial activities are characterized by three levels. First, in the bottom
level, facial feature points around each facial component, i.e., eyebrow, mouth, etc., capture the detailed face shape information. Second, in the middle level, facial action units, defined in the facial action coding
system, represent the contraction of a specific set of facial muscles, i.e., lid tightener, eyebrow raiser, etc.
Finally, in the top level, six prototypical facial expressions represent the global facial muscle movement and are commonly used to describe the human emotion states. In contrast to the mainstream approaches,
which usually only focus on one or two levels of facial activities, and track (or recognize) them
separately, this paper introduces a unified probabilistic framework based on the dynamic Bayesian
network to simultaneously and coherently represent the facial evolvement in different levels, their interactions and their observations. Advanced machine learning methods are introduced to learn the
model based on both training data and subjective prior knowledge. Given the model and the
measurements of facial motions, all three levels of facial activities are simultaneously recognized through a probabilistic inference. Extensive experiments are performed to illustrate the feasibility and
effectiveness of the proposed model on all three level facial activities.
ETPL
DIP-218
A Generalized Random Walk With Restart and its Application in Depth Up-Sampling
and Interactive Segmentation
Abstract: In this paper, the origin of random walk with restart (RWR) and its generalization are described. It is well known that the random walk (RW) and the anisotropic diffusion models share the same energy
functional, i.e., the former provides a steady-state solution and the latter gives a flow solution. In contrast,
the theoretical background of the RWR scheme is different from that of the diffusion-reaction equation, although the restarting term of the RWR plays a role similar to the reaction term of the diffusion-reaction
equation. The behaviors of the two approaches with respect to outliers reveal that they possess different
attributes in terms of data propagation. This observation leads to the derivation of a new energy functional, where both volumetric heat capacity and thermal conductivity are considered together, and
provides a common framework that unifies both the RW and the RWR approaches, in addition to other
regularization methods. The proposed framework allows the RWR to be generalized (GRWR) in
semilocal and nonlocal forms. The experimental results demonstrate the superiority of GRWR over existing regularization approaches in terms of depth map up-sampling and interactive image
segmentation.
ETPL
DIP-219 Variational Optical Flow Estimation Based on Stick Tensor Voting
Abstract: Variational optical flow techniques allow the estimation of flow fields from spatio-temporal
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
derivatives. They are based on minimizing a functional that contains a data term and a regularization
term. Recently, numerous approaches have been presented for improving the accuracy of the estimated
flow fields. Among them, tensor voting has been shown to be particularly effective in the preservation of flow discontinuities. This paper presents an adaptation of the data term by using anisotropic stick tensor
voting in order to gain robustness against noise and outliers with significantly lower computational cost
than (full) tensor voting. In addition, an anisotropic complementary smoothness term depending on directional information estimated through stick tensor voting is utilized in order to preserve discontinuity
capabilities of the estimated flow fields. Finally, a weighted non-local term that depends on both the
estimated directional information and the occlusion state of pixels is integrated during the optimization
process in order to denoise the final flow field. The proposed approach yields state-of-the-art results on the Middlebury benchmark.
ETPL
DIP-220 Exploring Visual and Motion Saliency for Automatic Video Object Extraction
Abstract: This paper presents a saliency-based video object extraction (VOE) framework. The proposed framework aims to automatically extract foreground objects of interest without any user interaction or the
use of any training data (i.e., not limited to any particular type of object). To separate foreground and
background regions within and across video frames, the proposed method utilizes visual and motion
saliency information extracted from the input video. A conditional random field is applied to effectively combine the saliency induced features, which allows us to deal with unknown pose and scale variations of
the foreground object (and its articulated parts). Based on the ability to preserve both spatial continuity
and temporal consistency in the proposed VOE framework, experiments on a variety of videos verify that our method is able to produce quantitatively and qualitatively satisfactory VOE results.
ETPL
DIP-221 Enhanced Compressed Sensing Recovery With Level Set Normals
Abstract: We propose a compressive sensing algorithm that exploits geometric properties of images to
recover images of high quality from few measurements. The image reconstruction is done by iterating the two following steps: 1) estimation of normal vectors of the image level curves, and 2) reconstruction of
an image fitting the normal vectors, the compressed sensing measurements, and the sparsity constraint.
The proposed technique can naturally extend to nonlocal operators and graphs to exploit the repetitive nature of textured images to recover fine detail structures. In both cases, the problem is reduced to a
series of convex minimization problems that can be efficiently solved with a combination of variable
splitting and augmented Lagrangian methods, leading to fast and easy-to-code algorithms. Extended experiments show a clear improvement over related state-of-the-art algorithms in the quality of the
reconstructed images and the robustness of the proposed method to noise, different kind of images, and
reduced measurements.
ETPL
DIP-222 Colorization-Based Compression Using Optimization
Abstract: In this paper, we formulate the colorization-based coding problem into an optimization
problem, i.e., an L1 minimization problem. In colorization-based coding, the encoder chooses a few
representative pixels (RP) for which the chrominance values and the positions are sent to the decoder, whereas in the decoder, the chrominance values for all the pixels are reconstructed by colorization
methods. The main issue in colorization-based coding is how to extract the RP well therefore the
compression rate and the quality of the reconstructed color image becomes good. By formulating the
colorization-based coding into an L1 minimization problem, it is guaranteed that, given the colorization matrix, the chosen set of RP becomes the optimal set in the sense that it minimizes the error between the
original and the reconstructed color image. In other words, for a fixed error value and a given colorization
matrix, the chosen set of RP is the smallest set possible. We also propose a method to construct the
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
colorization matrix that colorizes the image in a multiscale manner. This, combined with the proposed RP
extraction method, allows us to choose a very small set of RP. It is shown experimentally that the
proposed method outperforms conventional colorization-based coding methods as well as the JPEG standard and is comparable with the JPEG2000 compression standard, both in terms of the compression
rate and the quality of the reconstructed color image.
ETPL
DIP-223
Orientation Imaging Microscopy With Optimized Convergence Angle Using CBED
Patterns in TEMs
Abstract: Grain size statistics, texture, and grain boundary distribution are microstructural characteristics that greatly influence materials properties. These characteristics can be derived from an orientation map
obtained using orientation imaging microscopy (OIM) techniques. The OIM techniques are generally
performed using a transmission electron microscopy (TEM) for nanomaterials. Although some of these techniques have limited applicability in certain situations, others have limited availability because of
external hardware required. In this paper, an automated method to generate orientation maps using
convergence beam electron diffraction patterns obtained in a conventional TEM setup is presented. This method is based upon dynamical diffraction theory that describes electron diffraction more accurately as
compared with kinematical theory used by several existing OIM techniques. In addition, the method of
this paper uses wide angle convergent beam electron diffraction for performing OIM. It is shown in this
paper that the use of the wide angle convergent electron beam provides additional information that is not available otherwise. Together, the presented method exploits the additional information and combines it
with the calculations from the dynamical theory to provide accurate orientation maps in a conventional
TEM setup. The automated method of this paper is applied to a platinum thin film sample. The presented method correctly identified the texture preference in the sample.
ETPL
DIP-224
Grassmannian Regularized Structured Multi-View Embedding for Image
Classification
Abstract: Images are usually represented by features from multiple views, e.g., color and texture. In
image classification, the goal is to fuse all the multi-view features in a reasonable manner and achieve satisfactory classification performance. However, the features are often different in nature and it is
nontrivial to fuse them. Particularly, some extracted features are redundant or noisy and are consequently
not discriminative for classification. To alleviate these problems in an image classification context, we propose in this paper a novel multi-view embedding framework, termed as Grassmannian regularized
structured multi-view embedding, or GrassReg for short. GrassReg transfers the graph Laplacian obtained
from each view to a point on the Grassmann manifold and penalizes the disagreement between different views according to Grassmannian distance. Therefore, a view that is consistent with others is more
important than a view that disagrees with others for learning a unified subspace for multi-view data
representation. In addition, we impose the group sparsity penalty onto the low-dimensional embeddings
obtained hence they can better explore the group structure of the intrinsic data distribution. Empirically, we compare GrassReg with representative multi-view algorithms and show the effectiveness of GrassReg
on a number of multi-view image data sets.
ETPL
DIP-225
Efficient Minimum Error Bounded Particle Resampling L1 Tracker With Occlusion
Detection
Abstract: Recently, sparse representation has been applied to visual tracking to find the target with the
minimum reconstruction error from a target template subspace. Though effective, these L1 trackers
require high computational costs due to numerous calculations for l1 minimization. In addition, the
inherent occlusion insensitivity of the l1 minimization has not been fully characterized. In this paper, we propose an efficient L1 tracker, named bounded particle resampling (BPR)-L1 tracker, with a minimum
error bound and occlusion detection. First, the minimum error bound is calculated from a linear least
squares equation and serves as a guide for particle resampling in a particle filter (PF) framework. Most of
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
the insignificant samples are removed before solving the computationally expensive l1 minimization in a
two-step testing. The first step, named τ testing, compares the sample observation likelihood to an
ordered set of thresholds to remove insignificant samples without loss of resampling precision. The second step, named max testing, identifies the largest sample probability relative to the target to further
remove insignificant samples without altering the tracking result of the current frame. Though sacrificing
minimal precision during resampling, max testing achieves significant speed up on top of τ testing. The BPR-L1 technique can also be beneficial to other trackers that have minimum error bounds in a PF
framework, especially for trackers based on sparse representations. After the error-bound calculation,
BPR-L1 performs occlusion detection by investigating the trivial coefficients in the l1 minimization.
These coefficients, by design, contain rich information about image corruptions, including occlusion. Detected occlusions are then used to enhance the template updating. For evaluation, we conduct
experiments on three video applications: biometrics (head movement, hand hold- ng object, singers on
stage), pedestrians (urban travel, hallway monitoring), and cars in traffic (wide area motion imagery, ground-mounted perspectives). The proposed BPR-L1 method demonstrates an excellent performance as
compared with nine state-of-the-art trackers on eleven challenging benchmark sequences.
ETPL
DIP-226 Multiview Hessian Regularization for Image Annotation
Abstract: The rapid development of computer hardware and Internet technology makes large scale data dependent models computationally tractable, and opens a bright avenue for annotating images through
innovative machine learning algorithms. Semisupervised learning (SSL) therefore received intensive
attention in recent years and was successfully deployed in image annotation. One representative work in SSL is Laplacian regularization (LR), which smoothes the conditional distribution for classification along
the manifold encoded in the graph Laplacian, however, it is observed that LR biases the classification
function toward a constant function that possibly results in poor generalization. In addition, LR is developed to handle uniformly distributed data (or single-view data), although instances or objects, such
as images and videos, are usually represented by multiview features, such as color, shape, and texture. In
this paper, we present multiview Hessian regularization (mHR) to address the above two problems in LR-
based image annotation. In particular, mHR optimally combines multiple HR, each of which is obtained from a particular view of instances, and steers the classification function that varies linearly along the
data manifold. We apply mHR to kernel least squares and support vector machines as two examples for
image annotation. Extensive experiments on the PASCAL VOC'07 dataset validate the effectiveness of mHR by comparing it with baseline algorithms, including LR and HR.
ETPL
DIP-227
GPU Accelerated Edge-Region Based Level Set Evolution Constrained by 2D Gray-
Scale Histogram
Abstract: Due to its intrinsic nature which allows to easily handle complex shapes and topological
changes, the level set method (LSM) has been widely used in image segmentation. Nevertheless, LSM is computationally expensive, which limits its applications in real-time systems. For this purpose, we
propose a new level set algorithm, which uses simultaneously edge, region, and 2D histogram
information in order to efficiently segment objects of interest in a given scene. The computational complexity of the proposed LSM is greatly reduced by using the highly parallelizable lattice Boltzmann
method (LBM) with a body force to solve the level set equation (LSE). The body force is the link with
image data and is defined from the proposed LSE. The proposed LSM is then implemented using an NVIDIA graphics processing units to fully take advantage of the LBM local nature. The new algorithm is
effective, robust against noise, independent to the initial contour, fast, and highly parallelizable. The edge
and region information enable to detect objects with and without edges, and the 2D histogram
information enable the effectiveness of the method in a noisy environment. Experimental results on synthetic and real images demonstrate subjectively and objectively the performance of the proposed
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
method.
ETPL
DIP-228 Sparse Stochastic Processes and Discretization of Linear Inverse Problems
Abstract: We present a novel statistically-based discretization paradigm and derive a class of maximum a
posteriori (MAP) estimators for solving ill-conditioned linear inverse problems. We are guided by the
theory of sparse stochastic processes, which specifies continuous-domain signals as solutions of linear stochastic differential equations. Accordingly, we show that the class of admissible priors for the
discretized version of the signal is confined to the family of infinitely divisible distributions. Our
estimators not only cover the well-studied methods of Tikhonov and l1-type regularizations as particular
cases, but also open the door to a broader class of sparsity-promoting regularization schemes that are typically nonconvex. We provide an algorithm that handles the corresponding nonconvex problems and
illustrate the use of our formalism by applying it to deconvolution, magnetic resonance imaging, and X-
ray tomographic reconstruction problems. Finally, we compare the performance of estimators associated with models of increasing sparsity.
ETPL
DIP-229
Sparse/DCT (S/DCT) Two-Layered Representation of Prediction Residuals for Video
Coding
Abstract: In this paper, we propose a cascaded sparse/DCT (S/DCT) two-layer representation of prediction residuals, and implement this idea on top of the state-of-the-art high efficiency video coding
(HEVC) standard. First, a dictionary is adaptively trained to contain featured patterns of residual signals
so that a high portion of energy in a structured residual can be efficiently coded via sparse coding. It is
observed that the sparse representation alone is less effective in the R-D performance due to the side information overhead at higher bit rates. To overcome this problem, the DCT representation is cascaded
at the second stage. It is applied to the remaining signal to improve coding efficiency. The two
representations successfully complement each other. It is demonstrated by experimental results that the proposed algorithm outperforms the HEVC reference codec HM5.0 in the Common Test Condition.
ETPL
DIP-230 Representing and Retrieving Video Shots in Human-Centric Brain Imaging Space
Abstract: Meaningful representation and effective retrieval of video shots in a large-scale database has
been a profound challenge for the image/video processing and computer vision communities. A great deal of effort has been devoted to the extraction of low-level visual features, such as color, shape, texture, and
motion for characterizing and retrieving video shots. However, the accuracy of these feature descriptors is
still far from satisfaction due to the well-known semantic gap. In order to alleviate the problem, this paper investigates a novel methodology of representing and retrieving video shots using human-centric high-
level features derived in brain imaging space (BIS) where brain responses to natural stimulus of video
watching can be explored and interpreted. At first, our recently developed dense individualized and common connectivity-based cortical landmarks (DICCCOL) system is employed to locate large-scale
functional brain networks and their regions of interests (ROIs) that are involved in the comprehension of
video stimulus. Then, functional connectivities between various functional ROI pairs are utilized as BIS
features to characterize the brain's comprehension of video semantics. Then an effective feature selection procedure is applied to learn the most relevant features while removing redundancy, which results in the
formation of the final BIS features. Afterwards, a mapping from low-level visual features to high-level
semantic features in the BIS is built via the Gaussian process regression (GPR) algorithm, and a manifold structure is then inferred, in which video key frames are represented by the mapped feature vectors in the
BIS. Finally, the manifold-ranking algorithm concerning the relationship among all data is applied to
measure the similarity between key frames of video shots. Experimental results on the TRECVID 2005
dataset demonstrate the superiority of the proposed work in comparison with traditional methods.
ETPL Multivariate Slow Feature Analysis and Decorrelation Filtering for Blind Source
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
DIP-231 Separation
Abstract: We generalize the method of Slow Feature Analysis (SFA) for vector-valued functions of several variables and apply it to the problem of blind source separation, in particular to image separation.
It is generally necessary to use multivariate SFA instead of univariate SFA for separating multi-
dimensional signals. For the linear case, an exact mathematical analysis is given, which shows in
particular that the sources are perfectly separated by SFA if and only if they and their first-order derivatives are uncorrelated. When the sources are correlated, we apply the following technique called
Decorrelation Filtering: use a linear filter to decorrelate the sources and their derivatives in the given
mixture, then apply the unmixing matrix obtained on the filtered mixtures to the original mixtures. If the filtered sources are perfectly separated by this matrix, so are the original sources. A decorrelation filter
can be numerically obtained by solving a nonlinear optimization problem. This technique can also be
applied to other linear separation methods, whose output signals are decorrelated, such as ICA. When there are more mixtures than sources, one can determine the actual number of sources by using a
regularized version of SFA with decorrelation filtering. Extensive numerical experiments using SFA and
ICA with decorrelation filtering, supported by mathematical analysis, demonstrate the potential of our
methods for solving problems involving blind source separation.
ETPL
DIP-232
Parameter Estimation for Blind and Non-Blind Deblurring Using Residual Whiteness
Measures
Abstract: Image deblurring (ID) is an ill-posed problem typically addressed by using regularization, or
prior knowledge, on the unknown image (and also on the blur operator, in the blind case). ID is often formulated as an optimization problem, where the objective function includes a data term encouraging the
estimated image (and blur, in blind ID) to explain the observed data well (typically, the squared norm of a
residual) plus a regularizer that penalizes solutions deemed undesirable. The performance of this
approach depends critically (among other things) on the relative weight of the regularizer (the regularization parameter) and on the number of iterations of the algorithm used to address the
optimization problem. In this paper, we propose new criteria for adjusting the regularization parameter
and/or the number of iterations of ID algorithms. The rationale is that if the recovered image (and blur, in blind ID) is well estimated, the residual image is spectrally white; contrarily, a poorly deblurred image
typically exhibits structured artifacts (e.g., ringing, oversmoothness), yielding residuals that are not
spectrally white. The proposed criterion is particularly well suited to a recent blind ID algorithm that uses continuation, i.e., slowly decreases the regularization parameter along the iterations; in this case,
choosing this parameter and deciding when to stop are one and the same thing. Our experiments show
that the proposed whiteness-based criteria yield improvements in SNR, on average, only 0.15 dB below
those obtained by (clairvoyantly) stopping the algorithm at the best SNR. We also illustrate the proposed criteria on non-blind ID, reporting results that are competitive with state-of-the-art criteria (such as Monte
Carlo-based GSURE and projected SURE), which, however, are not applicable for blind ID.
ETPL
DIP-233 Image Processing Using Smooth Ordering of its Patches
Abstract: We propose an image processing scheme based on reordering of its patches. For a given
corrupted image, we extract all patches with overlaps, refer to these as coordinates in high-dimensional
space, and order them such that they are chained in the “shortest possible path,” essentially solving the traveling salesman problem. The obtained ordering applied to the corrupted image implies a permutation
of the image pixels to what should be a regular signal. This enables us to obtain good recovery of the
clean image by applying relatively simple one-dimensional smoothing operations (such as filtering or interpolation) to the reordered set of pixels. We explore the use of the proposed approach to image
denoising and inpainting, and show promising results in both cases.
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
ETPL
DIP-234
Recursive Histogram Modification: Establishing Equivalency Between Reversible Data
Hiding and Lossless Data Compression
Abstract: State-of-the-art schemes for reversible data hiding (RDH) usually consist of two steps: first
construct a host sequence with a sharp histogram via prediction errors, and then embed messages by modifying the histogram with methods, such as difference expansion and histogram shift. In this paper,
we focus on the second stage, and propose a histogram modification method for RDH, which embeds the
message by recursively utilizing the decompression and compression processes of an entropy coder. We prove that, for independent identically distributed (i.i.d.) gray-scale host signals, the proposed method
asymptotically approaches the rate-distortion bound of RDH as long as perfect compression can be
realized, i.e., the entropy coder can approach entropy. Therefore, this method establishes the equivalency between reversible data hiding and lossless data compression. Experiments show that this coding method
can be used to improve the performance of previous RDH schemes and the improvements are more
significant for larger images.
ETPL
DIP-235 Optical Flow Estimation for Flame Detection in Videos
Abstract: Computational vision-based flame detection has drawn significant attention in the past decade
with camera surveillance systems becoming ubiquitous. Whereas many discriminating features, such as
color, shape, texture, etc., have been employed in the literature, this paper proposes a set of motion features based on motion estimators. The key idea consists of exploiting the difference between the
turbulent, fast, fire motion, and the structured, rigid motion of other objects. Since classical optical flow
methods do not model the characteristics of fire motion (e.g., non-smoothness of motion, non-constancy
of intensity), two optical flow methods are specifically designed for the fire detection task: optimal mass transport models fire with dynamic texture, while a data-driven optical flow scheme models saturated
flames. Then, characteristic features related to the flow magnitudes and directions are computed from the
flow fields to discriminate between fire and non-fire motion. The proposed features are tested on a large video database to demonstrate their practical usefulness. Moreover, a novel evaluation method is
proposed by fire simulations that allow for a controlled environment to analyze parameter influences,
such as flame saturation, spatial resolution, frame rate, and random noise.
ETPL
DIP-236 Image Sharpness Assessment Based on Local Phase Coherence
Abstract: Sharpness is an important determinant in visual assessment of image quality. The human visual
system is able to effortlessly detect blur and evaluate sharpness of visual images, but the underlying
mechanism is not fully understood. Existing blur/sharpness evaluation algorithms are mostly based on edge width, local gradient, or energy reduction of global/local high frequency content. Here we
understand the subject from a different perspective, where sharpness is identified as strong local phase
coherence (LPC) near distinctive image features evaluated in the complex wavelet transform domain. Previous LPC computation is restricted to be applied to complex coefficients spread in three consecutive
dyadic scales in the scale-space. Here we propose a flexible framework that allows for LPC computation
in arbitrary fractional scales. We then develop a new sharpness assessment algorithm without referencing
the original image. We use four subject-rated publicly available image databases to test the proposed algorithm, which demonstrates competitive performance when compared with state-of-the-art algorithms.
ETPL
DIP-237 Library-Based Illumination Synthesis for Critical CMOS Patterning
Abstract: In optical microlithography, the illumination source for critical complementary metal-oxide-semiconductor layers needs to be determined in the early stage of a technology node with very limited
design information, leading to simple binary shapes. Recently, the availability of freeform sources
permits us to increase pattern fidelity and relax mask complexities with minimal insertion risks to the
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
current manufacturing flow. However, source optimization across many patterns is often treated as a
design-of-experiments problem, which may not fully exploit the benefits of a freeform source. In this
paper, a rigorous source-optimization algorithm is presented via linear superposition of optimal sources for pre-selected patterns. We show that analytical solutions are made possible by using Hopkins
formulation and quadratic programming. The algorithm allows synthesized illumination to be linked with
assorted pattern libraries, which has a direct impact on design rule studies for early planning and design automation for full wafer optimization.
ETPL
DIP-238 A Variational Approach for Pan-Sharpening
Abstract: Pan-sharpening is a process of acquiring a high resolution multispectral (MS) image by
combining a low resolution MS image with a corresponding high resolution panchromatic (PAN) image. In this paper, we propose a new variational pan-sharpening method based on three basic assumptions: 1)
the gradient of PAN image could be a linear combination of those of the pan-sharpened image bands; 2)
the upsampled low resolution MS image could be a degraded form of the pan-sharpened image; and 3) the gradient in the spectrum direction of pan-sharpened image should be approximated to those of the
upsampled low resolution MS image. An energy functional, whose minimizer is related to the best pan-
sharpened result, is built based on these assumptions. We discuss the existence of minimizer of our
energy and describe the numerical procedure based on the split Bregman algorithm. To verify the effectiveness of our method, we qualitatively and quantitatively compare it with some state-of-the-art
schemes using QuickBird and IKONOS data. Particularly, we classify the existing quantitative measures
into four categories and choose two representatives in each category for more reasonable quantitative evaluation. The results demonstrate the effectiveness and stability of our method in terms of the related
evaluation benchmarks. Besides, the computation efficiency comparison with other variational methods
also shows that our method is remarkable.
ETPL
DIP-239
Reducing the Complexity of the N-FINDR Algorithm for Hyperspectral Image
Analysis
Abstract: The N-FINDR algorithm for unmixing hyperspectral data is both popular and successful. However, opportunities for improving the algorithm exist, particularly to reduce its computational
expense. Two approaches to achieve this are examined. First, the redundancy inherent in the determinant
calculations at the heart of N-FINDR is reduced using an LDU decomposition to form two new algorithms, one based on the original N-FINDR algorithm and one based on the closely related Sequential
N-FINDR algorithm. The second approach lowers complexity by reducing the repetition of the volume
calculations by removing pixels unlikely to represent pure materials. This is accomplished at no
additional cost through the reuse of the volume calculations inherent in the Sequential N-FINDR algorithm. Various thresholding methods for excluding pixels are considered. The impact of these
modifications on complexity and the accuracy is examined on simulated and real data showing that the
LDU-based approaches save considerable complexity, while pixel reduction methods, with appropriate threshold selection, can produce a favorable complexity-accuracy trade-off.
ETPL
DIP-240 3-D Curvilinear Structure Detection Filter Via Structure-Ball Analysis
Abstract: Curvilinear structure detection filters are crucial building blocks in many medical image
processing applications, where they are used to detect important structures, such as blood vessels, airways, and other similar fibrous tissues. Unfortunately, most of these filters are plagued by an implicit
single structure direction assumption, which results in a loss of signal around bifurcations. This
peculiarity limits the performance of all subsequent processes, such as understanding angiography
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
acquisitions, computing an accurate segmentation or tractography, or automatically classifying image
voxels. This paper presents a new 3-D curvilinear structure detection filter based on the analysis of the
structure ball, a geometric construction representing second order differences sampled in many directions. The structure ball is defined formally, and its computation on a discreet image is discussed. A contrast
invariant diffusion index easing voxel analysis and visualization is also introduced, and different structure
ball shape descriptors are proposed. A new curvilinear structure detection filter is defined based on the shape descriptors that best characterize curvilinear structures. The new filter produces a vesselness
measure that is robust to the presence of X- and Y-junctions along the structure by going beyond the
single direction assumption. At the same time, it stays conceptually simple and deterministic, and allows
for an intuitive representation of the structure's principal directions. Sample results are provided for synthetic images and for two medical imaging modalities.
ETPL
DIP-241 Image Fusion With Guided Filtering
Abstract: A fast and effective image fusion method is proposed for creating a highly informative fused image through merging multiple images. The proposed method is based on a two-scale decomposition of
an image into a base layer containing large scale variations in intensity, and a detail layer capturing small
scale details. A novel guided filtering-based weighted average technique is proposed to make full use of
spatial consistency for fusion of the base and detail layers. Experimental results demonstrate that the proposed method can obtain state-of-the-art performance for fusion of multispectral, multifocus,
multimodal, and multiexposure images.
ETPL
DIP-242 Global Propagation of Affine Invariant Features for Robust Matching
Abstract: Local invariant features have been successfully used in image matching to cope with viewpoint
change, partial occlusion, and clutters. However, when these factors become too strong, there will be a lot
of mismatches due to the limited repeatability and discriminative power of features. In this paper, we
present an efficient approach to remove the false matches and propagate the correct ones for the affine invariant features which represent the state-of-the-art local invariance. First, a pair-wise affine
consistency measure is proposed to evaluate the consensus of the matches of affine invariant regions. The
measure takes into account both the keypoint location and the region shape, size, and orientation. Based on this measure, a geometric filter is then presented which can efficiently remove the outliers from the
initial matches, and is robust to severe clutters and non-rigid deformation. To increase the correct
matches, we propose a global match refinement and propagation method that simultaneously finds a optimal group of local affine transforms to relate the features in two images. The global method is
capable of producing a quasi-dense set of matches even for the weakly textured surfaces that suffer strong
rigid transformation or non-rigid deformation. The strong capability of the proposed method in dealing
with significant viewpoint change, non-rigid deformation, and low-texture objects is demonstrated in experiments of image matching, object recognition, and image based rendering.
ETPL
DIP-243
Edge-SIFT: Discriminative Binary Descriptor for Scalable Partial-Duplicate Mobile
Search
Abstract: As the basis of large-scale partial duplicate visual search on mobile devices, image local descriptor is expected to be discriminative, efficient, and compact. Our study shows that the popularly
used histogram-based descriptors, such as scale invariant feature transform (SIFT) are not optimal for this
task. This is mainly because histogram representation is relatively expensive to compute on mobile
platforms and loses significant spatial clues, which are important for improving discriminative power and matching near-duplicate image patches. To address these issues, we propose to extract a novel binary
local descriptor named Edge-SIFT from the binary edge maps of scale- and orientation-normalized image
patches. By preserving both locations and orientations of edges and compressing the sparse binary edge
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
maps with a boosting strategy, the final Edge-SIFT shows strong discriminative power with compact
representation. Furthermore, we propose a fast similarity measurement and an indexing framework with
flexible online verification. Hence, the Edge-SIFT allows an accurate and efficient image search and is ideal for computation sensitive scenarios such as a mobile image search. Experiments on a large-scale
dataset manifest that the Edge-SIFT shows superior retrieval accuracy to Oriented BRIEF (ORB) and is
superior to SIFT in the aspects of retrieval precision, efficiency, compactness, and transmission cost.
ETPL
DIP-244 Parametric Generalized Linear System Based on the Notion of the T-Norm
Abstract: By using the triangular norm, we propose two methods for the construction of generalized
linear systems, and show new insights into the relationship between typical systems. Using the Hamacher
and Frank t-norm, we propose a parametric log-ratio model, which is a generalization of the log-ratio model and is more flexible in algorithmic development. We develop a generalized linear contrast
enhancement algorithm based on the proposed parametric log-ratio model. We show that the performance
of the proposed algorithm is effective and robust for different types of images.
ETPL
DIP-245 A Linear Support Higher-Order Tensor Machine for Classification
Abstract: There has been growing interest in developing more effective learning machines for tensor
classification. At present, most of the existing learning machines, such as support tensor machine (STM),
involve nonconvex optimization problems and need to resort to iterative techniques. Obviously, it is very time-consuming and may suffer from local minima. In order to overcome these two shortcomings, in this
paper, we present a novel linear support higher-order tensor machine (SHTM) which integrates the merits
of linear C-support vector machine (C-SVM) and tensor rank-one decomposition. Theoretically, SHTM is an extension of the linear C-SVM to tensor patterns. When the input patterns are vectors, SHTM
degenerates into the standard C-SVM. A set of experiments is conducted on nine second-order face
recognition datasets and three third-order gait recognition datasets to illustrate the performance of the
proposed SHTM. The statistic test shows that compared with STM and C-SVM with the RBF kernel, SHTM provides significant performance gain in terms of test accuracy and training speed, especially in
the case of higher-order tensors
ETPL
DIP-246
Novel True-Motion Estimation Algorithm and Its Application to Motion-Compensated
Temporal Frame Interpolation
Abstract: In this paper, a new low-complexity true-motion estimation (TME) algorithm is proposed for
video processing applications, such as motion-compensated temporal frame interpolation (MCTFI) or
motion-compensated frame rate up-conversion (MCFRUC). Regular motion estimation, which is often used in video coding, aims to find the motion vectors (MVs) to reduce the temporal redundancy, whereas
TME aims to track the projected object motion as closely as possible. TME is obtained by imposing
implicit and/or explicit smoothness constraints on the block-matching algorithm. To produce better
quality-interpolated frames, the dense motion field at interpolation time is obtained for both forward and backward MVs; then, bidirectional motion compensation using forward and backward MVs is applied by
mixing both elegantly. Finally, the performance of the proposed algorithm for MCTFI is demonstrated
against recently proposed methods and smoothness constraint optical flow employed by a professional video production suite. Experimental results show that the quality of the interpolated frames using the
proposed method is better when compared with the MCFRUC techniques.
ETPL
DIP-247 Motion Analysis Using 3D High-Resolution Frequency Analysis
Abstract: The spatiotemporal spectra of a video that contains a moving object form a plane in the 3D frequency domain. This plane, which is described as the theoretical motion plane, reflects the velocity of
the moving objects, which is calculated from the slope. However, if the resolution of the frequency
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
analysis method is not high enough to obtain actual spectra from the object signal, the spatiotemporal
spectra disperse away from the theoretical motion plane. In this paper, we propose a high-resolution
frequency analysis method, described as 3D nonharmonic analysis (NHA), which is only weakly influenced by the analysis window. In addition, we estimate the motion vectors of objects in a video using
the plane-clustering method, in conjunction with the least-squares method, for 3D NHA spatiotemporal
spectra. We experimentally verify the accuracy of the 3D NHA and its usefulness for a sequence containing complex motions, such as cross-over motion, through comparison with 3D fast Fourier
transform. The experimental results show that increasing the frequency resolution contributes to high-
accuracy estimation of a motion plane.
ETPL
DIP-248 Segment Adaptive Gradient Angle Interpolation
Abstract: We introduce a new edge-directed interpolator based on locally defined, straight line
approximations of image isophotes. Spatial derivatives of image intensity are used to describe the principal behavior of pixel-intersecting isophotes in terms of their slopes. The slopes are determined by
inverting a tridiagonal matrix and are forced to vary linearly from pixel-to-pixel within segments. Image
resizing is performed by interpolating along the approximated isophotes. The proposed method can
accommodate arbitrary scaling factors, provides state-of-the-art results in terms of PSNR as well as other quantitative visual quality metrics, and has the advantage of reduced computational complexity that is
directly proportional to the number of pixels.
ETPL
DIP-249
Fast Computation of Rotation-Invariant Image Features by an Approximate Radial
Gradient Transform
Abstract: We present the radial gradient transform (RGT) and a fast approximation, the approximate RGT
(ARGT). We analyze the effects of the approximation on gradient quantization and histogramming. The
ARGT is incorporated into the rotation-invariant fast feature (RIFF) algorithm. We demonstrate that,
using the ARGT, RIFF extracts features 16× faster than SURF while achieving a similar performance for image matching and retrieval.
ETPL
DIP-250 Image Completion by Diffusion Maps and Spectral Relaxation
Abstract: We present a framework for image inpainting that utilizes the diffusion framework approach to
spectral dimensionality reduction. We show that on formulating the inpainting problem in the embedding domain, the domain to be inpainted is smoother in general, particularly for the textured images. Thus, the
textured images can be inpainted through simple exemplar-based and variational methods. We discuss the
properties of the induced smoothness and relate it to the underlying assumptions used in contemporary
inpainting schemes. As the diffusion embedding is nonlinear and noninvertible, we propose a novel computational approach to approximate the inverse mapping from the inpainted embedding space to the
image domain. We formulate the mapping as a discrete optimization problem, solved through spectral
relaxation. The effectiveness of the presented method is exemplified by inpainting real images, where it is shown to compare favorably with contemporary state-of-the-art schemes.
ETPL
DIP-251
A Continuous Method for Reducing Interpolation Artifacts in Mutual Information-
Based Rigid Image Registration
Abstract: We propose an approach for computing mutual information in rigid multimodality image
registration. Images to be registered are modeled as functions defined on a continuous image domain. Analytic forms of the probability density functions for the images and the joint probability density
function are first defined in 1D. We describe how the entropies of the images, the joint entropy, and
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
mutual information can be computed accurately by a numerical method. We then extend the method to
2D and 3D. The mutual information function generated is smooth and does not seem to have the typical
interpolation artifacts that are commonly observed in other standard models. The relationship between the proposed method and the partial volume (PV) model is described. In addition, we give a theoretical
analysis to explain the nonsmoothness of the mutual information function computed by the PV model.
Numerical experiments in 2D and 3D are presented to illustrate the smoothness of the mutual information function, which leads to robust and accurate numerical convergence results for solving the image
registration problem
ETPL
DIP-252 Image Inpainting on the Basis of Spectral Structure From 2-D Nonharmonic Analysis
Abstract: The restoration of images by digital inpainting is an active field of research and such algorithms are, in fact, now widely used. Conventional methods generally apply textures that are most similar to the
areas around the missing region or use a large image database. However, this produces discontinuous
textures and thus unsatisfactory results. Here, we propose a new technique to overcome this limitation by using signal prediction based on the nonharmonic analysis (NHA) technique proposed by the authors.
NHA can be used to extract accurate spectra, irrespective of the window function, and its frequency
resolution is less than that of the discrete Fourier transform. The proposed method sequentially generates
new textures on the basis of the spectrum obtained by NHA. Missing regions from the spectrum are repaired using an improved cost function for 2D NHA. The proposed method is evaluated using the
standard images Lena, Barbara, Airplane, Pepper, and Mandrill. The results show an improvement in
MSE of about 10-20 compared with the examplar-based method and good subjective quality
ETPL
DIP-253 Linear Discriminant Analysis Based on L1-Norm Maximization
Abstract: Linear discriminant analysis (LDA) is a well-known dimensionality reduction technique, which
is widely used for many purposes. However, conventional LDA is sensitive to outliers because its
objective function is based on the distance criterion using L2-norm. This paper proposes a simple but effective robust LDA version based on L1-norm maximization, which learns a set of local optimal
projection vectors by maximizing the ratio of the L1-norm-based between-class dispersion and the L1-
norm-based within-class dispersion. The proposed method is theoretically proved to be feasible and robust to outliers while overcoming the singular problem of the within-class scatter matrix for
conventional LDA. Experiments on artificial datasets, standard classification datasets and three popular
image databases demonstrate the efficacy of the proposed method.
ETPL
DIP-254 Visual Tracking With Spatio-Temporal Dempster–Shafer Information Fusion
Abstract: A key problem in visual tracking is how to effectively combine spatio-temporal visual
information from throughout a video to accurately estimate the state of an object. We address this
problem by incorporating Dempster-Shafer (DS) information fusion into the tracking approach. To implement this fusion task, the entire image sequence is partitioned into spatially and temporally adjacent
subsequences. A support vector machine (SVM) classifier is trained for object/nonobject classification on
each of these subsequences, the outputs of which act as separate data sources. To combine the discriminative information from these classifiers, we further present a spatio-temporal weighted DS
(STWDS) scheme. In addition, temporally adjacent sources are likely to share discriminative information
on object/nonobject classification. To use such information, an adaptive SVM learning scheme is
designed to transfer discriminative information across sources. Finally, the corresponding DS belief function of the STWDS scheme is embedded into a Bayesian tracking model. Experimental results on
challenging videos demonstrate the effectiveness and robustness of the proposed tracking approach.
ETPL Dimensionality Reduction for Registration of High-Dimensional Data Sets
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
DIP-255
Abstract: Registration of two high-dimensional data sets often involves dimensionality reduction to yield a single-band image from each data set followed by pairwise image registration. We develop a new
application-specific algorithm for dimensionality reduction of high-dimensional data sets such that the
weighted harmonic mean of Cramer -Rao lower bounds for the estimation of the transformation
parameters for registration is minimized. The performance of the proposed dimensionality reduction algorithm is evaluated using three remotes sensing data sets. The experimental results using mutual
information-based pairwise registration technique demonstrate that our proposed dimensionality
reduction algorithm combines the original data sets to obtain the image pair with more texture, resulting in improved image registration.
ETPL
DIP-256
Multiple-Kernel, Multiple-Instance Similarity Features for Efficient Visual Object
Detection
Abstract: We propose to use the similarity between the sample instance and a number of exemplars as features in visual object detection. Concepts from multiple-kernel learning and multiple-instance learning
are incorporated into our scheme at the feature level by properly calculating the similarity. The similarity
between two instances can be measured by various metrics and by using the information from various
sources, which mimics the use of multiple kernels for kernel machines. Pooling of the similarity values from multiple instances of an object part is introduced to cope with alignment inaccuracy between object
instances. To deal with the high dimensionality of the multiple-kernel multiple-instance similarity feature,
we propose a forward feature-selection technique and a coarse-to-fine learning scheme to find a set of good exemplars, hence we can produce an efficient classifier while maintaining a good performance.
Both the feature and the learning technique have interesting properties. We demonstrate the performance
of our method using both synthetic data and real-world visual object detection data sets.
ETPL
DIP-257 Asymmetric Correlation: A Noise Robust Similarity Measure for Template Matching
Abstract: We present an efficient and noise robust template matching method based on asymmetric
correlation (ASC). The ASC similarity function is invariant to affine illumination changes and robust to
extreme noise. It correlates the given non-normalized template with a normalized version of each image window in the frequency domain. We show that this asymmetric normalization is more robust to noise
than other cross correlation variants, such as the correlation coefficient. Direct computation of ASC is
very slow, as a DFT needs to be calculated for each image window independently. To make the template
matching efficient, we develop a much faster algorithm, which carries out a prediction step in linear time and then computes DFTs for only a few promising candidate windows. We extend the proposed template
matching scheme to deal with partial occlusion and spatially varying light change. Experimental results
demonstrate the robustness of the proposed ASC similarity measure compared to state-of-the-art template matching methods.
ETPL
DIP-258
Deconvolving Images With Unknown Boundaries Using the Alternating Direction
Method of Multipliers
Abstract: The alternating direction method of multipliers (ADMM) has recently sparked interest as a flexible and efficient optimization tool for inverse problems, namely, image deconvolution and
reconstruction under non-smooth convex regularization. ADMM achieves state-of-the-art speed by
adopting a divide and conquer strategy, wherein a hard problem is split into simpler, efficiently solvable
sub-problems (e.g., using fast Fourier or wavelet transforms, or simple proximity operators). In deconvolution, one of these sub-problems involves a matrix inversion (i.e., solving a linear system),
which can be done efficiently (in the discrete Fourier domain) if the observation operator is circulant, i.e.,
under periodic boundary conditions. This paper extends ADMM-based image deconvolution to the more realistic scenario of unknown boundary, where the observation operator is modeled as the composition of
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
a convolution (with arbitrary boundary conditions) with a spatial mask that keeps only pixels that do not
depend on the unknown boundary. The proposed approach also handles, at no extra cost, problems that
combine the recovery of missing pixels (i.e., inpainting) with deconvolution. We show that the resulting algorithms inherit the convergence guarantees of ADMM and illustrate its performance on non-periodic
deblurring (with and without inpainting of interior pixels) under total-variation and frame-based
regularization.
ETPL
DIP-259
Integration of Gibbs Markov Random Field and Hopfield-Type Neural Networks for
Unsupervised Change Detection in Remotely Sensed Multitemporal Images
Abstract: In this paper, a spatiocontextual unsupervised change detection technique for multitemporal,
multispectral remote sensing images is proposed. The technique uses a Gibbs Markov random field
(GMRF) to model the spatial regularity between the neighboring pixels of the multitemporal difference image. The difference image is generated by change vector analysis applied to images acquired on the
same geographical area at different times. The change detection problem is solved using the maximum a
posteriori probability (MAP) estimation principle. The MAP estimator of the GMRF used to model the difference image is exponential in nature, thus a modified Hopfield type neural network (HTNN) is
exploited for estimating the MAP. In the considered Hopfield type network, a single neuron is assigned to
each pixel of the difference image and is assumed to be connected only to its neighbors. Initial values of
the neurons are set by histogram thresholding. An expectation-maximization algorithm is used to estimate the GMRF model parameters. Experiments are carried out on three-multispectral and multitemporal
remote sensing images. Results of the proposed change detection scheme are compared with those of the
manual-trial-and-error technique, automatic change detection scheme based on GMRF model and iterated conditional mode algorithm, a context sensitive change detection scheme based on HTNN, the GMRF
model, and a graph-cut algorithm. A comparison points out that the proposed method provides more
accurate change detection maps than other methods.
ETPL
DIP-260 SparCLeS: Dynamic Sparse Classifiers With Level Sets for Robust
Beard/Moustache Detection and Segmentation
Abstract: Robust facial hair detection and segmentation is a highly valued soft biometric attribute for carrying out forensic facial analysis. In this paper, we propose a novel and fully automatic system, called
SparCLeS, for beard/moustache detection and segmentation in challenging facial images. SparCLeS uses
the multiscale self-quotient (MSQ) algorithm to preprocess facial images and deal with illumination
variation. Histogram of oriented gradients (HOG) features are extracted from the preprocessed images and a dynamic sparse classifier is built using these features to classify a facial region as either containing
skin or facial hair. A level set based approach, which makes use of the advantages of both global and
local information, is then used to segment the regions of a face containing facial hair. Experimental results demonstrate the effectiveness of our proposed system in detecting and segmenting facial hair
regions in images drawn from three databases, i.e., the NIST Multiple Biometric Grand Challenge
(MBGC) still face database, the NIST Color Facial Recognition Technology FERET database, and the Labeled Faces in the Wild (LFW) database.
ETPL
DIP-261 Cross-Domain Object Recognition Via Input-Output Kernel Analysis
Abstract: It is of great importance to investigate the domain adaptation problem of image object
recognition, because now image data is available from a variety of source domains. To understand the changes in data distributions across domains, we study both the input and output kernel spaces for cross-
domain learning situations, where most labeled training images are from a source domain and testing
images are from a different target domain. To address the feature distribution change issue in the
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
reproducing kernel Hilbert space induced by vector-valued functions, we propose a domain adaptive
input-output kernel learning (DA-IOKL) algorithm, which simultaneously learns both the input and
output kernels with a discriminative vector-valued decision function by reducing the data mismatch and minimizing the structural error. We also extend the proposed method to the cases of having multiple
source domains. We examine two cross-domain object recognition benchmark data sets, and the proposed
method consistently outperforms the state-of-the-art domain adaptation and multiple kernel learning methods.
ETPL
DIP-262 Regularized Feature Reconstruction for Spatio-Temporal Saliency Detection
Abstract: Multimedia applications such as image or video retrieval, copy detection, and so forth can
benefit from saliency detection, which is essentially a method to identify areas in images and videos that capture the attention of the human visual system. In this paper, we propose a new spatio-temporal
saliency detection framework on the basis of regularized feature reconstruction. Specifically, for video
saliency detection, both the temporal and spatial saliency detection are considered. For temporal saliency, we model the movement of the target patch as a reconstruction process using the patches in neighboring
frames. A Laplacian smoothing term is introduced to model the coherent motion trajectories. With
psychological findings that abrupt stimulus could cause a rapid and involuntary deployment of attention,
our temporal model combines the reconstruction error, regularizer, and local trajectory contrast to measure the temporal saliency. For spatial saliency, a similar sparse reconstruction process is adopted to
capture the regions with high center-surround contrast. Finally, the temporal saliency and spatial saliency
are combined together to favor salient regions with high confidence for video saliency detection. We also apply the spatial saliency part of the spatio-temporal model to image saliency detection. Experimental
results on a human fixation video dataset and an image saliency detection dataset show that our method
achieves the best performance over several state-of-the-art approaches.
ETPL
DIP-263 Texture Enhanced Histogram Equalization Using TV- Image Decomposition
Abstract: Histogram transformation defines a class of image processing operations that are widely applied
in the implementation of data normalization algorithms. In this paper, we present a new variational
approach for image enhancement that is constructed to alleviate the intensity saturation effects that are introduced by standard contrast enhancement (CE) methods based on histogram equalization. In this
paper, we initially apply total variation (TV) minimization with a L1 fidelity term to decompose the input
image with respect to cartoon and texture components. Contrary to previous papers that rely solely on the information encompassed in the distribution of the intensity information, in this paper, the texture
information is also employed to emphasize the contribution of the local textural features in the CE
process. This is achieved by implementing a nonlinear histogram warping CE strategy that is able to
maximize the information content in the transformed image. Our experimental study addresses the CE of a wide variety of image data and comparative evaluations are provided to illustrate that our method
produces better results than conventional CE strategies.
ETPL
DIP-264 Gaussian Blurring-Invariant Comparison of Signals and Images
Abstract: We present a Riemannian framework for analyzing signals and images in a manner that is
invariant to their level of blurriness, under Gaussian blurring. Using a well known relation between
Gaussian blurring and the heat equation, we establish an action of the blurring group on image space and
define an orthogonal section of this action to represent and compare images at the same blur level. This comparison is based on geodesic distances on the section manifold which, in turn, are computed using a
path-straightening algorithm. The actual implementations use coefficients of images under a truncated
orthonormal basis and the blurring action corresponds to exponential decays of these coefficients. We
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
demonstrate this framework using a number of experimental results, involving 1D signals and 2D images.
As a specific application, we study the effect of blurring on the recognition performance when 2D facial
images are used for recognizing people.
ETPL
DIP-265 Fast SIFT Design for Real-Time Visual Feature Extraction
Abstract: Visual feature extraction with scale invariant feature transform (SIFT) is widely used for object
recognition. However, its real-time implementation suffers from long latency, heavy computation, and
high memory storage because of its frame level computation with iterated Gaussian blur operations. Thus, this paper proposes a layer parallel SIFT (LPSIFT) with integral image, and its parallel hardware design
with an on-the-fly feature extraction flow for real-time application needs. Compared with the original
SIFT algorithm, the proposed approach reduces the computational amount by 90% and memory usage by 95%. The final implementation uses 580-K gate count with 90-nm CMOS technology, and offers 6000
feature points/frame for VGA images at 30 frames/s and ~ 2000 feature points/frame for 1920 × 1080
images at 30 frames/s at the clock rate of 100 MHz.
ETPL
DIP-266 Artistic Image Analysis Using Graph-Based Learning Approaches
Abstract: We introduce a new methodology for the problem of artistic image analysis, which among other
tasks, involves the automatic identification of visual classes present in an art work. In this paper, we
advocate the idea that artistic image analysis must explore a graph that captures the network of artistic influences by computing the similarities in terms of appearance and manual annotation. One of the
novelties of our methodology is the proposed formulation that is a principled way of combining these two
similarities in a single graph. Using this graph, we show that an efficient random walk algorithm based on an inverted label propagation formulation produces more accurate annotation and retrieval results
compared with the following baseline algorithms: bag of visual words, label propagation, matrix
completion, and structural learning. We also show that the proposed approach leads to a more efficient
inference and training procedures. This experiment is run on a database containing 988 artistic images (with 49 visual classification problems divided into a multiclass problem with 27 classes and 48 binary
problems), where we show the inference and training running times, and quantitative comparisons with
respect to several retrieval and annotation performance measures.
ETPL
DIP-267
Self-Supervised Online Metric Learning With Low Rank Constraint for Scene
Categorization
Abstract: Conventional visual recognition systems usually train an image classifier in a bath mode with
all training data provided in advance. However, in many practical applications, only a small amount of training samples are available in the beginning and many more would come sequentially during online
recognition. Because the image data characteristics could change over time, it is important for the
classifier to adapt to the new data incrementally. In this paper, we present an online metric learning
method to address the online scene recognition problem via adaptive similarity measurement. Given a number of labeled data followed by a sequential input of unseen testing samples, the similarity metric is
learned to maximize the margin of the distance among different classes of samples. By considering the
low rank constraint, our online metric learning model not only can provide competitive performance compared with the state-of-the-art methods, but also guarantees convergence. A bi-linear graph is also
defined to model the pair-wise similarity, and an unseen sample is labeled depending on the graph-based
label propagation, while the model can also self-update using the more confident new samples. With the
ability of online learning, our methodology can well handle the large-scale streaming video data with the ability of incremental self-updating. We evaluate our model to online scene categorization and
experiments on various benchmark datasets and comparisons with state-of-the-art methods demonstrate
the effectiveness and efficiency of our algorithm.
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
ETPL
DIP-268 Nonlocal Regularization of Inverse Problems: A Unified Variational Framework
Abstract: We introduce a unifying energy minimization framework for nonlocal regularization of inverse
problems. In contrast to the weighted sum of square differences between image pixels used by current schemes, the proposed functional is an unweighted sum of inter-patch distances. We use robust distance
metrics that promote the averaging of similar patches, while discouraging the averaging of dissimilar
patches. We show that the first iteration of a majorize-minimize algorithm to minimize the proposed cost function is similar to current nonlocal methods. The reformulation thus provides a theoretical justification
for the heuristic approach of iterating nonlocal schemes, which re-estimate the weights from the current
image estimate. Thanks to the reformulation, we now understand that the widely reported alias amplification associated with iterative nonlocal methods are caused by the convergence to local minimum
of the nonconvex penalty. We introduce an efficient continuation strategy to overcome this problem. The
similarity of the proposed criterion to widely used nonquadratic penalties (e.g., total variation
and lp semi-norms) opens the door to the adaptation of fast algorithms developed in the context of compressive sensing; we introduce several novel algorithms to solve the proposed nonlocal optimization
problem. Thanks to the unifying framework, these fast algorithms are readily applicable for a large class
of distance metrics.
ETPL
DIP-269
Corner Detection and Classification Using Anisotropic Directional Derivative
Representations
Abstract: This paper proposes a corner detector and classifier using anisotropic directional derivative
(ANDD) representations. The ANDD representation at a pixel is a function of the oriented angle and
characterizes the local directional grayscale variation around the pixel. The proposed corner detector fuses the ideas of the contour- and intensity-based detection. It consists of three cascaded blocks. First,
the edge map of an image is obtained by the Canny detector and from which contours are extracted and
patched. Next, the ANDD representation at each pixel on contours is calculated and normalized by its maximal magnitude. The area surrounded by the normalized ANDD representation forms a new corner
measure. Finally, the nonmaximum suppression and thresholding are operated on each contour to find
corners in terms of the corner measure. Moreover, a corner classifier based on the peak number of the ANDD representation is given. Experiments are made to evaluate the proposed detector and classifier.
The proposed detector is competitive with the two recent state-of-the-art corner detectors, the He & Yung
detector and CPDA detector, in detection capability and attains higher repeatability under affine
transforms. The proposed classifier can discriminate effectively simple corners, Y-type corners, and higher order corners
ETPL
DIP-270 Classification of Time Series of Multispectral Images With Limited Training Data
Abstract: Image classification usually requires the availability of reliable reference data collected for the
considered image to train supervised classifiers. Unfortunately when time series of images are considered,
this is seldom possible because of the costs associated with reference data collection. In most of the applications it is realistic to have reference data available for one or few images of a time series acquired
on the area of interest. In this paper, we present a novel system for automatically classifying image time
series that takes advantage of image(s) with an associated reference information (i.e., the source domain) to classify image(s) for which reference information is not available (i.e., the target domain). The
proposed system exploits the already available knowledge on the source domain and, when possible,
integrates it with a minimum amount of new labeled data for the target domain. In addition, it is able to handle possible significant differences between statistical distributions of the source and target domains.
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Here, the method is presented in the context of classification of remote sensing image time series, where
ground reference data collection is a highly critical and demanding task. Experimental results show the
effectiveness of the proposed technique. The method can work on multimodal (e.g., multispectral) images
ETPL
DIP-271 Fast -Minimization Algorithms for Robust Face Recognition
Abstract: l 1-minimization refers to finding the minimum l1-norm solution to an underdetermined linear
system mbib=Ambix. Under certain conditions as described in compressive sensing theory, the
minimum l1-norm solution is also the sparsest solution. In this paper, we study the speed and scalability of its algorithms. In particular, we focus on the numerical implementation of a sparsity-based
classification framework in robust face recognition, where sparse representation is sought to recover
human identities from high-dimensional facial images that may be corrupted by illumination, facial disguise, and pose variation. Although the underlying numerical problem is a linear program, traditional
algorithms are known to suffer poor scalability for large-scale applications. We investigate a new solution
based on a classical convex optimization framework, known as augmented Lagrangian methods. We conduct extensive experiments to validate and compare its performance against several popular l1-
minimization solvers, including interior-point method, Homotopy, FISTA, SESOP-PCD, approximate
message passing, and TFOCS. To aid peer evaluation, the code for all the algorithms has been made
publicly available.
ETPL
DIP-272 Robust Face Representation Using Hybrid Spatial Feature Interdependence Matrix
Abstract: A key issue in face recognition is to seek an effective descriptor for representing face
appearance. In the context of considering the face image as a set of small facial regions, this paper presents a new face representation approach coined spatial feature interdependence matrix (SFIM).
Unlike classical face descriptors which usually use a hierarchically organized or a sequentially
concatenated structure to describe the spatial layout features extracted from local regions, SFIM is
attributed to the exploitation of the underlying feature interdependences regarding local region pairs inside a class specific face. According to SFIM, the face image is projected onto an undirected connected
graph in a manner that explicitly encodes feature interdependence-based relationships between local
regions. We calculate the pair-wise interdependence strength as the weighted discrepancy between two feature sets extracted in a hybrid feature space fusing histograms of intensity, local binary pattern and
oriented gradients. To achieve the goal of face recognition, our SFIM-based face descriptor is embedded
in three different recognition frameworks, namely nearest neighbor search, subspace-based classification, and linear optimization-based classification. Extensive experimental results on four well-known face
databases and comprehensive comparisons with the state-of-the-art results are provided to demonstrate
the efficacy of the proposed SFIM-based descriptor
ETPL
DIP-273 Motion Estimation Using the Correlation Transform
Abstract: The zero-mean normalized cross-correlation is shown to improve the accuracy of optical flow,
but its analytical form is quite complicated for the variational framework. This paper addresses this issue and presents a new direct approach to this matching measure. Our approach uses the correlation transform
to define very discriminative descriptors that are pre-computed and that have to be matched in the target
frame. It is equivalent to the computation of the optical flow for the correlation transforms of the images.
The smoothness energy is non-local and uses a robust penalty in order to preserve motion discontinuities. The model is associated with a fast and parallelizable minimization procedure based on the projected-
proximal point algorithm. The experiments confirm the strength of this model and implicitly demonstrate
the correctness of our solution. The results demonstrate that the involved data term is very robust with
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
respect to changes in illumination, especially where large illumination exists.
ETPL
DIP-274 Single Image Dehazing by Multi-Scale Fusion
Abstract: Haze is an atmospheric phenomenon that significantly degrades the visibility of outdoor scenes.
This is mainly due to the atmosphere particles that absorb and scatter the light. This paper introduces a
novel single image approach that enhances the visibility of such degraded images. Our method is a
fusion-based strategy that derives from two original hazy image inputs by applying a white balance and a contrast enhancing procedure. To blend effectively the information of the derived inputs to preserve the
regions with good visibility, we filter their important features by computing three measures (weight
maps): luminance, chromaticity, and saliency. To minimize artifacts introduced by the weight maps, our approach is designed in a multiscale fashion, using a Laplacian pyramid representation. We are the first to
demonstrate the utility and effectiveness of a fusion-based technique for dehazing based on a single
degraded image. The method performs in a per-pixel fashion, which is straightforward to implement. The experimental results demonstrate that the method yields results comparative to and even better than the
more complex state-of-the-art techniques, having the advantage of being appropriate for real-time
applications.
ETPL
DIP-275 Joint Sparse Learning for 3-D Facial Expression Generation
Abstract: 3-D facial expression generation, including synthesis and retargeting, has received intensive
attentions in recent years, because it is important to produce realistic 3-D faces with specific expressions
in modern film production and computer games. In this paper, we present joint sparse learning (JSL) to learn mapping functions and their respective inverses to model the relationship between the high-
dimensional 3-D faces (of different expressions and identities) and their corresponding low-dimensional
representations. Based on JSL, we can effectively and efficiently generate various expressions of a 3-D
face by either synthesizing or retargeting. Furthermore, JSL is able to restore 3-D faces with holes by learning a mapping function between incomplete and intact data. Experimental results on a wide range of
3-D faces demonstrate the effectiveness of the proposed approach by comparing with representative ones
in terms of quality, time cost, and robustness.
ETPL
DIP-276 Robust Model for Segmenting Images With/Without Intensity Inhomogeneities
Abstract: Intensity inhomogeneities and different types/levels of image noise are the two major obstacles
to accurate image segmentation by region-based level set models. To provide a more general solution to these challenges, we propose a novel segmentation model that considers global and local image statistics
to eliminate the influence of image noise and to compensate for intensity inhomogeneities. In our model,
the global energy derived from a Gaussian model estimates the intensity distribution of the target object
and background; the local energy derived from the mutual influences of neighboring pixels can eliminate the impact of image noise and intensity inhomogeneities. The robustness of our method is validated on
segmenting synthetic images with/without intensity inhomogeneities, and with different types/levels of
noise, including Gaussian noise, speckle noise, and salt and pepper noise, as well as images from different medical imaging modalities. Quantitative experimental comparisons demonstrate that our
method is more robust and more accurate in segmenting the images with intensity inhomogeneities than
the local binary fitting technique and its more recent systematic model. Our technique also outperformed
the region-based Chan-Vese model when dealing with images without intensity inhomogeneities and produce better segmentation results than the graph-based algorithms including graph-cuts and random
walker when segmenting noisy images.
ETPL Learning Prototype Hyperplanes for Face Verification in the Wild
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Cochin | Ramnad |
Pondicherry | Trivandrum | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
DIP-277
Abstract: In this paper, we propose a new scheme called Prototype Hyperplane Learning (PHL) for face verification in the wild using only weakly labeled training samples (i.e., we only know whether each pair
of samples are from the same class or different classes without knowing the class label of each sample)
by leveraging a large number of unlabeled samples in a generic data set. Our scheme represents each
sample in the weakly labeled data set as a mid-level feature with each entry as the corresponding decision value from the classification hyperplane (referred to as the prototype hyperplane) of one Support Vector
Machine (SVM) model, in which a sparse set of support vectors is selected from the unlabeled generic
data set based on the learnt combination coefficients. To learn the optimal prototype hyperplanes for the extraction of mid-level features, we propose a Fisher's Linear Discriminant-like (FLD-like) objective
function by maximizing the discriminability on the weakly labeled data set with a constraint enforcing
sparsity on the combination coefficients of each SVM model, which is solved by using an alternating optimization method. Then, we use the recent work called Side-Information based Linear Discriminant
(SILD) analysis for dimensionality reduction and a cosine similarity measure for final face verification.
Comprehensive experiments on two data sets, Labeled Faces in the Wild (LFW) and YouTube Faces,
demonstrate the effectiveness of our scheme.