perceptual redundancy reduction in image sequences: (original italian title: riduzione della...

2
Thesis alerts 187 another and with real-time processing in mind, we first build a column-by-column processing. This processing is based on the future utilization of these images in classification. We show that the most efficient discriminating vector of parameters is the simplest one: the initial value vector of the normalized interferogram. This surprising result is proved in the thesis for actual data. The coding technique is constructed with the idea to send the features of each class more or less accurately. Then, with scalar and image compression working together, the compression ratio reached is between 8 and 20 with an acceptable distortion measure, depending on the column structure. A second step in image compression is to make a block processing (several columns) to improve the compression ratio already obtained. As a matter of conclusion, we give some insight of the processing algorithms and their ability to be implemented in real time. PERCEPTUAL REDUNDANCY REDUCTION IN IMAGE SEQUENCES (Original Italian title: Riduzione della Ridondanza Pereettiva nelle Seqaenze di Immagini) Gaetano GIUNTA*t University of Rome La Sapienza, INFO-COM Department, Via Eudossiana 18, 00184 Rome, Italy Second generation image coding is based on the recent results of research in neurophysiology. In this thesis, a model of neural coding of signals is analyzed with regard to the visual perception pro- cess, and a high compression method for coding image sequences is derived and applied to standard TV sequences. The signals in nervous fibers are spike trains which can be modeled as realizations of inhomo- geneous filtered Poisson random processes. In the biological transducers, such as the first neurons (ganglion cells) beyond the rods and the cones of the mammalian visual system, informa- tion (sensory stimuli) is carried out by the instan- taneous mean firing rate of the neural spikes. The results of the mathematical analysis performed in this work match several experimental evidences. * PhD thesis in Information and Communication Engineering. Tutor: Professor Guido di Blasio. t A part of this work was made at the Signal Processing Laboratory (LTS), Swiss Federal Institute of Technology ( EPFL), Lausanne, Switzerland. A random non-linear input-output sensitivity law has been proved. It can explain the non-linear behavior found by a number of psychophysical tests. The performance of correlation function estimators based on the neural signals has been derived to investigate the perceptive properties of human brain. The observation time window has been chosen comparable to the visual perception time (some tenths of seconds). Two important remarks are in order. The estimates are nearly efficient when the input process is correlated enough (i.e., the input bandwidth is lower than 20-30 Hz), while the estimation error increases for wider band input signals. Such a trend is in full agreement with the experimental evidence, where the eye inability to follow fast variations (such as it occurs with TV or cinema images) is well-known. Moreover, the physiological mean firing rate of neural signals satisfies an optimality tuning condi- tion based on the minimum mean square error of correlation estimates. Therefore, the neural coding physiological system is optimal according to such criterion. Vol. 20, No. 2, June 1990

Upload: gaetano-giunta

Post on 21-Jun-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Perceptual redundancy reduction in image sequences: (Original Italian title: Riduzione della Ridondanza Percettiva nelle sequence di immagini)

Thesis alerts 187

another and with real-time processing in mind, we first build a column-by-column processing. This processing is based on the future utilization of these images in classification. We show that the most efficient discriminating vector of parameters is the simplest one: the initial value vector of the normalized interferogram. This surprising result is proved in the thesis for actual data. The coding technique is constructed with the idea to send the features of each class more or less accurately. Then,

with scalar and image compression working together, the compression ratio reached is between 8 and 20 with an acceptable distortion measure, depending on the column structure. A second step in image compression is to make a block processing (several columns) to improve the compression ratio already obtained.

As a matter of conclusion, we give some insight of the processing algorithms and their ability to be implemented in real time.

P E R C E P T U A L R E D U N D A N C Y R E D U C T I O N IN I M A G E S E Q U E N C E S

(Original Italian title: Riduzione della Ridondanza Pereettiva nelle Seqaenze di Immagini)

Gaetano GI UNTA*t University of Rome La Sapienza, INFO-COM Department, Via Eudossiana 18, 00184 Rome, Italy

Second generation image coding is based on the recent results of research in neurophysiology. In this thesis, a model of neural coding of signals is analyzed with regard to the visual perception pro- cess, and a high compression method for coding image sequences is derived and applied to standard TV sequences.

The signals in nervous fibers are spike trains which can be modeled as realizations of inhomo- geneous filtered Poisson random processes. In the biological transducers, such as the first neurons (ganglion cells) beyond the rods and the cones of the mammalian visual system, informa- tion (sensory stimuli) is carried out by the instan- taneous mean firing rate of the neural spikes. The results of the mathematical analysis performed in this work match several experimental evidences.

* PhD thesis in Information and Communication Engineering. Tutor: Professor Guido di Blasio.

t A part of this work was made at the Signal Processing Laboratory (LTS), Swiss Federal Institute of Technology ( EPFL), Lausanne, Switzerland.

A random non-linear input-output sensitivity law has been proved. It can explain the non-linear behavior found by a number of psychophysical tests. The performance of correlation function estimators based on the neural signals has been derived to investigate the perceptive properties of human brain. The observation time window has been chosen comparable to the visual perception time (some tenths of seconds). Two important remarks are in order. The estimates are nearly efficient when the input process is correlated enough (i.e., the input bandwidth is lower than 20-30 Hz), while the estimation error increases for wider band input signals. Such a trend is in full agreement with the experimental evidence, where the eye inability to follow fast variations (such as it occurs with TV or cinema images) is well-known. Moreover, the physiological mean firing rate of neural signals satisfies an optimality tuning condi- tion based on the minimum mean square error of correlation estimates. Therefore, the neural coding physiological system is optimal according to such criterion.

Vol. 20, No. 2, June 1990

Page 2: Perceptual redundancy reduction in image sequences: (Original Italian title: Riduzione della Ridondanza Percettiva nelle sequence di immagini)

188 Thesis alerts

The random model of neural coding can explain some behaviors of the human visual system. Human brain can estimate the input signal and the input statistics only from the outputs of neural coders. Since the coded signals are random in nature, the estimation performance depends on how long the input signal or the input statistics can be assumed as constant parameters. Neurophysiological studies show that a visual stimulus is detected in some tens of micro-seconds (sensation time), while the perception process needs some hundreds of micro-seconds (percep- tion time). In fact, unlike sensation, visual perception involves the evaluation of higher order statistics. In order to achieve equivalent perceptive and sensitive distortions, image sequences can be subsampled with a rate of 5-6 Hz, but the inter- frame variations need to be displayed with a rate of 50-60 Hz. Such a compound visual effect can be obtained in practice by interlacing the pixels from subsequent frames.

Second-generation coding methods transfer information from the source space to the percep- tual space. Such a transformation is non-linear and depends on the observed objects. Since it is not possible to directly measure an object-dependent perceptual distortion, image sequences are first separated into components where distortion can be evalued, then analyzed and coded. For such a purpose, a 'bit sharing law', based on the minimiz- ation of a cost/resolution ratio, has been derived to split the image sequence into lowpass and high- pass spatial frequency components. While the spatial lowpass filtered image sequence (corre- sponding to the local average luminance) is subsampled, the directional decomposition of highpass components (lines and edges) is used as

a perceptual basis. The detection of lines and edges is very important for the performance of the coding method because the visual system is particularly sensitive to contour elements. According to estima- tion theory, the linear detection of lines and edges from each frame should be accomplished by sets of filters matched to oriented spatial profiles. The filters should be performed in the spatial domain where contour elements are separable and the spatial distortion due to the amplitude and direc- tion quantizations can be directly evaluated. In a practical implementation, only one set of matched filters is actually needed because a line can be regarded as the derivative along the transversal direction of an edge. The same block configuration can be employed to extract the line and edge endpoints, detected as the maxima of the derivative along the direction of each considered edge or line. Moreover, non-linear (median) directional filter- ing is used to enhance the directional structures before the extraction algorithms.

The coding is based on near optimal estimators which retain only the innovation part of informa- tion, and is well suited for differential pulse coding modulation. Simple time zero-order predictors are usually a very good approximation of the optimal estimators for all the parameters to be transmitted for slow variant image sequences, such as the scenes consisting of objects and /o r people moving on a still background without camera motion. The whole method has been applied to standard one second image sequences with the above charac- teristics. The estimated compression ratio is approximately 320:1 (0.025 bits per pixel), allow- ing a transmission rate of about 41 kbits/s. The resulting image quality is reasonably good.

Signal Processing