insertion of impairments in test video sequences for quality assessment based on psychovisual...
DESCRIPTION
Presentation for published work in AIMS 2014 congress in Madrid in session 2.A: Image, Speech and Signal Processing. Tue, November 18, 2014 15:30TRANSCRIPT
Insertion of Impairments in Test Video
Sequences for Quality Assessment
Based on Psychovisual Characteristics
Juan Pedro López Velasco, Juan Antonio Rodrigo,
David Jiménez and José Manuel Menéndez
Universidad Politécnica de Madrid
Madrid, 18th November 2014
Index
• Introduction: Problem description
• Artificially impaired video sequences
generation:
Impairment and artifacts insertion process
Creation of masks based on ROI’s
• Results and examples of masks appliance
• Example of future work for psychovisual model
• Conclusions
Problem description (I) • Assessing video quality is still a complex task.
• Video Quality Assessment needs to correspond
to human perception.
• Visual attention is focused on concrete areas of
an image as demonstrated with fixation maps.
Original image Fixation map Image with visual
attention weights
• Most pixel-based metrics do not present enough
correlation between objective and subjective results,
algorithms need to correspond to human perception
when analyzing quality in a video sequence.
• For example, these four frames have the same MSE.
Problem description (II)
High blocking High blurring
(defocus) Salt and pepper
noise artifact JPEG encoding
Problem description (III)
• Video quality metrics should correlate with
visual attention and psychovisual models
adapted to concrete artifacts and their
visualization.
Problem description (IV)
• But…
– How do we evaluate concrete artifacts
and effects of hiding/highlighting??
• Answer:
– With databases created to analyze these concrete
areas adapted to concrete artifacts.
– Subjective assessment reveals the relative
importance of each combination of artifact/ROI
Problem description (V)
Scheme of artificially impaired
video sequences generation
Impaired
video
sequence
Original
video
sequence
Artificially
impaired
sequence
Inverse
Feature
Mask
Feature
Mask
Distortion
(2 sequences
for each distortion:
One and the
opposite case
As seen in next
example)
Example of artificially
impaired sequences • Impaired area (with blocking artifact)
located in human faces ROI.
Impairment and artifacts
insertion process
Original
video
sequence
Feature
or
Artifact
Distortion
Impaired
video
sequence
Blocking
Blurring
Ringing
Artifacts simulation
Blocking simulated
with 8x8 mosaic filter
Blurring simulated with
gaussian lowpass filter
Ringing simulated with
JPEG codification filter
Creation of masks based on
ROI’s (I) • Types of regions of interest for masks
Original
video
sequence Feature
Detection
Feature Mask
Inverse
Feature Mask
Motion
Spatial
Detail
Faces
Position
Color
Motion mask
• For motion detection, temporal information in
consecutive frames is scrutinized
• Temporal information is analyzed:
0),(),(,.),( 1 yxFyxFifMaskyxPix frameiii
Original frame Motion mask based on TI
Spatial Detail Mask • Textures, edges and objects in motion are the source
of hiding or highlighting a determined impairments, in
cases such as blocking or blurring artifacts.
• Canny algorithm is used to create binary masks for
separating homogenous from high-frequencies areas.
Original frame Spatial detail mask based on Canny algorithm
Pixel Position Masks (I)
• The image is divided into nine sections
as indicated in research by Nojiri et al.
Nojiri’s sections
distribution
• The objective is to analyze the influence of
pixel position depending on the area in which
is located.
• Three types of masks are created depending
on the regions where pixels are located on a
corner, or on a lateral or central area
Pixel Position Masks (II)
Corner mask Lateral mask Central mask
Facial Mask
• Haar algorithm included in OpenCV
based on a boosted cascade of simple
features is used for face detection
Face detection Face mask
Color masks • Range of colors should be analyzed to determine the
weight of this factor to visual algorithms.
• The mask contains pixels corresponding to the
determined color and the ones with a similarity related
to a threshold.
• 3 ranges of colors define masks: red, blue and green.
Shades of red mask Original frame
Results
• Results based on subjective tests are
analyzed to demonstrate the validity of
test sequences.
“News Report”: Faces “Barrier”: Motion “Crowd”: Pixel Position
Examples for different effects
• 3 FR Metrics are analyzed (PSNR, Blur and MSE)
parallely to MOS result: 5 (excellent) to 1 (Poor).
• Examples where FR Metrics obtains bad correlation
with subjective results.
Sequence FR
Metric
H.264 Impairment located in
Faces ROI.
75Mbps 500Kbps D. Inv.
News
Report
PSNR 47.93 37.58 46.82 34.52
Blur 0.44 3.63 0.38 5.17
MSE 0.67 1.93 0.10 2.30
MOS 4.81 1.54 1.33 3.78
Sequence FR
Metric
H.264 Impairment located in
Motion ROI.
75Mbps 500Kbps D. Inv.
Barrier
PSNR 49.82 33.19 39.85 34.24
Blur 0.27 8.36 1.97 6.24
MSE 0.51 3.34 0.359 2.98
MOS 4.77 1.33 3.11 3.89
Seq. FR
Metric
H.264 Impairment located in Position ROI’s
75
Mbps
500
Kbps
Center Lateral Corner
D. Inv. D. Inv. D. Inv.
Crowd
PSNR 34.33 25.34 30.74 26.82 33.87 26.00 35.95 25.88
Blur 3.44 22.55 6.27 15.33 2.60 19.44 0.95 22.47
MSE 3.55 8.76 2.30 6.21 1.21 7.30 0.64 7.87
MOS 4.68 1.22 1.44 2.44 3.78 1.33 4.11 1.22
Example 1: Faces
• When distortion is located in the areas
corresponding to human faces, the subjective MOS
values are lower (1.33) than when located in the rest
of the picture and faces appear sharp (3.78). This
effect is completely opposite to PSNR (46.82 vs.
34.52) or MSE’s behavior (0.10 vs. 2.30)
Sequence FR
Metric
H.264 Impairment located in
Faces ROI.
75Mbps 500Kbps D. Inv.
News
Report
PSNR 47.93 37.58 46.82 34.52
Blur 0.44 3.63 0.38 5.17
MSE 0.67 1.93 0.10 2.30
MOS 4.81 1.54 1.33 3.78
Example 2: Motion
• A similar situation occurs when
analyzing motion in “Barrier” sequence
Sequence FR
Metric
H.264 Impairment located in
Motion ROI.
75Mbps 500Kbps D. Inv.
Barrier
PSNR 49.82 33.19 39.85 34.24
Blur 0.27 8.36 1.97 6.24
MSE 0.51 3.34 0.359 2.98
MOS 4.77 1.33 3.11 3.89
Example 3: Pixel Position • When comparing distortions located in a corner, a lateral or the
center area in sequence “Crowd”.
• For observers, a high distortion located in a corner is insignificant.
On the other hand, when impairment is located in central area,
opinion scores decrease to 1.44.
• PSNR and MSE reveals the distortion related to the size of the
impaired area, while the influence in human eye is related to the
position of that impaired area
Seq. FR
Metric
H.264 Impairment located in Position ROI’s
75
Mbps
500
Kbps
Center Lateral Corner
D. Inv. D. Inv. D. Inv.
Crowd
PSNR 34.33 25.34 30.74 26.82 33.87 26.00 35.95 25.88
Blur 3.44 22.55 6.27 15.33 2.60 19.44 0.95 22.47
MSE 3.55 8.76 2.30 6.21 1.21 7.30 0.64 7.87
MOS 4.68 1.22 1.44 2.44 3.78 1.33 4.11 1.22
Example of future work for
psychovisual model (I)
Original frame from sequence “News Report”
Example of future work for
psychovisual model (II)
Motion Mask Spatial Details Mask
Pixel Position Mask Faces Mask
Example of future work for
psychovisual model (III)
Psychovisual Model (combination of 4 masks)
Conclusions • Algorithms are not adapted to subjective
human eye response.
• Subjective tests revealed the importance of
some concrete regions.
• Psychovisual models adapted to visual
attention obtain a better correlation when
weighting pixels than treating them equally.
• Versatility of the process allows to analyze
new artifacts apart from the ones included in
the paper.
Thanks for your attention!!