perceptual quality evaluation for image and video: from ...€¦ · windscreen attracts more eye...
TRANSCRIPT
Tutorial for APSIPA13:S
Perceptual Quality Evaluation for Image and Video: from Modules to Systems
WeisiWeisi LinLinWeisiWeisi LinLin
Email: Email: [email protected]@ntu.edu.sgSchool of Computer EngineeringSchool of Computer EngineeringSchool of Computer EngineeringSchool of Computer Engineering
NanyangNanyang Technological UniversityTechnological UniversitySingaporeSingapore
Question 1: Why are pictures important in our lif & k?life & work?
• Physiology/Psychology~ 50% of cerebral cortex is for visionVision: major channel for us we to experience the world
• Visual representations: the most efficient way toVisual representations: the most efficient way to represent information
Even when people speak different languages
I i il bilit• Increasing availability Digital cameras Anytime, everywhere, anyhow… y y y
• Large interest & commercial valuemovies, television, Internet/social/ mobile media, gaming, content search, advertisements, surveillance, politics, scientific research, medical applications, military, … 2
Question 2: Why is picture appreciation by Q y p pp ymachines important?
li ( d l d l )• Quality assurance (as a standalone module)– product inspection– test equipmentq p– on-line, in-service monitoring– visual/multimedia algorithm/system benchmarking
T h l d l t & ti i ti ( b dd d i t )• Technology development & optimization (embedded in a system)– VCD/DVD/HDTV/3DTV, multimedia, mulsemedia (multiple sensorial
media)– computer graphics/animation– computational photography– visual/multimedia transmission– …
3
Traditional Visual Signal Quality Measures ( ill id l d )(still widely used now)
• MSE (Mean Square Error)• MSE (Mean Square Error) • SNR (Signal Noise Ratio)• PSNR (Peak SNR)• PSNR (Peak SNR)• QoS (Quality of Service)•• …
4
Problems with the existing metricsProblems with the existing metrics
(a) (b) (c)
(a) Original image
MSE=324(b) Gaussian noise(c) Brightened
5
( ) g(d) JPG compressed
(e)(d)
Image Quality Assessment (more examples)
All images have nearly the same MSE
Device-centricift
power, memory,di l
perception-centricff i
m sh
i display, …Network-centric
bit rate
effectiveefficient
f l
adig
m bit rate,error rate,packet loss,
usefulenjoyable
l
para
pdelay, …
…natural
A
Gap in most current systems:Target: human consumption, appreciation & interaction
7
gTechnical design: non-perceptual criteria
H t i i l l tiHuman-centric visual evaluation
• majority of visual content we handle: for human consumptionp
• human perception: effective and efficient, so machines that emulate its functioningso machines that emulate its functioning have technical advantages
• an increasing need: harmonious human• an increasing need: harmonious human-machine interaction
8
A part of a bigger “picture”
Multimedia & Mulsemedia (multiple sensorial media):–Audition (hearing)
–Vision (seeing)
Touch (taction)–Touch (taction)
–Olfaction (smell)Olfaction (smell)
–Gustation (tasting)( g)9
Special Issue on “Multiple Sensorial (MulSeMedia) Multi-modal Media:
Paper Submission: 18/11/2013
Guest Editors(MulSeMedia) Multi modal Media: Advances and Applications”
Transactions on Multimedia Computing,
George Ghinea (Brunel University, UK)
Stephen Gulliver (Univ. of Reading, UK)
Christian Timmerer (Alpen-Adria-Universität, Klagenfurt,
A stria)p g,Communications and Applications
Austria)
Weisi Lin (Nanyang Technological University, Singapore)
Possibilities of Perceptual EvaluationPossibilities of Perceptual Evaluation
• Subjective viewing tests j g– ITU BT 500 standard– MOS (mean opinion score)– Shortcomings
• Expensive, time consuming• Not suitable for automatic in-loop/service on-lineNot suitable for automatic, in-loop/service, on-line
real-time processing e.g., encoding, transmission, relaying, etc.
• Not always reliable• Not always reliable depending on viewers' physical conditions, emotional states, personal experience,
11
display context
Solution:
A bj ti (b hi ) tAn objective (by machine) measure to emulate MOS!
12
e Artificial visionct
ure Artificial vision
which is where, and whatis done and how?
d pi
cin
g
lty
base
cess
i Representation: compression“zipping” & restoration
difficu
hine
-bpr
oc
Pixel manipulations
d
Mac
h cropping, edition (addition/subtraction/size change)object boundary detection, edge enhancementhistogram equalization, …
13
M g q ,
Picture quality evaluation
• 1st-party evaluation– by the photographer or image makerby the photographer or image maker
• 2nd-party evaluationby the subject of an image– by the subject of an image
• 3rd-party evaluationb i h h h h h bj– by neither the photographer nor the subject
– general and most meaningful situation
14
tem
s Sy
stod
ules
M
o
15
O li fO li f h f hi lkh f hi lkOutline of Outline of the rest of this talkthe rest of this talk
1. Relevant Human Visual Perception 2. Basic Computational Modules3. Perceptual Visual Quality Metrics (PVQMs) 4. Demonstrated Systems of Applications 5. Summary & Further Discussion
16
Which square is brighter, A or B?
17
Adelson’s “Checker-shadow illusion”http://web.mit.edu/persci/people/adelson/checkershadow illusion.html
18
http://web.mit.edu/persci/people/adelson/checkershadow_illusion.html
Color Contrast: Background color can affect visual perception [http://www.psy.ritsumei.ac.jp/~akitaoka/shikisai2005.html]
It appears that a = d or b = c in color, but actually b = d!! pp yab
cd
Shepard's paradox (for sound) p p ( )the finishing pitch is the same as the starting pitch synthesized by Jean-Claude Risset)
Spectrum of Shepard's The same spectrum looped 5 timesSpectrum of Shepard s ascending paradox The same spectrum looped 5 times
Our ear cannot perceive where the sample starts and finishes: appear p ppto increase in pitch even if the sample loops back to the beginning and starts over
20
and starts over.impossible to tell where the sample begins and ends.
Useful HVS Properties
o Sensitivity to structural changeso Sensitivity to structural changes o Masking effects
S i ffo Saturation effecto Role of visual attentiono Worst case effect
Useful HVS Properties (cont.)• Sensitivity to structural changes: Features like edges and
contours play a key role in visual quality assessment
Original image Noisy image Blurred imageg g y g g
Lower visual quality due to edge damageLower visual quality due to edge damage
Useful HVS Properties (cont.)• Masking effects: Visual impact of the same distortion can be
different depending on signal content
(a) (b)
T t kiMore annoying to the eye
Texture masking
Useful HVS Properties (cont.)• Texture Masking: Effect of distortion is reduced due to texture
(a) (b)
Original image Image with lowest distortion
(c) (d)
Image with highest distortiondistortion
Useful HVS Properties (cont.)• Saturation effect: Sensitivity to perceived distortion decreases at
high distortion levels
Sensitivity to perceived variations decreases at high distortion levelsOriginal image
(a)
(c)(b)
Higher level of blurring than the image on left
Perceived distortion level in the two images is however nearly the same
Useful HVS Properties (cont.)• Role of visual attention: Distortion in regions attracting the human
attention are more annoying than that in non-attentional ones
( ) (b) ( )(a) (b) (c) Original image Attentional region Non-attentional
(face) distorted region distorted
Observe that distortion in image (b) is more annoying than in image (c).This is because ‘face’ is attentional region as compared to ‘table’
Useful HVS Properties (cont.)• Worst case effect: Human eyes tend to focus more on the
distorted portions (in image and videos)
• Explains the quality fluctuation effect: bad frames have higher impact on perceived qualityp p q y
• An intuitive example: A small dot (or scratch) on the mirror or windscreen attracts more eye attention!!!windscreen attracts more eye attention!!!
– Area covered by the dot (scratch) could be thousand times smaller than the whole object (mirror windscreen)than the whole object (mirror, windscreen)
• Human eyes “penalize” more for the distorted or bad quality regions/frames/areasregions/frames/areas.
Masking in stereo viewingWu, et. al., ‘13
left view right view right view
coding artifacts in right view: completely masked in 3-D viewing
severe loss of depth perception
, ,
H.264/AVC (High Profile) 10 Mbps
right view2 Mbps
right view 512 kbps
28a cropped portion of the above
a cropped portion of the above
a cropped portion of the above
Perceptual Characteristics & Phenomena Gallery (for visual and audio aspects):
http://fyp-demo-gallery.appspot.com/index.html
29
O li fO li f h f hi lkh f hi lkOutline of Outline of the rest of this talkthe rest of this talk
1. Relevant Human Visual Perception 2. Basic Computational Modules
• Signal decomposition• Just-noticeable difference (JND)• Visual attention (VA)
3. Perceptual Visual Quality Metrics (PVQMs) 4. Demonstrated Systems of Applications 5. Summary & Further Discussion
30
T l D i iTemporal Decomposition
• Physiological evidence – two main visual pathways
i l– visual cortex
• Signal decomposition– Implemented as FIR/IIR filters
• sustained (low-pass) channelt i t (b d ) h l• transient (band-pass) channel
31
Spatial Decomposition
– FiltersGabor, Cortex, waveletsGaussian/sterrable pyramid
– Stimuli: orientations, frequencies
Simoncelli et al.’92
32
Just-noticeable Difference (JND)
• JND: the visibility threshold below which changes cannot be detected by typical HVS
noise injection original image white-noise injected (29 0 d )
33
(29.00 dB) guided by JNDJND in noise shaping
Image (29.05 dB),
Wu, et. al., ‘13
Factors for JND: Visual Sensitivity with spatial ffrequency (Spatial Contrast Sensitivity Function)
34
Factors for JND: Contrast MaskingFactors for JND: Contrast Masking
(a) A weak (b) A masking (c) Combining (d) Combining ( ) d (b) ithvisual
stimulus:can be seen
signal (a) and (b): the stimuluscannot be seen
(a) and (b) with increased contrast in (a):
35alone the stimulus can
be seen.Wu, et. al., ‘13
Fusion of different factorsFusion of different factors• Multiplication
(bl k k bb d b f j)(block k, subband b, frame j)
• Exponentiation not very oft-used.
• Addition
36(pixel location n)
The key in accurate JND determination
• To distinguish smooth and edge regions (Yang, et al ‘05)al, 05)
• To distinguish texture and edge (Liu, et al, ‘10)
= +
• To further distinguish texture into ordered• To further distinguish texture into ordered (structural) and disordered regions (Wu, et al, ‘13)
37
Visual Attention (VA)
• Selectivityselective awareness of sensory environment– selective awareness of sensory environment
– selective responsiveness to visual stimuli
T o t pes• Two types– bottom-up: external stimuli– top-down: task/experience related
38
p p
Auto-generation of VA mapAuto generation of VA map
motion face-eye skin color contrast texture
39
Lu, et al, ’05, IEEE T-IP
Itti’s Bottom up Visual Attention (VA) ModelItti s Bottom-up Visual Attention (VA) Model
40
Improved framework for video VA determination
(Fang, et al, ‘13)
Adaptive uncertainty evaluation: decide which (spatial or temporal saliency) contributes more to the final saliency
41
Alternative approach to detectAlternative approach to detect bottom-up VAp
For an image I(x, y)F i T fFourier TransformVA:
(Hou & Zhang’07)( g )
42
More on VA modeling…
• Influence from audio/speech• Integration of “aural attention”• Integration of aural attention
– Ma, et. al. ’05Y l ’07– You, et. al. ’07
• VA Detection Model in Compressed Domain (Fang, et al, ‘13)
• Data driven approaches (emerging)
43
VerificationVerification with eye tracking
various eye trackers
44
O li fO li f h f hi lkh f hi lkOutline of Outline of the rest of this talkthe rest of this talk
1. Relevant Human Visual Perception 2. Basic Computational Modules3. Perceptual Visual Quality Metrics (PVQMs) 4. Demonstrated Systems of Applications 5. Summary & Further Discussion
45
Visual Quality Gauge
∑∑ Δ=X Y
yxdXY
MAE ),(1
Visual Quality Gauge
a traditional metric fails∑∑− =x yXY 1 1
∑∑=X Y
MAEXY
MSE 21
MSEAPSNR
2
lg10=
− =x yXY 1 1
MSEMajor reasons for failure:(1) Not every change in an image is noticeable;(2) Not every pixel/region in an image receives the same attention level;(3) Not every change leads to distortion (otherwise, many edge sharpeningand post-processing algorithms would have not been developed);(4) Not every change yield a same extent of perceptual effect with a samemagnit de of change (d e to spatial/temporal/chrominance masking)
46
magnitude of change (due to spatial/temporal/chrominance masking).
Classification Classification for PVQMs According to methodology:
Vision-based MetricsVision based Metrics Signal-driven Metrics (more often used now)Learning based Metrics (emerging)g g g
According to reference requirement:Full-reference (FR) Metrics
3 possibilities: FR RR and NR
Reduced-reference (RR) MetricsNo-reference (NR) Metrics
PVQM(Reference image)
Distorted imageQuality score
3 possibilities: FR, RR, and NR
47
g
Obj i Q li AObjective Image Quality AssessmentSignalVision LearningSignal
Driven ModelsVision Based Models
Learning Based Model
Based on extraction d l i f i
Early approach: tries
Emerging…iand analysis of image
featuresFocus is on image
approach: tries to model HVS
Generally
New metric development
Fusion of content and distortion analysis rather than fundamental vision
based on data from psychophysical
multiple existing metrics
fundamental vision modeling
p y p yexperiments
48
Common Operations: Feature ExtractionCo o Ope a o s: ea u e ac o
Image Data RawLess organized
Analysis
Less organizedHidden underlying structure
(may use domain specific prior knowledge) Feature extraction
Mathematical/engineering Tools
(Fourier transform wavelets(Fourier transform, wavelets, KPCA…)
Transformed/
ProcessedMore organized
Easier interpretationTransformed/Processed Data
pReduced dimensions
Vision based ModelsVision-based Models• attractive in principles: to incorporate relevant HVS
properties pertaining to visual qualityproperties pertaining to visual quality• Major limitation:
– limited understanding of the HVS and its intricatelimited understanding of the HVS and its intricate mechanisms
– Metrics can be complex and computationally expensive
as discussed earlier accounting for masking effect
50
Signal Driven ModelsSignal Driven Models
FR RR or NR
Feature extraction(Reference image)
Distorted imageFeature pooling
(cognitive mapping)Quality score
FR, RR or NR
( g pp g)
may also incorporate appropriate HVS properties, like JND, VA, various masking effects, and so on.
Recently, more research effort for signal driven models
51
Widely-acknowledged visual metric: SSIM (S l SIMil i )SSIM (Structural SIMilarity)For any two image blocks x and y :
luminance similarity contrast similarity structural similarity(e g blurring) (edge loss or false edge)(e.g., blurring) (edge loss or false edge)
Evaluated for an overlapped or un-overlapped block
:
52
Noticeable Contrast ChangesNot ceab e Co t ast C a ges
⎪⎪⎧ ≤− yxjndyxIyxIif ),(),(),(0
⎪⎪⎩
⎪⎪⎨ −=
otherwiseyxjnd
yxIyxIyxc
),(
),(),(),(
),( yxI is calculated in a image neighborhood
53
A A Visual Quality Visual Quality Metric Metric Q yQ y(for (for video with video with both distortion & enhancement)both distortion & enhancement)
• Discrimination of c(x,y)( ,y)c+
ne: c increase at non-edge pixels—degradationc-
ne: c decrease at non-edge pixels—degradationc+ : c increase at edge enhancementc+
e: c increase at edge—enhancementc-e: c decrease at edge contrast—the worst degradation
ccccD +−+− −++= αααα 3α > ),max( 21 αα > 4α >0
eenene ccccD −++= 4321 ααααwhere Lin, et al.’05
• D reduces to the mean absolute error (MAE) measure, if– JND is constant– different contrast changes are not differentiated
54
different contrast changes are not differentiated
An emerging class of metrics: machine learning-based approachesg pp
To tackle problems for feature pooling in spatial or spatiotemporal domain
• Currently-employed techniques:Currently employed techniques: – simple summation– Minkowski combination, linear (i.e. weighted) combination– Visual attention based weightings
• Problem: impose constraints on the relationship between f d lifeatures and quality score– A simple summation or averaging of features implicitly
constraints the relationship to be linearconstraints the relationship to be linear. – Minkowski summation for spatial pooling of the features/errors
implicitly assumes that errors at different locations are statistically independent.
Machine learning for feature pooling
• use of machine learning is general more systematic• use of machine learning is general, more systematic and reasonable
• more databases available for trainingmore databases available for training• effective feature extraction: still a key• Support Vector Regression: encouraging results• Support Vector Regression: encouraging results
56
Metric Benchmarking• oft-used full-reference image quality metrics
– SSIM, VIF,IFC, VSNR, FSIM, PSNR, …, , , , , ,
• pubic image quality databases– LIVE, TID, A57, WIQ, CSIQ, IVC, Toyama, …
57
sab
ases
ge d
ata
n0: number of original i
of im
ag images; n: number of test images; R: image resolution (N LIVE h
ptio
n o (Notes: LIVE has
many images of size 768x512, but also of other size like 480 720 632 505
Des
crip 480x720, 632x505,
634x505, 618x453 and 610x488); S: type of subjective
liD quality score.
Video quality databasesVideo quality databasesComparison of Video Quality Databases
SRC (# of HRC (# of Total # SubjectiveDatabase Year
SRC (# of reference videos)
HRC (# of test conditions)
Total # of test videos
Subjective Testing Method
Subjective Score
VQEG FR-TV-I [23] 2000 20 16 320 DSCQS DMOS (0 ~ 100)IRCCyN/IVC 1080iIRCCyN/IVC 1080i [24] 2008 24 7 192 ACR MOS (1 ~ 5)
IRCCyN/IVC SD RoI[25] 2009 6 14 84 ACR MOS (1 ~ 5)
EPFL-PoliMI [26] 2009 16 9 156 ACR MOS (0 ~ 5)LIVE [27] 2009 10 15 150 ACR DMOS (0 ~ 100)LIVE Wireless [28] 2009 10 16 160 SSCQE DMOS (0 ~ 100)MMSP 3D Video [29] 2010 6 5 30 SSCQE MOS (0 ~ 100)MMSP SVD [30] 2010 3 24 72 PC MOS (0 100)
Retargeting databases (Ma et al ‘12)
MMSP SVD [30] 2010 3 24 72 PC MOS (0 ~ 100)
VQEG HDTV [31] 2010 45 15 675 ACR MOS (0 ~ 5), DMOS (1 ~ 5)
Retargeting databases (Ma, et. al, 12)3D Video quality databases (Shao, et al, ‘12) 59
ases
data
ba
n0: number of original i
imag
e images; n: number of test images; R: image resolution (Notes: LIVE has
on o
f i (Notes: LIVE has many images of size 768x512, but also of other size like 480x720 632x505
scrip
tio 480x720, 632x505, 634x505, 618x453 and 610x488); S: type of subjectivequality score
Des
60
quality score.
Pearson coefficient for image databases
FR signal-driven image quality metrics
Spearman coefficient for image databasesp g
Lin & Kuo’10
Pearson coefficient for 5 distortion types in TID image databaseimage database
(mean intensity shift, contrast change, image denoising, non eccentricity pattern noise, and local block-wise distortions of different intensity.)
• PSNR: not capable of predicting• PSNR: not capable of predicting the quality for this sub-dataset at all (CP< 0.3)
ll PVQM h l C• all PVQMs have lower CP , although they do much better than PSNR
Li & K ’10Lin & Kuo’10
Performance Comparison with Learning-based Metrics PSNR
0 75
0.8
0.85
0.9
0.95PSNRSSIMRef [73]MSVDVIFIFCVSNRQvector
0.55
0.6
0.65
0.7
0.75 Qvector Qfull
0.5LIVE CSIQ IVC Toyama A57 TID WIQ
(a) 1.08
0.48
0.58
0.68
0.78
0.88
0.98
RMSE
11
13
15
17
RM
SE
0.08
0.18
0.28
0.38
CSIQ IVC Toyama A57 TID5
7
9
LIVE WIQ
(b) (c) (b) (c)
(a) CP (Pearson correlation) comparison on different image databases, (b) RMSE (root MSE) for CSIQ, IVC, A57 & TID databases and (c) RMSE for LIVE and WIQ databases Narwaria & Lin’10
O li fO li f h f hi lkh f hi lkOutline of Outline of the rest of this talkthe rest of this talk
1. Relevant Human Visual Perception 2. Basic Computational Modules3. Perceptual Visual Quality Metrics (PVQMs) 4. Demonstrated Systems of Applications
uses of modules or metrics
5. Summary & Further Discussion
65
Use of JNDControl of quantization in compression
q = 2 x JND maximum error < JND(Hontsch & Karam’02; Zhang, et. al.’05, Wu , et. al.’06)
perceptually lossless (Wu , et. al.’06)
1 370 bpp1.370 bpp45.0303 dB
Rate-perceptual-distortion (RpD) optimizationW t l ’06
66
Wu , et. al.’06
Perceptual Motion Estimation and Residue Filtering
Yang, et al’05
Motion search: pruned when difference < JNDResidues: discarded when they < JND
67
VA modulated JND for Video CodingVA-modulated JND for Video Coding
Yang, et al’05
68
Pre processing for codingPre-processing for coding
• Much work so far: to optimize coders• Much work so far: to optimize coders• New thinking: to optimize signal for compression
Compressibility signal variance∝p y g
69
One dimensional illustration of preprocessingOne-dimensional illustration of preprocessing using JNDs
70
Most Eye pleasing Edge SharpnessMost Eye-pleasing Edge Sharpness • edge sharpening: • optimal edge contrast ~ 2 6 JNDoptimal edge contrast 2.6 JND• less ad hoc approach
most eye-pleasing; right-shifted if c+
ne also increases
0.1
0.15churchfacelena
right shifted if c ne also increases
-0.1
-0.05
0
0.05
0 1 2 3 4 5 6 7Sr,0
car
Perceived quality
Average behavior-0.25
-0.2
-0.15
rExtent of sharpening
71Lin, et al.’06
Other uses of perceptual modelsOther uses of perceptual models• Image/video post-processing (Wu, et. al.’13)g p p g• Watermarking (Zhang, et al, ‘11)
• Prioritized transmission (Wu et al ’13)• Prioritized transmission (Wu, et. al. 13)
– discard less important packetsl t t ti t i t t– apply stronger error protection to more important
packetsretransmit only important packets that have been– retransmit only important packets that have been lost
• Content retrieval• Content retrieval72
Image re-targeting with VA modelsg g g
Original Images
Seam C i Ren’s Wolf’s
VA-guidedImages Carving guided
73Fang, et al, ‘13
Rapid target location via VA modelRapid target location via VA model
Imamoglu, et al, ‘12
74
Applications to computer graphicsApplications to computer graphics
• computer graphics: actively developing areasareas – part of multimedia
t ti l l it
“The goal of computergraphics isn’t to controllight, but to control our
– computational complexity– mobile/cloud graphics
perception of it. Light ismerely a carrier of theinformation we gatherby perception.”
75
Tumblin and Ferwerda, 2001
Perceptually graphic rendering
32 samples/pixel 64 samples/pixelPossibilities:Possibilities:• two continuous intermediate images are compared to see which regions
need more samples (Bolin and Meyer (1998)• Computation stops when the difference < JND (Ramasubramanian, et.
al.’99)530 samples per pixel 218.5 sec
390 spp177.5 sec
Lu, et al, ‘13
76
areas with less attention
77
Deployment for commercial products:’ TMSarnoff’s JNDmetrixTM: Tektronix's PQA200/500
luminance fields chrominance fields
Level 0 Level 1 Level 2 Level 3
Pyramid decomposition
Temporal filteringSpatial filtering
Level 0 Level 1 Level 6
Pyramid decomposition
Temporal filtering…
…
to: chrominance processing
Contrast computation
Contrast gain controlfrom: luminance
processing
Contrast computation
Contrast gain control
Spatial filtering…
…
Luminance JND map
g
Chrominance JND map
…Lubin’95, Sarnoff’97
78
Industrial Deployment
Vi l Q lit M it i S tVisual Quality Monitoring System
i i t ti f bil d i• in-service testing for mobile devices– PDAs– handphonesp
• in conjunction with a channel simulator
79
O li fO li f h f hi lkh f hi lkOutline of Outline of the rest of this talkthe rest of this talk
1. Relevant Human Visual Perception 2. Basic Computational Modules3. Perceptual Visual Quality Metrics (PVQMs) 4. Demonstrated Systems of Applications 5. Summary & Further Discussion
80
Summary of this talkSummary of this talk
Filling the gap in current technology:– Filling the gap in current technology: – user oriented
perceptually inspired– perceptually-inspired – human-friendly machines
N di i f i t i i l– New dimension of improvement in many visual processing tasks• room for further improvement with existing• room for further improvement with existing
technology: diminishingDifferentiating factor for commercial products
81
– Differentiating factor for commercial products
Possible research ahead:– Model advancement
o temporal and color modelso alternative transformso alternative transforms
SVD (singular value decomposition), NMF (non-negative matrix factorization)
l t t fover-complete transformso multiple strategies or Multi-Metric Fusion approacheso modified PSNR or SSIMo new forms of signals
HDTV 3D/stereo/Free-view TV olfactory sensationsMobile/IP TV
o no-reference models– Modeling for audio, speech, olfaction …
82
g , p ,
Possible research ahead (cont’d):– Joint modeling (multi-modality)
o audio/speech, text, tactile, olfaction, and so onT d t l M lti di M lS M di !o Toward truly Multimedia or MulSeMedia!
– Learning-based methodologyo cloud media, big data & data driven approaches o deep learning, transfer learning & incremental learningo effective data collection
Labeled dataUnlabeled data
– Less investigated scenarioso Image retrievalgo Robot navigation
– Perceptual computer graphics o high dynamic range (HDR) imaging & tone mappingo high dynamic range (HDR) imaging & tone mappingo mobile graphicso post-processing 83
Different views on the role of visual attention (VA)ff f ( )
• no doubt: VA is important to HVS perception• however, it has been argued that VA consideration is
not always beneficial (at least for simple weighting)–Ninassi et al ’07Ninassi, et al. 07
• distortion may change the subjects' eye fixation and duration– Vu, et al.’08 du at o Vu, et a . 08
• visual quality may be influenced by not only attentional regions, but also non-attentional ones-- You, et al.’10
• still an open issue for research
84
Issues Related to Viewing Distance (L)g ( )• limited research on the influence of L
VSNR L 3 5 ti f th i h i ht d• VSNR: L=3.5 times of the image height, and claimed to be reasonable for typical viewing
diticonditions• Multi-scale approach:
– SSIM: downsampling both reference and test images into different resolutions.However, the multi-scale SSIM does not always yield better results than its single-scale version
– IFC and VIF: steerable pyramid transform. 85
Issues Related to Viewing Distance (L) t’dIssues Related to Viewing Distance (L) -cont’d
• Multi-scale approach ppo just a way to compromise the effect of different
L settingso it is still a problem on how to pool the
calculated errors from different scales and decouple the overlapping among different scales.
• a challenge for future research to account for viewing conditions (display resolution, ambient illumination and viewing distance)
86
References for this talkReferences for this talk
Surveys:W Li C C J K “P t l Vi l Q lit M t i A S ” J f Vi l• W. Lin, C.-C. Jay Kuo, “Perceptual Visual Quality Metrics: A Survey”, J. of Visual Communication and Image Representation, 22(4):297-312, 2011.
• H. R. Wu, A. Reibman, W. Lin, F. Pereira, S. S. Hemami, “Perceptual Visual Signal Compression and Transmission”, PROC. OF THE IEEE, SEPTEMBER 2013.Compression and Transmission , PROC. OF THE IEEE, SEPTEMBER 2013.
• T-J Liu, Y-C Lin, W. Lin, C.-C. Jay Kuo, “Visual Quality Assessment: Recent Developments, Coding Applications and Future Trends”, APSIPA Trans. on Signal and Information Processing, in press.
• L. Ma, C. Deng, K. N. Ngan, and W. Lin, “Recent Advances and Challenges of Visual Signal Quality Assessment”, China Communications, in press.
Authored book• L. M. Zhang and W. Lin, Modeling Selective Visual Attention: Techniques and
Applications, John Wiley & Sons, 2013.
87
References for this talk (cont’d)Book chaptersBook chapters• W. Lin, Computational Models for Just-noticeable Difference, Chapter 9 in Digital
Video Image Quality and Perceptual Coding, eds. H. R. Wu and K. R. Rao, CRC Press, 2006.
• W. Lin, Gauging Image and Video Quality in Industrial Applications, Chapter 6 in Advances of Computational Intelligence in Industrial Systems, eds. Y. Liu, et. al., Springer-Verlag, Heidelberg, 2008.
• M. Paul, W. Lin, “Computer vision aided video coding”, in Advanced Video Communications Over Wireless Networks, C. Zhu and Y Li (eds.), CRC Press, 2012.
Some special issues:W Li T Eb hi i P C L i S Möll A R R ib “I t d ti t th• W. Lin, T. Ebrahimi, P. C. Loizou, S. Möller, A. R. Reibman, “Introduction to the Special Issue on New Subjective and Objective Methodologies for Audio and Visual Signal Processing”, IEEE Journal of Selected Topics in Signal Processing, 6(6): 614-615, 2012.,
• W. Zeng and W. Lin, “QoE Modeling and Applications for Multimedia Systems” ,ZTE Communications, Vol. 11(1), 2013.
• T. Dagiuklas, W. Lin and A. Ksentini, “QoE Aware Optimization in Mobile Networks”, IEEE COMSOC MMTC E-Letter, Vol. 8, No. 2, March 2013.
88
89