iwt2011_a new method for blocking artifacts detection based on human visual system features

8/2/2019 IWT2011_A New Method for Blocking Artifacts Detection Based on Human Visual System Features

1/16

A New Method for Blocking Artifacts Detection

Based on Human Visual System FeaturesObjective Video Quality Assessment

Luis Fernando Abanto Len

Department of Access Technologies

and RadiopropagationINICTEL - UNI

Lima, [email protected]

Guillermo Kemper Vsquez




Joel Telles Castillo




AbstractBlock-based coding has become the state-of-art

approach for image and video compression. However, when

compression is performed at low bitrates annoying blockingartifacts appear caused by the coarse quantization of DCT

coefficients. In this paper we propose a No-Reference method for

spatially locating blocking artifacts as well as an objective quality

measure for images and video signals using neural networks. Theproposed method takes into account Human Visual System

features in order to achieve better correlation with subjective

opinion. In the experiments, we have attained a correlation

coefficient of about 0.90 between subjective scores and theproposed objective quality metric. Compared with other

objective methods, our method has proven to be more consistent

with subjective visual perception.

Index TermsBlocking artifacts, human visual system, video

compression, video quality assessment,

I. INTRODUCTIONDiscrete Cosine Transform has been widely used in image

and video processing, especially for lossy data compression,because it has been proven that DCT approach has a strongenergy compaction property [22], i.e. most of the signalinformation tends to be concentrated in a few low-frequencycomponents of the DCT, approaching the optimal Karhunen-Loeve Transform. Although suboptimal when compared to theKarhunen-Loeve Transform, DCT approach has the advantageof being used without the need of calculating the covariancematrices and eigenvectors, which are time consumingprocesses. For these reasons, DCT approach is the baseline ofmost modern image and video coding techniques, such asMPEG, H.26x and JPEG.

One of the most annoying visual disturbances is theblocking artifacts, which are caused by the coarse andindependent quantization of DCT coefficients whencompression is performed at low bitrates. Spatial location ofblocking artifacts is of great interest for many applications, e.g.blockiness post-filtering [30] [32] [36], on-line monitoring ofcommunication systems and applications [19][20][25], andobjective video/image quality assessment [7][10][11][17][18].

Traditional approaches for measuring video quality such asPeak-Signal-to-Noise Ratio (PSNR) and Minimum SquaredError (MSE) do not effectively correlate with human visualperception. Consequently much effort has been devoted fordeveloping objective methods to correlate well with the humanvisual perception. However, most of related work has beenfocused on developing Full-Reference (FR) methods [33],which use the original source for comparison with the distortedversions (compressed/decompressed videos). But, the fact thatoriginal sources are not always available at the receiving end,limits the applications of FR methods. On the other hand, No-Reference (NR) methods [1][2][3][21][29] do not require theoriginal source for evaluation and are of much more interestbecause of its potential applications; however designing NRmethods is a difficult task due to the lack of understanding of

Human Visual System [4][14][16][27] [28].Many methods that were conceived for image and video

quality assessment consider a block-based analysis, i.e. theanalysis for detecting blocking artifacts is carried out in theboundaries of 8x8 non-overlapping blocks. However, whendealing with video, blocking artifacts do not appear in a fixedgrid because of motion and frame prediction. So, none of theexisting methods were appropriate for accurately locatingblocking artifacts. In this sense, we propose both, 1) a No-Reference method for spatially locating blocking artifacts,which besides being an essential part of our method, also mayserve as a building block for blockiness post-filteringtechniques and 2) an objective measure for image and video

quality evaluation using neural networks. Our method takesinto account Human Visual Features in order to obtain bettercorrelation with subjective opinion. Among the maincontributions of this paper are: the introduction of a modifiededge detection method for effectively locating blockingartifacts, the use of the overlapping blocks analysis and thedesign of perceptual rules for blockiness detection. Finally wecompute two metrics: the first, mathematically expressed as theratio of distorted pixels to the total pixels; the second is basedon the Spatial Just-Noticeable-Distortion approach used in [12][37].
mailto:[email protected]://en.wikipedia.org/wiki/Karhunen-Lo%C3%A8ve_transformhttp://en.wikipedia.org/wiki/Karhunen-Lo%C3%A8ve_transformhttp://en.wikipedia.org/wiki/Karhunen-Lo%C3%A8ve_transformhttp://en.wikipedia.org/wiki/Karhunen-Lo%C3%A8ve_transformmailto:[email protected]


2/16

The remainder of this paper is organized as follows. In nextsection we describe the methodology that we followed toperform the subjective tests. In section III we explain theproposed method for locating blocking artifacts as well as thefinal metric for measuring image and video quality. In sectionIV we describe the ANN characteristics and the training

algorithm we used. In section V, results of experiments areshowed. Finally in section VI, conclusions of this investigationare summarized.

II. SUBJECTIVE TESTSSubjective tests were performed according to ITU-BT 500

Recommendation. Tests were conducted with the aim to build adatabase and use the collected information for two differentpurposes; the first half of the data collected is used for trainingthe neural network, the remaining data was used for testing theneural network performance, i.e. to evaluate how muchobjective quality assessment reflects human visualcharacteristics.

At the beginning of each session, an explanation is given tothe observers about the type of assessment, the grading scale,the sequence and timing [6]. Observers were presented therange and type of the impairments to be assessed. Observerswere asked to base their judgment on the overall impressiongiven by the video sequence, and express these judgments interms of a non-categorical numerical scale that ranged from 0to 100. [39]

Tests were carried out with the collaboration of 15volunteers. Videos sequences comprised a wide range ofbitrates 192Kbps, 384Kbps, 768Kbps, 1Mbps, 2Mbps, 4Mbps,8Mbps, 10Mbps and 12Mbps. All the sequences were

processed by the MPEG-4 codec. The video samples that wereused in the tests, Aspen, ControlledBurn, RedKayak,RushFieldCuts, SnowMountain, SpeedBag, andTouchdownPass can be found at [5].

Subjective tests were performed just for evaluating videosequences. Subjective scores for images were obtained fromthe experiments conducted in [38].

III. ANEW METHOD FOR BLOCKING ARTIFACTSDETECTION

When dealing with images, blocking artifacts manifestitself as an abrupt discontinuity between neighboring blocks, soif present this kind of disturbance can always be found in a

fixed position (fixed grid). However, when dealing with videoblocking artifacts have no fixed position of appearance due tomotion estimation. In order to identify potential regionsaffected by blocking artifacts, we carefully process videoframes using a modified version of one-dimensional filters,which are based on the first-order partial derivatives approach.In Fig. 1 is illustrated the generalized block diagram of theproposed method. In the first stage, the Modified Gradient Mapas well as the Low-Pass Filtered Frame are computed. In thesecond stage HVS (Human Visual System) masking propertiesare taken into account to calculate the Perceptual Rules, whichgive rise to the Binary Distortion Map (BDM). In the finalstage two metrics are extracted, one of them obtained from the

BDM and the other one obtained from the SJND approach.These two metrics are arranged as inputs to the neural networkto finally compass a quality measure.

A. Modified Gradient MapOne-dimensional filters that are obtained from the

approximation of the first-order partial derivatives are of muchinterest to compute edges since they involve less computationalcharge than convolution with bidimensional masks. Also, thesefilters do not cause edge displacements as occurs under certainconditions with traditional filters such as Canny, Sobel andPrewitt used in [8][9][31] for artifacts detection. Another majorreason for using one-dimensional filters is that bidimensionalmasks for edges detection cause edges widening. Althoughone-dimensional filters provide many advantages over typicalmask filters, they are sometimes ineffective to correctlyidentify edges. In this sense, we propose the following case inorder to exemplify the typical drawbacks that arise when usingone-dimensional filters. Then we propose a modified version ofthese filters in which the main drawbacks are overcome.

Let us assume that Fig. 2 represents an excerpt of amonochromatic image to which edges must be computed usingclassical one-dimensional filters. Consider that L(i, j)represents a gray level value in the position (i, j), where i

Fig.1. Block diagram of the proposed method


3/16

[0:M-1] andj [0:N-1], beingMandNthe height and width ofthe image, respectively.


Equation (1) represents a simplified approach commonlyused for gradient estimation. This method is widely used inmany papers [35] for computing the gradient map. The gradientmap is expressed as the contribution of the vertical andhorizontal edges.

),(),(),( jiGjiGjiL vh (1)

The term Gh(i,j) is expressed as the horizontal difference oftwo adjacent pixels as shown in (2).

),()1,(),( jiLjiLjiGh (2)

Equation (2) has been used to compute the horizontal edgesof the image excerpt shown in Fig. 2. The gradient map isshown in Fig. 3. From Fig. 2 we can notice that three sharptransitions occur; from 29 to 10, from 10 to 30 and from 30 to11. Values between 8 and 11 could be considered as a flatregion with almost the same luminance intensity; so transitionsbetween these values do not represent relevant peaks. We caneasily deduce from Fig. 2 that two noticeable edges with values29 and 30 are distinguished from the background; however Fig.3 just exhibits one noticeable edge when transition from 19 to 2occurs. We can say that the recently tested approach does not

truly represent the actual luminance variations of the imageillustrated in Fig. 2.


In order to overcome this drawback some conditions havebeen added to the original one-dimensional filtering method.The conditions are expressed from (4) to (8) and the HorizontalGradient Map is computed as expressed in (3).

1)1,(),(

20

20),1,(),(max

),(

5

43

21

CifjiLjiL

CCif

CCifjiLjiL

jiHM(3)

Where

0

)1,(),(11

otherwise

jiLjiLifC (4)

0

)2,()1,(12

otherwise

jiLjiLifC (5)

0

0)1,(),(13

otherwise

jiLjiLifC (6)

0

)),(()1,(),(14

otherwise

jiLBJNDjiLjiLifC (7)

0

2214321

5otherwise

CCandCCifC (8)

HM represents the gradient map computed along thehorizontal direction. In the same way, VM is computed byanalyzing the vertical direction, in order to obtain the VerticalGradient Map. Finally, the Total Gradient Map is computed asexpressed in (9)

255,),(),(min),(22

jiVMjiHMjiGM (9)

BJND values in (7) were experimentally obtained followingthe methods explained in [12] [37], in which an image ofconstant gray level is gradually added noise of fixed amplitude.The method for computing BJND values are further explainedbelow.


Fig.4 shows the result of computing edges with theproposed method. As can be noticed, edges are clearly definedwhereas adjacent pixels with similar gray level values havebeen considered as a flat region resulting in zero-valued pixelsin the Total Gradient Map.

B. Calculation of BJND valuesConsidering only monochromatic images in the spatial

domain, there are two main factors affecting the visibilitythreshold of distortions among adjacent pixels. One of them is

the average background luminance behind the pixel to betested. The other one is the spatial non-uniformity or spatialactivity of the background [12].

JND (Just-Noticeable Difference) has been defined as theamount of luminance that needs to be added to a pixel so thatthe intensity of the pixel can be discriminated from thebackground. In this paper we define the BJND (Block Just-Noticeable Difference) as the minimum amount of luminancethat has to be added to the pixels along the block boundaries sothat a block can be distinguished from the background. Thepurpose of identifying BJND values is to discriminateperceptual sharp transitions between adjacent blocks.


4/16

The original method for finding JND (Just-NoticeableDifference) values is supported in the two perceptual factorsthat were described above; background luminance and spatialnon-uniformity. However, in this section is only consideredJND values due to the influence of background luminance.

In the original method for computing JND values, which isexplained in [12], a small square of size 32x32 pixels ispositioned in the center of a image of constant gray level.Noise of fixed amplitude is randomly added or subtracted toeach pixel within the square. The noise amplitude is graduallyincreased until the noise becomes just noticeable; thisamplitude represents the visibility threshold of the noise withina constant gray level background. The process is carried out foreach possible gray level, comprising all the values between[0:255]. We can say that the methodology for finding Just-Noticeable Differences is consistent with many otherexperimental results that supports the fact that human visualperception is sensitive to luminance contrast rather thanabsolute luminance [23] [24].

It is worth saying that the aforementioned method wasdeveloped for finding JND values at a pixel precision, i.e. thevisibility threshold is estimated considering the fact that a pixelhas to be distinguished from the constant gray levelbackground. However, if we are dealing with images or video,and assuming that they have been processed by block-basedcompression algorithms, it would be more appropriate to findJND values considering bigger regions instead of pixels. In thissense, we have conducted the experiments as described in theoriginal method with the difference that, instead of addingrandom noise of size 1x1 we added noise of size 4x4 within asquare of size 64x64. As it is logical to assume, JND valuesthat were obtained were lower than those from the originalmethod; this fact could be justified because of the size of thenoise that was used, i.e. the visibility thresholds were lower

because the noise was spatially larger, hence more noticeable.The experimental outcomes of this experiment were calledBJND.

Note that in this paper we have extended the experimentproposed in [12], which was originally intended for luminancemasking, to obtain BJND values. BJND values are used toeffectively compute the gradient map of images so that an edgecan be classified as noticeable or not.

C. Low-Pass FilteringIn this stage, a smoothed version of the frame under

examination is computed. The outcome of this stage is calledthe Low-Pass Filtered Map which is later used for developing

the SJND approach.

Fig.5. Low-pass Filter Mask

Fig.5 shows the low-pas filter mask that is used to estimatea smoothed version of the image. The low-pass filtered imageis computed as shown in (10), where S is equal to 4. In (10), Lrepresents and Cis the mask of size 9x9.

12

0

12

0

12

0

12

0

),(

),(),(

),(S

x

S

y

S

x

S

y

yxC

yxCySjxSiL

jiF(10)

D. Spatial MaskingSpatial masking occurs when the ability to discriminate

amplitude errors or distortions is reduced due to the spatialcontent in which the artifact is positioned [15]. Masking isusually explained by a decrease in the effective gain of theearly visual system. Neurons in the primary visual cortex havea reception field composed of excitatory and inhibitory regions.Contrast gain control can be modeled by an excitatorynonlinearity where masking occurs through the inhibitoryeffects of the pool of responses from other neurons [25]. Twowell-studied properties of the HVS which are highly relevant tothe perception of blocking artifacts are the luminance maskingand texture masking.

Incorporating relevant and accurately known properties ofhuman perception into the method should result efficient forjudging the visibility of the blocking effect [26].

a) Luminance MaskingLuminance masking properties have been considered in the

proposed method, in order to reflect the image/video qualitymore efficaciously.

It was found that the human visual systems sensitivity to

variations in luminance mainly depends on the local meanluminance [34]. As the background luminance is low, thevisibility threshold is high; therefore distortions becomeimperceptible or barely distinguishable. The same effect occurswhen luminance background is high, so we can summarize thathigh visibility thresholds are assumed in both very dark orbright regions, and low thresholds in regions of medium graylevels.

(a) (b)

Fig.6. An example of luminance masking

In Fig. 6 is illustrated an example of luminance masking.The difference of luminance between the foreground and thebackground is the same for both illustrations. In the firstillustration the background has a value of 0 and its foregroundhas a value of 10, whereas in the second illustration theforeground has a luminance level of 90 and its surrounding


5/16

background has a value of 80. It can be deduced from Fig. 6that the edge in illustration (b) is more noticeable than that inillustration (a) because of the contrast between the foregroundand background, so if present a distortion would be morevisible in the second illustration.

We have conducted the experiments that are explained in[12] in order to calculate the threshold curve that modelsmasking due to luminance. The experiments are similar tothose described in section III.B where an image with constantgray background is gradually added or subtracted noise untilthe noise or distortion becomes just noticeable. Note thatalthough this method has already been previously used forother purposes, to calculate the BJND thresholds, this sectionturns to use the same method for the purposes for which it wasoriginally created, namely to estimate pixel thresholds due toluminance masking.

Note that this method is aimed to evaluate the contrastbetween a pixel and its mean background level. The tests wereperformed with the collaboration of nine volunteers; then theaverage thresholds were computed and plotted as seen in Fig.7.

0 50 100 150 200 2500

2

4

6

8

10

12

Gray Level

VisibilityThreshold

Fig.7. Visibility Threshold due to Luminance Masking

The curve in Fig.7 shows the visibility thresholds fordifferent background levels. The resulting model has aminimum threshold around 80 and has high threshold values inthe extremes. The obtained curve is consistent with othermodels that show the same behavior around 80.

The threshold curve was normalized and rescaled to obtaina weighting function, which can be used to estimate the pixel

noticeability. The weighting factors range from 0 to 1.

22

2

2

11

2

1

),(),(

),(),(),(

CjiIBjiIA

CjiIBjiIAjiWl

(11)

In (11) assume that A1 = 0.000103895, B1 = 0.0025918, C1= 0.009428690, A2 = 0.0000220653, B2 = -0.0125636, C2 =1.885972. Consider that Irepresents the input to the weightingfunction. In this case Icould be the luminance level of a pixelor it may be the average luminance of a group of neighboring

pixels, as in our case. This function is used later to computethe Perceptual Rules.

b) Texture MaskingAnother important mechanism of spatial masking is texture

masking. This paper incorporates the spatial masking theoryinto the proposed method. Spatial masking suggests thatdistortions in images may be either hidden or revealed,

depending on the region in which they occur. Noise forexample is plainly visible in smooth parts, whereas it is almostimperceptible in more spatially active regions [25] [29].

The experimental results have shown that high values oftexture masking lead to low values of visibility. Distortionvisibility decreases near regions with spatial details.

654

3

2

1

),(),(1

1),( SSjiIS

jiIS

S

SjiTM

(12)

Contrary to (11), where WL represents the pixels visibilityweight, TM in (12), represents the visibility threshold. Considerthe following constants, S1 = 100, S2 = 120.73853, S3 = 0.026,S4 = 0.062362, S5 = -20.806316 and S6 = 1.1. Assume that Irepresents the input to the threshold function; in this paperIhasbeen treated as the variance of luminance level within a groupof pixels.

E. Temporal MaskingIt was considered appropriate to incorporate in the method

the effect of temporal masking, which is mainly based in theinterframe difference. As described in [37] larger inter-frameluminance differences result in larger temporal masking effect[Guillemot]. In this section we will briefly describe theexperiments that were conducted in [37] in order to derive afunction that may be able to represent the effect of masking due

to temporal variations between frames. As a first approach tothe problem it was suggested to carry out experiments to findthe Temporal Just-Noticeable Differences which can representthe maximum tolerable thresholds of the human perception todistortions when partially or totally masked by luminancevariations between consecutive frames. In this sense, theexperiments were launched running a video sequence at 30frames per second in which a square of constant gray levelmoved horizontally over a background with differentluminance level. Noise of fixed amplitude was randomly addedto or subtracted from each pixel at certain small regions. Thevisibility thresholds for identifying distortions are determinedas a function of the interframe variations and the average

background luminance as seen in (13). The original temporalthreshold curve is explained in [37]; in this paper we havetransformed that threshold curve into a weighting function byinverting the coefficients and adding an offset value Z. Thisweighting function represents the noticeability of distortionswhen partially or fully masked by interframe differences. Theweighting function yields values between 0 and 1. Theweighting function we have slightly modified is expressed in(13).


6/16

0),,(

)),,(255(2

15.0exp

2,max

0),,(

)255),,((2

15.0exp

2,max

),,(

2

1

tjiif

ZtjiH

tjiif

ZtjiH

tjiTJND

(13)

With respect to (13), assume that, = 0.8,H1 = 8,H2 = 3.2andZ= 5.2090. Also assume that average luminance differencebetween two consecutive frames is defined as

2

)1,,(),,()1,,(),,(),,(

tjiFtjiFtjiLtjiLtji (14)

Where L(i,j,t) represents the luminance of the pixel inposition (i,j) in the instant t, and F(i,j,t) represents the value ofthe pixel in the Low-pass Filtered Map at position (i,j) in theinstant t. The computation of the Low-pass Filtered Map F(i,j)has been comprised in (10).

Due to the fact thatH1>H2, we can infer from (13) that high

to low luminance transitions induces a more relevant maskingthan low to high transitions.

F. Perceptual RulesFive perceptual rules were developed in order to efficiently

detect blocking artifacts. These rules are applied to bothdirections, horizontal and vertical. Each rule is composed ofnine conditions. If one of the rules meets the conditions, thepixels under examination are considered to be disturbed byblocking artifacts. The nine conditions that form part of eachrules, are armed according to eight intermediate metrics whichare calculated considering some geometrical regions of theModified Gradient Map and the Low-Pass Filtered Map,

illustrated in Fig. 8 and Fig.9 respectively. We have decided toperform the frame analysis using a 3x5 sliding window in orderto overcome the fact of non-detected artifacts caused by motionand frame prediction. Unlike other methods this paper employsthe overlapping blocks analysis, which has better detectionperformance than the non-overlapping approach when dealingwith video.

Fig.8. Modified Gradient Map

With reference to Fig. 8, G1 and G2 represent regions of size3x2. As well, GA, GB and GErepresent regions of dimensions of3x1.

Fig.9. Low-pass Filtered Luminance Map

With respect to Fig. 9, regions labeled as FA, FB and FErepresent blocks of 3x1 pixels.

To perform the analysis in the vertical direction, i.e. withthe aim of identifying potential disturbed regions verticallylocated, we should use a 5x3 sliding window instead of the 3x5window that was used for the horizontal analysis.

The eight intermediate metrics are obtained from regionsillustrated in Fig.8 and Fig.9, and are computed according to(15)-(22). In (15), (16) and (17), uG1, uG2 and uGE representthe average gradient of the regions G1, G2 and GErespectively.Equations from (18) to (20) represent the spatial activity in theregions labeled as GE, G1, and G2 in the Modified GradientMap. These three parameters are computed by using the regionvariance and the Texture Masking Function already defined in(12). In (21), lmFE represents the average luminance visibility

factor, computed over the regions FA, FB and FE. Forcomputing the luminance visibility weight, LuminanceMasking function defined in (11) was used. Finally, (22)represents the visibility factor caused by the interframedifference; we used the Temporal Masking Function defined in(13).

)( 11 GMeanuG (15)

)( 22 GMeanuG (16)

)( EE GMeanuG (17)

))((1 EG GVariancekingTextureMassm E (18)

))((2 BG GVariancekingTextureMassm B (19)

))((3 AG GVariancekingTextureMassm A (20)


7/16

)),,(( AEBF FFFMeanskingLumianceMalm E (21)

)),,(( AEBF FFFMeanskingTemporalMatm E (22)

Equations from (23) to (31) represent the conditions inwhich the rules are based.

otherwise

uGif nn

0

1 1,1

(23)

otherwise

uGifn

n0

1 2,2

(24)

otherwise

uGif nEn

0

1,3

(25)

otherwise

uGif nEn

0

1,4

(26)

otherwise

smif nGn

E

0

1,5

(27)

otherwise

lmifnF

n

E

0

1,6

(28)

otherwise

tmifEF

n0

1,7

(29)

otherwise

smifnG

n

B

0

1,8

(30)

otherwise

smif nGn

A

0

1,9

(31)

According to (32), if the nine conditions meet therequirements then the rule is considered true. Assume that nranges from 1 to 5. The parameters used in the conditions havebeen defined in TABLE I.

otherwisefalse

iftruerule k

nk

n

19

1

, (32)

The five rules have to be evaluated at each displacement ofthe window. Consider as mentioned before, the displacement ofthe window is performed in steps of one pixel, whether theanalysis is carried out horizontally or vertically, i.e. theoverlapping between the current position with next position ofthe sliding window is of 80%.

TABLE I. PARAMETERS USED IN THE CONDITIONS

n n n n n n n n n n

1 0.5 0.5 3.0 12.0 0.2174 0.30 0.78 1.0 1.0

2 2.0 2.0 7.2 20.0 0.2174 0.17 0.77 1.0 1.0

3 4.0 4.0 15.4 20.0 0.2853 0.17 0.78 1.0 1.0

4 5.0 5.0 15.0 20.0 0.4076 0.2 0.79 1.0 1.0

5 5.0 5.0 5.0 20.0 0.21 0.4 0.78 0.14 0.14

G. SJND ApproachIn this stage we develop the Spatial Just-Noticeable

Difference (SJND) method employed in [12] [37] whose mainpurpose is to estimate the JND profile for measuring perceptualredundancies inherent in an image. SJND threshold is modeledas a function of luminance contrast and texture masking. Theperceptual SJND model is modeled by the followingexpressions.

)),(()),,(),,((max),(1

jiFWjiGMjiFfjiSJND l (33)

where f1(F(i,j),GM(i,j)) estimates the spatial masking dueto texture whereas Wl(F(i,j)) estimates the luminance contrast.Note that GM(i,j) and F(i,j) have already been defined in (9)and (10) respectively; in the same way Wl(i,j) has been definedin (11). The expressionf1(F(i,j),GM(i,j)) is defined in (34)

)),(()),((),()),(),,((1 jiFjiFjiGMjiGMjiFf (34)

Parameters (i,j) and (i,j) are defined in (35) and (36)respectively.

115.00001.0),()),(( jiFjiF (35)

01.0),(25.0)),(( jiFjiF (36)

Consider that (i,j) and (i,j) are the background luminancedependant functions. These two parameters are furtherexplained in [12].

H. Binary Disturbance MapThe Binary Disturbance Map is directly related with the

fulfillment of the Perceptual Rules. The Binary DisturbanceMap (BDM) is a matrix with the same dimensions of the frameor image under evaluation. The BDM is formed by evaluating

the Perceptual Rules that have already been mentioned. For anyregion, if any of the five rules meets all the conditions then thatregion is considered to be distorted by blocking artifacts, thenthe BDM is updated with the value of 1 in the same positionthat distortions were encountered. So the BDM will accuratelyrepresent the presence or absence of distortions with the binaryvalues 0 and 1 respectively. Results of this method areillustrated in the Appendix.

I. Metrics ComputationIn this section the two metrics used for estimating video

quality are encompassed.


8/16

a) Distorted Pixels-to-Total Pixels RatioThe proposed metric is expressed as the ratio between the

total pixels in the frame and the total distorted pixels in theframe

NM

jiBDM

DTR

M

i

N

j

1

0

1

0

),((37)

whereMandNrepresent the height and width of the framerespectively. From (37) we can clearly understand that thedenominator represents the total pixels in the image. Asmentioned before, BDM is a matrix with the same dimensionsof the frame and contains binary values which indicate thepresence or absence of distortions represented by 1 or 0.So the numerator in (37) represents the total sum of thedistorted pixels. The resulting value is a number in the range[0:1].

b) Average Pixel DistortionSJND values represent the maximum tolerable thresholds

just before a distortion can be noticed; using this concept wehave developed the Average Pixel Distortion (APD) metric.

1

0

1

0

1

0

1

0

),(

),()0),,()15),,(max(min(

M

i

N

j

M

i

N

j

jiBDM

jiBDMjiSJNDjiGM

APD

(37)

This metric is based in the computation of the surplus ordifference of GM over the SJND Map, i.e. the numeratorrepresents the total distortion of the image. The denominatorrepresents the number of distorted pixels, sot the APD metric

represent the average distortion per pixel. Typically values forthis metric range between 0 and 15.

IV. NEURAL NETWORK TRAININGWe used a Three-Layer Perceptron Neural Network; with 2

linear neurons at the input, 12 sigmoidal neurons at the hiddenlayer one single neuron for the output, which yields the finalquality estimation of the image or video under evaluation. Weused the Resilient Propagation algorithm [13] for training thenetwork. The neural network has been trained with the resultsobtained by the subjective tests as well as with theexperimental outcomes conducted in [38], i.e. the network wastrained considering image and video results. LIVE Image

Database is divided in two groups; Database 1 and Database 2.Both databases contain images distorted with blockingartifacts; however Database 2 was only used for training.Database 1 is used for testing.

V. EXPERIMENTAL RESULTSTo test the performance of the proposed method in

experiments, JPEG distortion LIVE Database 1 was used,provided by Laboratory of Image and Video Engineering [38].Database 1 contains 29 original high-definition images and 233compressed pictures which comprise a wide range ofdistortions. A sample of seven original images was compressed

under different quality factors (QF), using the imwriteMATLAB command.

Fig. 10 illustrates the relationship between the quality factorand the values obtained with the proposed metric.

10 20 30 40 50 60 70 80 9020

30

40

50

60

70

80

90

Quality Factor

QualityMetric(ProposedMethod)

Lena

Parrots

Woman and hat

Lighthouse

Caps

Buildings

Monarch

Fig.10. Results of proposed approach

From Fig. 10 we can infer that the proposed metric isconsistent with other experiments that suggest that there is athreshold bitrate in which the observers will not notice muchmore difference between any image/video compressed with abitrate higher to the threshold bitrate. In base to the illustration,this threshold can be situated between 50 and 60.

We formed two small datasets with 9 samples each one;Dataset A and Dataset B. Each dataset was formed byrandomly selecting images from the 233 pictures that arecontained in LIVE Database 1. Fig. 11 and Fig. 12 show theresults for Dataset A and Dataset B when compared with theirrespective subjective scores. The achieved correlation or bothdatasets are of 0.876 and 0.879.

1 2 3 4 5 6 7 8 9 100

10

20

30

40

50

60

70

80

90

100

ImageQuality

Image

Subjective Score

Objective Score

Fig.11. Comparison of subjective and objective scores for Dataset A


9/16

1 2 3 4 5 6 7 8 9 100

10

20

30

40

50

60

70

80

90

100

Image

Quality

Image

Subjective Score

Objective Score

Fig.12. Comparison of subjective and objective scores for Dataset B

Then we tested the method under a stricter scenariodisposing images and video sequences as inputs. Fig. 13 showsa total of 190 objective evaluations achieving a correlation of0.8286.

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

100

Objective Metric(Proposed Method)

SubjectiveScore

Fig.13. Comparison of subjective and objective scores

Also, TABLE II shows the results that were obtained whencomparing the proposed method with competing approaches.

We have randomly selected 13 pictures from LIVE ImageDatabase 1 to test the following competing approaches:DFIMAGE[31], S [8], NPBM[40] and WSBM [9] and comparethem with the subjective scores. Competing methods that havebeen implemented proves an inverse relationship with thesubjective opinion since they are impairments metrics; theproposed metric is a quality metric. We can deduce fromexperiments that the proposed method outperforms theimplemented approaches.

TABLE II. COMPARISON WITH COMPETING METHODS

Image

Number

Methods/Metrics

DFIMAGE S NPBM WSBMProposed

Method

Subjective

Score

7 0.1236 0.444 0.0929 2.5018 70.09 74.3

16 0.3326 0.4445 0.2743 2.8735 61.93 38.0

37 1.5766 0.9993 0.4323 6.6912 37.74 32.5

49 0.3642 0.1082 0.4724 3.1186 61.27 57.75

54 0.2415 0.5942 0.0942 2.9487 71.36 79.65

69 0.8038 0.3016 0.2235 7.6194 57.3 41.65

81 0.4048 0.6282 0.3392 3.6066 60.84 46.4

101 0.1795 0.0019 0.2006 2.2221 76.06 75.35

113 0.0888 0.4337 0.1152 2.5344 71.33 78.3

125 0.346 0.0507 0.1873 3.0378 81.11 76.07

133 0.3463 0.5302 0.1104 3.0492 75.41 84.38

152 0.8518 0.8336 0.4508 5.2679 50.7 40.92

178 0.7687 0.8108 0.1213 5.7435 64.57 67.07

Corr.

Coef-0.692 -0.405 -0.767 -0.641 0.875

VI. CONCLUSIONSIn regard to the objective method for image and video

quality assessments that we are proposing, the results correlatewell with the Mean Subjective Opinion, about 90%.

In respect to the method for blocking artifacts location, wethink that the block-based analysis with overlapping is the bestapproach to detect and estimate blocking artifacts in videosequences. We believe that the introduction of additional HVSfeatures could improve the correlation with the subjectiveopinion.

REFERENCES

[1] M. Slanina, V. Ricny and R. Forchheimer, A Novel Metric forH.264/AVC No-Reference Quality Assessment, 14th InternationalWorkshop on Systems, Signals and Image Processing, June 2007.

[2] F. Yang, S. Wan, Y. Chang, and H. R. Wu, A Novel Objective No-Reference Metric for Digital Video Quality Assessment, IEEE SignalProcessing Letters, vol. 12, No. 10, October 2005.

[3] N.Suresh, R. Palaniappan, P. Mane and N. Jayant, Testing of a No-Reference VQ Metric: Monitoring Quality and Detecting VisibleArtifacts, Fourth International Workshop On Video Processing andQuality Metrics, 2009.

[4] C. Yang, L, Zhao, Z. Liao, Spatial-temporal Distortion Metrics forVideo, World Congress on Computer Science and InformationEngineering, vol. 7, April 2009.

[5] The Video Quality Experts Group (VQEG) Samples, available atwww.its.bldrdoc.gov/vqeg/.

[6] Y. Kim, J. Han, H. Kim and J. Shi, Novel No-Reference Video QualityAssessment Metric with Estimation of Dynamic Range Distortion, 12thInternational Conference on Advanced Communication Technology(ICACT) , February 2010.

[7] M. C. Q. Farias and S. K. Mitra, No-Reference Video Quality MetricBased on Artifact Measurements International Conference on ImageProcessing ICIP, September 2005.

[8] C. Perra, F. Massidda, and D. D. Giusto, Image Blockiness EvaluationBased on Sobel Operator, IEEE Pacific-Asia Workshop onComputational Intelligence and Industrial Application, vol. 1, pp.1002-1006, 2008.
http://www.its.bldrdoc.gov/vqeg/http://www.its.bldrdoc.gov/vqeg/http://www.its.bldrdoc.gov/vqeg/


10/16

[9] Z. Hua, Z. Yiran, and T. Xiang, A Weighted Sobel Operator-based No-Reference Blockiness Metric, IEEE Pacific-Asia Workshop onComputational Intelligence and Industrial Application, 2008.

[10] L. Abate and G. Ramponi, Measurement of the blocking artefact invideo frames 15th IEEE Mediterranean Electrotechnical Conference,April 2010.

[11] H. R. Wu and M. Yuen, A Generalized Block-Edge Impariment Metricofr Video Coding, IEEE Signal Processiing Letters, vol. 4, No. 11,November 1997.

[12] C. Chou and Y. Li, A Perceptually Tuned Subband Image Coder Basedon the Measure of Just-Noticeable-Distortion Profile, IEEETransactions on Circuits and Systems for Video Technology, vol. 5, No.6, December 2005.

[13] M. Riedmiller and Heinrich Braun, Rprop - A Fast Adaptive LearningAlgorithm, Proceedings of the International Symposium on Computerand Information Science VII, 1992.

[14] R. V. Babu and Andrew Perkis, An HVS-Based No-ReferencePerceptual Quality Assessment of JPEG Coded Images Using NeuralNetworks , IEEE International Conference on Image Processing,September 2005.

[15] A. Wang, G. Jiang, X. Wang and M. Yu, New No-Reference BlockingArtifacts Metric Based on Human Visual System, InternationalConference on Wireless Communications and Signal Processing,November 2009.

[16] X. Song and Y. Yang, A New No-Reference Assessment Metric ofBlocking Artifacts Based on HVS Masking Effect, 2nd InternationalCongress on Image and Signal Processing, October 2009

[17] E. Ong, W. Lin, Z. Lu and S. Yao, Colour Perceptual Video QualityMetric, IEEE International Conference on Image Processing,September 2005.

[18] C. Keimel, T. Oelbaum and K. Diepold, No-Reference Video QualityEvaluation For High-Definition Video, IEEE International Conferenceon Acoustics, Speech and Signal Processing, April 2009.

[19] J. Yao, Y. Zhang, G. Xu and M. Jin, No-reference Objective QualityAssessment for Video Communication Services Based on Feature

Extraction, 2nd International Congress on Image and SignalProcessing, October 2009.

[20] R. V. Babu, A. S. Bopardikar, A. Perkis and O. I. Hillestad, No-Reference Metrics For Video Streaming Applications, 14thInternational Packet Video Workshop, December 2004.

[21] Y. Tian and M. Zhu, Analysis and Modelling of No-Reference VideoQuality Assessment, International Conference on Computer andAutomation Engineering, pp. 108-112, 2009.

[22] K. R. Rao and P. Yip, Discrete Cosine Transform: Algorithms,Advantages, Applications, London, U. K., Academic Press, 1990.

[23] P. Moon and D. Spencer, The visual effect of nonuniform surrounds,J. Opt. Soc. Amer., vol. 35, pp. 233-248, March 1945.

[24] A. K. Jain, Fundametal of Digital Image Processing, Englewood Cliffs,New York, Prentice Hall, 1989.

[25] S. Rimac-Drlje, D. Zagar and G. Martinovic, Spatial Masking andPerceived Video Quality in Multimedia Applications, 16 th InternationalConference on Systems, Signals and Image Processing, December 2009.

[26] G. Buchsbaum, An Analytical Derivation of Visual Nonlinearity, IEEETransactions on Biomedical Engineering,Vol.BME-27, No. 5, May1980.

[27] S. A. Karunasekera and N. G. Kingsbury, A Distortion Measure forBlocking Artifacts in Image Based on Human Visual Sensitivity, IEEETransactions Image Processing, vol. 4, pp. 713724, June 1995.

[28] Z. Wang and A. C. Bovik, Blind measurement of blocking artifacts in images, in Proc. IEEE Int. Conf. Image Processing, Vancouver, BC,Canada, Oct. 2000, pp. 981984.

[29] R. Dosselmann and X. Yang, A Prototype No-Reference Video QualitySystem, Fourth canadian Conference on Computer and Robot Vision,2007.

[30] Q. B. Do, M. Luong and A. Beghdadi, Coding Artifacts ReductionMethod Based on a Perceptually-Weighted Variational Approach, IEEEISSPIT, December 2009.

[31] F. Pan, X. Lin, S. Rahardja, E.P Ong, and W.S. Lin, Measuringblocking artifacts using edge direction information, IEEE InternationalConference on Multimedia and Expo, vol. 2, pp. 1491-1494, June 2004.

[32] H. C. Kim, O. Uguzhan and T. G. Chang, Post-filtering of DCT CodedImages Using Fuzzy Blockiness Detector and Linear Interpolation, IEEE

Transactions on Consumer Electronics, Vol. 53, No. 3, pp. 1125-1129,August 2007.

[33] S. Suthaharan, S. W. Kim and K. R. Rao, A New Quality Metric Basedon Just-Noticeable Difference, Perceptual Regions, Edge Extraction andHuman Vision, Canadian J. of Electrical and Computer Eng., Vol. 30,No. 2, pp. 81-88, 2005.

[34] H. Liu, and I. Heyndereickx, A Perceptually Relevant No-ReferenceBlockiness Metric Based on Local Image Characteristics, EURASIPJournal on Advances in Signal Processing, Vol. 2009, January 2009

[35] F. Pan, X. Lin, S. Rahardja, W. Lin, E. Ong, S. Yao, Z. Lu and X. Yang,A Locally-Adaptive Algorithm for Measuring Blocking Artifacts inImages and Videos, Signal Processing: Image Communication, vol. 19,pp. 499-506, 2004.

[36] I. O. Kirenko and L. Shao, Local Objective Metrics of BlockingArtifacts Visibility for Adaptive Repair of Compressed Video Signals,Proceedings of the 8th IEEE International Conference on Multimedia andExpo (ICME), July 2007.

[37] Z. Chen and C. Guillemot, Perceptually-Friendly H.264/AVC VideoCoding Based on Foveated Just-Noticeable-Distortion Model, IEEETransactions Circuits Systems Video Technology, September 2010.

[38] H. R. Sheikh, Z. Wang, L. Cormack and A. C. Bovik, LIVE ImageQuality Assessment Database,http://live.ece.utexas.edu/research/quality/

[39] ITU-T Recommendation BT.500-10, Methodology for the SubjectiveAssessment of the Quality of Television Pictures, 2000.

[40] H. Liu and I. Heynderickx, A No-Reference Perceptual BlockinessMetric, IEEE ICASSP 2008 The 33rd International Conference onAcoustics, Speech, and Signal Processing, March 2008.
http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=10242http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=5361058http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=5300806http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=5300806http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=5367684http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=9605http://live.ece.utexas.edu/research/quality/http://live.ece.utexas.edu/research/quality/http://mmi.tudelft.nl/pub/hantao/ICASSP2008.pdfhttp://mmi.tudelft.nl/pub/hantao/ICASSP2008.pdfhttp://mmi.tudelft.nl/pub/hantao/ICASSP2008.pdfhttp://mmi.tudelft.nl/pub/hantao/ICASSP2008.pdfhttp://mmi.tudelft.nl/pub/hantao/ICASSP2008.pdfhttp://mmi.tudelft.nl/pub/hantao/ICASSP2008.pdfhttp://live.ece.utexas.edu/research/quality/http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=9605http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=5367684http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=5300806http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=5361058http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=10242


11/16

APPENDIX

Results of the proposed approach for spatially locating blocking artifacts.

Fig.14. Compressed Lena with quality factor QF = 20 Fig.15. Detected Regions disturbed by Blocking Artifacts

Fig.16. Binary Disturbance Map for Lena with QF = 20 Fig.17. Improved Edge Detection Map


12/16

Fig.18. Compressed Parrots with quality factor QF = 25 Fig.19. Detected Regions with Blocking Artifacts

Fig.20. Compressed Monarch with quality factor QF = 20 Fig.21. Detected Regions with Blocking Artifacts

Fig.22. Compressed Rapids with quality factor QF = 85 Fig.23. Detected Regions with Blocking Artifacts


13/16

Fig.24. Compressed Caps with quality factor QF = 10

Fig.25. Detected Regions with Blocking Artifacts


14/16

Fig.26. Compressed Bikes with quality factor QF = 90

Fig.27. Detected Regions with Blocking Artifacts


15/16

Fig.28. Frame from a compressed video using MPEG-4 codec

Fig.29. Detected Regions with Blocking Disturbance


16/16

Fig.30. Frame from compressed Mobile and Calendar using MPEG-4 codec

Fig.31. Detected Regions with Blocking Disturbance

iwt2011_a new method for blocking artifacts detection based on human visual system features

Documents