subjective assessment of hdtv with super...

5
SUBJECTIVE ASSESSMENT OF HDTV WITH SUPER RESOLUTION FUNCTION Seiichi Gohshi Kogakuin University 1-24-2 Nishi-Shinjuku, Shinjuku-ku, Tokyo, 163-8677, Japan [email protected] Isao Echizen National Institute of Informatics 2-1-2, Hitotsubashi, Chiyoda-ku, Tokyo, 101-8430, Japan [email protected] ABSTRACT Super Resolution (SR) has become common, and some recent HDTV sets and digital cameras are equipped with it. SR is a sales point of commercial products. However, the resolution of HDTV sets equipped with an SR function has not been tested as to whether it is actually better than those of HDTV sets without the function, in part because the resolution difference between HDTV sets is not always clearly visible. This paper proposes a subjective assessment for this purpose. The method is a combination of Scheffe’s paired comparison and BT.500. Using this method, we performed a subjective assessment on an HDTV set with the SR function and other sets. The assessment data was statistically analyzed, and the results prove that the HDTV set with the SR function was not superior in resolution to the others. 1. INTRODUCTION Digital HDTV broadcasting has started in many countries, and large LCD TV sets have become common. The image quality of these systems is much higher than those of analogue systems such as NTSC and PAL. LCD HDTV manufacturers are selling various HDTV systems, and their catalogues are filled with their own sales points, such as Super Resolution (SR) [3][4][7], 240-Hz frame rates, etc. However, it is not easy to tell the difference in image quality between HDTV sets of different TV manufacturers. In this paper, we will focus on HDTV sets equipped with an SR function and see whether SR can actually make a difference in image quality as regards the resolution. According to the webpage of one manufacturer selling these sets, their sort of SR is a typical form called SR image reconstruction (SRR) [3][7][8]. Moreover, although there are several types of SR, SRR is the only SR technology that has been embodied in a real-time system. We can theoretically analyze the resolution of the video if the HDTV set can be made to output video signals before and after the SR signal process. However, the SR processed image is only sent to the LCD, and there is no method to take the unprocessed image from the set. Subjective assessments are the alternative way to clarify the capabilities of HDTVs equipped with SR. These methods are used to establish the performance of television systems, and they measure the reactions of volunteers who view the systems. One of the most common and useful subjective assessments is BT.500 [1]. BT.500 defines a double stimulus continuous quality scale (DSCQS), and it performs well at evaluating artifacts such as mosquito noise and block noise caused by video coding such as MPEG-2 and MPEG-4. Moreover, the assessments of BT.500 played a significant role in deciding the bit rate for digital HDTV broadcasting. However, BT.500 has been standardized to evaluate the relationship between the video stream bitrate and subjective image quality, and only one display can be used during the entire assessment test. This means that there is just one display but several video sources, since the bitrate is the only parameter. For our purpose, we have to use a number of HDTV sets, since there is no further digital output after the SR process. The same video should thus be distributed to different HDTV sets in order to compare their image quality, but there is no standard for this sort of evaluation. The purpose of this assessment is to determine whether the SR function of commercial HDTV sets makes a difference in image quality or not. To be able to make a comparison, we decided that the essential capabilities of BT.500 must be combined with other measuring factors since the DSCQS by itself is not appropriate. We thought that a paired comparison would be useful for our purpose. The notion of paired comparison is exploited whenever we go to a store and do comparison shopping of similar items. Moreover, shoppers would likely want to compare the image qualities of two (or more) TV sets if all other features such as price, reliability, etc., are equivalent. The paired comparison does have an issue in that a lot of time would be consumed if Proceedings of Seventh International Workshop on Video Processing and Quality Metrics for Consumer Electronics January 30-February 1, 2013, Scottsdale, Arizona VPQM2013 32

Upload: hadan

Post on 16-Mar-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

SUBJECTIVE ASSESSMENT OF HDTV WITH SUPER RESOLUTION FUNCTION

Seiichi Gohshi

Kogakuin University1-24-2 Nishi-Shinjuku, Shinjuku-ku,

Tokyo, 163-8677, [email protected]

Isao Echizen

National Institute of Informatics2-1-2, Hitotsubashi, Chiyoda-ku,

Tokyo, 101-8430, [email protected]

ABSTRACT

Super Resolution (SR) has become common, and some recent HDTV sets and digital cameras are equipped with it. SRis a sales point of commercial products. However, the resolution of HDTV sets equipped with an SR function has not beentested as to whether it is actually better than those of HDTV sets without the function, in part because the resolution differencebetween HDTV sets is not always clearly visible. This paper proposes a subjective assessment for this purpose. The methodis a combination of Scheffe’s paired comparison and BT.500. Using this method, we performed a subjective assessment onan HDTV set with the SR function and other sets. The assessment data was statistically analyzed, and the results prove thatthe HDTV set with the SR function was not superior in resolution to the others.

1. INTRODUCTION

Digital HDTV broadcasting has started in many countries, and large LCD TV sets have become common. The image qualityof these systems is much higher than those of analogue systems such as NTSC and PAL. LCD HDTV manufacturers areselling various HDTV systems, and their catalogues are filled with their own sales points, such as Super Resolution (SR)[3][4][7], 240-Hz frame rates, etc. However, it is not easy to tell the difference in image quality between HDTV sets ofdifferent TV manufacturers. In this paper, we will focus on HDTV sets equipped with an SR function and see whether SRcan actually make a difference in image quality as regards the resolution. According to the webpage of one manufacturerselling these sets, their sort of SR is a typical form called SR image reconstruction (SRR) [3][7][8]. Moreover, although thereare several types of SR, SRR is the only SR technology that has been embodied in a real-time system. We can theoreticallyanalyze the resolution of the video if the HDTV set can be made to output video signals before and after the SR signal process.However, the SR processed image is only sent to the LCD, and there is no method to take the unprocessed image from theset.

Subjective assessments are the alternative way to clarify the capabilities of HDTVs equipped with SR. These methods areused to establish the performance of television systems, and they measure the reactions of volunteers who view the systems.One of the most common and useful subjective assessments is BT.500 [1]. BT.500 defines a double stimulus continuousquality scale (DSCQS), and it performs well at evaluating artifacts such as mosquito noise and block noise caused by videocoding such as MPEG-2 and MPEG-4. Moreover, the assessments of BT.500 played a significant role in deciding the bit ratefor digital HDTV broadcasting. However, BT.500 has been standardized to evaluate the relationship between the video streambitrate and subjective image quality, and only one display can be used during the entire assessment test. This means that thereis just one display but several video sources, since the bitrate is the only parameter. For our purpose, we have to use a numberof HDTV sets, since there is no further digital output after the SR process. The same video should thus be distributed todifferent HDTV sets in order to compare their image quality, but there is no standard for this sort of evaluation. The purposeof this assessment is to determine whether the SR function of commercial HDTV sets makes a difference in image qualityor not. To be able to make a comparison, we decided that the essential capabilities of BT.500 must be combined with othermeasuring factors since the DSCQS by itself is not appropriate. We thought that a paired comparison would be useful forour purpose. The notion of paired comparison is exploited whenever we go to a store and do comparison shopping of similaritems. Moreover, shoppers would likely want to compare the image qualities of two (or more) TV sets if all other features suchas price, reliability, etc., are equivalent. The paired comparison does have an issue in that a lot of time would be consumed if

Proceedings of Seventh International Workshop on Video Processing and Quality Metrics for Consumer Electronics January 30-February 1, 2013, Scottsdale, Arizona

VPQM201332

A B• Synchronized HDTV video is provided to TV A and TV B• Observers can move freely to assess resolution

Fig. 1. Assessment Fig. 2. Paired HDTV assessment

we wanted to compare numerous HDTV sets. The number of TV manufacturers with established brands, however, is limited,and here, we only compare one HDTV set with an SR function with two other HDTV sets.

This paper is organized as follows. Section 2 discusses the eligibility of the observers in the assessment and the lengthof the test video in reference to BT.500. These are essential points in a subjective assessment and are defined in BT.500.Section 3 describes the candidate HDTV sets with SR capability and the selection criteria. Section 4 explains the subjectiveassessment. Section 5 explains how SR works on HDTV receiver sets. Section 6 describes the statistical analysis of thesubjective assessment, and Section 7 is the conclusion.

2. OBSERVERS AND LENGTH OF TEST SEQUENCE

BT.500 was used when digital video coding technology was implemented in broadcasting, and it was an important standard atthe time digital broadcasting was just starting. The assessment method for the SR function of HDTV sets should thus adhereto BT.500 as closely as possible. Two essential aspects of our study followed the guidelines laid out in BT.500 as to howwe selected the observers and how we determined the length of the test video sequences. BT.500 specifies that the observersmust be non-video experts who do not work in the video industry. This means that non-specialist observers can recognizedifferences in quality since BT.500 is widely used to assess the video image quality, even now. Although many analysts andcritics of HDTV image quality can easily recognize differences in image quality, non-video experts, general people, can doso as well.

BT.500 call for the length of each video sequence for the assessment to be from 10 to 15 seconds long. The ITE/ARIBHi-Vision Test Sequences (ITE sequences) were made for HDTV assessments, and the period of each sequence is 15 seconds[5][6]. Although these sequences have been used for subjectively assessing HDTV video coding technologies, they are inYUV422 format. We have to process the sequences with the same MPEG-2 video encoder as broadcasting companies areusing, including the horizontal resolution conversion. The sequences for the assessment should be selected from terrestrialbroadcasting content because of the issues discussed above. However, it is not easy to find appropriate sequences in theactual broadcasting content. The appropriate sequences must have very high frequency elements that are details in videothat has no panning or tilting during 10 seconds. Fore this purpose, various pieces of terrestrial HDTV broadcasting contentwere recorded on BluRay disc (BD) players, and many subjective assessments were conducted on them in order to select theappropriate sequences. Five sequences, each lasting from 10 to 15 seconds, were selected. The sequences are described insection 3.2.

3. SUBJECTIVE ASSESSMENT

3.1. Scheffe’s Paired Comparison

Before describing the assessment procedure, we should note that the ’pre-assessments’ conducted beforehand proved that theresults of paired comparisons were reproducible. Figure 2 illustrates the assessment procedure. In one assessment, observersassess a pair of HDTV sets. Synchronized HDTV video is sent to both sets. Observers can move freely and check the imagequality. Although BT.500 defines the lighting conditions, they are much darker than in typical living rooms. As HDTV setsare usually set in a living room, a normal lighting condition for a living room was chosen instead. The observer is shown twoHDTV sets and asked to choose the one with the higher resolution. The same video sequence is repeated until the observermakes the decision between the two sets. Three TV sets were selected for the experiment (the selection included the TVwith SR), A round-robin paired comparison was conducted since Scheffe’s paired comparison was to be used. That is, theassessment for one sequence is done three times by each observer, using the pair of HDTV A and HDTV B, the pair of

33

Fig. 3. Sequence 1 (Talk) Fig. 4. Sequence 2 (Fireworks) Fig. 5. Sequence 3 (Comedy show)

Fig. 6. Sequence 4 (Football) Fig. 7. Sequence 5 (Travel)

HDTV B and HDTV C, and the pair of HDTV C and HDTV A. Observers score +1 for better resolution and -1 for poorerresolution in each assessment. They can move freely during the assessment until they make their decision. Moreover, eachof the observers assessed the image quality on their own without anyone else in the test room. Figure 2 shows a photo of theexperiment.

3.2. Test sequences

Still images such as test patterns and high-resolution photos are usually used to assess the resolution of displays. However,SR on an HDTV set is not used to view still images but to enjoy digital HDTV broadcasting content. The test video sequenceswere made from terrestrial HDTV digital broadcasting content in Japan. Content was recorded on BD at HDTV resolution,and the repeat function of the BD player was used to show the sequences to the observers during the assessment. A limitedamount of recorded broadcasting content was deemed appropriate for this assessment since most of it did not show anydifferences in the pre-assessment involving several observers. Five sequences were selected, and images from them areshown in Figures 3 to 7. The rectangles in each figure are high resolution areas that were the objectives of the assessments.

Observers were asked to concentrate on watching the texture in the rectangles and decide which HDTV set had the higherresolution. Even using these video sequences it was not possible to detect differences in resolution outside of the rectanglesin Figure 3 to Figure 7. If we were to watch general broadcasting content without knowing what to look for, we would notbe able to find differences in resolution between a set with the SR function and one without it. Observers were also askedto evaluate only the resolution and ignore other things. With regard to sequence 1 shown in Figure 3, observers were askedto focus on the rectangular area and evaluate the resolution of the letters on the clothes. The letters have high frequencyelements, and frequency elements that exceed the Nyquist frequency must be created if SR is to work. All of these areasin Figures 3 to Figure 7 were indicted to the observers. Sequence 2 in Figure 4 shows the details of fireworks exploding.Sequence 3 in Figure5 shows wrinkles on a person’s face, whereas sequence 4 shown in Figure 6 shows the audience in astadium. Sequence 5 in Figure 7 shows the cracks and details of the stones.

4. SR FOR HDTV

There are two types of digital HDTV broadcasting in Japan: satellite broadcasting and terrestrial broadcasting. Althoughthe horizontal resolution of satellite HDTV broadcasting is 1920, the horizontal resolution of terrestrial HDTV broadcastingis 1440, due to bandwidth limitations. A terrestrial broadcasting HDTV receiver decodes the stream at 1440x1080 andexpands it to 1920x1080 with digital-to-digital conversion using a digital filter. There are no high frequency spectra interrestrial HDTV broadcasting since it is encoded and decoded at 1440 horizontal resolution and the digital filter changes itto 1920. SR on an HDTV set can create high-frequency elements when enlarging from 1440 TV lines to 1920 TV lines inthe frequency domain[4]. A manufacturer’s official web site defines SR as ’A technology that can create the high frequencyspectra exceeding the Nyquist frequency precisely.’ If there are digital outputs of raw HDTV video signal before the SRsignal processing and after the SR signal processing, we can compare them and find out whether the SR function really worksor not. However, there is no evidence that what the manufacturer says on the website is true or not since the HDTV video

34

Table 1. Analysis of varianceFactors Deviation Degrees of freedom Biaseddeviation F0 F1%

Stimuli Sα 14.29333333 (n− 1) 2 7.146666667 28.17045455 99.48564957

Stimuli× observers Sα(k) 113.7066667 (n− 1)(N − 1) 48 2.368888889 9.337594697** 1.888507472

Combination Sβ 3.226666667 (n− 1)((n− 2)/2 12 0.268888889 1.059895833 3.503207193

Residual Sε 18.77333333 n2N − n2/2− 2nN 74 0.253693694 − −Overall result Sτ 150 150 - - -

signalbefore the SR processing is not shown for the sake of comparison. That is, if we had images before and after SR,we could compare them and find out whether the SR function really works or not. Resolution is one of the most importantcharacteristics of displays. The image quality of HDTV sets with an SR function must be better than that of TV sets withoutit.

5. STATISTICAL ANALYSIS

Twenty-five observers participated in the assessment. All of them were university students ranging in age from 20 to 23(average, 21). Prior to each test, a training session was held to introduce them to the test methodology of using broadcastingcontent that had high-resolution areas. The assessment used three stimuli since three TV sets were used. The results ofSequence 1 (Talk) are shown in Table 1. The Deviations column cells and the Biased Deviations column cells had to becalculated in order to analyze the assessments [2]. In this table, N means the number of stimuli (three TV sets), and nmeans the number of observers (25). The values in the F column are from an F distribution table. The F values werederived using the degree of freedom of the residual (74) and those of the second parameter (2, 48, and 12). Here, F at a 1%provability is shown.F0 values are obtained with the values of biased deviation. The biased deviation of the stimuli value(7.146666667) was divided by the biased deviation of the residual value (0.253693694). Thus, theF0 value of the stimulibecame 28.1704545. TheF0 value of Stimuli× observers is calculated in a similar fashion . The biased deviation of theStimuli × observers (2.368888889) was divided with by the biased deviation of the residual value (0.253693694). TheF0

value of the Stimuli× observers was 9.337594697. IfF0 is bigger than F, there is a significant difference, and indeed the tableshows that theF0 values of the stimuli are bigger than those of F. There is a significant difference in the Stimuli× observersrow (1%). The yardstick method can be used only for significant differences in the analysis of variance[2]. Although thedetails of the analysis process cannot be discussed in full due to space limitations, it is a typical combination of Scheffe’spaired comparison [2]. The Y value can be derived using equation (1).

Yα = q

√Vε

2nN(1)

Different tables give the provability at 1% and 5% of the F distribution [2]. TheYα values are as follows.

Yα0.01 = 0.165662 (2)

Yα0.05 = 0.125016 (3)

Theα values for TV sets (αHDTV A, αHDTV B andαHDTV C) were determined from the degree of freedom and Table 2. Table2 is called the cross table, and it is used in the process of the Scheffe’s paired comparison [2]. There are two degreesof freedom: the number of observers (25) and the number of HDTV sets (3). TheαHDTV A is the value in Table 2, i.e.,(X·j· −Xi··,HDTV A)(−28) divided by 2nN, i.e., the stimuli values (HDTV sets:n = 3, observers:N = 25).

αHDTV A = −28/(2 ∗ 25 ∗ 3) = −0.18667 (4)

αHDTV B andαHDTV C arecalculated in a similar way.

αHDTV B = 0.24 (5)

αHDTV C = −0.05333 (6)

35

Theyardstick values are shown in Figure 8. The differences in the yardstick values in relation to theYα0.01 andYα0.05

are as follows[2]αHDTV B − αHDTV A > Yα0.01 (7)

αHDTV C − αHDTV A > Yα0.05 (8)

Here, the percentage 1% means a false provability of 99% and 5% means 95%. Equation (8) means that the resolutionof HDTV set A with SR is inferior to that of HDTV set B at a provability of 99%. Equation (7) means that the resolution ofHDTV set A with SR is inferior to that of HDTV set C at a provability of 95%. Thus, it is statistically proven that the HDTVset with SR was actually poorer in resolution in comparison with the other TV sets. Although other assessment results cannotbe discussed due to space limitations, all of the results are similar and the HDTV set with SR cannot be evaluated on the scoreof resolution.

HHHHHHDTV A HDTV B HDTV C Xi··

HDTV AHHHHH

7 7 14

HDTV B -7HHHHH

-11 -18

HDTV C -7 11HHHHH

4

X·j· -14 18 -4 X···X·j· −Xi·· -28 36 -8 0

Table 2. Cross table

A BC0 21-1-2***

Fig. 8. Results of experiment

6. CONCLUSION

Our subjective assessment of HDTV with an SR function used Scheffe’s paired comparison, observers who were not videoexperts, as called for by BT.500, and content chosen from terrestrial digital HDTV broadcasting. The assessment results werestatistically analyzed (analysis of variance). A yardstick method was conducted about the points of significant difference. Itwas statistically proven that the SR function on the HDTV set did not improve the resolution. This result accords with mostobservers’ opinions just after the assessment test. The assessment method described here can be used for other items such asframe rate conversion from 60 Hz to 240 Hz and noise reduction on digital HDTV sets.

7. REFERENCES

[1] http://www.itu.int/rec/R-REC-BT.500/en

[2] Tadahiko Fukuda and Ryoko Fukuda , ”Ergonomics handbook”, ISBN978-4-86079-036-3, Scientist press co.ltd, Tokyo,2009. (in Japanese)

[3] N. Matsumoto and T. Ida, ”A Study on One Frame Reconstruction-based Super-resolution Using Image Segmentation”,IEICE Technical Report, SIP2008-6,IE2008-6 (2008-04) (in Japanese).

[4] http://www.toshiba.co.jp/regza/detail/superresolution /resolution.html(in Japanese)

[5] http://www.ite.or.jp/en/

[6] http://www.nes.or.jp/gaiyo/pdf/itehyoujundougasample.pdf

[7] http://www.toshiba.co.jp/regza/detail/highresolution /mechanism.html (in Japanese)

[8] S. Farsiu, D. Robinson, M. Elad, and P. Milanfar, ”Fast and Robust Multi-frame Super-resolution”, IEEE Transactionson Image Processing, vol. 13, no. 10, pp. 1327-1344, October 2004.

36