key frame extraction using faber-schauder...

Key Frame Extraction using Faber-Schauder Wavelet

ASSMA AZEROUALFaculty of ScienceComputer Systems

and Vision LaboratoryMOROCCO

[email protected]

KARIM AFDELFaculty of ScienceComputer Systems

and Vision LaboratoryMOROCCO

[email protected]

EL HAJJI MOHAMEDFaculty of Science

IRF-SIC LaboratoryMOROCCO

[email protected]

HASSAN DOUZIFaculty of Science

IRF-SIC LaboratoryMOROCCO

[email protected]

Abstract: Key frames are the most representative frames (images) of a video. Different areas of video processinguse them, such as video indexing, video retrieval and video summary. In this paper we propose a novel approach forkey frames extraction from video, based on Faber-Schauder coefficients [1] and Singular Value Decomposition.The frame is characterized uniquely by his contours and its near textures.These contours and its near texturescontain an important concentration of dominant coefficients which are used to select the dominant blocks. Anysubstantial change in a video frame will result in a change of their edges and the neighboring textures of theseedges, therefore an important change in the density of the blocks coefficients, especially in the dominant blockswhich contain a big density of faber-Schauder coefficients .Then this frame is considered as a key frame. Firstthe dominant blocks [2][3] of every frame are computed, and then feature vectors are extracted from the dominantblocks image of each frame and arranged in a feature matrix. Singular Value Decomposition is used to calculatesliding windows ranks of those matrices. Finally the computed ranks are traced to extract key frames of a video[4]. Experimental results show that the proposed approach is robust against a large range of digital effects usedduring shot transition.

Key–Words: Key Frame Extraction, FSDWT, Singular Value Decomposition

1 Introduction

With the development of information technology, datais no longer limited to text or sound, but it’s alsorecord and used as video . Recently there is a huge in-crease in video number on the Internet, which requiresto seek proper techniques for the management andstorage of video data. A video is composed by a seriesof basic units called frames (images) at a certain rate,for the human eye the rate at which it can distinguishimages is 20 FPS (frame per seconds). Hence, films ortelevision programs are projected at an average rate of30 FPS. All the frames on a video have the same sizeand the time is equal between each two frames. Theseries of interrelated consecutive frames taken contin-uously by a single camera and representing a contin-uous action in time and space are called video shot[5]. these shots are joined together by editing op-erations which can contain transition effects or not.There are two kinds of shot changes namely, abruptchanges and gradual transitions. Abrupt changes usu-ally result from camera breaks, while gradual transi-tions are produced with artificial editing effects, suchas fading, dissolve and wipe [5]. One frame is suffi-cient to present the important informations of the shot,that frame is named key frame, the figure 1 illustrate

Figure 1: The structure of a video

the structure of a video.Advances in digital content distribution and dig-

ital video recorders, made the digital content record-ing easy. However, the coast on time and memoryis expensive when we work on the full video frames.Furthermore in the case of video summarization theuser may not have enough time to watch the entirevideo. Therefore, many research works have beendone about the key frame extraction to perform wellvideo processing like video summarization,video an-notation, creating chapter titles in DVDs,video trans-

Advances in Information Science and Computer Engineering

ISBN: 978-1-61804-276-7 317

mission, video indexing, and prints from video [6] .Many methods exist for key frame extraction.

Some of the techniques compute the difference be-tween two transition frames based on color histogramand intensity histogram [7], and other techniques arebased on color features and Singular Value Decompo-sition [4][8] The approaches computing the differencebetween two frames attempt to compare that differ-ence with a threshold -often not automatically fixed-to chose a key frame but this method is available forthe abrupt transitions and not for gradual transitions.

The approaches based on color feature attemptto calculate an histogram of video frames presentedin Red-Green-Blue (RGB) color space, or Hue-Saturation-Value (HSV) color space, or other colorspace, after that, comparisons between frames his-tograms take place to chose the key frames. Howeverthis method is computationally expensive and sensi-tive to brightness and camera color effects. Hence,this method can give a false alarm on key frame orextract some frames presenting a non important infor-mation.

To address these problems, this paper proposes anovel key frame extraction algorithm based on Faber-Schauder Discrete Wavelet Transform (FSDWT) andSVD.

The algorithm extracts the block dominant im-age features of each video frame and constructs a2D feature matrix. Then the matrix is factorized us-ing SVD. Finally key frames are extracted based onthe traced rank. The advantages of the algorithm arethe low computational requirements, the robustnessagainst the gradual transitions and non-sensitivity tobrightness.

2 Background And Theory2.1 FSDWTThe FSDWT is a mixed scales representation of aninteger wavelet transform [1]. It based on the LiftingScheme [9] without any boundary treatment. We canconsider a image as a sequence f0 = (f0m,n)m,n∈Z ofZ2, transform FSWT is done in three steps as shownin the following figure:

Figure 2: Lifting Scheme

The Scheme lifting of the FSDWT [1] is given by

the following algorithm:

f0 = fij for i, j ∈ Zfor 1 ≤ k ≤ Nf0ij = fk−1

gkij = (gk1ij , gk2ij , g

k3ij )

gk1ij = fk−12i+1,2j −12(f

k−12i,2j + fk−12i+2,2j)

gk2ij = fk−12i,2j+1 −12(f

k−12i,2j + fk−12i+2,2j+2)

gk3ij = fk−12i+1,2j+1 −14(f

k−12i,2j + fk−12i,2j+2+

fk−12i+2,2j + fk−12i+2,2j+2)

(1)

Textured regions and contours are efficiently de-tected by FSDWT. It redistributes the image containedinformation which is mostly carried in the dominantcoefficients. To facilitate the selection of theses dom-inant coefficients in all sub- bands, we use mixed-scales representation which puts each coefficient atthe point where its related basis function reaches itsmaximum. So, a coherent image can be visuallyobtained with edges and textured regions formed bydominant coefficients. These regions are representedby a high density of dominant coefficients. Theypresent more stability for any transformation keepingvisual characteristics of the image [1].

In [1], El Hajji and al. use standard deviation ofmixed scales DWT coefficients σ1 and local deviationσ2 for given 8x8 block as a rule to detect a dominantblock: if σ2 ≥ σ1 then the block is dominant. Formore precision and to fix automatically the thresh-old used in the algorithm we use the OTSU threshold[10] in the place of standard deviation of mixed scalesDWT coefficients and 4x4 blocks. The dominant co-efficient blocks are located around the image contoursand textured zones near to contours, as shown in Fig-ure 3. The original image is presented in figure 3-a, then the figure 3-d was obtained by assigning agray color to the positions of the image’s pixels corre-sponding to the dominant blocks using standard devia-tion of mixed scales DWT coefficients, we remark thatthis presentation is not precise. The figure 3-c is ob-tained by assigning a gray color to the positions of theimage’s pixels corresponding to the dominant blocksusing OTSU threshold, this presentation is more pre-cise than the other one in figure 2-d. Finally the figure3-b presents the zones of image 3-a associated withthe dominant blocks presented in figure 3-c.◦ In the first step we compute the Faber-Schauder

DWT coefficients.

◦ In the second step we divide the image into 4x4blocks.

◦ In the third step we calculate the local deviation ofeach block.


ISBN: 978-1-61804-276-7 318

(a) Original Image (b) Selected zones in origi-nal image corresponding tothe dominant blocks usingOTSU threshold

(c) The dominant blocks us-ing OTSU threshold

(d) The dominant blocks us-ing standard deviation ofmixed scales DWT coeffi-cients

Figure 3: Comparison between the standard deviation of mixedscales DWT coefficients method and OTSU threshold method.

◦ Finally we compare the local deviation to the OTSUthreshold α. If σ ≥ α a block is considered dom-inant, otherwise this block contain a big density ofcoefficients which are related to image contours andtextured zones near to contours.

2.2 SVDThe decomposition into singular values is based on alinear algebra theorem which tells us that any m x nmatrix A with m≥ n can be factored as in (2) where Uis an m x m orthogonal matrix, V T is the transposedmatrix of an n x n orthogonal matrix V, and S is an mx n matrix with singular values on the diagonal.

A = USV T (2)

The matrix S can be presented as in (3). For i = 1,2, 3,...,n, σn are called Singular Values of matrix A.

A =

σ1 0 ... 00 σ2 ... 0...

......

0 ... 0 σn0 ... 0 0

, (3)

and σ1 ≥ σ2 ≥ ... ≥ σnThere are many properties of SVD from the view-

point of image processing applications :◦ The singular values of an image have very good sta-

bility, that is, when a small perturbation is added toan image, its Singular values do not change signifi-cantly [11].

◦ Each singular value specifies the luminance of animage layer while the corresponding pair of singularvectors specifies the geometry of the image [11].

◦ Singular values represent intrinsic algebraic imageproperties [11].

◦ Singular values represent the image energy, and wecan approximate an image by only the first fewterms.

◦ The first term of singular values will have the largestimpact on approximating image, followed by thesecond term, then the third term, etc.

3 Proposed MethodThe proposed method for extraction of key frames isbased on FSDWT and SVD. In [4], W. Abd-Almageeduses a sliding-window SVD approach based on Hue-Saturation-Value (HSV) color space of video frame.However, this approach is sensitive to change of framebrightness and frame color. To solve these problem weuse the dominant blocks of a video frame in the placeof his HSV presentation. The dominant blocks are lo-cated at the frame contours and textures around; theycharacterize uniquely the frame and give us a goodprecision when we extract the key frames.

Firstly, we convert the video to gray color, afterthat we compute the dominant blocks of each videoframe. Then we select the dominant blocks zones inframe using the OTSU threshold α.

Secondly, an histogram Ht of length l (the num-ber of histogram bins) is computed for the video frameat time t, next build a Nxl feature matrix Xt for everyframe at time t > N as shown in (4), N is a win-dow width and can be the maximum number of framesused in a transition in the video.

Xt =

Ht

Ht−1

...Ht−N+1

, and t = N, ..., T (4)

Xt is a matrix feature varying in the time, pre-senting the feature of the current frame and previousN-1 frames and T is the total number of video frames.


ISBN: 978-1-61804-276-7 319

Thirdly, we use SVD to factorize the matrix Xt asshown in (5):

Xt = USV T (5)

Let the singular values be S1, S2, ..., SN , with S1being the maximum singular value. The rank rtofXt

is the number of Si that satisfy the condition as shownin (6):

SiS1

> τ (6)

τ is a user-defined threshold limiting the num-ber of key frame extracted according to the precisionliked.

Tracing the computed ranks over time, we candraw two scenarios. The first one, if the rank of thecurrent feature matrix, Xt , is greater than the pre-vious one, Xt−1 , and then the visual content of thecurrent video frame is different than the content of theprevious frame, since the first singular values presentthe most informations contained in Xt , hence the in-crease in number of singular values that satisfy thecondition (6) means that there is a change in the con-tent of Xt , otherwise the current frame is enough dif-ferent to be considered as a key frame. The secondscenario, if the rank of the current feature matrix, Xt ,is smaller than the rank of previous matrix,Xt−1 , andthen the visual content of the video has been stable.

Finally, we have two conclusions. First, the frameat which the rank rt = 1 and rt+1 > rt , is the endingof shot. Second, between two consecutive shots, theframe at which the rank is maximum is extracted as akey frame and presents the start of shot. The algorithmis illustrated in figure 4.

The algorithm is initialized with the first N framesthat are used to compute Xt = N ,then the main al-gorithm loop starts at N+1.

4 Experimental ResultsThe results of the proposed key frame extraction algo-rithm are presented in this section. We used C++ andOpenCV library to implement the shot boundary de-tection and key frame extraction algorithm. A videosoccer of 5253 frames was used to validate the pro-posed approach.

With frame size 320 x 240 and frame rate 30 fps.The algorithm produces the correct key frames. Forthe video in our example as shown in Figure 5, thenumber of frames dissolve effect transition is 3 toswitch from frame number 505 to frame number 509,at the frame 506 the rank = 1 and the rank of the framenumber 507 is 5, so the frame number is the ending ofshot, then the rank increases to 3 at frame number 509

Figure 4: The new approach algorithm

which is the key frame. The algorithm selects a sta-ble key frame even if it was a dissolve transition. Thealgorithm extract 67 key frame from 5253, some ofthem are shown in Figure 6.

The performances are evaluated based on (7) and(8). Using a window of width N = 6 and threshold0.05, we obtained an average recall of 97.05 % and aprecision of 98.50 % and 1.25 % of video frames areextracted as a key frames. The Comparative results ofthe key frame extraction with the methods in ([4],[7])is illustrated in Table I.

Recall =Correct

Correct+Missed(7)

Precision =Correct

Correct+ FalseAlarms(8)


ISBN: 978-1-61804-276-7 320

Figure 5: Dissolve effect from frame number 505 to frame number 509.

Figure 6: Some video key frames, we obtain 67 key frames from a video of 5253 frames


ISBN: 978-1-61804-276-7 321

Table 1: Experimental results

Method key framenumber

Averagerecall

Precision

Ourmethod

67 98.41% 92.53%

LiuFeipengmethod

36 60.34% 97.22%

W.Abd-Almageedmethod

81 83.87% 64.19%

5 ConclusionIn this paper, a video key frame extraction and bound-ary shot detection algorithm is proposed. In the pro-posed approach a Faber-Schauder dominant blocks ofeach video frame is computed to construct a featurematrix. Then a sliding window SVD is used to com-pute the rank of the current feature matrix. By tracingthe computed rank we can detect the end of shot andthe start of shot which can be extracted us a key frame.

Experimental results shows that our algorithm isrobust against the transition effects like dissolve oneused in some videos like sports ones. More experi-ments should be done to replace the threshold usingin the phase of computing rank, by a threshold fixedautomatically.

Acknowledgment

This work was supported by the Centre National pourla Recherche Scientifique et Technique (CNRST),funded by the Moroccan government.

References:

[1] H. Douzi, D. Mammass, F. Nouboud, ”Faber-Schauder wavelet transformation applicationto edge detection and image characteriza-tion,” Journal of Mathematical Imaging and Vi-sion Kluwer Academic Press, pp 91-102 ,Vol.14(2),2001.

[2] M. El Hajji, H. Douzi, D. Mammas, R. Harba,F. Ros, A New Image Watermarking AlgorithmBased on Mixed Scales Wavelets, J.Electron.Imaging. 21(1), 013003 (Feb 27, 2012).

[3] M. Hajji , H. Douzi , R. Harba, Watermark-ing Based on the Density Coefficients of Faber-Schauder Wavelets, Proceedings of the 3rd in-ternational conference on Image and Signal Pro-

cessing, July 01-03, 2008, Cherbourg-Octeville,France.

[4] W. Abd-Almageed, Online, simultaneous shotboundary detection and key frame extraction forsports videos using rank tracing, Image Process-ing, 2008. ICIP 2008. 15th IEEE InternationalConference on , vol., no., pp.3200,3203, 12-15Oct. 2008.

[5] Guozhu Liu, Junming Zhao, Key Frame Extrac-tion from MPEG Video Stream,Third Interna-tional Symposium on Information Processing,2011.

[6] C. T. Dang, M. Kumar, H. Radha, Key FrameExtraction from Consumer Videos Using Epit-ome, Image Processing (ICIP), 19th IEEE Inter-national Conference on. pp. 93-96, September2012.

[7] Video Boundary DetectionPart 1 AbruptTransition and Its Matlab Implementation,http://www.roman10.net/video-boundary-detectionpart-1-abrupt-transitions-and-its-matlab-implementation/.

[8] A.V.Kumthekar, Prof. J.K. Patil, Key frame ex-traction using color histogram method, Interna-tional Journal of Scientific Research Engi neer-ing and Technology (IJSRET) Volume 2 Issue 4pp 207-214, ISSN 22780882, july.2013.

[9] W. Sweldens, ”The lifting scheme: A construc-tion of second generation wavelets,” SIAM Jour-nal on Mathematical Analysis, vol. 29, no.2, pp.511546, 1998.

[10] N. Otsu,A threshold selection method from greyscale histogram, IEEE Trans. on SMC, Vol.1,pp. 62-66, 1979.

[11] K. Bhagyashri,Joshi M. Y. ,Robust Image Water-marking based on Singular


ISBN: 978-1-61804-276-7 322

key frame extraction using faber-schauder...

Documents