stationary background generation in mpeg …raygun/pubs/conf/2001_icme_aygun...ramazan savas aygun...

STATIONARY BACKGROUND GENERATION IN MPEG COMPRESSED VIDEOSEQUENCES

Ramazan Savas Aygun and Aidong Zhang

Departmentof ComputerScienceandEngineeringStateUniversityof New York at Buffalo

Buffalo,NY 14260USA

ABSTRACT

Thedevelopmentof thenew videocodingstandard,MPEG-4, hastriggeredmany video segmentationalgorithmsthataddressthe generationof the video objectplanes(VOPs).Thebackgroundof avideosceneis onekind of VOPswhereall other video objectsare layeredon. In this paper, weproposea methodfor thegenerationof thestationaryback-groundin a MPEGcompressedvideosequence.If theob-jectsmove frequentlyandall the componentsof the back-groundarevisible in the video sequence,the backgroundmacroblockscanbe constructedby usingDiscreteCosineTransform(DCT) DC coefficientsof the blocks. After thegenerationof thestationarybackground,themoving objectscanbeextractedby takingthedifferencebetweentheframesandthebackground.

1. INTRODUCTION

The MPEG-4 [5] video standardintroducesthe notion ofVideoObjectPlane(VOP)to supportcontent-basedaccessof video streams. Each video object is representedas avideo objectplane(VOP). The segmentationof video ob-jects is oneof the corepartsof MPEG-4video encoding.The backgroundof the objectsis representedwith

��.

All other VOPsof objectsare layeredon the background��. TheVOPsareoftennotknown beforehandandthey

needto beextractedfrom imagesequences.Most of the video segmentationalgorithmsthat address

VOPextractionhave emergeddueto thenew videocodingstandard,MPEG-4. Most methodsthatareproposedinves-tigatethe segmentationof video objectsandarecomputa-tion intensive. Kim et al. proposeda methodwhereVOPsareextractedby first generatingmoving edgemaps[2]. Themethodproposedin [1] consistsof amotiondetectionphaseemploying higherorderstatisticsandaregularizationphaseto achieve spatialcontinuity. VOPsaregeneratedfrom anestimatedchangedetectionmask(CDM) in [3]. A bufferis usedto increasetemporalstability by labelingeachpixelaschangedif it belongedto an objectat leastoncein thelast L changedetectionmasks. The model in [4] usestheedgepixels in an edgeimagedetection.An object tracker

matchesthis binarymodelagainstsubsequentframes.Themodelis updatedevery frameto accommodatefor rotationandchangesin shapeof theobject.After videoobjectsareextracted,the remainingpartof the imageis consideredasthebackground,

��.

Our focus is different from the previous work doneinthis field. We concentrateon the generationof the station-ary background,which will provide flexibility in MPEG-4video constructionandediting. In this paper, we proposeanalgorithmwhich generatesthebackgroundfrom a videosequenceand then detectsthe moving objectson it. Weconsiderthe backgroundas the part of the imagewithoutthe moving objectswhich arereplacedwith the real back-ground.So,this is not just theseparationof thebackgroundfrom moving objects.Wealsogeneratethebackgroundhid-den behind the moving objectsfrom the video sequence.The backgroundcanbe constructedif all the componentsappearin the video sequence.Becauseof the moving ob-jects,not all partswill appearin the sameframe. We per-form our operationson the macroblocklevel using onlyDCT DC coefficientsin anMPEG-1videostream.Thisap-proachfits well to applicationswherethebackgrounddoesnot changeoften suchas lecturesor halls recordedby astatic camera. In theseapplications,the backgroundre-mainsthe sameandobjectsmove enoughso that all partsof thebackgroundarevisible in thevideosequence.

2. STATIONARY BACKGROUND GENERATION2.1. MPEG Video StreamAn MPEG-1videostreamis composedof I, PandB frames,whereP and B framesexploit the similarity betweentheframes. The detectionof the backgroundcan be accom-plishedif themoving objectdisplacesits location. So thatthebackgroundthatis hiddenbehindtheobjectwill bevis-ible. In consecutive frames, there is usually very slightchange.It is timeconsumingto processeveryframe.There-fore, the processof the backgroundgenerationat intervalsof frameswill providefasterspeed.P andB framesassumevery little changewith respectto their dependentframesand their macroblocksare decodedusing macroblocksin

raygunText Box© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.The final published version is available at http://dx.doi.org/10.1109/ICME.2001.1237817.

Figure1: Blocksof a macroblockin MPEGframes.

theI frames.So,I framesarebettercandidatesfor theback-groundconstructionsincetherewill be enoughobjectdis-placementin videosequenceandthey donotdependonanyotherframe. But, if the mobile objectsaresmall (i.e. thatcanfit in amacroblock)andtheirdisplacementis alsosmall,then further processingis needed. In that case,P and Bframesneedto beprocessed.In this work, we addresslargemobileobjects(i.e. thatcannotfit into singlemacroblock).

In MPEG-1, eachmacroblockof I-Frameis composedof 6 blocks: 4 luminanceblocks and2 color components(Figure1). Blocks arecompressedusingDiscreteCosineTransform(DCT)and eachblock is representedwith DCandAC coefficients. Our algorithm works at the level ofmacroblocksratherthanpixelssincethecompressionis per-formedat the level of macroblocks.Let �

�� be themacroblockatthe �� row and �� columnof frame � . EachDC coefficient of � �� canbe denotedas �� .�� !� may be representedwith its DC coefficientsasfollows:"$# �� &% # ��' ��&% # ��( � �!�)% # ��* ��&%+�, �� !�)%-�/. �� !�&0where

# �,# ' , # ( , # * , �, and �/. areDC coefficientsof the

blocksthatareshown in Figure1.

2.2. The AlgorithmThebackgroundgenerationhasthreesteps:1 Thegenerationof thebackgroundfrom a videoshot1 Merging of the samebackgroundsthat aregenerated

from differentvideo shotswhich belongto the samevideoscene1 Merging of the samebackgroundsthat appearin dif-ferentvideoscenes

In thispaper, wewill focusonthegenerationof theback-groundfrom a videoshot. Our algorithmhasthreephases:theclusteringof themacroblocks,theselectionof theclus-ter which maycontainthebackgroundmacroblockandtheselectionof the backgroundmacroblockfrom the cluster.Thealgorithmis thusasfollows:

/* 2436587-9;:& 7-9+?A@�BDC$E keepsall clustersof macroblocksthatappearedat @GFIH row and C$FIH column*/

/* J+365G7+9D:&

Thesecondmethodassignsthesameweightto thechromi-nanceandtheluminanceandtakestheaverageof thelumi-nancecoefficients. The averageof the luminanceDC co-efficients of �� !� is denotedwith # ��u&v+w � �!� which is'x4y *qXr � # ��q � �!� . In thiscase,thedistancewill becomputedas z

�Dh # ��u)v+w � �{%|f�� ( s �, �� k%lf�� ( s h\�/. �� k%lf�� (TheDC coefficientis 8 timestheaverageof thevaluesin theblock. So,a DC coefficient is the smoothingof the valuesin the block. It may be better to keepthe differencesasmuchaspossible. Insteadof averagingluminancevalues,the maximumluminancedifference,h # �j�}Tu&~ � �{%|f�� , whichis Z\[^]

q6r �j * # ��q � �{%|f�� , maybeevaluated.Thenthedistancewill bez

h # �j�}Tu&~ � �{%|f�� ( s h\�, �j� � �{%|f�� ( s h\�/. �� {%|f�� (Sometimes,it may only be necessaryto considersharpchangesin the sequence. Instead of computing themaximum of luminancedifference, the minimum lumi-nancedifferencecan be taken, h # ��}

q6��k%lf�� , which isZc_Dd

q6r �� * h # ��q

� �{%|f�� . Thenthedistancewill bezh # ��}

qX� �{%|f�� (4s h\�, �� {%|f�� (4s h\�/. �� {%|f�� (K

Maximum selective measureis usedif the differencebe-tweentwo macroblocksneedsto be emphasizedas muchaspossible.Thefollowing function is usedfor Z\[K]�_DZcaLZdistance:

Z\[^]��Mh # �j�}Tu&~ � �{%|f��&%&h\�, �� k%lf!�&%+h\�/. �� k%lf��|� The sharpchangesat a macroblockcanbe detectedusingthe Zb_Dde_MZbaLZ distancemeasureasfollows:

Zb_Dd�Mh # ��}qX

��k%lf!�&%+h\�, �j� ��k%lf��)%+h\�/. �� {%|f��l� 2.2.2. The Cluster Selection

If thenumberof clustersaremorethanonefor amacroblocklocation, thereis a moving object at that macroblock. Ifthenumberof clustersis one,thenthereis no movementatthatmacroblockandtheclusteris theonly candidatewhichcontainsthebackgroundmacroblock.

Therearetwo basicfactorsthatwill beusedfor theselec-tion of thecluster:frequency andcontinuity. Thefrequencyof a clusteris the numberof elementsin that cluster. Thecontinuity of a clusterdenotesthe maximumlengthof thesequenceof macroblocksof the cluster that appearedse-quentially. The clustersarefirst chosenaccordingto theirfrequency. If thereis a tie, the clusterwhich hasa highercontinuity is selected.Sometimes,theremaybeno or verylittle motion in succeedingI-Frames.All theseframesareconsideredas the sameframe and they have the effect ofa single frame on the frequency and the continuity of theclusters.

Figure2: Estimationof DCcoefficientof thereferencemac-roblock.

2.2.3. The Background Macroblock SelectionThe backgroundarechosenfrom the clusterin four ways.Threeof themareaboutluminanceandtheotheronecon-sidersbothluminanceandchrominance.In thefirst case,ifthemoving objectabsorbslight (non-shining),it will causedarknesson thebackground.It is betterto choosethecan-didatewhich hasa higher luminance. Low luminanceisdueto theobjectin theenvironment.In thesecondcase,ifthe moving object is a light sourceor reflectslight, it willcausethebackgroundto shine.So,it is betterto choosethecandidatewhich hasa lower luminance.High luminanceiscausedby theobjectin theenvironment. In the third case,if thetypeof theobjectis unknown or it mayhavetheprop-ertiesof both shiningandnon-shiningobject, then it maybe betterto beneutral. So, the backgroundblock which iscloserto themeanis chosen.Finally, if thecolor is consid-ered,thenthebackgroundmacroblockwhichis closerto themeanof all theblockcoefficientsis thecandidate.

2.2.4. Enhancement with Motion VectorsThemotionvectorindicateswhetherthemacroblockmovesto otherpartsof theframe.Althoughtheterm”motion vec-tors” areusedin MPEGstreams,motionvectorsdo not ex-actlyexpressthedisplacementof amacroblock.They rathergive the locationof the closestmacroblockin thepreviousor next frame.Thisis crucialif thereis apatternin theback-ground.Althoughthemacroblockdoesnot move,sinceitspatternis the sameassomeothermacroblock,the motionvectormay point to thatmacroblock.If the macroblockispreviously locatedin anotherlocation, this usually showsthepartswheremoving objectsexist. If thereis a patternintheframe,it is alsopossiblethatthis macroblockwill pointto anotherlocationsharingthesamepattern.We applytwotrivial methodsto detectthis situation.

Sincewe do our operationson themacroblocklevel, wedo not have the exactDCT coefficientsof this block. Oneway to dealwith this is to checkall macroblocksthat in-tersectwith this macroblock.If all of themhave the samecharacteristicsasin thepredictedframe,themacroblockhasnot moved. Anotherway is to estimatetheDC coefficientsof thereferencedblock. It is shown how to estimatetheDCcoefficientsof thisblockdependingonthemacroblocksthat

(a) (b)

Figure3: VideoSequenceandBackgroundGeneration.

intersectwith this block in [6]. The estimationis donebygiving weightsaccordingto themacroblockcoefficientsbythe areathat sharewith the referencemacroblock�/+(Figure2):

�� +� xp q6r ' �gq

��q�

where ��q

givesthe DC valueof a block in Figure2 andq

is the ratio of region coveredby �

q

to the region ofthewholemacroblock(8x8=64pixels).

Let � �� !� and � � �j� �gf�� representthe referencedmacroblock in the referencedframe and in the currentframe,respectively. If

�_�[^dN)� � �j� � �!�)%-� �j� �gf!�l�4Y`�&%where �_;�[^dN) is a distancefunction and Y is a thresh-old, this implies that the referenceblock hasnot changedin thecurrentframe. This alsoimplies that �� !� wasaffectedby themoving object.So, �/D�--

� ��f�� is a can-

didatefor thebackgroundmacroblock.Otherwise,it wouldbeassumedto bepartof a moving object.

2.3. Merging BackgroundsThe problemof merging backgroundsis in fact the prob-lem of merging macroblocks.Thereareclustersgeneratedfor eachmacroblocklocation. Merging canbe donefromscratchasif no clustersexisted. Thentheupperprocedurefor generatingclustersfrom asinglevideoshotcanbeused.A clustermerging approachcanalsobeused.Theclusterswhichsharethesamecharacteristicsaremerged.Themerg-ing canbeperformedby usingoneof theexisting methodsin the literature. We merge themaccordingto their close-nessof their centroids.

3. RESULTS

We usedvideo streamsthat are recordedin the lectures.Eachlectureis storedasa MPEG-1videostream.Thecod-ing patternof streamsare IBBPBBPBBPBBPBBand theframe rate is 15 framesper second. Figure 3 shows thephasesof how the backgroundis detected. The first row

displaysthe framesthat areencountered.The secondrowshows the phasesof backgroundgeneration.Otherexam-plesareomittedheredueto thespacelimit.

Ourobservationsshowedthatif theobjectmovesenough,the backgroundcan be constructedin the early framesofthe clip. In the given example, the backgroundis gener-atedafter processing13 I frames. 120 B and48 P framesareskipped.Our resultsshowedthatchrominancemustbeincludedin the distancecomputationand the selectionofthe backgroundmacroblockfrom the cluster. The wrongmacroblockswhich have color distortionsmay be selectedif only luminancecoefficientsareconsidered.

4. CONCLUSION AND FUTURE WORK

An algorithmis presentedfor thegenerationof thestation-ary backgroundfrom a compressedvideo sequence.Thisalgorithm is basedon the clusteringof macroblocks.Ex-perimentsshowedthatbackgroundscanbeextractedusingDC coefficientsof MPEGstreams.In thispaper, wedid notconsidercameraoperations.Our next stepwill be genera-tion of thebackgroundin existenceof cameramotion.

5. REFERENCES

[1] A. Neri etal. Automaticmoving objectandbackgroundsepa-ration. Signal Processing, 66:119–232,1998.

[2] C. Kim andJ-N. Hwang. An integratedschemefor object-basedvideoabstraction.In ACM, 2000.

[3] R.MechandM. Wollborn.A noiserobustmethodfor segmen-tation of moving objectsin video sequences.In ICASSP97,pages2657–2660,April 1997.

[4] T. Meier andK.N. Ngan. Automaticvideosequencesegmen-tationusingobjecttracking.

[5] T. Sikora. The mpeg-4 video standardverification model.IEEE Trans. Circuits Syst. Video Technology, 7:19–31,Febru-ary1997.

[6] B.-L. Yeo and S.-F. Chang. Rapid sceneanalysison com-pressedvideo. IEEE Trans. Circuits Syst. Video Technology,5(6).

stationary background generation in mpeg …raygun/pubs/conf/2001_icme_aygun...ramazan savas aygun...

Documents