stationary background generation in mpeg …raygun/pubs/conf/2001_icme_aygun...ramazan savas aygun...

4
STATIONARY BACKGROUND GENERATION IN MPEG COMPRESSED VIDEO SEQUENCES Ramazan Savas Aygun and Aidong Zhang Department of Computer Science and Engineering State University of New York at Buffalo Buffalo, NY 14260 USA ABSTRACT The development of the new video coding standard, MPEG- 4, has triggered many video segmentation algorithms that address the generation of the video object planes (VOPs). The background of a video scene is one kind of VOPs where all other video objects are layered on. In this paper, we propose a method for the generation of the stationary back- ground in a MPEG compressed video sequence. If the ob- jects move frequently and all the components of the back- ground are visible in the video sequence, the background macroblocks can be constructed by using Discrete Cosine Transform (DCT) DC coefficients of the blocks. After the generation of the stationary background, the moving objects can be extracted by taking the difference between the frames and the background. 1. INTRODUCTION The MPEG-4 [5] video standard introduces the notion of Video Object Plane (VOP) to support content-based access of video streams. Each video object is represented as a video object plane (VOP). The segmentation of video ob- jects is one of the core parts of MPEG-4 video encoding. The background of the objects is represented with . All other VOPs of objects are layered on the background . The VOPs are often not known beforehand and they need to be extracted from image sequences. Most of the video segmentation algorithms that address VOP extraction have emerged due to the new video coding standard, MPEG-4. Most methods that are proposed inves- tigate the segmentation of video objects and are computa- tion intensive. Kim et al. proposed a method where VOPs are extracted by first generating moving edge maps [2]. The method proposed in [1] consists of a motion detection phase employing higher order statistics and a regularization phase to achieve spatial continuity. VOPs are generated from an estimated change detection mask (CDM) in [3]. A buffer is used to increase temporal stability by labeling each pixel as changed if it belonged to an object at least once in the last L change detection masks. The model in [4] uses the edge pixels in an edge image detection. An object tracker matches this binary model against subsequent frames. The model is updated every frame to accommodate for rotation and changes in shape of the object. After video objects are extracted, the remaining part of the image is considered as the background, . Our focus is different from the previous work done in this field. We concentrate on the generation of the station- ary background, which will provide flexibility in MPEG-4 video construction and editing. In this paper, we propose an algorithm which generates the background from a video sequence and then detects the moving objects on it. We consider the background as the part of the image without the moving objects which are replaced with the real back- ground. So, this is not just the separation of the background from moving objects. We also generate the background hid- den behind the moving objects from the video sequence. The background can be constructed if all the components appear in the video sequence. Because of the moving ob- jects, not all parts will appear in the same frame. We per- form our operations on the macroblock level using only DCT DC coefficients in an MPEG-1 video stream. This ap- proach fits well to applications where the background does not change often such as lectures or halls recorded by a static camera. In these applications, the background re- mains the same and objects move enough so that all parts of the background are visible in the video sequence. 2. STATIONARY BACKGROUND GENERATION 2.1. MPEG Video Stream An MPEG-1 video stream is composed of I, P and B frames, where P and B frames exploit the similarity between the frames. The detection of the background can be accom- plished if the moving object displaces its location. So that the background that is hidden behind the object will be vis- ible. In consecutive frames, there is usually very slight change. It is time consuming to process every frame. There- fore, the process of the background generation at intervals of frames will provide faster speed. P and B frames assume very little change with respect to their dependent frames and their macroblocks are decoded using macroblocks in

Upload: others

Post on 29-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • STATIONARY BACKGROUND GENERATION IN MPEG COMPRESSED VIDEOSEQUENCES

    Ramazan Savas Aygun and Aidong Zhang

    Departmentof ComputerScienceandEngineeringStateUniversityof New York at Buffalo

    Buffalo,NY 14260USA

    ABSTRACT

    Thedevelopmentof thenew videocodingstandard,MPEG-4, hastriggeredmany video segmentationalgorithmsthataddressthe generationof the video objectplanes(VOPs).Thebackgroundof avideosceneis onekind of VOPswhereall other video objectsare layeredon. In this paper, weproposea methodfor thegenerationof thestationaryback-groundin a MPEGcompressedvideosequence.If theob-jectsmove frequentlyandall the componentsof the back-groundarevisible in the video sequence,the backgroundmacroblockscanbe constructedby usingDiscreteCosineTransform(DCT) DC coefficientsof the blocks. After thegenerationof thestationarybackground,themoving objectscanbeextractedby takingthedifferencebetweentheframesandthebackground.

    1. INTRODUCTION

    The MPEG-4 [5] video standardintroducesthe notion ofVideoObjectPlane(VOP)to supportcontent-basedaccessof video streams. Each video object is representedas avideo objectplane(VOP). The segmentationof video ob-jects is oneof the corepartsof MPEG-4video encoding.The backgroundof the objectsis representedwith

    �������.

    All other VOPsof objectsare layeredon the background�������. TheVOPsareoftennotknown beforehandandthey

    needto beextractedfrom imagesequences.Most of the video segmentationalgorithmsthat address

    VOPextractionhave emergeddueto thenew videocodingstandard,MPEG-4. Most methodsthatareproposedinves-tigatethe segmentationof video objectsandarecomputa-tion intensive. Kim et al. proposeda methodwhereVOPsareextractedby first generatingmoving edgemaps[2]. Themethodproposedin [1] consistsof amotiondetectionphaseemploying higherorderstatisticsandaregularizationphaseto achieve spatialcontinuity. VOPsaregeneratedfrom anestimatedchangedetectionmask(CDM) in [3]. A bufferis usedto increasetemporalstability by labelingeachpixelaschangedif it belongedto an objectat leastoncein thelast L changedetectionmasks. The model in [4] usestheedgepixels in an edgeimagedetection.An object tracker

    matchesthis binarymodelagainstsubsequentframes.Themodelis updatedevery frameto accommodatefor rotationandchangesin shapeof theobject.After videoobjectsareextracted,the remainingpartof the imageis consideredasthebackground,

    �������.

    Our focus is different from the previous work doneinthis field. We concentrateon the generationof the station-ary background,which will provide flexibility in MPEG-4video constructionandediting. In this paper, we proposeanalgorithmwhich generatesthebackgroundfrom a videosequenceand then detectsthe moving objectson it. Weconsiderthe backgroundas the part of the imagewithoutthe moving objectswhich arereplacedwith the real back-ground.So,this is not just theseparationof thebackgroundfrom moving objects.Wealsogeneratethebackgroundhid-den behind the moving objectsfrom the video sequence.The backgroundcanbe constructedif all the componentsappearin the video sequence.Becauseof the moving ob-jects,not all partswill appearin the sameframe. We per-form our operationson the macroblocklevel using onlyDCT DC coefficientsin anMPEG-1videostream.Thisap-proachfits well to applicationswherethebackgrounddoesnot changeoften suchas lecturesor halls recordedby astatic camera. In theseapplications,the backgroundre-mainsthe sameandobjectsmove enoughso that all partsof thebackgroundarevisible in thevideosequence.

    2. STATIONARY BACKGROUND GENERATION2.1. MPEG Video StreamAn MPEG-1videostreamis composedof I, PandB frames,whereP and B framesexploit the similarity betweentheframes. The detectionof the backgroundcan be accom-plishedif themoving objectdisplacesits location. So thatthebackgroundthatis hiddenbehindtheobjectwill bevis-ible. In consecutive frames, there is usually very slightchange.It is timeconsumingto processeveryframe.There-fore, the processof the backgroundgenerationat intervalsof frameswill providefasterspeed.P andB framesassumevery little changewith respectto their dependentframesand their macroblocksare decodedusing macroblocksin

    raygunText Box© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.The final published version is available at http://dx.doi.org/10.1109/ICME.2001.1237817.

  • Figure1: Blocksof a macroblockin MPEGframes.

    theI frames.So,I framesarebettercandidatesfor theback-groundconstructionsincetherewill be enoughobjectdis-placementin videosequenceandthey donotdependonanyotherframe. But, if the mobile objectsaresmall (i.e. thatcanfit in amacroblock)andtheirdisplacementis alsosmall,then further processingis needed. In that case,P and Bframesneedto beprocessed.In this work, we addresslargemobileobjects(i.e. thatcannotfit into singlemacroblock).

    In MPEG-1, eachmacroblockof I-Frameis composedof 6 blocks: 4 luminanceblocks and2 color components(Figure1). Blocks arecompressedusingDiscreteCosineTransform(DCT)and eachblock is representedwith DCandAC coefficients. Our algorithm works at the level ofmacroblocksratherthanpixelssincethecompressionis per-formedat the level of macroblocks.Let �

    ��������� be themacroblockatthe ����� row and ����� columnof frame � . EachDC coefficient of � ��� ����� canbe denotedas ��� ��� ����� .������� �!� may be representedwith its DC coefficientsasfollows:"$# ���� �����&% # ���' �����&% # ���( � �!�)% # ���* �����&%+�, ��� � �!�)%-�/. ��� � �!�&0where

    # �,# ' , # ( , # * , �, and �/. areDC coefficientsof the

    blocksthatareshown in Figure1.

    2.2. The AlgorithmThebackgroundgenerationhasthreesteps:1 Thegenerationof thebackgroundfrom a videoshot1 Merging of the samebackgroundsthat aregenerated

    from differentvideo shotswhich belongto the samevideoscene1 Merging of the samebackgroundsthat appearin dif-ferentvideoscenes

    In thispaper, wewill focusonthegenerationof theback-groundfrom a videoshot. Our algorithmhasthreephases:theclusteringof themacroblocks,theselectionof theclus-ter which maycontainthebackgroundmacroblockandtheselectionof the backgroundmacroblockfrom the cluster.Thealgorithmis thusasfollows:

    /* 2436587-9;:& 7-9+?A@�BDC$E keepsall clustersof macroblocksthatappearedat @GFIH row and C$FIH column*/

    /* J+365G7+9D:&

  • Thesecondmethodassignsthesameweightto thechromi-nanceandtheluminanceandtakestheaverageof thelumi-nancecoefficients. The averageof the luminanceDC co-efficients of ������� �!� is denotedwith # ���u&v+w � �!� which is'x4y *qXr � # ���q � �!� . In thiscase,thedistancewill becomputedas z

    �Dh # ���u)v+w � �{%|f�� ( s �, ��� ���k%lf�� ( s h\�/. ��� ���k%lf�� (TheDC coefficientis 8 timestheaverageof thevaluesin theblock. So,a DC coefficient is the smoothingof the valuesin the block. It may be better to keepthe differencesasmuchaspossible. Insteadof averagingluminancevalues,the maximumluminancedifference,h # �j�}Tu&~ � �{%|f�� , whichis Z\[^]

    q6r �j * # ���q � �{%|f�� , maybeevaluated.Thenthedistancewill bez

    h # �j�}Tu&~ � �{%|f�� ( s h\�, �j� � �{%|f�� ( s h\�/. ��� � �{%|f�� (Sometimes,it may only be necessaryto considersharpchangesin the sequence. Instead of computing themaximum of luminancedifference, the minimum lumi-nancedifferencecan be taken, h # ���}

    q6���k%lf�� , which isZc_Dd

    q6r �� * h # ���q

    � �{%|f�� . Thenthedistancewill bezh # ���}

    qX� �{%|f�� (4s h\�, ��� � �{%|f�� (4s h\�/. ��� � �{%|f�� (K

    Maximum selective measureis usedif the differencebe-tweentwo macroblocksneedsto be emphasizedas muchaspossible.Thefollowing function is usedfor Z\[K]�_DZcaLZdistance:

    Z\[^]��Mh # �j�}Tu&~ � �{%|f��&%&h\�, ��� ���k%lf!�&%+h\�/. ��� ���k%lf��|� The sharpchangesat a macroblockcanbe detectedusingthe Zb_Dde_MZbaLZ distancemeasureasfollows:

    Zb_Dd�Mh # ���}qX

    ���k%lf!�&%+h\�, �j� ���k%lf��)%+h\�/. ��� � �{%|f��l� 2.2.2. The Cluster Selection

    If thenumberof clustersaremorethanonefor amacroblocklocation, thereis a moving object at that macroblock. Ifthenumberof clustersis one,thenthereis no movementatthatmacroblockandtheclusteris theonly candidatewhichcontainsthebackgroundmacroblock.

    Therearetwo basicfactorsthatwill beusedfor theselec-tion of thecluster:frequency andcontinuity. Thefrequencyof a clusteris the numberof elementsin that cluster. Thecontinuity of a clusterdenotesthe maximumlengthof thesequenceof macroblocksof the cluster that appearedse-quentially. The clustersarefirst chosenaccordingto theirfrequency. If thereis a tie, the clusterwhich hasa highercontinuity is selected.Sometimes,theremaybeno or verylittle motion in succeedingI-Frames.All theseframesareconsideredas the sameframe and they have the effect ofa single frame on the frequency and the continuity of theclusters.

    Figure2: Estimationof DCcoefficientof thereferencemac-roblock.

    2.2.3. The Background Macroblock SelectionThe backgroundarechosenfrom the clusterin four ways.Threeof themareaboutluminanceandtheotheronecon-sidersbothluminanceandchrominance.In thefirst case,ifthemoving objectabsorbslight (non-shining),it will causedarknesson thebackground.It is betterto choosethecan-didatewhich hasa higher luminance. Low luminanceisdueto theobjectin theenvironment.In thesecondcase,ifthe moving object is a light sourceor reflectslight, it willcausethebackgroundto shine.So,it is betterto choosethecandidatewhich hasa lower luminance.High luminanceiscausedby theobjectin theenvironment. In the third case,if thetypeof theobjectis unknown or it mayhavetheprop-ertiesof both shiningandnon-shiningobject, then it maybe betterto beneutral. So, the backgroundblock which iscloserto themeanis chosen.Finally, if thecolor is consid-ered,thenthebackgroundmacroblockwhichis closerto themeanof all theblockcoefficientsis thecandidate.

    2.2.4. Enhancement with Motion VectorsThemotionvectorindicateswhetherthemacroblockmovesto otherpartsof theframe.Althoughtheterm”motion vec-tors” areusedin MPEGstreams,motionvectorsdo not ex-actlyexpressthedisplacementof amacroblock.They rathergive the locationof the closestmacroblockin thepreviousor next frame.Thisis crucialif thereis apatternin theback-ground.Althoughthemacroblockdoesnot move,sinceitspatternis the sameassomeothermacroblock,the motionvectormay point to thatmacroblock.If the macroblockispreviously locatedin anotherlocation, this usually showsthepartswheremoving objectsexist. If thereis a patternintheframe,it is alsopossiblethatthis macroblockwill pointto anotherlocationsharingthesamepattern.We applytwotrivial methodsto detectthis situation.

    Sincewe do our operationson themacroblocklevel, wedo not have the exactDCT coefficientsof this block. Oneway to dealwith this is to checkall macroblocksthat in-tersectwith this macroblock.If all of themhave the samecharacteristicsasin thepredictedframe,themacroblockhasnot moved. Anotherway is to estimatetheDC coefficientsof thereferencedblock. It is shown how to estimatetheDCcoefficientsof thisblockdependingonthemacroblocksthat

  • (a) (b)

    Figure3: VideoSequenceandBackgroundGeneration.

    intersectwith this block in [6]. The estimationis donebygiving weightsaccordingto themacroblockcoefficientsbythe areathat sharewith the referencemacroblock�/+(Figure2):

    ��� +� xp q6r ' �gq

    ���q�

    where ���q

    givesthe DC valueof a block in Figure2 andq

    is the ratio of region coveredby �

    q

    to the region ofthewholemacroblock(8x8=64pixels).

    Let � ��� � �!� and � � �j� �gf�� representthe referencedmacroblock in the referencedframe and in the currentframe,respectively. If

    �_�[^dN)� � �j� � �!�)%-� �j� �gf!�l�4Y`�&%where �_;�[^dN) is a distancefunction and Y is a thresh-old, this implies that the referenceblock hasnot changedin thecurrentframe. This alsoimplies that ������� �!� wasaffectedby themoving object.So, �/D�--

    � ��f�� is a can-

    didatefor thebackgroundmacroblock.Otherwise,it wouldbeassumedto bepartof a moving object.

    2.3. Merging BackgroundsThe problemof merging backgroundsis in fact the prob-lem of merging macroblocks.Thereareclustersgeneratedfor eachmacroblocklocation. Merging canbe donefromscratchasif no clustersexisted. Thentheupperprocedurefor generatingclustersfrom asinglevideoshotcanbeused.A clustermerging approachcanalsobeused.Theclusterswhichsharethesamecharacteristicsaremerged.Themerg-ing canbeperformedby usingoneof theexisting methodsin the literature. We merge themaccordingto their close-nessof their centroids.

    3. RESULTS

    We usedvideo streamsthat are recordedin the lectures.Eachlectureis storedasa MPEG-1videostream.Thecod-ing patternof streamsare IBBPBBPBBPBBPBBand theframe rate is 15 framesper second. Figure 3 shows thephasesof how the backgroundis detected. The first row

    displaysthe framesthat areencountered.The secondrowshows the phasesof backgroundgeneration.Otherexam-plesareomittedheredueto thespacelimit.

    Ourobservationsshowedthatif theobjectmovesenough,the backgroundcan be constructedin the early framesofthe clip. In the given example, the backgroundis gener-atedafter processing13 I frames. 120 B and48 P framesareskipped.Our resultsshowedthatchrominancemustbeincludedin the distancecomputationand the selectionofthe backgroundmacroblockfrom the cluster. The wrongmacroblockswhich have color distortionsmay be selectedif only luminancecoefficientsareconsidered.

    4. CONCLUSION AND FUTURE WORK

    An algorithmis presentedfor thegenerationof thestation-ary backgroundfrom a compressedvideo sequence.Thisalgorithm is basedon the clusteringof macroblocks.Ex-perimentsshowedthatbackgroundscanbeextractedusingDC coefficientsof MPEGstreams.In thispaper, wedid notconsidercameraoperations.Our next stepwill be genera-tion of thebackgroundin existenceof cameramotion.

    5. REFERENCES

    [1] A. Neri etal. Automaticmoving objectandbackgroundsepa-ration. Signal Processing, 66:119–232,1998.

    [2] C. Kim andJ-N. Hwang. An integratedschemefor object-basedvideoabstraction.In ACM, 2000.

    [3] R.MechandM. Wollborn.A noiserobustmethodfor segmen-tation of moving objectsin video sequences.In ICASSP97,pages2657–2660,April 1997.

    [4] T. Meier andK.N. Ngan. Automaticvideosequencesegmen-tationusingobjecttracking.

    [5] T. Sikora. The mpeg-4 video standardverification model.IEEE Trans. Circuits Syst. Video Technology, 7:19–31,Febru-ary1997.

    [6] B.-L. Yeo and S.-F. Chang. Rapid sceneanalysison com-pressedvideo. IEEE Trans. Circuits Syst. Video Technology,5(6).