automatic tv advertisement detection from mpeg bitstream

8
Pattern Recognition 35 (2002) 2719 – 2726 www.elsevier.com/locate/patcog Automatic TV advertisement detection from MPEG bitstream David A. Sadlier , Sean Marlow, Noel O’Connor, Noel Murphy Centre for Digital Video Processing, Dublin City University, Dublin, Republic of Ireland Received 31 October 2001; accepted 31 October 2001 Abstract The Centre for Digital Video Processing at Dublin City University conducts concentrated research and development in the area of digital video management. The current stage of development is demonstrated on our Web-based digital video system called F schl ar (Proceedings of the Content based Multimedia Information Access, RIAO 2000, Vol. 2, Paris, France, 12–14 April 2000, p. 1390), which provides for ecient recording, analysing, browsing and viewing of digitally captured television programmes. Advertisement breaks during or between television programmes are typically recognised by a series of ‘black’ video frames simultaneously accompanying a depression in audio volume which separate each advertisement from one another by recurrently occurring before and after each individual advertisement. It is the regular prevalence of these ags that enables automatic dierentiation between what is programme and what is a commercial break. This paper reports on the progress made in the development of this idea into an advertisement detector system that automatically detects the commercial breaks from the bitstream of digitally captured television broadcasts. ? 2002 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: MPEG-1; Black=silent frames; DC coecients; Subband scalefactors 1. Introduction For the development of an ecient video browsing= view- ing tool for digitised television, it is desirable to present the user with the option of skipping irrelevant content. A typical television programme may be accompanied by beginning= end credits with one or more ad-breaks somewhere in the middle. To the user, these features of a programme= video are generally regarded as an insignicant part of the recorded material. Hence, by detecting the ad-breaks, the eciency of programme browsing= viewing may be increased. Adver- tisement breaks may be isolated from actual programme ma- terial by the ags that most terrestrial and some satellite television companies wave during their broadcast: a series of ‘black’ video frames simultaneously accompanied by a decrease in the audio signal occurring before and after each individual advertisement [1]. Corresponding author. Tel.: +353-1-700-5871; fax: +353-1- 704-5508. E-mail address: [email protected] (D.A. Sadlier). The F schl ar system captures television broadcasts and encodes the programmes according to the MPEG-1 digital video standard with the audio signal coded in line with the Layer-II prole. An inherently dark or ‘black’ frame of a video may be recognised by its luminance histogram, which would be typically characterised by having most of its ‘power’ at the bottom end of the pixel amplitude spectrum, correspond- ing to black= dark pixels. Thus, by comparing an average pixel value, representing an entire frame, against some given threshold, a decision on whether that frame may be consid- ered ‘black’ or not, may be made. Furthermore, a depression in audio volume for a particular video frame may be recognised as follows: A summation of the absolute value of all the audio sam- ples corresponding to one video frame may be dened as the ‘audio level’ for that frame, i.e. for a relatively silent frame, a low audio level would be expected. Thus, by comparing this audio level to some threshold, an audio depression (of magnitude dened by threshold) may be detected. The abovementioned criteria propose a simplistic 0031-3203/02/$22.00 ? 2002 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII:S0031-3203(01)00251-5

Upload: david-a-sadlier

Post on 02-Jul-2016

221 views

Category:

Documents


1 download

TRANSCRIPT

Pattern Recognition 35 (2002) 2719–2726www.elsevier.com/locate/patcog

Automatic TV advertisement detection from MPEG bitstream

David A. Sadlier ∗, Sean Marlow, Noel O’Connor, Noel MurphyCentre for Digital Video Processing, Dublin City University, Dublin, Republic of Ireland

Received 31 October 2001; accepted 31 October 2001

Abstract

The Centre for Digital Video Processing at Dublin City University conducts concentrated research and development in thearea of digital video management. The current stage of development is demonstrated on our Web-based digital video systemcalled F��schl�ar (Proceedings of the Content based Multimedia Information Access, RIAO 2000, Vol. 2, Paris, France, 12–14April 2000, p. 1390), which provides for e9cient recording, analysing, browsing and viewing of digitally captured televisionprogrammes.

Advertisement breaks during or between television programmes are typically recognised by a series of ‘black’ video framessimultaneously accompanying a depression in audio volume which separate each advertisement from one another by recurrentlyoccurring before and after each individual advertisement. It is the regular prevalence of these <ags that enables automaticdi=erentiation between what is programme and what is a commercial break. This paper reports on the progress made in thedevelopment of this idea into an advertisement detector system that automatically detects the commercial breaks from thebitstream of digitally captured television broadcasts. ? 2002 Pattern Recognition Society. Published by Elsevier Science Ltd.All rights reserved.

Keywords: MPEG-1; Black=silent frames; DC coe9cients; Subband scalefactors

1. Introduction

For the development of an e9cient video browsing=view-ing tool for digitised television, it is desirable to present theuser with the option of skipping irrelevant content. A typicaltelevision programme may be accompanied by beginning=end credits with one or more ad-breaks somewhere in themiddle. To the user, these features of a programme=video aregenerally regarded as an insigni@cant part of the recordedmaterial. Hence, by detecting the ad-breaks, the e9ciencyof programme browsing=viewing may be increased. Adver-tisement breaks may be isolated from actual programme ma-terial by the <ags that most terrestrial and some satellitetelevision companies wave during their broadcast: a seriesof ‘black’ video frames simultaneously accompanied by adecrease in the audio signal occurring before and after eachindividual advertisement [1].

∗ Corresponding author. Tel.: +353-1-700-5871; fax: +353-1-704-5508.

E-mail address: [email protected] (D.A. Sadlier).

The F��schl�ar system captures television broadcasts andencodes the programmes according to the MPEG-1 digitalvideo standard with the audio signal coded in line with theLayer-II pro@le.

An inherently dark or ‘black’ frame of a video may berecognised by its luminance histogram, which would betypically characterised by having most of its ‘power’ at thebottom end of the pixel amplitude spectrum, correspond-ing to black=dark pixels. Thus, by comparing an averagepixel value, representing an entire frame, against some giventhreshold, a decision on whether that frame may be consid-ered ‘black’ or not, may be made.

Furthermore, a depression in audio volume for a particularvideo frame may be recognised as follows:

A summation of the absolute value of all the audio sam-ples corresponding to one video frame may be de@ned asthe ‘audio level’ for that frame, i.e. for a relatively silentframe, a low audio level would be expected. Thus, bycomparing this audio level to some threshold, an audiodepression (of magnitude de@ned by threshold) may bedetected. The abovementioned criteria propose a simplistic

0031-3203/02/$22.00 ? 2002 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.PII: S0031 -3203(01)00251 -5

2720 D.A. Sadlier et al. / Pattern Recognition 35 (2002) 2719–2726

approach to the task of locating within a television pro-gramme, groups of black-frames=audio-depressions, whichshould provide for e9cient detection of advertisementbreaks.

However, the method does require direct access to bothvideo pixels and audio samples. Therefore it necessitates afull decode of the captured programme from its compressedformat [MPEG-1 (Layer-II)], which is highly undesirablefrom a computational point of view [1].

It was proposed that the same assessment and classi@ca-tion of the individual frames of a captured television signalmight be more e9ciently made as follows:

• For video; an examination of the DC Discrete CosineTransform (DC-DCT) coe9cients of a frame, which rep-resent the weight of its zero-frequency content, with aview to establishing whether or not the frame is inher-ently dark enough to be labelled ‘black’. The DCT formsan integral part of the compression process used in theMPEG-1 standard.

• For Audio; an inspection of the weight of the scalefac-tors of the signal’s (low) frequency subbands with a viewto establishing whether or not a video frame’s accompa-nying audio signal power is minimal enough for it to belabelled ‘silent’. (In the MPEG-1 Layer-II standard forcompression of audio signals, the signal frequency spec-trum is divided uniformly into 32 subbands which arethen coded independently from one another.)

It was envisaged that black-frame=silence detection via theabovementioned principles would provide for advertisementdetection from captured and encoded television programmesto the same degree of accuracy as could be obtained by themethods requiring implementation of a full audio=visualdecode, but with the computational burden signi@cantlyreduced.

2. Background

The Digital Video Indexing Project at Dublin City Uni-versity is a continuing research endeavor aimed at develop-ing innovative technologies fundamental to the realisationof e9cient video content management.

To demonstrate our research on digital video, we havedeveloped a Web-based digital video system which we callF��schl�ar [from the Irish f��s (vision) and chl�ar (programme)].At present a user can pre-set the recording of TV broadcastprogrammes and can choose from a set of di=erent browserinterfaces which allow navigation through the recordedprogrammes. As our research develops we will plug inincreased options such as personalisation and programmerecommendation, automatic recording, SMS=WAP=PDAalerting (SMS—Short messaging service; WAP—Wirelessapplication protocol; PDA—personal digital assistant),searching, summarising and so on.

To initiate the recording of a programme, a user browsesthe TV schedule and selects those programmes to berecorded—our system will then automatically record (digi-tally) that programme at broadcast time, much the same as ahome video cassette recorder. After we record a programme,we then automatically segment it using our shot boundarydetection technique based on colour histogram comparison,so that the content becomes easily browsable through ourvarious user interfaces. The analysed programme is thenadded to our archive of recorded programmes which a usercan scroll through and then select one for browse=playback.As a user browses through a programme he=she can thenstream the video to their desktop.

3. ‘Black=silent’ frames

3.1. The MPEG standard

The Moving Pictures Experts Group (MPEG), whomeet under the International Standards Organisation (ISO),generate the international standards for digital video andaudio compression. MPEG-1 [2] is a standard in @ve partswhich individually address the issues of audio=visual mul-tiplexing, video coding, audio coding, bitstream testing,and software implementation. A detailed description ofthe MPEG-1 standard for audio and video compressionmay be found via Refs. [2–4] and is thus not dealt withhere.

3.2. Black frame detection

F��schl�ar captures analog television signals and encodesthem according to the MPEG-1 standard. An MPEG-1video frame is divided into slices, which are subdivided intomacroblocks which each contain 6 blocks of (8× 8) pixelstransformed by a 2D-DCT, four of which provide luminanceinformation, leaving two for chrominance information.The video data is either explicitly given for the frame(I-frame) or provided implicitly via forward prediction(P-frame) or forward=backward prediction (B-frame).

The four luminance blocks (Y-blocks) of each macro-block provide the essential information on how dark or‘black’ the macroblock e=ectively is, hence indicating howdark or ‘black’ the overall frame may be.

Each Y-block consists of a DC coe9cient, which rep-resents its mean luminance intensity, and a number ofAC coe9cients, which represent its non-zero frequencycontent. The DC value corresponds to an average inten-sity value for each block, to which the visual perceptionis highly sensitive, whereas, various frequency <uctu-ations may go unnoticed. It was thus assumed that adecision on the inherent darkness of a block could bemade, with acceptable accuracy, via examination of the

D.A. Sadlier et al. / Pattern Recognition 35 (2002) 2719–2726 2721

DC coe9cients exclusively, i.e. AC variations may beignored.

The proposal was that an average luminance intensityvalue for each frame could be determined from the DC-DCTcoe9cients provided within individual Y-block. This valuewas expected to be relatively low for inherently dark framesand higher for brighter frames. Thus by thresholding thisvalue, a decision on whether the frame is ‘black’ or not maybe made.

3.3. Silent frame detection

FOPschlOar captures television audio signals and encodesthem according to the MPEG-1 Layer-II compression algo-rithm which encodes as follows:

The frequency spectrum of the audio signal (sampled at32, 44.1, or 48 kHz) is @rst divided uniformly into 32 sub-bands which approximate the ear’s critical bands. Thesesubbands are then individually assigned a bit-allocation ac-cording to the audibility of quantisation noise within thatband (a pyschoacoustic model of the ear analyses the audiosignal and provides this information to the quantiser).

Layer-II frames consist of 1152 samples; three groups of12 samples from each of 32 subbands. A group of 12 sam-ples gets a bit allocation and, if this is non-zero, a scalefac-tor. Scalefactors are weights that scale groups of 12 sam-ples such that they fully use the range of the quantiser (theencoder uses a di=erent scalefactor for each of the threegroups of 12 samples within each subband only if neces-sary). The scalefactor for such a group is determined by thenext largest value (given in a look-up table) to the maximumof the absolute values of the 12 samples, thus it provides anindication of the maximum power exhibited by any one ofthe 12 samples within the group.

The proposal was that an audio power level for eachvideo frame could be determined by superposition of thescalefactors corresponding to the groups of audio samplesto which the frame is associated. This power level was ex-pected to be relatively low for ‘silent’ frames and higher forlouder frames. Thus by thresholding this value, a decisionon whether the frame is ‘silent’ or not may be made.

Further still, it was expected that examination of just thebottom end of the frequency spectrum would provide suf-@cient information on which this decision could still beaccurately made, since it is typical of an audio signal tohave most of its energy corresponding to relatively lowfrequencies.

By trial and error examination of various audio signals, itwas decided that at least 10 of the low frequency subbands’scalefactors must be included in the silence investiga-tions such that the results rendered were both sensible andconsistent.

(Since the maximum frequency encoded by the MPEG-1Layer-II standard is 20 kHz, the @rst 10 subbands cor-responds to an examination of the energy present from0–6 kHz.)

4. Ad-break detection

4.1. Recognition of advertisement breaks

As explained, the occurrence of black-frame=audio-depression series may indicate the existence of an ad-break.However, it is possible, and maybe quite probable, thatthese indicators also occur during the valuable material ofthe programme itself. For example, they are not uncommonwhen news programmes cut back and forth from anchor-person to news reports, or during scene changes during asoap opera. To combat this problem, and its consequenceof detection=removal of valuable programme content, somestrict conditions had to be enforced.

• It was noted that the ‘black=silent’ frame series occurringbetween individual advertisements tended to be of at leastsix frames in length. Thus to aid against detection of freakblack-frame=audio-depression occurrences not associatedto ad-breaks which may sporadically occur during pro-grammes, it was decided only to recognise them if theyexhibit a series of at least six consecutive ‘black=quiet’frames i.e. a series of 10 ‘black=quiet’ frames betweenadjacent advertisements would be detected, while a seriesof @ve occurring between anchor person and news reportwould not.

• Upon examination of 20+ advertisement breaks fromvarious television stations, the longest advertisementrecorded was that of 76 s, with approximate averageadvertisement duration of 25 s. Thus it was decided thatupon detection of a series of (at least six) ‘black=silent’frames, if another distinct series was not detected withina window of 90 s, then the initial series must thereforenot correspond to an ad-break and should be ignored.This prevented against recognition of rogue ‘black=quiet’frame series which may randomly occur during relevantprogramme material.A consequence of this condition was that the systemwould fail upon occurrence of advertisements that werelonger than 90 s in duration (76 s was the longest adver-tisement recorded so 90 s provided for some tolerance).Frame rate for FOPschlOar is 25 frames=s. Thus 90 s corre-sponds to 2250 frames.

• Upon examination of 20+ advertisement breaks from var-ious television stations, it was determined that the averagenumber of individual advertisements within an ad-breakwas approximately seven.It was decided, to further prevent against the possibilityof relevant programme material being mistakenly recog-nised as advertisement, that the recognition process wouldnot succeed if the number of advertisements within onead-break was less than three.i.e. upon detection of a series of (at least six) ‘black=quiet’frames, at least three more series must be detected, withinthe respective 90 s windows of each other, for the over-all detection to be recognised as an ad-break (detection

2722 D.A. Sadlier et al. / Pattern Recognition 35 (2002) 2719–2726

of four ‘black=quiet’ frame series corresponds to three in-dividual advertisements). This prevented against possiblemistaken recognition due to the unlikely event of up tothree rogue series (consisting of at least six consecutive‘black=quiet’ frames) occurring within 90 s of each other,during relevant programme material.

The above three conditions would be individually weak sincethey focus on features which are undoubtedly inconsistentattributes of a television broadcast. However it is the com-bined e=ect of all three clauses which is expected to providethe success in accurately preventing mis-recognition of pro-gramme content for advertisement material. Fig. 1 explainshow detection of sporadic ‘black=quiet’ frame series are in-terpreted and recognised (shaded frames represent frameswhich are both ‘black’ and ‘silent’).

4.2. Black-frame and silent-frame thresholds

A number of threshold techniques were investigated.However, the following adaptive method provided mostconsistency in the results obtained, and was thus chosen asthe appropriate scheme.

For video, an overall mean DC-DCT value was calculatedby averaging over all individual frames (=DC-DCTavg). The‘black-frame’ threshold was then expressed as some factortimes this number. By trial and error examination of variousad-break clips, the video threshold which gave the mostsensible and consistent results was

Vth = 0:48DC-DCTavg: (1)

For audio, an overall mean audio level value was calculatedby averaging over all individual frames (=audio levelavg).The ‘silent-frame’ threshold was then expressed as somepercentage of this number. By trial and error examination ofvarious ad-break clips, the audio threshold which gave themost sensible and consistent results was

Ath = 0:073audio levelavg: (2)

4.3. Procedure

4.3.1. Video examination

• The DC-DCT coe9cients of each Y-block of each framewere stripped from the video bitstream of the MPEG @le.

• An average luminance DC-DCT coe9cient was then cal-culated for each video frame of the sequence.

• The overall mean value of the average frame coe9cientsfor the clip was determined. The ‘black-frame’ thresholdwas then de@ned as the value corresponding to 48% ofthis number (see Eq. (1)).

• Each frame’s average DC-DCT coe9cient value wascompared to the threshold and if equal=less than, thenthe frame was labelled ‘black’.

4.3.2. Audio examination

• The scalefactors corresponding to the @rst 10 subbands ofthe encoded audio signal were stripped from the bitstream.

• An audio level for each video frame was determined byaveraging the scalefactors corresponding to its associatedaudio signal.

• The overall mean value of the video frame audio lev-els for the entire clip was determined. The ‘silent-frame’threshold was then de@ned as the value corresponding to7.3% of this number (see Eq. (2)).

• Each video frame’s representative audio level was com-pared to the threshold and if equal=less than, then theframe was labelled ‘silent’.

4.3.3. Search for simultaneous ‘black=silent’ frames

• Unless four distinct series of at least six consecutive si-multaneously ‘black=silent’ frames were detected within90 s (2250 frames) of each other, any such series wereignored.

• Upon detection of four distinct series of at least six con-secutive simultaneously ‘black=silent’ frames, within 90 s(2250 frames) of each other, the minimum requirementsfor such a sequence to be recognised as an ad-break havebeen completed.

• Further such series detected within the same allowablewindow [90 s (2250 frames)] from the end of the previ-ous would also be recognised as being part of the samead-break.

• Until there is no further detection of such series withinthe allowable window, all detected series are recognisedas corresponding to the same ad-break.

• The detected ad-break is then said to have begun with the@rst ‘black=silent’ frame of the @rst detected=recognisedseries and ended with the last frame of the lastdetected=recognised series.

• The process begins again upon detection of the next(unrelated) series.

5. Results and examination

5.1. Test material

F��schl�ar provided 10 short television programme clipsfrom four di=erent channels [labelled (a)–(d)] in MPEG-1Layer-II format. The recordings were meticulously chosensuch that they exhibited signi@cant content diversity with atleast one complete ad-break somewhere in the middle.

5.2. Results

The above mentioned procedures were executed on all 10clips.

D.A. Sadlier et al. / Pattern Recognition 35 (2002) 2719–2726 2723

Fig. 1. Interpretation of ‘black=silent’ frame series.

Evaluation of the system was performed by comparison ofresults achieved against a manual record of the true locationof the ad-breaks, to the nearest second, within each clip.

Results are tabulated in Table 1.(For shorthand purposes, a detected series of at least

six consecutive ‘black=silent’ video frames as previouslydescribed, is henceforth labelled as a >ag.)

5.3. Result examination

The absence of falsely identi?ed content is evident fromthe results in Table 1. In clip ‘(b) Sports Show’, the detectedend of the ad-break is at 770 s, but the true end is later, at790 s. This is recorded as 20 missed seconds of ad-breakmaterial. The total number of seconds in the true ad-break(=790 − 603) = 187 s. The total number of seconds de-tected as ad-break (=770 − 603) = 167 s. A superpositionof these quantities over all 10 clips was performed to givefour overall values, which are illustrated in Fig. 2.

5.3.1. Precision and recallFor a better insight into the individual accuracy of each

clip, two important @gures of merit for the ad-detection

system were calculated:

• The Recall measure looks at the percentage of detectedmaterial corresponding to true ad-breaks.

Recall

=100

[Length of ad-break (seconds)−No: seconds missed

]

Length of ad-break (seconds)

• The Precision measure is a percentage showing howaccurate the system is at exclusively detecting ad-breakmaterial.

Precision

=100

[Length of ad-break (seconds)−No: seconds missed

] Length of ad-break (seconds)−No: seconds missed+No: seconds falsely identi@ed

Example:For clip ‘(b) Sports Show’ the Recall=Precision @gures

were calculated as follows (from the information in Table 1):

2724 D.A. Sadlier et al. / Pattern Recognition 35 (2002) 2719–2726

Table 1Results of experimentation on 10 video clips provided by 4 television stations comprising 11 advertisement breaks

(Channel) Length No. of No. of <ags True DetectedDescription of clip <ags detected, ad-break ad-break

(seconds) detected not corresponding to location in clip location in clipad-breaks (second–second) (second–second)

(a) Chat show 600 5 0 193–281 193–281

(a) News broadcast 1200 4 0 468–538 468–516

(a) Music show 600 5 1 257–347 258–347

(b) News broadcast 1200 11 0 113–351 113–351

(b) Soap opera 1200 7 0 779–935 779–935

(b) Sports show 1200 11 3 603–790 603–770

(c) Youth magazine 1800 14 7 185–272 185–272

(c) Game show 1500 7 2 659–772 659–772

(d) Cartoon 1800 21 1 798–975 798–975and and1415–1660 1415–1660

(d) Comedy quiz 1800 6 0 774–934 774–913

Fig. 2. Total number of seconds corresponding to ad-breaks, detected ad-breaks, missed ad-breaks and falsely identi@ed ad-breaks.

Length of ad-break (=790− 603) = 187 s

No: seconds falsely identi@ed = 0 s

No: seconds missed = 20 s (ad-break-end detected 20 s

early)

Precision = 100[(187− 20)=(187− 20 + 0)] = 100;

Recall = 100[(187− 20)=187] = 89:3:

Results following similar calculations performed on allclips are presented in Table 2.

6. Conclusions

6.1. Result evaluation

In all, 10 clips comprising 315 min of digital video, in-corporating 11 ad-breaks, were analysed. The following arethe main points to be noted:

D.A. Sadlier et al. / Pattern Recognition 35 (2002) 2719–2726 2725

Table 2Precision and Recall values based on results in Table 1

Clip Precision Recall

(a) Chat show 100 100(a) News broadcast 100 68.6(a) Music show 100 98.8(b) News broadcast 100 100(b) Soap opera 100 100(b) Sports show 100 89.3(c) Youth magazine 100 100(c) Game show 100 100(d) Cartoon (1) 100 100

(2) 100 100(d) Comedy quiz 100 86.9

• The system detected the occurrence of all 11 ad-breaks.• A total of 14 ‘black=silent’ <ags which were not associated

with ad-breaks were detected (seven of these occurring inone clip alone). However, the number of falsely detectedad-breaks was zero.

• All detections attained a Precision percentage of 100%.• eight-out-of-11 of the detections attained a Recall per-

centage ¿ 98%.

Clips ‘(b) Sports Show, (d) Comedy Quiz, and (a) NewsBroadcast’ performed relatively poorly compared to oth-ers. It was noted that their common downfall was thead-break-end being detected prematurely. Manual investi-gation showed that the reason for this was that for all threeclips, the @nal <ag in the ad-breaks (occurring between @nalad and return of programme) consisted of less than 6 con-secutive frames, thus violating one of the three conditionsimposed on the detection process. Consequently, the sys-tem did not detect the occurrence of the @nal advertisementwithin the ad-break, but identi@ed the ad-break-end with theend of the last detected <ag (which actually correspondedto the end of the penultimate advertisement).

Numbers of individual advertisements comprising overallad-breaks were counted for the three clips. It was then con-cluded that clips ‘(b) Sports Show’ and ‘(d) Comedy Quiz’resulted in one-out-of-seven and one-out-of-six ads missedwithin their respective ad-breaks, giving Recall percentagesof 89.3 and 86.9. However, clip ‘(a) News Broadcast’ re-sulted in one-out-of-four ads missed within its ad-break,which represented non-detection of a much larger propor-tion of the overall ad-break. Hence, this clip only attained ameagre Recall percentage of 68.6.

The consistently high Precision @gures indicated highsuccess in the prevention of mis-recognition of content notcorresponding to ad-breaks.

6.2. Further work

The system succeeded very well until, on three-out-of-11occasions, the conditions incorporated to combat false

identi@cation of ad-breaks, actually prevented the detectionof the @nal advertisement within their respective ad-breaks.For this reason (and due to the fact that the system willfail for (i) occurrence of individual advertisements longerthan 90 s in duration, and (ii) ad-breaks consisting of lessthan three advertisements), any future work could possiblyinvolve further optimisation of the trade-o= between consis-tently precise ad-break detection and the prevention of falseidenti@cation, either by changing the parameters in the ex-isting conditions or by replacing them with improved ones.

In introducing this subject, it was mentioned that mostadvertising television stations feature the characteristic of‘black=silent’ frame gap in between individual advertise-ments. In fact, of the six advertising television channelscaptured by F��schl�ar, two do not exhibit this trait. Theyinstead have each individual advertisement run directlyinto each other with no pauses in between. Consequently,the discussed method of ad-break detection would fail forthese broadcasts and so an alternate method is required. Aproposed technique is to examine the rate of shot-cuts overan entire programme. This rate is expected to be notablyhigh during ad-breaks since advertisements characteris-tically exhibit a consistently high rate of activity. This,coupled with some timing aspects (e.g. broadcast timeis usually sold in discrete units) could provide su9cientinformation enabling an alternative method for automaticadvertisement=programme di=erentiation.

References

[1] D. Carroll, Advertisement detection, Internal Technical Report,Dublin City University, 2000 (contact corresponding author).

[2] The O9cial MPEG Committee Website. http://www.cselt.it/mpeg/

[3] D. Marshall, Video and Audio Compression Website.http://www.cs.cf.ac.uk/Dave/Multimedia/node196.html

[4] K.R. Rao, J.J. Hwang, Techniques and Standards for Image,Video and Audio Coding, Prentice-Hall, Englewood Cli=s, NJ,1996.

Further Reading

B. Bourquin, M. Frey, R. Wetzel, NOMAD Project.http://www.fatalfx.com/nomad/R. Lienhart, C. Kuhmunch, W. E=elsberg, On the detection andrecognition of television commercials, Proceedings of the IEEEConference on Multimedia Computing and Systems, Ottawa,Canada, 1996, pp. 509–516.D. Fry, E. Hampshire, T. Hargrove, Automatic detection ofcommercials within pre-recorded cartoon shows. Preliminaryproject design report. http://www.cse.scu.edu/projects/2000-01/project12/Y. Li, C.-C. Jay Kuo, Detecting the commercial breaks in realTV programs based on audiovisual information, Proceedings of theSPIE Vol. 4210, Internet Multimedia Management Systems 2000,pp. 225–236.

2726 D.A. Sadlier et al. / Pattern Recognition 35 (2002) 2719–2726

About the Author—DAVID A. SADLIER received his Bachelor of Electronic Engineering Degree from the National University of Irelandin 2000. His research interests are mainly in signal processing for digital audio=video analysis and is currently pursuing a Masters degreeof E. Eng in this @eld at Dublin City University.

About the Author—SEAN MARLOW, MIEI, CEng, received a B.Sc. (Hons) degree in Electrical and Electronic Engineering from Queen’sUniversity, Belfast, in 1976 and a Ph.D. in Signal Processing from the same University in 1979. From 1979 to 1980 he was assistantlecturer in University College, Cork. From 1980 to the present he was employed at Dublin City University (formerly NIHE Dublin), @rst aslecturer, then as senior lecturer in electronic engineering. Dr. Marlow is joint Irish representative on the COST 211ter (Video Compressionfor Telecomms) Management Committee. He is co-Director of the Visual Media Processing Group and Centre for Digital Video Processing.

About the Author—NOEL O’CONNOR has been a lecturer in the School of Electronic Engineering at Dublin City University Since July1999. He is currently the Programme Chair for the School’s new BEng in Digital Media Engineering degree programme. Noel is the Irishrepresentative to the European COST 211 action and has also been an Irish representative to the ISO=IEC MPEG standards body on a numberof occasions. His plans for future research include continuing his work on video compression, video object segmentation and investigatingthe role of image and video analysis techniques in future multimedia archiving and indexing applications.

About the Author—NOEL MURPHY received his degree in Theoretical Physics from Trinity College, Dublin, in 1985 and subsequentlybegan work as an investigative researcher in Computer Vision at the School of Electronic Engineering in DCU. In 1986 he began asa part-time lecturer at the School while he @nished his Ph.D. on an information theoretic approach to visual perception. He has held apermanent lecturer position in the School since 1994.