towards yahoo! challenge: a generic event detection and

1
Towards Yahoo! Challenge: A Generic Event Detection and Segmentation System for Video Navigation and Search Tianzhu Zhang 1,2 , Changsheng Xu 1,2 , Guangyu Zhu 3 , Si Liu 1,2,3 , Hanqing Lu 1,2 1 National Lab of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, P. R. China 2 China-Singapore Institute of Digital Media, Singapore 119613, Singapore 3 Department of Electrical and Computer Engineering, National University of Singapore, Singapore {tzzhang, csxu, sliu, luhq}@nlpr.ia.ac.cn, [email protected] ABSTRACT We have developed a generic event detection and segmentation sys- tem for video navigation and search to address part of the challenge proposed by Yahoo!. We will first introduce the most significant features of our system and then describe why our system presents a novel and interesting solution in relation to the Yahoo! challenge. 1. INTRODUCTION TO OUR SYSTEM Our system shown in Fig.1 is able to detect event and segment video based on its content and is generic to various video domains (e.g. sports, news and movies). The event-based video segmenta- tion system can significantly facilitate video navigation and search in large-scale video corpus. The details of the technical approaches and experimental results can be referred to our full paper [1]. Here we highlight the most significant features of our system as follows: 1. Event Recognition: Given a video, our system can make analysis of its content information and obtain its event cat- egories. In this way, each video can be labeled using their content information instead of its textual metadata. 2. Event Segmentation/Localization: For an event in a video sequence, our system is able to exactly locate its start/end boundaries. Therefore, interesting events in a video can be localized and video can be segmented with its semantic con- cept. 3. Semantic Navigation and Search: Based on event detection (event recognition and localization), our system also pro- vides a service on video semantic navigation and search. As a result, it is very convenient for users to search and browse videos with their personal preferences. 2. ADVANTAGES OF OUR SYSTEM There are three challenges for robust video automatic segmenta- tion according to event detection: (1) Current video search portals mostly rely on the title, tags or surrounding page-text of the video, which ignore the richness of information within the video. (2) The next generation video search engine needs to provide users with their interesting sections within a video for convenient browsing. (3) How to develop generic algorithms to automatically generate navigation for videos from various domains. MM’10, October 25–29, 2010, Firenze, Italy. Copyright 2010 ACM 978-1-60558-933-6/10/10 ...$10.00. ... ... Semantic Navigation and Search Demonstration: ApplaudHand: Event Localization B_FreeThrow: Kissing: Event Recognition Demonstration: ... ... DrivingCar: ... ... Input Video ... ... Movie Video: Sports Video: News Video: Figure 1: Video event detection and segmentation system. For better viewing, please see the original color pdf file. To solve these challenging problems, we propose a Graph-based Semi-Supervised Multiple Instance Learning (GSSMIL) algorithm for video navigation and search in different video domains based on event detection, which has the following advantages: We semantically align the text and video information to learn event model and then propagate labels to those videos with or without text information. Since our learned model is based on video content, it can be used for video navigation and search based on content information within the video. We formulate event detection as a multiple instance repre- sentation problem, which is most effective and suitable to localize the interesting events within the video. To obtain the effective event models in diverse video gen- res, we design a generic GSSMIL algorithm to integrate expert labeled data obtained by text analysis and unlabeled data collected from Internet, which improves the detection performance and solves the training data insufficiency prob- lem by resorting to Internet data source. [1] T. Zhang, C. Xu, G. Zhu, S. Liu, and H. Lu. A generic framework for event detection in various video domains, 2010. International Conference on Multimedia (ACM MM) (Full Paper).

Upload: others

Post on 03-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Towards Yahoo! Challenge: A Generic Event Detection andSegmentation System for Video Navigation and Search

Tianzhu Zhang1,2, Changsheng Xu1,2, Guangyu Zhu3, Si Liu1,2,3, Hanqing Lu1,2

1 National Lab of Pattern Recognition, Institute of Automation,Chinese Academy of Sciences, Beijing 100190, P. R. China

2 China-Singapore Institute of Digital Media, Singapore 119613, Singapore3 Department of Electrical and Computer Engineering, National University of Singapore, Singapore

{tzzhang, csxu, sliu, luhq}@nlpr.ia.ac.cn, [email protected]

ABSTRACTWe have developed a generic event detection and segmentation sys-tem for video navigation and search to address part of the challengeproposed by Yahoo!. We will first introduce the most significantfeatures of our system and then describe why our system presents anovel and interesting solution in relation to the Yahoo! challenge.

1. INTRODUCTION TO OUR SYSTEMOur system shown in Fig.1 is able to detect event and segment

video based on its content and is generic to various video domains(e.g. sports, news and movies). The event-based video segmenta-tion system can significantly facilitate video navigation and searchin large-scale video corpus. The details of the technical approachesand experimental results can be referred to our full paper [1]. Herewe highlight the most significant features of our system as follows:

1. Event Recognition: Given a video, our system can makeanalysis of its content information and obtain its event cat-egories. In this way, each video can be labeled using theircontent information instead of its textual metadata.

2. Event Segmentation/Localization: For an event in a videosequence, our system is able to exactly locate its start/endboundaries. Therefore, interesting events in a video can belocalized and video can be segmented with its semantic con-cept.

3. Semantic Navigation and Search: Based on event detection(event recognition and localization), our system also pro-vides a service on video semantic navigation and search. Asa result, it is very convenient for users to search and browsevideos with their personal preferences.

2. ADVANTAGES OF OUR SYSTEMThere are three challenges for robust video automatic segmenta-

tion according to event detection:(1) Current video search portals mostly rely on the title, tags or

surrounding page-text of the video, which ignore the richness ofinformation within the video.

(2) The next generation video search engine needs to provideusers with their interesting sections within a video for convenientbrowsing.

(3) How to develop generic algorithms to automatically generatenavigation for videos from various domains.

MM’10, October 25–29, 2010, Firenze, Italy.Copyright 2010 ACM 978-1-60558-933-6/10/10 ...$10.00.

...

...Semantic Navigation and Search

Demonstration:

ApplaudHand:

Event Localization

B_FreeThrow:

Kissing:

Event Recognition

Demonstration:

... ...

DrivingCar:

... ...

Input Video

...

...

Movie Video:

Sports Video:

News Video:

Figure 1: Video event detection and segmentation system. Forbetter viewing, please see the original color pdf file.

To solve these challenging problems, we propose a Graph-basedSemi-Supervised Multiple Instance Learning (GSSMIL) algorithmfor video navigation and search in different video domains basedon event detection, which has the following advantages:

• We semantically align the text and video information to learnevent model and then propagate labels to those videos with orwithout text information. Since our learned model is basedon video content, it can be used for video navigation andsearch based on content information within the video.

• We formulate event detection as a multiple instance repre-sentation problem, which is most effective and suitable tolocalize the interesting events within the video.

• To obtain the effective event models in diverse video gen-res, we design a generic GSSMIL algorithm to integrateexpert labeled data obtained by text analysis and unlabeleddata collected from Internet, which improves the detectionperformance and solves the training data insufficiency prob-lem by resorting to Internet data source.

[1] T. Zhang, C. Xu, G. Zhu, S. Liu, and H. Lu. A generic framework forevent detection in various video domains, 2010. InternationalConference on Multimedia (ACM MM) (Full Paper).