super

Seediscussions,stats,andauthorprofilesforthispublicationat:http://www.researchgate.net/publication/222520049

Asurveyontheuseofpatternrecognitionmethodsforabstraction,indexingandretrievalofimagesandvideo.PatternRecogn

ARTICLEinPATTERNRECOGNITION·APRIL2002

ImpactFactor:2.58·DOI:10.1016/S0031-3203(01)00086-3·Source:DBLP

CITATIONS

229

DOWNLOADS

987

VIEWS

368

3AUTHORS:

SameerKiranAntani

NationalLibraryofMedicine

212PUBLICATIONS1,934CITATIONS

SEEPROFILE

RangacharKasturi

UniversityofSouthFlorida


SEEPROFILE

RameshJain

UniversityofCalifornia,Irvine


SEEPROFILE

Availablefrom:SameerKiranAntani

Retrievedon:13September2015

http://www.researchgate.net/publication/222520049_A_survey_on_the_use_of_pattern_recognition_methods_for_abstraction_indexing_and_retrieval_of_images_and_video._Pattern_Recogn?enrichId=rgreq-0c2fc570-f487-4a72-9dc1-c6aa79f68a73&enrichSource=Y292ZXJQYWdlOzIyMjUyMDA0OTtBUzoxMDE4NDI2NzEzNzQzNDhAMTQwMTI5MjU5MzU0OQ%3D%3D&el=1_x_2

http://www.researchgate.net/publication/222520049_A_survey_on_the_use_of_pattern_recognition_methods_for_abstraction_indexing_and_retrieval_of_images_and_video._Pattern_Recogn?enrichId=rgreq-0c2fc570-f487-4a72-9dc1-c6aa79f68a73&enrichSource=Y292ZXJQYWdlOzIyMjUyMDA0OTtBUzoxMDE4NDI2NzEzNzQzNDhAMTQwMTI5MjU5MzU0OQ%3D%3D&el=1_x_3

http://www.researchgate.net/?enrichId=rgreq-0c2fc570-f487-4a72-9dc1-c6aa79f68a73&enrichSource=Y292ZXJQYWdlOzIyMjUyMDA0OTtBUzoxMDE4NDI2NzEzNzQzNDhAMTQwMTI5MjU5MzU0OQ%3D%3D&el=1_x_1

http://www.researchgate.net/profile/Sameer_Antani?enrichId=rgreq-0c2fc570-f487-4a72-9dc1-c6aa79f68a73&enrichSource=Y292ZXJQYWdlOzIyMjUyMDA0OTtBUzoxMDE4NDI2NzEzNzQzNDhAMTQwMTI5MjU5MzU0OQ%3D%3D&el=1_x_4


http://www.researchgate.net/institution/National_Library_of_Medicine?enrichId=rgreq-0c2fc570-f487-4a72-9dc1-c6aa79f68a73&enrichSource=Y292ZXJQYWdlOzIyMjUyMDA0OTtBUzoxMDE4NDI2NzEzNzQzNDhAMTQwMTI5MjU5MzU0OQ%3D%3D&el=1_x_6


http://www.researchgate.net/profile/Rangachar_Kasturi?enrichId=rgreq-0c2fc570-f487-4a72-9dc1-c6aa79f68a73&enrichSource=Y292ZXJQYWdlOzIyMjUyMDA0OTtBUzoxMDE4NDI2NzEzNzQzNDhAMTQwMTI5MjU5MzU0OQ%3D%3D&el=1_x_4


http://www.researchgate.net/institution/University_of_South_Florida?enrichId=rgreq-0c2fc570-f487-4a72-9dc1-c6aa79f68a73&enrichSource=Y292ZXJQYWdlOzIyMjUyMDA0OTtBUzoxMDE4NDI2NzEzNzQzNDhAMTQwMTI5MjU5MzU0OQ%3D%3D&el=1_x_6


http://www.researchgate.net/profile/Ramesh_Jain2?enrichId=rgreq-0c2fc570-f487-4a72-9dc1-c6aa79f68a73&enrichSource=Y292ZXJQYWdlOzIyMjUyMDA0OTtBUzoxMDE4NDI2NzEzNzQzNDhAMTQwMTI5MjU5MzU0OQ%3D%3D&el=1_x_4


http://www.researchgate.net/institution/University_of_California_Irvine?enrichId=rgreq-0c2fc570-f487-4a72-9dc1-c6aa79f68a73&enrichSource=Y292ZXJQYWdlOzIyMjUyMDA0OTtBUzoxMDE4NDI2NzEzNzQzNDhAMTQwMTI5MjU5MzU0OQ%3D%3D&el=1_x_6


Pattern Recognition 35 (2002) 945–965www.elsevier.com/locate/patcog

A survey on the use of pattern recognition methods forabstraction, indexing and retrieval of images and video

Sameer Antania, Rangachar Kasturia ; ∗, Ramesh JainbaDepartment of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA

bPraja, Inc., 10455-B Paci*c Center Court, San Diego, CA 92121, USA

Received 5 June 2000; received in revised form 16 February 2001; accepted 19 March 2001

Abstract

The need for content-based access to image and video information from media archives has captured the attentionof researchers in recent years. Research e0orts have led to the development of methods that provide access to imageand video data. These methods have their roots in pattern recognition. The methods are used to determine the similarityin the visual information content extracted from low level features. These features are then clustered for generation ofdatabase indices. This paper presents a comprehensive survey on the use of these pattern recognition methods whichenable image and video retrieval by content. ? 2002 Pattern Recognition Society. Published by Elsevier Science Ltd.All rights reserved.

Keywords: Content-based retrieval; Pattern recognition; Image databases; Video databases

1. Introduction

There has been a growing demand for image and videodata in applications due to the signi6cant improvementin the processing technology, network subsystems andavailability of large storage systems. This demand for vi-sual data has spurred a signi6cant interest in the researchcommunity to develop methods to archive, query andretrieve this data based on their content. Such systems,called content based image (and video) retrieval (CBIR)systems or visual information retrieval systems (VIRS)[1], employ many pattern recognition methods devel-oped over the years. Here the term pattern recognitionmethods refer to their applicability in feature extraction,feature clustering, generation of database indices, anddetermining similarity in content of the query anddatabase elements. Other surveys on techniques forcontent based retrieval of image and video have been

∗ Corresponding author. Tel.: +1-814-863-4254; fax:+1-814-865-3176.

E-mail address: [email protected] (R. Kasturi).

presented in Refs. [2–6]. A discussion on issues relevantto visual information retrieval and capturing the seman-tics present in images and video are presented in Ref.[7]. Applications where CBIR systems can be e0ectivelyutilized ranging from scienti6c, medical imaging togeographic information systems, and video-on-demandsystems among numerous others. Commercial CBIRsystems have been developed by Virage Inc., 1 Media-Site Inc. 2 and IBM. In this paper we present signi6cantcontributions published in the literature that use patternrecognition methods for providing content based accessto visual information. In view of the large number ofmethods that can perform speci6c tasks in the contentbased access to video problem, several researchers havepresented critical evaluations of the state of art in contentbased retrieval methods for image and video. An eval-uation of various image features and their performancein image retrieval has been done by Di Lecce and Guer-riero [8]. Mandal et al. [9] present a critical evaluation

1 Virage: http:==www.virage.com.2 MediaSite: http:==www.mediasite.com.

0031-3203/02/$22.00 ? 2002 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.PII: S0031-3203(01)00086-3

946 S. Antani et al. / Pattern Recognition 35 (2002) 945–965

Fig. 1. Typical present day image and video database system.

of image and video indexing techniques that operate inthe compressed domain. Gargi et al. [10] characterizethe performance of video shot change detection methodsthat use color histogram based techniques as well asmotion based methods in the MPEG 3 compressed anduncompressed domain.Research in retrieval of non-geometric pictorial

information began in the early 1980s. A relational data-base system that used pattern recognition and imagemanipulation was developed for retrieving LANDSATimages [11]. This system also introduced the prelim-inary concepts of query-by-pictorial-example (QPE),which has since matured to be better known asquery-by-example (QBE). Since then, the research hascome a long way in developing various methods thatretrieve visual information by its content, which dependon an integrated feature extraction=object recognitionsubsystem. Features such as color, shape and texture areused by the pattern recognition subsystem to select agood match in response to the user query. Present daysystems can at best de6ne the visual content with partialsemantics. In essence these are forms of quanti6ablemeasurements of features extracted from image andvideo data. Humans, on the other hand, attach seman-tic meaning to visual content. While the methods forextracting=matching images and video remain primar-ily statistical in nature [12], CBIR systems attach the

3 MPEG: Motion Picture Experts Group—http:==www.mpeg.org.

annotations to the extracted features for pseudo-semanticretrieval.Importance of pattern recognition methods in im-

age and video databases: Image and video databases aredi0erent in many aspects from the traditional databasescontaining text information. Image and video data can beperceived di0erently making it very diLcult to maintaincomplete content information of images or video. Thequeries would tend to be complex and better expressedin natural language form. The focus for the computervision and pattern recognition community is also to de-velop methods which can convert the query speci6cationinto meaningful set of query features. A typical presentday image and video database system block diagram isshown in Fig. 1. The 6gure shows that the databasesystem has two main points of interface, the databasegeneration subsystem and the database query subsystem.Pattern recognition methods are applied in both thesesubsystems. The inputs to the database generation sub-system are image and video data. The necessary featuresare extracted from this data and then appropriately clus-tered to generate indices to be used by the database querysubsystem. The database query subsystem matches a userquery using appropriate features to the indexed elementsof the database.Although many methods have been developed for

querying images that make use of exotic features ex-tracted from the visual information, successful methodsneed to be 6ne tuned for each application. In traditionalobject recognition the expected geometry, shape, tex-ture, and color of the object are known beforehand. But,

S. Antani et al. / Pattern Recognition 35 (2002) 945–965 947

with a real image and video database system, the datais sourced from diverse environments. This leads usto face the reality that a 6xed and a priori decided setof features will not work well. Also, typical users ofsuch systems may not be able to comprehend the com-plex methods used to implement the visual informationretrieval. Hence a more intuitive interface needs to begiven to the users, so that the query can then be mappedto appropriate parameters for the retrieval subsystem.All of the above lead to the need for a middle layerbetween the user query and the retrieval subsystem thatwill translate the query into the use of an appropriatefeature to query the image. This feature should be easilycomputable in real time and also be able to capture theessence of the human query.The remainder of the paper is organized as follows.

Section 2 describes some of the successful image=videodatabase systems brieMy. Section 3 discusses some meth-ods and issues related to similarity matching. We haveseparated the discussion on the multimedia database sys-tems into image database and video database systems.Section 4 describes the features used and pattern recog-nition methods employed for image databases. Section5, on the other hand, describes the classi6cation methodsused for features extracted from video data. We presentconclusions in Section 6.

2. Image and video database systems: examples

Several successful multimedia systems which havebeen developed in recent years are described in the litera-ture. In this section we shall highlight some of the maturesystems, the features they use and the primary classi6ca-tion method(s) utilized by them. A book by Gong [13]discusses some of the image database systems. Other sys-tems are mentioned due to their contributions to the de-velopment of methods for content based image and videoretrieval in Sections 4 and 5.Content based retrieval engine (CORE): In Ref. [14],

the authors describe and discuss the requirements ofmultimedia information systems. A multimedia informa-tion system is composed of multimedia objects, de6nedas a six tuple, for which it must o0er creation, manage-ment, retrieval, processing, presentation and usabilityfunctions. The authors admit that a multimedia objectis a complex component that is diLcult to describe. Itis audio-visual in nature and hence can have multipleinterpretations. Its description is context sensitive andhierarchical relationships can be formed between theseobjects. Content based retrieval is classi6ed into fourdistinct categories; visual retrieval, similarity retrieval,fuzzy retrieval and text retrieval. Each of these areaddressed in the CORE architecture. It supports colorbased and word=phonetics based retrieval. It also sup-ports fuzzy matching along with a feedback loop for

additional reliability. Two real world systems have beendeveloped on the CORE concept; Computer Aided FacialImage Inference and Retrieval (CAFIIR) and Systemfor Trademark Archival and Retrieval (STAR) [15].WebSeek: WebSeek 4 is a prototype image and video

search engine which collects images and videos fromthe Web and catalogs them. It also provides tools forsearching and browsing [16] using various content basedretrieval techniques that incorporate the use of color,texture and other properties. The queries can be a combi-nation of text based searches along with image searches.Relevance feedbackmechanisms are used to enhance per-formance.VideoQ: VideoQ is a web enabled content based video

search system [17]. VideoQ 5 expands the traditionalsearch methods (e.g., keywords and subject navigation)with a novel search technique that allows users to searchcompressed video based on a rich set of visual featuresand spatio-temporal relationships. Visual features includecolor, texture, shape, and motion. Spatio-temporal videoobject query extends the principle of query by sketchin image databases to a temporal sketch that de6nes theobject motion path.Blobworld: Bolbworld 6 is a system for content based

image retrieval by 6nding coherent regions that may cor-respond to objects [18]. It is claimed that current imageretrieval systems do not perform well in two key areas—6nding images based on regions of interest (objects) andpresenting results that are diLcult to comprehend sincethe system is presented to the user as a black box. TheBlobworld system performs search on parts of an imagewhich is treated as a group of blobs roughly represent-ing homogeneous color or texture regions. Each pixel inthe image is described using a vector of 8 entries, thecolor in L∗a∗b∗ space, the texture properties of contrast,anisotropy and polarity and the position (x; y). The imageis treated as mixture of Gaussians and is segmented usingthe Expectation-Minimization algorithm. The user canquery Blobworld using simple queries of 6nding imagessimilar to a selected blob or creating complex queriesusing other blobs.The multimedia analysis and retrieval system

(MARS): The multimedia analysis and retrieval system(MARS) [19] allows image queries using color, tex-ture, shape and layout features. 2D color histogramsin the HSV color space are used as the color feature,texture is represented using the coarseness, contrast anddirectionality (CCD) values. Shape based queries arematched using modi6ed fourier descriptors (MFD). Thelayout queries uses a combination of the above featureswith respect to the image. The retrieval methods areFuzzy Boolean Retrieval, where the distance between

4 WebSeek: http:==www.ctr.columbia.edu=webseek=.5 VideoQ: http:==www.ctr.columbia.edu=VideoQ.6 BlobWorld: http:==elib.cs.berkeley.edu=photos=blobworld.


the query image and the database image is the degree ofmembership to the fuzzy set of images that match thespeci6ed feature, and Probabilistic Boolean Retrieval,which determines the measure of probability that the im-age matches the user’s information need. Recent updatesto the system include query formulation and expansionalong with relevance feedback.PicToSeek: The PicToSeek 7 system [20] uses the

color variation, saturation, transition strength, back-ground and grayness as the indexing features to providecontent based access to images. The system is based onthe philosophy that the color based image indexing sys-tems published in the literature do not take into accountthe camera position, geometry and illumination. Thecolor transition strength is the number of hue changesin the image. The HSV color system is selected sinceit has the value parameter which represents the imageintensity and which can be extended to provide intensityinvariant matching. A reMection model is constructed foreach object in the image in the sensor space and the pho-tometric color invariant features are computed that areused for illumination invariant image retrieval. In theirexperiments, histogram cross-correlation provided thebest results. Sub-image matching is provided through theuse color-invariant active contours (snakes). The systemalso uses a color invariant gradient scheme for retrievalof multi-valued images. The PicToSeek system providesa combination of color, texture and shape features forimage retrieval.Content-based image retrieval in digital libraries

(C-BIRD): The C-BIRD 8 system presents an illumi-nation invariant approach to retrieval of color imagesthrough the color channel normalization step [21]. Thecolor information is reduced from 3 planes by projectingthem onto a single plane connecting the unity values onthe R; G and B color axes. While usage of the chro-maticity plane provides illumination invariance, it failsto provide chrominance changes in the illumination.This is achieved by applying L2 normalization on theR; G and B vectors formed by taking the value of eachof these components at each pixel location. Histogramintersection is used for image matching. The histogramsize is reduced by using the discrete cosine transform.Object search is facilitated through feature localizationand a three step matching algorithm which operates onimage locales that are formed of dominant color regions.The texture in these regions is determined by computingthe edge separation and directionality, while the shapeis represented using the generalized Hough transform.Query-by-image-content (QBIC): The query-by-

image-(and video)-content (QBIC) system developed atIBM Almaden Research Center is described as a set oftechnologies and associated software that allows a user

7 PicToSeek: http:==www.wins.uva.nl=research=isis=zomax=.8 C-BIRD: http:==jupiter.cs.sfu.ca=cbird=.

to search, browse and retrieve image, graphic and videodata from large on-line collections [22]. The systemallows image and video databases to be queried usingvisual features such as color, layout and texture thatare matched against a meaningfully clustered databaseof precomputed features. A special feature in QBIC isthe use of gray level images where the spatial distri-bution of gray level texture and edge information isused. The feature vector thus generated has a very highdimension (sometimes over 100). QBIC allows QBEand query-by-user-sketch type queries. Video retrievalis supported by generation of video storyboards whichconsist of representative frames selected from subse-quences separated by signi6cant changes such as scenecuts or gradual transitions. Technologies from QBIC arecombined with results from speech recognition researchto form an integrated system called CueVideo [23].Informedia digital video library: The Informedia Digi-

tal Video Library Project 9 at CarnegieMellon Universityis a full Medged digital video library under development[24]. The developers of the system project that a typicaldigital library user’s interests will lie in short video clipsand content-speci6c segments skims. The library devel-opers have designed methods to create a short synopsisof each video. Language understanding is applied to theaudio track to extract meaningful keywords. Each videoin the database is then represented as a group of represen-tative frames extracted from the video at points of signif-icant activity. This activity may be abrupt scene breaks,some form of rapid camera movement, gradual changesfrom one scene to another, and points in the video wherethe keywords appear. Caption text 10 is also extractedfrom these frames which add to the set of indices for thevideo.Other systems: Chabot 11 is a picture retrieval system

which uses a relational database management system forstoring and managing the images and their associated tex-tual data [25]. To implement retrieval from the currentcollection of images, Chabot integrates the use of storedtext and other data types with content-based analysis ofthe images to perform “concept queries”. The project hasa mix of annotated and un-annotated images, and aims todevelop methods for retrieving them based on color, tex-ture and shape. The Manchester multimedia system [26]uses geometric features such as area, perimeter, length,bounding box, longest chord, etc., of the image objectsas features. VisualGREP [27] is a system for comparingand retrieving video sequences. The features used by thissystem are the color atmosphere represented as color co-herence vector (CCV), the motion intensity representedby the edge change ratio (ECR), and presence of frontal

9 Informedia: http:==www.informedia.cs.cmu.edu.10 Text added in video post-production by graphic editing

machines.11 Chabot : http:==www.cs.berkeley.edu= ginger=chabot.html.


face in the image. Cox et al. describe the PicHunter sys-tem in Ref. [28] that is based on a Bayesian frameworkwhich is used to capture the relevance feedback to im-prove search results.

3. Similarity measures

This section describes the similarity measures used formatching visual information and the approaches taken toimprove similarity results. The similarity is determinedas a distance between some extracted feature or a vectorthat combines these. The most common scheme adoptedis the use of histograms which are deemed similar if theirdistance is less than or equal to a preset distance thresh-old. Some of the common histogram comparison tests aredescribed here. In other cases, the features are clusteredbased on their equivalency which is de6ned as the dis-tance between these features in some multi-dimensionalspace. The most commonly used distance measures arethe Euclidean distance, the Chamfer distance, the Haus-dor0 distance, the Mahalanobis distance, etc. The clus-tering methods commonly used are k-means clustering,fuzzy k-means, minimum spanning tree (MST), etc. De-tails on these can be found in a pattern classi6cation textsuch as Ref. [29]. A review of statistical pattern recogni-tion and clustering and classi6cation techniques is cov-ered in Ref. [30].If Hcurr and Hprev represent the histograms having

n bins and i is the index into the histograms then thedi0erence between these histograms is given by D asshown in Eq. (1). Histogram intersection is de6ned as inEq. (2) where (when normalized) a value close to 1.0represents similarity. The Yakimovsky test was proposedto detect the presence of an edge at the boundary of tworegions. The expression for Yakimovsky Likelihood Ra-tio is given by Eq. (3), �2

1 and �22 are the individual vari-

ances of the histograms while �20 is the variance of the

histogram generated from the pooled data. The numbersm and n are the number of elements in the histogram. Alow value of y indicates a high similarity:

D=n∑

i=0

|Hcurr(i)− Hprev(i)|; (1)

Dint =∑n

i=1 min(Hcurr(i); Hprev(i))∑ni=1 Hprev(i)

; (2)

y=(�2

0)m+n

(�21)m(�

22)n

: (3)

The "2 test for comparing two histograms as proposed byNagasaka and Tanaka [31] is given by Eq. (4), where lowvalue for "2 indicates a good match. The Kolmogorov–Smirnov test, given by Eq. (5), is based on the cumulative

distribution of the two sets of data. Let CHprev(j) andCHcurr(j) represent the cumulated number of entities upto the jth bin of the histograms of the previous and currentframe given by Hprev and Hcurr . A low value of D resultsfor a good match. The Kuiper test, de6ned in Eq. (6),is similar to the Kolmogorov–Smirnov test, but is moresensitive to the tails of the distributions:

"2 =n∑

i=1

(Hcurr(i)− Hprev(i))2

(Hcurr(i) +Hprev(i))2; (4)

D=maxj

|CHprev(j)− CHcurr(j)|; (5)

D=maxj

[CHprev(j)− CHcurr(j)]

+maxj

[CHcurr(j)− CHprev(j)]; (6)

d=(I −M)TA(I −M): (7)

The perceptual similarity metric [32], de6ned in Eq. (7),is similar to the quadratic distance function. Here I andM are histograms with n bins. Matrix A=[aij] containssimilarity weighting coeLcients between colors corre-sponding to bins i and j. Other metrics seen in the lit-erature are the Kantorovich metric (also referred to asthe Hutchison metric) [33], the Choquet integral distance[34], the Minkowski distance metric given by Eq. (8),three special cases of the LM metric (L1; L2 and the L∞),and the Canberra distance metric shown in Eq. (9). TheCzekanowski coeLcient, de6ned in Eq. (10), is appli-cable only to vectors with non-negative components. InEqs. (8)–(10), p is the dimension of the vector x̃i andxki is the kth element of x̃i [35]:

dM (i; j)=

( p∑k=1

|(xki − xkj )|M)1=M

; (8)

dc(i; j)=p∑

i=1

|xki − xkj ||xki + xkj |

; (9)

dz(i; j)=1− 2∑p

k=1 min(xki ; xkj )∑p

k=1(xki + xkj )

: (10)

Brunelli and Mich [36] present a study on the e0ective-ness of the use of histograms in image retrieval. Stricker[37] has done a study on the discrimination power of thehistogram based color indexing techniques. In his workhemakes the observation that histogram based techniqueswould work e0ectively only if the histograms are sparse.He also determines the lower and upper bounds on thenumber of histograms that 6t in the de6ned color spaceand suggests ways to determine the threshold based onthese bounds.


3.1. Improving similarity results

Content based searches can be classi6ed into threetypes, the target search where a speci6c image is sought,the category search where one or more images from acategory are sought, and the open-ended browsing inwhich the user seeks an image by specifying visuallyimportant properties [38]. Histograms and other featurescan be used for target searches while open-ended brows-ing can be achieved by allowing the user to select avisual item and 6nding other similar visual information.Relevance feedback has been proposed as a method forimproving category based searches [39–42]. At eachstep the user marks the search results as relevant or irrel-evant to the speci6ed query. The image database learnsfrom the user response and categorizes the images basedon user input. Meilhac and Nastar [38] present the useof relevance feedback in category searches. The authorsuse Parzen window estimator for density estimation,the Bayes Inference rule to compute the probability ofrelevance versus the probability on non-relevance in thedatabase. The strategy has been used in SurfImage 12

which is an interactive content-based image retrievalsystem.Minka and Picard [43] observe that it is often diLcult

to select the right set of features to extract to make the im-age database robust. They use multiple feature models ina system that learns from the user queries and responsesabout the type of feature that would best serve the user.The problems of using multiple features is reduced bya multi-stage clustering and weight generation process.The stages closest to the user are trained the fastest andthe adaptations are propagated to earlier stages improvingoverall performance with each use. Classi6cation basedapproach using the Fisher Linear Discriminant and theKarhunen–Loeve transform is described in Ref. [44].

3.2. Determining visual semantics

Santini and Jain [45] present a study on some of theproblems found in attempting to extract meaning fromimages in a database. They argue that the image seman-tics is a poorly de6ned entity. It is more fruitful to derivea correlation between the intended user semantics andthe simple perceptual clues that can be extracted. It iscontended that the user should have a global view of theplacement of images in the database and also be able tomanipulate its environment in a simple and intuitive way.They point out that specifying “more texture” and “lesscolor” is not intuitive to the user in the e0ect it will haveon the database response to the query. This places everyimage in the context of other images that, given the cur-rent similarity criterion, are similar to it. Two interfaces

12 SurfImage Demo: http:==www-rocq.inria.fr=cgi-bin=imedia=sur6mage.cgi.

are proposed for meaningful results to be possible; viz.,the direct manipulation interface and the visual conceptsinterface.Direct manipulation allows the manipulation of the im-

age locations in various clusters, thereby forming new se-mantics and allowing di0erent similarity interpretations.Such an interface will require three kinds of spaces tobe de6ned; viz., the feature space, the query space andthe display space. When specifying visual concepts, theuser could select equivalent images from the databaseand place them in a bucket thus creating new semantics.The equivalency of the images in the bucket is decidedby the user on a concept meaningful to the current ap-plication. But this equivalency could be in a high dimen-sional feature space and mapping this to a display spacecould be a challenging task. For such a speci6cation tobe possible the database must allow arbitrary similaritymeasures to be accommodated and also be able to makeoptimal use of the semantics induced by the use of suchvisual concepts.

3.3. Similarity measures

Similarity measures for selecting images based oncharacteristics of human similarity measurement arestudied in Ref. [46]. The authors apply results fromhuman psychological studies to develop a metric forqualifying image similarity. The basis for their work isthe need to 6nd a similarity measure relatively indepen-dent of the feature space used. They state that if SA andSB are two stimuli, represented as vectors in a space ofsuitable dimension, then the similarity between the twocan be measured by a psychological distance functiond(SA; SB). They introduce the di0erence between per-ceived dissimilarity d and judged dissimilarity '. Thetwo are related by a monotonically decreasing functiong given by '(SA; SB)= g[d(SA; SB)]. The requirementsof the distance function are constancy of self-similarity,minimality, and symmetrical distance between thestimuli.The authors lay the foundation for set-theoretic and

fuzzy set-theoretic similarity measures. They also intro-duce three operators which are similar in nature to theand, or and not operators used in relational databases butare able to powerfully express complex similarity queries.It is shown that the assumption made by most similaritybased methods that the feature space is Euclidean is in-correct. They analyze some similarity measures proposedin the psychological literature to model human similar-ity perception, and show that all of them challenge theEuclidean assumption in non-trivial ways. The sugges-tion is that similarity criteria must work not only for im-ages very similar to the query, as would be the case formatching, but for all the images reasonably similar to thequery. The global characteristics of the distance measureused are much more important in this case.


The propositions used to assess the similarity are pred-icates. For textures, a wavelet transform of the image istaken, and the energy (sum of the squares of the coef-6cients) in each sub-band is computed and included inthe feature vector. To apply the similarity theory, a lim-ited number of predicate-like features such as luminosity,scale, verticality and horizontality are computed. Thedistance between two images can be de6ned accordingto the Fuzzy Tversky model:

S(); ) =p∑

i=1

max{+i()); +i( )}

− ,p∑

i=1

max{+i())− +i( ); 0}

+-p∑

i=1

max{+i())− +i( ); 0}; (11)

where p is the number of predicates, + is the truth valuevector formed on predicates applied to images ) and ,and , and - are constants which decide the importanceof the results.

4. Pattern recognition methods for image databases

The methods described in the literature have beenfound to use three types of features extracted from theimage. These features are color based, shape based andtexture based. Some systems use a combination of fea-tures to index the image database.

4.1. Color based features

Color has been the most widely used feature in CBIRsystems. It is a strong cue for retrieval of images andalso is computationally least intensive. Color indexingmethods have been studied using many color spaces, viz.,RGB, YUV, HSV, L∗u∗v∗, L∗a∗b∗, the opponent colorspace and the Munsell space. The e0ectiveness of usingcolor is that it is an identifying feature that is local to theimage and largely independent of view and resolution.The major obstacles that should be overcome to 6nd agood image match are variation in viewpoint, occlusion,and varying image resolution. Swain and Ballard [47] usehistogram intersection to match the query image and thedatabase image. Histogram intersection is robust againstthese problems and they describe a preprocessing methodto overcome change in lighting conditions.In the PICASSO system [48] the image is hierarchi-

cally organized into non-overlapping blocks. The clus-tering of these blocks is then done on the basis of colorenergy in the L∗u∗v∗ color space. Another approach sim-ilar to the Blobworld project is to segment the image intoregions by segmenting it into non-overlapping blocks of

a 6xed size and merging them on overall similarity [49].A histogram is then formed for each such region and afamily of histograms describes the image. Image match-ing is done in two ways—chromatic matching is done byhistogram intersection; and geometric matching is doneby comparing the areas occupied by the two regions. Asimilarity score is the weighted average of the chromaticand geometric matching. Other color based features de-rived from block based segmentation of the image arethe mean, standard deviation, RMS, skew and kurtosisof the image pixel I(p) in a window [50]. These arerepresented using a 6ve-dimensional feature vector. Theimage similarity is then a weighted Euclidean distancebetween the corresponding feature vectors. The similar-ity score using these statistical feature vectors are givenby

Sstat(A; B)=

√√√√ F∑f=1

wf(df(f; A)− df(f; B))2: (12)

In Eq. (12), df(f; I) denotes the value of distribu-tion of feature f in image I . F denotes the numberof features used to describe the color distribution andwf is the weight of the fth feature vector which islarger for features of smaller variance. Lin et al. [51]use 2D-pseudo-hidden Markov model (2D-PHMM) forcolor based CBIR. The reason for using 2D-PHMM isits Mexibility in image speci6cation and partial imagematching. 2 1-D HMMs are used to represent the rowsand the columns of the quantized color images. MTulleret al. [52] present an improvement by combining shapeand color into a single statistical model using HMMs forrotation invariant image retrieval.Pass et al. [53] have developed color coherence his-

tograms to overcome the matching problems with stan-dard color histograms. Color coherence vectors (CCV)include spatial information along with the color densityinformation. Each histogram bin j is replaced with a tuple(,j; -j), where , is the number of pixels of the color inbin j that are coherent, or belong to a contiguous regionof largely that color. The - number indicates the numberof incoherent pixels of that color. They also propose theuse of joint histograms for determining image similarity.A joint histogram is de6ned as a multi-dimensional his-togram where each entry counts the number of pixels inthe image that describe some particular set of featuressuch as color, edge density, texture, gradient magnitudeand rank. Huang et al. [54] propose the use of color cor-relograms over CCVs. A color correlogram expresses thespatial correlation of pairs of colors with distance. It isde6ned as

2(k)ci ;cj (I)�= Pr

p1∈Icip2∈I

[p2 ∈Icj | |p1 − p2|= k]: (13)


Eq. (13) gives the probability that, in the image I, thecolor of the pixel p2 at distance k from the pixel p1

with color ci will be cj. This includes the information ofthe spatial correlation between colors in an image, thusmaking it robust against change of viewpoint, zoom-in,zoom-out, etc.Tao and Grosky [55] propose using Delauney trian-

gulation based approach to classifying image regionsbased on color. Cinque et al. [56] use spatial-chromatichistograms for determining image similarity. Each entryin this histogram consists of a tuple with three entries,the probability of the color, the baricenter of the pix-els with the same color which is akin to the centroid,and the standard deviation in the spread of the pixelsof the same color in the image. The image matchingis done by separately considering the color and spatialinformation about the pixels in a weighted function.Chahir and Chen [57] segment the image to determinecolor-homogeneous objects and the spatial relation-ship between these regions for image retrieval. Smithand Li [58] use composite region templates (CRTs)to identify and characterize the scenes in the imagesby spatial relationships of the regions contained in theimage. The image library pools the CRTs from eachimage and creates classes based on the frequency ofthe labels. Bayesian inference is used to determinesimilarity. Brambilla et al. [59] use multi-resolutionwavelet transforms in the modi6ed L∗u∗v∗ space toobtain image signatures for use in content based imageretrieval applications. The multi-resolution waveletdecomposition is performed using the Haar wavelettransform by 6ltering consecutively along horizontaland vertical directions. Image similarity is determinedthrough a supervised learning approach.Vailaya et al. [60] propose classi6cation of images into

various classes which the can be easily identi6ed. Theyuse the Bayes decision theory to classify vacation pho-tographs images into indoor versus outdoor and sceneryversus building images. The class-conditional probabil-ity density functions are estimated under a vector quan-tization (VQ) framework. MDL-type principle is used todetermine the optimal size of the vector codebook. An-droutsos et al. [61] drive home the importance of beingable to specify colors which should be excluded from theimage retrieval query. Most color based image retrievalsystems allow users to specify colors in the target image,but do not handle speci6cations for color exclusion. Theauthors index images on color vectors in the RGB colorspace de6ning regions in the image. The similarity met-ric is based on the angular distance between the queryvector and the target vector and is given by

-(xi; xj) = exp(−,(1−

[1− 2

3cos−1

(xi · xj|xi| |xj|

)]

×[1− |xi − xj|√

3:2552

])): (14)

In Eq. (14), xi and xj are the three-dimensional colorvectors being compared, , is a design parameter and 2=3and

√3:2552 are normalizing factors. A vector de6n-

ing the minimum such distances for each query color isformed. This vector then de6nes the multi-dimensionalquery space. From the target images selected by this pro-cess those images which have minimum distance to theexcluded colors are then removed:

DMq; i = |f̃q − f̃i|=

∑R;G;B

|+q − +i|; (15)

DEq; i =

√(f̃q − f̃i)2 =

√∑R;G;B

(+q − +i)2: (16)

While histogram and other matching methods are verye0ective for color based matching, color cluster pro-vides a healthy alternative to it. In Refs. [62–64],several approaches to color based image retrieval arediscussed. The distance method uses a feature vectorwhich is formed of the mean values obtained from the1-D histograms (normalized for the number of pix-els) of each color component R;G and B given byf̃=(+R; +G; +B), where +i is the mean of the distributionof color component i. The distance between the queryimage q with feature vector f̃q and the database imagei with feature vector f̃i is computed using the Manhat-tan distance measure, given by DM

q; i, and the Euclideandistance measure, given by DE

q; i, as shown in Eqs. (15)and (16), respectively. If all the colors of the imagesin the database could be perceptually approximated, thereference color method can be used. In this case thefeature vector is given by f̃=(51; 52; : : : ; 5n), where 5i

is the relative pixel frequency of the reference color i. Aweighted Euclidean distance measure is used to computethe similarity between the query image and the databaseimage. The drawback of the reference color table basedmethod is that one has to know all the colors of all theimages in the database. This makes modi6cations to thedatabase diLcult and is also infeasible for real worldimages and large archives. In Ref. [63], the authorspresent a color clustering based approach, in the L∗u∗v∗space, to determination of image similarity. Such ap-proaches are good for images with few dominant colors.Two classi6ers are discussed for image matching, theminimum distance classi6er adjusted for pixel popula-tion and a classi6er based on the Markov random 6eldprocess. The latter uses the spatial correlation propertyof image pixels. This method was found to be an im-provement over methods proposed earlier and does verywell on color images without requiring a priori infor-mation about the images. Kankanhalli et al. [65] presenta method for combing color and spatial clustering forimage retrieval. Some other methods for clustering colorimages can be found in Refs. [66,67]. A comparison andstudy of color clustering on image retrieval is presentedin Refs. [68,69].


Hierarchical color clustering has been proposed as amore robust approach to indexing and retrieval of images[70,71]. Several problems have been identi6ed with tra-ditional histogram based similarity measures. The com-putational complexity is directly related to the histogramresolution. Color quantization reduces this complexity,but also causes loss of information and su0ers from in-creased sensitivity to quantization boundaries. Wang andKuo [70] develop a hierarchical feature representationand comparison scheme for image retrieval. The simi-larity between images is de6ned as the distance betweenthe features at the kth resolution.

4.2. Shape based features

Di0erent approaches are taken for matching shapesby the CBIR systems. Some researchers have projectedtheir use as a matching tool in QBE type queries. Oth-ers have projected its use for query-by-user-sketch typequeries. The argument for the latter being that in a usersketch the human perception of image similarity is in-herent and the image matching sub-system does not needto develop models of human measures of similarity. Oneapproach adopts the use of deformable image templatesto match user sketches to the database images [72–74].Since the user sketch may not be an exact match of theshape in the database, the method elastically deforms theuser template to match the image contours. An imagefor which the template has to undergo minimal deforma-tion, or, loses minimum energy, is considered as the bestmatch. A low match means that the template is lying inareas where the image gradient is 0. By maximizing thematching function and minimizing the elastic deforma-tion energy, a match can be found. The distance betweentwo images (or image regions) is given by

D(T; I; u) =∫ ∫

S(Tu1(x1; x2); u2(x1; x2))

− I(x1; x2))2 d(x1; x2); (17)

J(f) =∫ ∫

S((fx1x1 )

2 + 2(fx1x2 )2

+ (fx2x2 )2) d(x1; x2); (18)

F=+(J(u1) +J(u2)) +D(T; I; u): (19)

In Eq. (17), x1 and x2 are coordinate of some pointon the grid on surface S, u=(u1; u2) de6nes the de-forming function causing the template and the targetimage to match, resulting in new coordinates given byu1(x1; x2) and u2(x1; x2) for template T and image I .The total amount of bending of the surface de6ned by(x1; x2; f(x1; x2)) is measured as in Eq. (18). The defor-mation energy is thus J(u1) + J(u2). The balance be-tween the amount of warp and the energy associated with

it is given by Eq. (19), where + controls the sti0ness ofthe template. Adoram and Lew [75] use gradient vectorMow (GVF) based active contours (snakes) to retrieveobjects by shape. They note that deformable templatesare highly dependent on their initialization and are un-able to handle concavities. The authors present results bycombining GVF snakes with invariant moments. GTunseland Tekalp [76] de6ne a shape similarity based directlyon the elements of the mismatch matrix derived from theeigenshape decomposition. A proximity matrix is formedusing the eigenshape representation objects. The distancebetween the eigenvectors of the query and target ob-ject proximity matrices forms the mismatch matrix. Theelements of the mismatch matrix indicate the matchedfeature points. These are then used to determine thesimilarity between the shapes.A di0erent approach to CBIR based on shape has been

through use of implicit polynomials for e0ective rep-resentation of geometric shape structures [77]. Implicitpolynomials are robust, stable and exhibit invariant prop-erties. The method is based on 6xing a polynomial to acurve patch. A vector consisting of the parameters of thiscurve is used to match the image to the query. A typi-cal database would contain the boundary curve vectors atvarious resolutions to make the matching robust. Alferezand Wang [78] present a method to index shapes whichis invariant to aLne transformations, rigid-body motion,perspective transforms and change in illumination. Theyuse a parameterized spline and wavelets to describe theobjects. Petrakis and Milios [79] use a dynamic pro-gramming based approach for matching shapes at variouslevels of shape resolution. Mokhtarian and Abbasi [80]apply the curvature scale space based matching for re-trieval of shapes under aLne transform.Sharvit et al. [81] propose the use of shock structures

to describe shapes. The describe the symmetry-basedrepresentation as one which retains the advantages of thelocal, edge-based correlation approaches, as well as ofglobal deformable models. It is termed as an intermediaterepresentation. Two bene6ts of this approach have beenoutlined; the computation of similarity between shapesand the hierarchical symmetries captured in a graph struc-ture. Rui et al. [82] propose the use of multiple matchingmethods to make the retrieval robust. They de6ne the re-quirements of the parameter as invariance and compactform of representation. The authors de6ne a modi6edFourier descriptor (MFD) which is an interpolated formof the low frequency coeLcients of the Fourier descrip-tor normalized to unit arc-length. They also calculatethe orientation of the major axis. The matching of theimages is then done using the Euclidean distance, MFDmatching, Chamfer distance and Hausdor0 distance.Although these matching tools have been used in thissystem, they can also be used to match shapes whichhave been speci6ed using other appropriate descriptors.Jain and Vailaya present a study of shape based retrieval


methods with respect to trademark image databases[83]. An invariant-based shape retrieval approach hasbeen presented by Kliot and Rivlin [84]. Semi-localmulti-valued invariant signatures are used to describethe images. Such representation when used with contain-ment trees, a data structure introduced by the authors,allows for matching shapes which have undergone achange in viewpoint, or are under partial occlusion. Italso allows retrieval by sketch. The invariant shape repa-rameterization is done by applying various transforms(translation, rotation, scale) to the curve signature.Translation, rotation and scale invariance, which is im-

perative for shape based retrieval, can also be achievedthrough the use of Fourier–Mellin descriptors. Derrodeet al. [85] base their system on these and describe themto be stable under small shape distortions and numeri-cal approximations. The analytical Fourier–Mellin trans-form (AFMT) is used to extract the Fourier–Mellin de-scriptors. For an irradiance function f(r; 7) representinga gray level image over R2 and the origin of the polarcoordinates located over the image centroid, the AFMTof f is given in Eq. (20). For all k in Z , v in R, and�¿ 0, f is assumed to be square summable under themeasure dr d7=r:

M�f(k; v)=

123

∫ 1

0

∫ 23

0f(r; 7)r�−ive−ik7 dr

rd7: (20)

4.3. Texture based features

The visual characteristics of homogeneous regions ofreal-world images are often identi6ed as texture. Theseregions may contain unique visual patterns or spatialarrangements of pixels which gray level or color in aregion alone may not suLciently describe. Typically,textures have been found to have strong statistical and=orstructural properties. The textures have been expressedusing several methods. One system uses the quadraturemirror 6lter (QMF) representation of the textures ona quad-tree segmentation of the image [86]. Fisher’sdiscriminant analysis is the used to determine a gooddiscriminant function for the texture features. TheMahalanobis distance is used to classify the features. Animage can be described by means of di0erent orders ofstatistics of the gray values of the pixels inside a neigh-borhood [87]. The features extracted from the image his-togram, called the 6rst order features, are mean, standarddeviation, third moment and entropy. The second orderfeatures are homogeneity, contrast, entropy, correlation,directionality and uniformity of the gray level pixels.Also included is the use of several other third orderstatistics from run-length matrices. A vector composedof these features is then classi6ed based on the Euclideandistance.The Gabor 6lter and wavelets can also be used to

generate the feature vector for the texture [88]. The

assumption is that the texture regions are locally ho-mogeneous. The feature vector is then constructed frommultiple scales and orientations comprising of the meansand standard deviations at each. Other work done byPuzicha et al. [89] uses Gabor 6lters to generate the im-age coeLcients. The probability distribution function ofthe coeLcients is used in several distance measures todetermine image similarity. The distances used are theKolmogorov–Smirnov distance, the "2-statistic, the Jef-fery divergence, weighted mean and variance among oth-ers. Mandal et al. [90] propose a fast wavelet histogramtechnique to index texture images. The proposed tech-nique reduces the computation time compared to otherwavelet histogram techniques. A rotation, translation andscale invariant approach to content based retrieval ofimages is presented by Milanese and Cherbuliez [91].The Fourier power spectra coeLcients transformed tologarithmic–polar coordinate system are used as the im-age signature for image matching. The image matchingis done using the Euclidean distance measure. Strobel etal. [92] have developed a modi6ed maximum a posteriori(MMAP) algorithm to discriminate between hard to sep-arate textures. The authors use the normalized 6rst orderfeatures, two circular Moran autocorrelation features andPentland’s fractal dimension to which local mean andvariance has been added. The Euclidean distance metricis used to determine the match. To decorrelate the fea-tures the singular value decomposition (SVD) algorithmis used.In the Texture Photobook due to Pentand et al. [93]

the authors use the model developed by Liu and Picard[94] based on the Wold decomposition for regular sta-tionary stochastic processes in 2D images. This modelgenerates parameters that are close if the images are sim-ilar. If the image is treated as a homogeneous 2D discreterandom 6eld, then the 2D Wold like decomposition isa sum of three mutually orthogonal components whichqualitatively appear as periodicity, directionality and ran-domness, respectively. The Photobook consist of threestages. The 6rst stage determines if there is a strong pe-riodic structure. The second stage of processing occursfor periodic images on the peaks of their Fourier trans-form magnitudes. The third stage of processing is appliedwhen the image is not highly structural. The complexitycomponent is modeled by use of a multi-scale simultane-ous autoregressive (SAR) model. The SAR parametersof di0erent textures are compared using the Mahalanobisdistance measure.

4.4. Combination of features

Several approaches take advantage of the di0erentfeatures by applying them together. The features are of-ten combined into a single feature vector. The distancesbetween the images is then determined by the distancebetween the feature vectors.


Zhong and Jain [95] present a method for combiningcolor, texture and shape for retrieval of objects from animage database without preindexing the database. Im-age matching is done in the discrete cosine transform(DCT) domain (assuming that the stored images are inthe JPEG 13 format). Color and texture are used to prunethe database and then deformable template matching isperformed for identifying images that have the speci6edshape. Mojsilovic et al. [96] propose a method whichcombines the color and texture information for matchingand retrieval of images. The authors extract various fea-tures such as the overall color, directionality, regularity,purity, etc., and build a rule based grammar for identify-ing each image:

wi =(0:5 +

0:50max{0}

)log(Nn

); (21)

S(Q;D)=∑t

k=1 min{wQk ; wIk}∑k=1 twQk

: (22)

The PicToSeek system developed by Gevers and Smeul-ders uses a combination of color, texture and shapefeatures for image retrieval. The images are described bya set of features and the weights associated with them.The weights are computed as the product of the features’frequency times the inverse collection frequency factor,given by Eq. (21), where feature frequency is givenby 0, N is the number of images in the database, andn denotes the number of images to which a featurevalue is assigned. The image similarity is determined byEq. (22), where Q is the query image vector of fea-tures as associated weights and D is the database imagevector; each having t features.Scarlo0 et al. [97] propose combining visual and tex-

tual cues for retrieval of images from the World WideWeb. Words surrounding an image in a HTML docu-ment or those included in the title, etc., may serve todescribe the image. This information is associated withthe image through the use of the latent semantic indexingtechnique. Normalized color histogram in the quantizedL∗u∗v∗ space and texture orientation distribution usingsteerable pyramids form cues that are used to visuallydescribe each image. The aggregate visual vector is com-posed of 12 vectors that are computed by applying colorand texture features from on the entire image and 6vesub-image regions. It is assumed that the vectors are in-dependent and have a Gaussian distribution. SVD is ap-plied for these vectors over a randomly chosen subset ofthe image database and the average and covariance val-ues for each subvector is computed. The dimensionalityof the vector components is reduced by using principalcomponent analysis. The 6nal global similarity measureis a linear combination of the text and visual cues in

13 JPEG: Joint Photographic Experts Group—http://www.jpeg.org.

normalized Minkowski distance. Relevance feedback isused to weight queries. Berman and Shapiro describe aMexible image database system (FIDS) [98] with an aimto reduce the computation time required to calculate thedistance measures between images in a large database.FIDS presents the user with multiple measures for deter-mining similarity and presents a method for combiningthese measures. The triangle inequality is used to min-imize the index searches which are initially performedusing keys which describe certain image features. Thedistance measures in FIDS compare the color, local bi-nary partition texture measure, edges, wavelets, etc.

5. Pattern recognition methods for video databases

Pattern recognition methods are applied at severalstages in the generation of video indices and in videoretrieval. Video data has the feature of time in addi-tion to the spatial information contained in each frame.Typically, a video sequence is made up of severalsubsequences, which are uniform in overall contentbetween their start and end points marked by cuts orshot changes. A scene change is de6ned as a pointwhen the scene content changes, e.g. indoor to outdoorscene. Some times these end points may also be gradualchanges, called by the generic name gradual transitions.Gradual transitions are slow changes from the scene inone subsequence to the next that are further classi6edas blends, wipes, dissolves, etc. Thus, a video databasecan be indexed at the cut (or transition) points. Somemethods cluster similar frames while others take theapproach of 6nding peaks in the di0erence plot. Thisprocess of 6nding scene changes in a video sequence isalso called indexing. Some methods go beyond this stepby determining the camera motion parameters and clas-sifying it as zooms, pans, etc. It has been observed thatif the subsequences in the video clips can be identi6edand the scene change points marked, then mosaicingthe data within a subsequence can provide a necessaryrepresentation for eLcient retrieval. A video storyboardcan be created by selecting a representative (key) framefrom each shot. Visualization of the pictorial contentand generation of story boards has been presented byYeung and Yeo [99]. They de6ne video visualization asa joint process of video analysis and subsequent gener-ation of representative visual presentation for viewingand understanding purposes.

5.1. Video shot detection techniques

The simplest, and probably the most noisy, methodfor detecting scene changes is performing a pixel leveldi0erencing between two succeeding video frames. Ifthe percentage change in the pixels is greater than apreset threshold, then a cut is declared. Another simple


method is to perform the likelihood ratio test [3], given byEq. (23). In this, each frame is divided into k equallysized blocks. Then, the mean (+) and variance (�2) foreach block is calculated. If the likelihood 5 is greater thana threshold T , then the count of the blocks changed isincremented. The process is repeated for all the blocks.If the number of blocks changed is greater than anotherthreshold TB, then a scene break is declared. This methodis a little more robust than the pixel di0erencing methodbut is insensitive to two blocks being di0erent yet havingthe same density functions:

5=[((�2

i + �2i+1)=2) + ((+i − +i+1)=2)2]2

�2i ∗ �2

i+1: (23)

Many methods have been developed that use the colorand=or intensity content of the video frame to classifyscene changes [100]. Smith and Kanade [24] describethe use of comparative histogram di0erence measure formarking scene breaks. The peaks in the di0erence plot aredetected using a threshold. Zabih et al. [101] detect cuts,fades and dissolves by counting the new and old edgepixels. Hampapur et al. [102] use chromatic scaling todetermine the locations of cuts, dissolves, fades, etc. Kimet al. [103] present an approach to scene change detectionby applying morphological operations on gray scale andbinarized forms of the frames in a video sequence.DCT coeLcients from compressed video frames are

also used to determine dissimilarity between frames[104–110]. One method applies standard statistical mea-sures such as the Yakimovsky test, "2 test, and theKolmogorov–Smirnov test to the histograms of averageblock intensities to detect shot changes. Another usesthe error power of the frame as a feature to determinedi0erences with other frames. The ratio of intra-codedmacroblocks to inter-coded macroblocks and the ratioof backward prediction motion vectors to forward pre-diction motion vectors can also be used to detect shotchanges in P- and B- frames. An intensity di0erenceplot can also be used to detect gradual transitions. Forthese the ideal curve is parabolic. A dip in curve marksthe center of dissolve. A curve is expected because thevariance of frame intensity gradually increases, thenstabilizes and gradually decreases into the new shot.Hanjalic and Zhang [111] choose to minimize the

average detection error probability criterion for detectingshot changes in video. The problem is reduced to decidingbetween two hypotheses; No shot boundary or shotboundary. They present a statistical framework for de-tecting shots and a variety of gradual transitions. Thea priori information to be used in the error probabil-ity function is derived from statistical studies of shotlengths of a large number of motion pictures. These are6t to the Poisson function that is controlled by a param-eter which is varied for di0erent video types. Other shotdetection methods are found in Refs. [112–116]. A com-

parison of various shot detection metrics is presented in[10,117–119].

5.2. Motion based video classi*cation

The methods described in this section detect motionfrom compressed and uncompressed video. The motioninformation is used to detect shot changes (cuts), detectactivity in the video or detect camera motion parame-ters (such as zooms, pans, etc.). Video sequences can betemporarily segmented into shots using local and globalmotion features. These features can then be used for de-termining shot similarity [120].Akutsu et al. [121] have developed a motion vector

based video indexing method. Motion vectors refer to thevector generated by a feature moving from one locationin a frame to another location in a succeeding frame. Cutdetection is based on the average inter-frame correlationcoeLcient based on the motion vectors. Two methodsare proposed to determine the coeLcient. The 6rst is theinter-frame similarity based on motion. The other valueis the ratio of velocity to the motion distance in eachframe which represents motion smoothness. Cameraoperations are detected by applying motion analysis tothe motion vectors using the Hough transform. A spa-tial line in the image space is represented as a point inthe Hough space. Conversely, a sinusoid in the Houghspace represents a group of lines in the image spaceintersecting at the same point. In this case, the linesintersect at the convergence=divergence point of themotion vectors. Thus, if the Hough transform of themotion vectors is least squares 6t to a sinusoid, the pointof divergence=convergence of vectors indicates the typeof camera motion. Another method for detecting videozooms through camera motion analysis is presented byFischer et al. [122]. Other block based motion estimationtechniques for detecting cuts that operate on uncom-pressed video can be found in Refs. [123,124]. In theInformedia Project [24] trackable features of the objectare used to form Mow vectors. The mean length, meanphase and phase variance of the Mow vectors determineif the scene is a static scene, pan or a zoom. MPEGmotion vectors are also used to detect the type of cam-era motion [106,125]. MPEG motion Mow extraction issimple and fast compared to optical Mow and is suitedfor dense motion Mows since each motion vector is fora 16× 16 sub-image. Fatemi et al. [126] detect cuts andgradual transitions by dividing the video sequence into(overlapping) subsequence of 3 consecutive frames witha fourth predicted frame.Naphade et al. [127] use the histogram di0erence

metric (HDM) and the spatial di0erence metric (SDM)as features in k-means clustering. They contend thata shot change occurs when both these values are rea-sonably large. GTunsel and Tekalp [128] use the sum ofthe bin-to-bin di0erences in the color histograms of the


current and the next frame and the di0erence betweenthe current frame histogram and the mean of the his-tograms of previous k frames to determine shot changesand select keyframes. If either component for a givenframe is greater than the threshold, then it is labeledas a keyframe. A shot change is determined if the 6rstcomponent is greater than the threshold. The thresholdis determined automatically by treating the shot changedetection as a two class problem originally used tobinarize images. Deng and Manjunath [129] describethe video object based recognition system NeTra-V.The system operates on MPEG compressed videos andperforms region segmentation based on color and textureproperties on I-frames. These regions are then trackedacross the frames and a low level region representationis generated for video sequences:

Mp;q[I ](m)=∑k;l

I(k; l)'(m+ qk − pl): (24)

Coudert et al. [130] present a method for fast analysisof video sequences using the Mojette transform givenby Eq. (24), where I(k; l) denotes the image pixel, 'is the Dirac function and m is the Mojette index. Thedirection of the projection is determined by integers pand q given by tan(’)=− p=q. In essence M (m) is thesummation of intensity pixels along the line de6ned bym− qk + pl=0. The transform is a discrete version ofthe Radon transform. Motion is estimated by the pro-jected motion in the transform domain. The analysis isperformed on the 1D representation of the image. DaWgtaXset al. [131] present motion based video retrieval methodsbased on the trajectory and trail models. In the case of thetrajectory model, the describe methods to retrieve objectsbased on the location, scale and temporal characteristicsof trajectory motion. For retrieval of video clips indepen-dent of the temporal characteristics of the object motion,the trail based model is proposed. The approach uses theEuclidean distance measure and the Fourier transformfor convolution.Wu et al. [132] present an algorithm for detecting

wipes in a video sequence. Wipes are used frequentlyin television news programs to partition stories. Detect-ing such graphic e0ects is useful for abstracting video.The method sums up the average pixel di0erence in DCimages extracted from MPEG compressed video in thedirection orthogonal to the wipe. A wipe creates a plateauin the di0erence plot which can be detected using stan-dard deviation. Corridoni and Del Bimbo [133] presentmethods for indexing movie content by detecting speciale0ects such as fades, wipes, dissolves and mattes. Fadesand dissolves are modeled as the change in pixel inten-sity over the duration of the fade or dissolve. Wipes aredetected by applying cut detection techniques to a seriesof frames and analyzing the di0erence graph. Mattes aretreated as a special case of dissolves. A comparison of

several methods for detection of special e0ects is coveredin Ref. [134].

5.3. Video abstraction

Once the shot and scene changes in a video are de-tected, the camera and object motion characterized, it isnecessary to select keyframes from the scenes or shotsto generate a video storyboard. Hence, the research com-munity has turned its attention to the problem of selec-tion of keyframes. It is important to note that selectinga 6xed frame from every shot is not a complete solu-tion to the problem of video classi6cation or abstraction.It is necessary to select keyframes from the salient shotchanges only. A comprehensive study on existing meth-ods for determining keyframes and an optimal clusteringbased approach is presented in Ref. [135].Yeung et al. [136] use time-constrained clustering

and scene transition graph (STG) to group similar shotstogether provide the basis for story analysis. Video issegmented into shots which are then clustered based ontemporal closeness and visual similarity. The STGs arethen used for forming the story units in videos. Rui etal. [137] take a more generalized approach to clusteringsimilar shots. The approach is to determine visual simi-larity between shots and the temporal di0erence betweenshots. The overall similarity between shots is a weightedsum of the shot color similarity and the shot activitysimilarity.Lienhart et al. [138] use video abstraction methods for

classifying movies. Subsequent to shot segmentation, themovie subsequences are clustered together based on colorsimilarity. Scene information is extracted based on audiosignals and associated dialog. Special events such as ex-plosions, close-up shots of actors, gun6re, etc., and credittitles are extracted for indexing the movie. Corridoni andDel Bimbo [133] also form a structured representation ofmovies. Their approach detects shot changes and grad-ual transitions such as fades, dissolves, wipes and mattes.Methods to separate commercials from television broad-casts have been presented in Refs. [139–141].Zhuang et al. [142] use unsupervised clustering to clus-

ter the 6rst frame of every shot. The keyframes withineach cluster provide a notion of similarity while provid-ing indices into the video. Hanjalic et al. [143] breakvideo sequences into logically similar segments based onvisual similarity between the shots. The similarity is de-termined between shots by registering the similarity be-tween MPEG-DC frames using the L∗u∗v∗ color space.In related work [135] present a method for determiningthe optimal number of clusters for a video. Each frame inthe video is described by a D-dimensional feature vector.The value of D is dependent on the sequence length. Thevideo is initially clustered into N clusters whose valueis determined from the lengths of the video sequencesin the database. In order to 6nd an optimal number of


clusters, the authors perform cluster validity analysis andpresent a cluster separation measure given by Eq. (25):

>(n)=1n

n∑i=1

max16j6n∧i �=j

(?i + ?j

+ij

); n¿ 2; (25)

where,

?i =

{1Ei

Ei∑k=1

|)̃(k | k ∈ i)− )̃(ci)|@i}1=@1

;

+ij =

{D∑

A=1

|’A(ci)− ’A(cj)|@2}1=@2

:

Here Ei and ?i are number of elements and the disper-sion of cluster i having centroid ci, respectively. +ij isthe Minkowski metric characterizing centroids of clus-ters i and j. 1; : : : ; N clusters are under consideration. )̃ isthe D-dimensional feature vector containing features ’A.Di0erent metrics are obtained for parameters @1 and @2.The suitable cluster structure for a video is determined byanalyzing the normalized curve of >(n). The keyframesare then selected by choosing the frames closest to thecluster centroid.Chen et al. [144] describe ViBE, a video indexing

and browsing environment. Video indexing is performedusing generalized traces. A generalized trace is a highdimensional feature vector containing histogram inter-section for each of the YUV color components with re-spect to the previous frame, their standard deviation forthe current frame, and statistics from the MPEG streamabout the number of intracoded, forward predicted andbidirectionally predicted macroblocks. This feature vec-tor is used in a binary regression tree to determine shotboundaries. Each shot is then represented using a shottree formed using the Ward’s clustering algorithm alsoknown as the minimum variance method. A similaritypyramid of shots is formed to aid browsing. Bouthemyet al. [145] describe the DiVAN project, which aims atbuilding a distributed architecture for the managing andproviding access to large television archives and has apart of it devoted to automatic segmentation of video.The video sequence is segmented into shots and grad-ual transitions along with camera motion analysis. Dom-inant colors are used to index keyframes. Chang et al.[146] present a keyframe selection algorithm based onthe semi-Hausdor0 Distance which is de6ned in Eq. (26),where glb represents the greatest lower bound, B is theerror bound and A; B ⊂ " are two point sets in metricspace ("; d) where d is a prede6ned distance function.The authors also present keyframe extraction algorithmsbased on the keyframe selection criterion and methods

to structure video for eLcient browsing:

dSH (A; B)= glb{B |A ⊂ U (B; B)}; (26)

where

U (B; B)=⋃x∈B

Bd(x; B);

Bd(x; B)= {y |d(x; y)6 B}:Sahouria and Zakhor [147] present methods to describevideos using principal component analysis and generatehigh level scene description without shot segmentation.They also use HMMs to analyze the motion vectors ex-tracted fromMPEG P-frames to analyze sport sequences.Vasconcelos and Lippman [148] develop statistical mod-els for shot duration and shot activity in video. Thesemodels are described as the 6rst steps towards the se-mantic characterization of video. The authors developBayesian procedures for the tasks of video segmenta-tion and semantic characterization since the knowledgeof the video is available as prior knowledge from the shotactivity.

5.4. Video retrieval

Video shot segmentation and abstraction are two partsof a larger problem of content based video retrieval.Once the video has been segmented into shots whichmay be represented by keyframes, scene transition graphsor multi-level representations, it is necessary to providemethods for similarity based retrieval. Jain et al. [149]present a method for retrieval of video clips which maybe longer than a single shot. The methods allow retrievalbased on keyframe similarity, akin to the image databaseapproach. A video clip is then generated as a result bymerging shot boundaries of those shots whose keyframesare highly similar to the query speci6cation. In anotherapproach subsampled versions of video shots are com-pared to a query clip. Image database matching tech-niques are extended for content based video retrieval inRef. [150]. Fuzzy classi6cation and relevance feedbackis used for retrieval. A multi-dimensional feature vectoris formed from color and motion segmentation of eachvideo frame that is used to construct a vector and extractkey shots and keyframes.

5.5. Performance evaluation of video shot detectionmethods

When a number of methods exist for performing acertain task, it is necessary to determine which methodperforms well on a generalized data set. Towards this,we have conducted a thorough evaluation of video shotsegmentation methods that use color, motion and com-pression parameters [10,151]. In this paper we presentthe some results from the evaluation, the reader is en-couraged to refer to our original publications wherein


Table 1Cut detection performance: recall at precision =95%

Color space Histogram di0erencing method

B2B1D B2B2D B2B3D CHI1D CHI2D CHI3D INT1D INT2D INT3D AVG(%) (%) (%) (%) (%) (%) (%) (%) (%) (%)

RGB 60 68 45 45 60 65 10HSV 65 63 68 63 53 54 61 61 63 15YIQ 68 37 62 62 23 49 64 50 60 14LAB 70 51 65 59 42 50 64 57 63 15LUV 68 37 64 59 29 47 67 49 67 14MTM 69 65 68 59 54 55 67 63 70 13XYZ 60 65 64 47 57 35 68 68 57 8OPP 65 34 57 60 34 45 64 50 67 14YYY 55 46 45 6

Table 2Cut detection performance of MPEG algorithms

Algorithm Detects MDs FAs Recall Precision(%) (%)

MPEG-A 932 27 14820 97 6MPEG-B 473 486 3059 49 13MPEG-C 286 673 795 30 26MPEG-D 754 205 105 79 88MPEG-E 862 97 4904 90 15MPEG-F 792 167 663 83 54

detail on the methods and the evaluation has been pro-vided. Other comparisons can be found in Refs. [8,9].Four histogram comparison measures, Bin-to-bin dif-ference (B2B), "2 test (CHI), histogram intersection(INT), and di0erence of average colors (AVG), wereevaluated on eight color spaces, viz., RGB, HSV, YIQ,XYZ, L∗a∗b∗, L∗u∗v∗, Munsell and Opponent colorspaces. Each of the histogram comparison measureswere tested in 1, 2 and 3 dimensions. Six methods thatuse MPEG compression parameters were also evaluated[105,110,152–154,109] These have been labelled asMPEG-A through MPEG-F, respectively. Finally threealgorithms that use block matching on uncompressedvideo data were evaluated [121,124,155], labeled asAlgorithm Block-A through Algorithm Block-C.

As shown in Table 1 among the di0erencing methods,histogram intersection is the best. The CHI method didsigni6cantly poorly and is thus not suitable for coarse his-togram di0erencing for shot-change detection. Amongstthe bin-to-bin histogram comparison methods using 31D is less computationally intensive and hence moreattractive. Amongst the MPEG algorithms (see Table 2),MPEG-D is clearly the best method for cut detection withboth high precision and recall. Several other comparisonscan be found in our work [10].

6. Summary and conclusions

This paper has listed the state-of-the-art in meth-ods developed for providing content-based access tovisual information. The visual information may be inthe form of images or video stored in archives. Thesemethods rely on pattern recognition methods for featureextraction, clustering, and matching the features. Patternrecognition thus plays a signi6cant role in contentbased recognition and has applications in more than onesub-system in the database. Yet, very little work has beendone on addressing the issue of human perception of vi-sual data content. The approaches taken by the computervision, image analysis and pattern recognition commu-nity have been bottom up. It is not only necessary todevelop better pattern recognition methods to capture thevisually important features from the image, but also todevelop them such that they are simple, eLcient and eas-ily mapped to human queries. As presented above, e0ortsare being made to extend queries beyond use of sim-ple color, shape, texture, and=or motion based features.Speci6cation of queries using spatial and temporal fuzzyrelationships between objects and sub-image blocks isbeing explored. Another aspect in content-based retrievalis the development of useful and eLcient indexing struc-tures. Commonly, the image databases cluster the fea-tures to enhance the similarity search. Similarity searchesoften are generalized k-nearest neighbor searches whereperformance and accuracy can be easily traded. Manyapproaches rely on clustering of indices which are thenstored in a commercial database management systems.Other approaches used modi6ed forms of other tree struc-tures. Robust structures allowing the use of features andfuzzy queries need to be developed and a quantitative andqualitative performance analysis published. The PatternRecognition community with its rich tradition of makingfundamental contributions to diverse applications thushas a great opportunity to make break-through advances


in content based image and video information retrievalsystems.

References

[1] A. Gupta, R. Jain, Visual information retrieval, Commun.ACM 40 (5) (1997) 70–79.

[2] Y. Rui, T.S. Huang, S.F. Change, Image retrieval: currenttechniques, promising directions, and open issues, J.Visual Commun. Image Representation 10 (1) (1999)39–62.

[3] G. Ahanger, T.D.C. Little, A survey of technologies forparsing and indexing digital video, J. Visual Commun.Image Representation 7 (1) (1996) 28–43.

[4] R. Brunelli, O. Mich, C.M. Modena, A survey on theautomatic indexing of video data, J. Visual Commun.Image Representation 10 (2) (1999) 78–112.

[5] S. Antani, R. Kasturi, R. Jain, Pattern recognitionmethods in image and video databases: past, present andfuture, Joint IAPR International Workshops SSPR andSPR. Lecture Notes in Computer Science, Vol. 1451,1998, pp. 31–58.

[6] F. Idris, S. Panchanathan, Review of image andvideo indexing techniques, J. Visual Commun. ImageRepresentation 8 (2) (1997) 146–166.

[7] C. Colombo, A. Del Bimbo, P. Pala, Semantics in visualinformation retrieval, IEEE Multimedia 6 (3) (1999)38–53.

[8] A. Di Lecce, V. Guerriero, An evaluation of thee0ectiveness of image features for image retrieval, J.Visual Commun. Image Representation 10 (4) (1999)351–362.

[9] M.K. Mandal, F. Idris, S. Panchanathan, A criticalevaluation of image and video indexing techniques inthe compressed domain, Image Vision Comput. 17 (7)(1999) 513–529.

[10] U. Gargi, R. Kasturi, S.H. Strayer, Performancecharacterization of video-shot-change detection methods,IEEE Trans. Circuits Systems Video Technol. 10 (1)(2000) 1–13.

[11] N.-S. Chang, K.-S. Fu, Query-by-pictorial-example,IEEE Trans. Software Eng. 6 (6) (1980) 519–524.

[12] A. Gupta, S. Santini, R. Jain, In search of informationin visual media, Commun. ACM 40 (12) (1997) 34–42.

[13] Y. Gong, Intelligent Image Databases—TowardsAdvanced Image Retrieval, Kluwer AcademicPublishers, Boston, 1998.

[14] J.K. Wu, B.M. Mehtre, C.P. Lam, A. Desai Narasimhalu,J.J. Gao, Core: a content-based retrieval engine for multi-media information systems, Multimedia Systems 3 (1)(1995) 25–41.

[15] J.K. Wu, P. Lam, B.M. Mehtre, Y.J. Gao, A.D. Nara-simhalu, Content based retrieval for trademark regi-stration, Multimedia Tools Appl. 3 (3) (1996) 245–267.

[16] S.-F. Chang, J.R. Smith, M. Beigi, A. Benitez, Visualinformation retrieval from large distributed onlinerepositories, Commun. ACM 40 (12) (1997) 62–71.

[17] D. Zhong, S.-F. Chang, An integrated approach forcontent-based video object segmentation and retrieval,IEEE Trans. Circuits Systems Video Technol. 9 (8)(1999) 1259–1268.

[18] C. Carson, M. Thomas, S. Belongie, J.M. Hellerstein,J. Malik. Blobworld, A system for region-based imageindexing and retrieval, Third International Conferenceon Visual Information and Information Systems(VISUAL’99), Appears in Lecture Notes in ComputerScience, Vol. 1614, 1999, pp. 509–516.

[19] K. Porkaew, M. Ortega, S. Mehrotra, Queryreformulation for content based multimedia retrieval inMARS, IEEE International Conference on MultimediaComputing Systems, Vol. 2, 1999, pp. 747–751.

[20] T. Gevers, A.W.M. Smeulders, Pictoseek: combiningcolor and shape invariant features for image retrieval,IEEE Trans. Image Process. 9 (1) (2000) 102–119.

[21] Z.-N. Li, O.R. ZaTiane, Z. Tauber, Illumination invarianceand object model in content-based image and videoretrieval, J. Visual Commun. Image Representation 10(3) (1999) 219–244.

[22] W. Niblack, X. Zhu, J.L. Hafner, T. Breuel, et al.,Updates to the QBIC system, Proceedings of IS&T=SPIE Conference on Storage and Retrieval for Imageand Video Databases VI, Vol. SPIE 3312, 1997,pp. 150–161.

[23] D. Poncelon, S. Srinivasan, A. Amir, D. Petkovic,D. Diklic, Key to e0ective video retrieval: e0ectivecataloging and browsing, ACM International Conferenceon Multimedia, 1998, pp. 99–107.

[24] M.A. Smith, T. Kanade, Video skimming for quickbrowsing based on audio and image characterization,Technical Report CMU-CS-95-186 Carnegie MellonUniversity, 1995.

[25] V. Ogle, M. Stonebraker, Chabot: retrieval from arelational database of images, IEEE Comput. 28 (9)(1995) 40–48.

[26] C. Goble, M. O’Docherty, P. Crowther, M. Ireton,J. Oakley, C. Xydeas, The Manchester multimediainformation system, Proceedings of the E. D. B. T.’92Conference on Advances in Database Technology, Vol.580, 1994, pp. 39–55.

[27] R. Lienhart, W. E0elsberg, R. Jain, VisualGREP:a systematic method to compare and retrieve videosequences, Proceedings of IS&T=SPIE Conference onStorage and Retrieval for Image and Video Database VI,Vol. SPIE 3312, 1997, pp. 271–282.

[28] I.J. Cox, M.L. Miller, T.P. Minka, T.V. Papathomas, P.N.Yianilos, The bayesian image retrieval system, pichunter:theory, implementation and psychophysical experiments,IEEE Trans. Image Process. 9 (1) (2000) 20–37.

[29] P.E. Duda, R.O. Hart, D.G. Stork, Pattern Classi6cation,2nd Edition, Wiley, Inc., New York, NY, 2000.

[30] A.K. Jain, R.P.W. Duin, J. Mao, Statistical patternrecognition: a review, IEEE Trans. Pattern Anal. Mach.Intell. 22 (1) (2000) 4–37.

[31] A. Nagasaka, Y. Tanaka, Automatic video indexing andfull-video search for object appearances, Proceedingsof IFIP 2nd Working Conference on Visual DatabaseSystems, 1992, pp. 113–127.

[32] J. Hafner, H.S. Sawhney, W. Equitz, M. Flickner,W. Niblack, ELcient color histogram indexing forquadratic form distance functions, IEEE Trans. PatternAnal. Mach. Intell. 17 (7) (1995) 729–736.

[33] K. Chen, S. Demko, R. Xie, Similarity-based retrievalof images using color histograms, Proceedings of


IS&T=SPIE Conference on Storage and Retrieval forImage and Video Databases VII, Vol. SPIE 3656, 1999.

[34] M. Popescu, P. Gader, Image content retrieval fromimage databases using feature integration by Choquetintegral, Proceedings of IS&T=SPIE Conference onStorage and Retrieval for Image and Video DatabasesVII, Vol. SPIE 3656, 1999.

[35] D. Androutsos, K.N. Plataniotis, A.N. Venetsanopoulos,Distance measures for color image retrieval, Proceedingsof International Conference on Image Processing, 1998,pp. 770–774.

[36] R. Brunelli, O. Mich, On the use of histograms for imageretrieval, IEEE International Conference on MultimediaComputing Systems, Vol. 2, 1999, pp. 143–147.

[37] M. Stricker, Bounds of the discrimination power ofcolor indexing techniques, Proceedings of IS&T=SPIEConference on Storage and Retrieval for Image andVideo Databases II, Vol. SPIE 2185, 1994, pp. 15–24.

[38] C. Meilhac, C. Nastar, Relevance feedback and categorysearch in image databases, IEEE International Con-ference on Multimedia Computing Systems, Vol. 1, 1999,pp. 512–517.

[39] G. Ciocca, R. Schettini, Using relevance feedbackmechanism to improve content-based image retrieval,Third International Conference on Visual Informationand Information Systems (VISUAL’99), Lecture Notesin Computer Science, Vol. 1614, 1999, pp. 107–114.

[40] E. Di Sciascio, G. Mingolla, M. Mongiello, Content-based image retrieval over the Web using query bysketch and relevance feedback, Third InternationalConference on Visual Information and InformationSystems (VISUAL’99), Lecture Notes in ComputerScience, Vol. 1614, 1999, pp. 123–130.

[41] J. Peng, B. Bhanu, S. Qing, Probabilistic featurerelevance learning for content-based image retrieval,Comput. Vision Image Understanding 75 (1–2) (1999)150–164.

[42] D. Squire, W. MTuller, H. MTuller, Relevance feedbackand term weighting schemes for content-based imageretrieval, Third International Conference on VisualInformation and Information Systems (VISUAL’99),Lecture Notes in Computer Science, Vol. 1614, 1999,pp. 549–556.

[43] T.P. Minka, R.W. Picard, Interactive learning with asociety of models, Pattern Recognition 30 (4) (1997)565–581.

[44] M.S. Lew, D.P. Huijsmans, D. Denteneer, Contentbased image retrieval: KLT, projections, or templates,Proceedings of the First International Workshop onImage Databases and Multi-Media Search, 1996,pp. 27–34.

[45] S. Santini, R. Jain, Beyond query by example,ACM International Conference on Multimedia, 1998,pp. 345–350.

[46] S. Santini, R. Jain, Similarity measures, IEEE Trans.Pattern Anal. Mach. Intell. 21 (9) (1999) 871–883.

[47] M.J. Swain, D.H. Ballard, Color indexing, Int. J. Comput.Vision 7 (1) (1991) 11–32.

[48] A. Del Bimbo, M. Mugnaini, P. Pala, F. Turco, Visualquerying by color perceptive regions, PatternRecognition 31 (9) (1998) 1241–1253.

[49] C. Colombo, A. Del Bimbo, Color-induced imagerepresentation and retrieval, Pattern Recognition 32 (10)(1999) 1685–1696.

[50] A. So0er, Image categorization using N × M-grams,Proceedings of IS&T=SPIE Conference on Storage andRetrieval for Image and Video Databases V, Vol. SPIE3022, 1997, pp. 121–132.

[51] H.C. Lin, L.L. Wang, S.N. Yang, Color image retrievalbased on hidden markov-models, IEEE Trans. ImageProcess. 6 (2) (1997) 332–339.

[52] S. MTuller, S. Eickeler, G. Rigoll, Multimedia databaseretrieval using hand-drawn sketches, InternationalConference on Document Analysis and Recognition,1999, pp. 289–292.

[53] G. Pass, R. Zabih, Comparing images using jointhistograms, ACM J. Multimedia Systems 7 (2) (1999)119–128.

[54] J. Huang, S. Ravi Kumar, M. Mitra, W.-J. Zhu, R. Zabih,Spatial color indexing and applications, Int. J. Comput.Vision 35 (3) (1999) 245–268.

[55] Y. Tao, W.I. Grosky, Spatial color indexing: a novelapproach for content-based image retrieval, IEEEInternational Conference on Multimedia ComputingSystems, Vol. 1, 1999, pp. 530–535.

[56] L. Cinque, S. Levialdi, K.A. Olsen, A. Pellican\o,Color-based image retrieval using spatial-chromatichistograms, IEEE International Conference on Multi-media Computing Systems, Vol. 2, 1999, pp. 969–973.

[57] Y. Chahir, L. Chen, ELcient content-based image re-trieval based on color homogenous objects segmentationand their spatial relationship characterization, IEEEInternational Conference on Multimedia ComputingSystems, Vol. 2, 1999, pp. 969–973.

[58] J.R. Smith, C.-S. Li, Image classi6cation and queryingusing composite region templates, Comput. Vision ImageUnderstanding 75 (1–2) (1999) 165–174.

[59] C. Brambilla, A. Della Ventura, I. Gagliardi, R. Schetini,Multiresolution wavelet transform and supervisedlearning for content-based image retrieval, IEEEInternational Conference on Multimedia ComputingSystems, Vol. 1, 1999, pp. 183–188.

[60] A. Vailaya, A. Jain, H.-J. Zhang, On image classi6cation:city images vs. landscapes, J. Visual Commun. ImageRepresentation 31 (12) (1998) 1921–1935.

[61] D. Androutsos, K.N. Plataniotis, A.N. Venetsanopoulos,A novel vector-based approach to color image retrievalusing a vector angular-based distance metric, Comput.Vision Image Understanding 75 (1–2) (1999) 46–58.

[62] G.P. Babu, B.M. Mehtre, M.S. Kankanhalli, Colorindexing for eLcient image retrieval, Multimedia ToolsAppl. 1 (4) (1995) 327–348.

[63] M.S. Kankanhalli, B.M. Mehtre, J.K. Wu, Cluster basedcolor matching for image retrieval, Pattern Recognition29 (4) (1996) 701–708.

[64] B.M. Mehtre, M.S. Kankanhalli, A.D. Narasimhalu,G.C. Man, Color matching for image retrieval, PatternRecognition Lett. 16 (3) (1995) 325–331.

[65] M.S. Kankanhalli, B.M. Mehtre, H.Y. Huang, Color andspatial feature for content-based image retrieval, PatternRecognition Lett. 20 (1) (1999) 109–118.

[66] S. Krishnamachari, M. Abdel-Mottaleb, A scalablealgorithm for image retrieval by color, Proceedings of


the International Conference on Image Processing, 1998,pp. 119–122.

[67] K. Hirata, S. Mukherjea, W.-S. Li, Y. Hara, Integratingimage matching and classi6cation for multimediaretrieval, IEEE International Conference on MultimediaComputing Systems, Vol. 1, 1999, pp. 257–263.

[68] P. Scheunders, A comparison of clustering algorithmsapplied to color image quantization, Pattern RecognitionLett. 18 (11–13) (1997) 1379–1384.

[69] J. Wang, W.-J., Yang, R. Acharya, Color clusteringtechniques for color-content-based image retrieval fromimage databases, IEEE International Conference onMultimedia Computing Systems, 1997, pp. 442–449.

[70] J. Wang, W.-J. Yang, R. Acharya, ELcient access toand retrieval from a shape image database, Workshopon Content Based Access to Image and Video Libraries,1998, pp. 63–67.

[71] U. Gargi, R. Kasturi, Image database querying usinga multi-scale localized color-representation, Workshopon Content Based Access to Image and Video Libraries,1999, pp. 28–32.

[72] A. Del Bimbo, P. Pala, E0ective image retrieval usingdeformable templates, Proceedings of the InternationalConference on Pattern Recognition, 1996, pp. 120–124.

[73] S. Scarlo0, Deformable prototypes for encoding shapecategories in image databases, Pattern Recognition 30(4) (1997) 627–641.

[74] P. Pala, S. Santini, Image retrieval by shape and texture,Pattern Recognition 32 (3) (1999) 517–527.

[75] M. Adoram, M.S. Lew, IRUS: image retrieval usingshape, IEEE International Conference on MultimediaComputing Systems, Vol. 2, 1999, pp. 597–602.

[76] B. GTunsel, A. Murat Tekalp, Shape similarity matchingfor query-by-example, Pattern Recognition 31 (7) (1998)931–944.

[77] Z. Lei, T. Tasdizen, D. Cooper, Object signature curveand invariant shape patches for geometric indexinginto pictorial databases, Proceedings of IS&T=SPIEConference on Multimedia Storage and ArchivingSystems II, Vol. SPIE 3229, 1997, pp. 232–243.

[78] R. Alferez, Y.-F., Wang, Image indexing and retrievalusing image-derived, geometrically and illuminationinvariant features, IEEE International Conferenceon Multimedia Computing Systems, Vol. 1, 1999,pp. 177–182.

[79] E.G.M. Petrakis, E. Milios, ELcient retrieval by shapecontent, IEEE International Conference on MultimediaComputing Systems, Vol. 2, 1999, pp. 616–621.

[80] F. Mokhtarian, S. Abbasi, Retrieval of similar shapesunder aLne transform, Third International Conferenceon Visual Information and Information Systems(VISUAL’99), Lecture Notes in Computer Science, Vol.1614, 1999, pp. 566–574.

[81] D. Sharvit, J. Chan, H. Tek, B.B. Kimia, Symmetry-basedindexing of image databases, J. Visual Commun. ImageRepresentation 9 (4) (1998) 366–380.

[82] Y. Rui, T.S. Huang, S. Mehrotra, M. Ortega, Automaticmatching tool selection using relevance feedback inMARS, Second International Conference on VisualInformation Systems (VISUAL’97), 1997, pp. 109–116.

[83] A.K. Jain, A. Vailaya, Shape-based retrieval: a case studywith trademark image databases, Pattern Recognition 31(9) (1998) 1369–1390.

[84] M. Kliot, E. Rivlin, Invariant-based shape retrievalin pictorial databases, Comput. Vision Image Under-standing 71 (2) (1998) 182–197.

[85] S. Derrode, M. Daoudi, F. Ghorbel, Invariantcontent-based image retrieval using a complete setof Fourier–Mellin descriptors, IEEE InternationalConference on Multimedia Computing Systems, Vol. 2,1999, pp. 877–881.

[86] J.R. Smith, S.-F. Chang, Quad-tree segmentationfor texture-based image query, ACM InternationalConference on Multimedia 1994, pp. 279–286.

[87] M. Borchani, G. Stammon, Use of texture featuresfor image classi6cation and retrieval, Proceedings ofIS&T=SPIE Conference on Multimedia Storage andArchiving Systems II, Vol. SPIE 3229, 1997, pp.401–406.

[88] B.S. Manjunath, W.Y. Ma, Texture features for browsingand retrieval of image data, IEEE Trans. Pattern Anal.Mach. Intell. 18 (8) (1996) 837–842.

[89] J. Puzicha, T. Hofmann, J.M. Buhmann, Non-parametricsimilarity measures for unsupervised texture segmen-tation and image retrieval, Proceedings of IEEEConference on Computer Vision and Pattern Recog-nition, 1997, pp. 267–272.

[90] M.K. Mandal, T. Aboulnasr, S. Panchanathan, Fastwavelet histogram techniques for image indexing,Comput. Vision Image Understanding 75 (1–2) (1999)99–110.

[91] R. Milanese, M. Cherbuliez, A rotation, translation, andscale invariant approach to content-based image retrieval,J. Visual Commun. Image Representation 10 (2) (1999)186–196.

[92] N. Strobel, C.S. Li, V. Castelli, MMAP: modi6edmaximum a posteriori algorithm for image segmentationin large image=video databases, Proceedings of IEEEInternational Conference on Image Processing, 1997, pp.196–199.

[93] A.P. Pentland, R.W. Picard, S. Sclaro0, Photobook:content-based manipulation of image databases, Int. J.Comput. Vision 18 (3) (1996) 233–254.

[94] F. Liu, R.W. Picard, Periodicity, directionality, andrandomness: Wold features for image modeling andretrieval, IEEE Trans. Pattern Anal. Mach. Intell. 18 (7)(1996) 722–733.

[95] Y. Zhong, A. Jain, Object localization using color, textureand shape, Pattern Recognition 33 (4) (2000) 671–684.

[96] A. Mojsilovic, J. Kovacevic, J. Hu, R.J. Safranek,S.K. Ganpathy, Matching and retrieval based onvocabulary and grammar of color patterns, IEEE Int.Conf. Multimedia Computing Systems, Vol. 1, 1999,pp. 189–194.

[97] S. Scarlo0, M. La Cascia, S. Sethi, L. Taycher, Unifyingtextual and visual cues for content-based image retrievalon the World Wide Web, Comput. Vision ImageUnderstanding 75 (1–2) (1999) 86–98.

[98] A.P. Berman, L.G. Shapiro, A Mexible image databasesystem for content-based retrieval, Comput. VisionImage Understanding 75 (1–2) (1999) 175–195.


[99] M.M. Yeung, B.-L. Yeo, Video visualization for compactpresentation and fast browsing of pictorial content, IEEETrans. Circuits Systems Video Technol. 7 (5) (1997)771–785.

[100] Y. Gong, An accurate and robust method for detectingvideo shot boundaries, IEEE International Conferenceon Multimedia Computing Systems, Vol. 1, 1999,pp. 850–854.

[101] R. Zabih, R. Miller, K. Mai, A feature-basedalgorithm for detecting and classifying scene breaks,ACM International Conference on Multimedia, 1995,pp. 189–200.

[102] A. Hampapur, R. Jain, T. Weymouth, Production modelbased digital video segmentation, J. Multimedia ToolsAppl. 1 (1) (1995) 9–46.

[103] W.M. Kim, S.M.-H. Song, H. Kim, C. Song, B.W.Kwoi, S.G. Kim, A morphological approach to scenechange detection and digital video storage and retrieval,Proceedings of IS&T=SPIE Conference on Storage andRetrieval for Image and Video Databases VII, Vol. SPIE3656, 1999, pp. 733–743.

[104] F. Arman, A. Hsu, M.-Y. Chiu, Feature managementfor large video databases, Proceedings of IS&T=SPIEConference on Storage and Retrieval for Image andVideo Databases I, Vol. SPIE 1908, 1993, pp. 2–12.

[105] H.J. Zhang et al., Video parsing using compressed data,SPIE Symposium on Electronic Imaging Science andTechnology: Image and Video Processing II, 1994, pp.142–149.

[106] N.V. Patel, I.K. Sethi, Video shot detection andcharacterization for video databases, (Special issue onmultimedia), Pattern Recognition 30 (1997) 583–592.

[107] H.C. Liu, G.L. Zick, Scene decomposition of MPEGcompressed video, SPIE=IS&T Symposium on ElectronicImaging Science and Technology: Digital VideoCompression: Algorithms and Technologies, Vol. 2419,1995, pp. 26–37.

[108] B. Yeo, B. Liu, Rapid scene analysis on compressedvideo, IEEE Trans. Circuits Systems Video Technol.5 (6) (1995) 533–544.

[109] K. Shen, E.J. Delp, A fast algorithm for video parsingusing MPEG compressed sequences, IEEE InternationalConference on Image Processing, October 1995,pp. 252–255.

[110] J. Meng, Y. Juan, S.F. Chang, Scene change detectionin a MPEG compressed video sequence, SPIE=IS&T Symposium on Electronic Imaging Science andTechnology: Digital video Compression: Algorithms andTechnologies, Vol. 2419, 1995.

[111] A. Hanjalic, H. Zhang, Optimal shot boundary detectionbased on robust statistical methods, IEEE InternationalConference on Multimedia Computing Systems, Vol. 2,1999, pp. 710–714.

[112] K. Tse, J. Wei, S. Panchanathan, A scene changedetection algorithm for MPEG compressed videosequences, Canadian Conference on Electrical andComputer Engineering (CCECE’95), Vol. 2, 1995,pp. 827–830.

[113] V. Kobla, D.S. Doermann, K.-I. Lin, C. Falutsos,Compressed domain video indexing techniques usingDCT and motion vector information in MPEG video,Proceedings of IS&T=SPIE Conference on Storage and

Retrieval for Image and Video Databases V, Vol. SPIE3022, 1997, pp. 200–211.

[114] Y. Yuso0, J. Kittler, W. Christmas, Combiningmultiple experts for classifying shot changes in videosequences, IEEE International Conference on MultimediaComputing Systems, Vol. 2, 1999, pp. 700–704.

[115] H.H. Yu, W. Wolf, A hierarchical multiresolution videoshot transition detection scheme, Comput. Vision ImageUnderstanding 75 (1=2) (1999) 196–213.

[116] M.K. Mandal, S. Panchanathan, Video segmentationin the wavelet domain, Proceedings of IS&T=SPIEConference on Multimedia Storage and ArchivingSystems III, Vol. SPIE 3527, 1998, pp. 262–270.

[117] R.M. Ford, A quantitative comparison of shotboundary detection metrics, Proceedings of IS&T=SPIEConference on Storage and Retrieval for Imageand Video Databases VII, Vol. SPIE 3656, 1999,pp. 666–676.

[118] R. Lienhart, Comparison of autmatic shot boundarydetection, Proceedings of IS&T=SPIE Conference onStorage and Retrieval for Image and Video DatabasesVII, Vol. SPIE 3656, 1999, pp. 290–301.

[119] J.S. Boreczky, L.A. Rowe, Comparison of videoshot boundary detection techniques, Proceedings ofIS&T=SPIE Conference on Storage and Retrieval forImage and Video Databases IV, Vol. SPIE 2670, 1996,pp. 170–179.

[120] R. Fablet, P. Bouthemy, Motion-based feature extractionand ascent hierarchical classi6cation for video indexingand retrieval, Third International Conference on VisualInformation and Information Systems (VISUAL’99),Lecture Notes in Computer Science, Vol. 1614, 1999,pp. 221–228.

[121] A. Akutsu et al., Video indexing using motion vectors,Proceedings of SPIE Visual Communications and ImageProcessing, Vol. 1818, 1992, pp. 1522–1530.

[122] S. Fischer, I. Rimac, R. Steinmetz, Automatic recognitionof camera zooms, Third International Conferenceon Visual Information and Information Systems(VISUAL’99), Lecture Notes in Computer Science, Vol.1614, 1999, pp. 253–260.

[123] M. Cherfaoui, C. Bertin, Temporal segmentation ofvideos: a new approach, SPIE=IS&T Symposium onElectronic Imaging Science and Technology: DigitalVideo Compression: Algorithms and Technologies, Vol.2419, 1995.

[124] B. Shahraray, Scene change detection and content-basedsampling of video sequences, SPIE=IS&T Symposiumon Electronic Imaging Science and Technology: DigitalVideo Compression: Algorithms and Technologies, Vol.2419, 1995, pp. 2–13.

[125] E. Ardizzone, M. La Cascia, A. Avanzato, A. Bruna,Video indexing using MPEG motion compensationvectors, IEEE International Conference on MultimediaComputing Systems, Vol. 2, 1999, pp. 725–729.

[126] O. Fatemi, S. Zhang, S. Panchanathan, Optical Mow basedmodel for scene cut detection, Canadian Conferenceon Electrical and Computer Engineering, Vol. 1, 1996,pp. 470–473.

[127] M.R. Naphade, R. Mehrotra, A.M. Ferman, J. Warnick,T.S. Huang, A.M. Tekalp, A high-performance shotboundary detection algorithm using multiple cues,


Proceedings of IEEE International Conference on ImageProcessing, 1998, pp. 884–887.

[128] B. GTunsel, A. Murat Tekalp, Content-based videoabstraction, Proceedings of IEEE International Con-ference on Image Processing, 1998, pp. 128–132.

[129] Y. Deng, B.S. Manjunath, NeTra-V: toward anobject-based video representation, IEEE Trans. CircuitsSystems Video Technol. 8 (5) (1998) 616–627.

[130] F. Coudert, J. Benois-Pineau, P.-Y. Le Lann, D. Barba,Binkey: a system for video content analysis “on theMy”, IEEE International Conference on MultimediaComputing Systems, Vol. 1, 1999, pp. 679–684.

[131] S. DaWgtaXs, W. Al-Khatib, A. Ghafoor, R.L. Kashyap,Models for motion-based video indexing and retrieval,IEEE Trans. Image Process. 9 (1) (2000) 88–101.

[132] M. Wu, W. Wolf, B. Liu, An algorithm for wipedetection, Proceedings of IEEE International Conferenceon Image Processing, 1998, pp. 893–897.

[133] J.M. Corridoni, A. Del Bimbo, Structured representationand automatic indexing of movie information content,Pattern Recognition 31 (2) (1998) 2027–2045.

[134] V. Kobla, D. DeMenthon, D. Doermann, Special e0ectedit detection using video trails: a comparison withexisting techniques, Proceedings of IS&T=SPIE Confer-ence on Storage and Retrieval for Image and VideoDatabases VII, Vol. SPIE 3656, 1999, pp. 302–313.

[135] A. Hanjalic, H. Zhang, An integrated scheme forautomated video abstraction based on unsupervisedcluster-validity analysis, IEEE Trans. Circuits SystemsVideo Technol. 9 (8) (1999) 1280–1289.

[136] M. Yeung, B.-L. Yeo, B. Liu, Segmentation of video byclustering and graph analysis, J. Visual Commun. ImageRepresentation 71 (1) (1998) 94–109.

[137] Y. Rui, T.S. Huang, S. Mehrotra, Eploring videostructure beyond shots, IEEE International Conferenceon Multimedia Computing Systems, 1998, pp. 237–240.

[138] R. Lienhart, S. Pfei0er, W. E0elsberg, Video abstracting,Commun. ACM 40 (12) (1997) 54–62.

[139] C. Colombo, A. Del Bimbo, P. Pala, Retrievalof commercials by video semantics, Proceedings ofIEEE Conference on Computer Vision and PatternRecognition, 1998, pp. 572–577.

[140] R. Lienhart, C. KuhmTunch, W. E0elsberg, On the detect-tion and recognition of television commercials, IEEEInternational Conference on Multimedia ComputingSystems, 1997, pp. 509–516.

[141] J.M. S]anchez, X. Binefa, J. Vitri\a, P. Radeva, Localcolor analysis for scene break detection appliedto TV commercials recognition, Third InternationalConference on Visual Information and InformationSystems (VISUAL’99), Lecture Notes in ComputerScience, Vol. 1614, 1999, pp. 237–244.

[142] Y. Zhuang, Y. Rui, T.S. Huang, S. Mehrotra, Adaptivekey frame extraction using unsupervised clustering,

Proceedings of IEEE International Conference on ImageProcessing, 1998, pp. 866–870.

[143] A. Hanjalic, R.L. Lagendijk, J. Biemond, Automaticallysegmenting movies into logical story units, ThirdInternational Conference on Visual Information andInformation Systems (VISUAL’99), Lecture Notes inComputer Science, Vol. 1614, 1999, pp. 229–236.

[144] J.-Y. Chen, C. Taskiran, A. Albiol, C.A. Bouman, E.J.Delp, ViBE: a video indexing and browsing environment,Proceedings of IS&T=SPIE Conference on MultimediaStorage and Archiving Systems IV, Vol. SPIE 3846,1999, pp. 148–164.

[145] P. Bouthemy, C. Garcia, R. Ronfard, G. Tziritas,E. Veneau, D. Zugaj, Scene segmentation and imagefeature extraction for video indexing and retrieval, ThirdInternational Conference on Visual Information andInformation Systems (VISUAL’99), Lecture Notes inComputer Science, Vol. 1614, 1999, pp. 245–252.

[146] H.S. Chang, S. Sull, S.U. Lee, ELcient video indexingscheme for content-based retrieval, IEEE Trans. CircuitsSystems Video Technol. 9 (8) (1999) 1269–1279.

[147] E. Sahouria, A. Zakhor, Content analysis of video usingprincipal components, IEEE Trans. Circuits SystemsVideo Technol. 9 (8) (1999) 1290–1298.

[148] N. Vasconcelos, A. Lippman, Statistical models of videostructure for content analysis and characterization, IEEETrans. Image Process. 9 (1) (2000) 3–19.

[149] A. Jain, A. Vailaya, W. Xiong, Query by video clip,Proceedings of International Conference on PatternRecognition, 1998, pp. 909–911.

[150] Y.S. Avrithis, A.D. Doulamis, N.D. Doulamis, S.D.Kollias, A stochastic framework for optimal key frameextraction from MPEG video databases, Comput. VisionImage Understanding 75 (1=2) (1999) 3–24.

[151] U. Gargi, R. Kasturi, S. Antani, Performancecharacterization and comparison of video indexingalgorithms, Proceedings of IEEE Conference onComputer Vision and Pattern Recognition, 1998, pp.559–565.

[152] H.C. Liu, G.L. Zick, Automated determination of scenechanges in MPEG compressed video, ISCAS—IEEEInternational Symposium on Circuits and Systems,1995.

[153] B.-L. Yeo, B. Liu, A uni6ed approach to temporalsegmentation of motion JPEG and MPEG compressedvideo, IEEE International Conference on MultimediaComputing Systems, 1995, pp. 81–89.

[154] I.K. Sethi, N.V. Patel, A statistical approach to scenechange detection, Proceedings of IS&T=SPIE Conferenceon Storage and Retrieval for Image and Video DatabasesIII, Vol. SPIE 2420, 1995, pp. 329–338.

[155] R. Kasturi, R.C. Jain, Computer Vision: Principles, IEEEComputer Society Press, Silver Spring, MD, 1991, pp.469–480.

About the Author—SAMEER ANTANI has been a Ph.D. candidate with the Dept. of Computer Science and Engineering at ThePennsylvania State University since 1995. He received his Master of Engineering degree in 1998. He obtained his B.E. (Computer)from the University of Pune, India, in 1994. His research has been in the area of Automated Content Analysis of Visual Information.His interests are in Computer Vision, Content Analysis of Visual and Multimedia Information, Document Image Analysis. HumanComputer Interaction, Image Processing and Computer Graphics. Sameer has been a student member of the IEEE for the past 10 years.


About the Author—RANGACHAR KASTURI is a Professor of Computer Science and Engineering and Electrical Engineering atthe Pennsylvania State University. He received his degrees in electrical engineering from Texas Tech University (Ph.D., 1982 andM.S., 1980) and from Bangalore University, India (B.E. 1968). He worked as a research and development engineer in India during1968–78. He served as a Visiting Scholar at the University of Michigan during 1989–90 and as a Fulbright Lecturer at the IndianInstitute of Science, Bangalore during 1999.Dr. Kasturi is an author of the textbook, Machine Vision, McGraw-Hill, 1995. He has also authored=edited the books ComputerVision: Principles and Applications, Document Image Analysis. Image Analysis Applications, and Graphics Recognition: Methodsand Applications. He has directed many projects in document image analysis, image sequence analysis, and video indexing. Hisearlier work in image restoration resulted in the design of many optimal 6lters for recovering images corrupted by signal-dependent6lm-grain noise.

Dr. Kasturi is serving as the Vice President, Publications of the IEEE Computer Society. He is also serving as the First VicePresident of the International Association for Pattern Recognition (IAPR). He was the Editor-in-Chief of the IEEE Transactionsof Pattern Analysis and Machine Intelligence during 1995–98 and the Editor-in-chief of Machine Vision and Applications journalduring 1993–94. He served in the IEEE Computer Society’s Distinguished Visitor Program during 1987–90.

Dr. Kasturi is a General Chair of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2001 and theGeneral Chair: Technical Program for the International Conference on Pattern Recognition (ICPR), 2002. He was a General Chairfor the International Conference on Document Analysis and Recognition (ICDAR), 1999.

Dr. Kasturi is a Fellow of the IEEE and a Fellow of IAPR. He received the Penn State Engineering Society Outstanding ResearchAward in 1996 and was elected to the Texas Tech Electrical Engineering Academy in 1997. He is a Golden Core Member of theIEEE Computer Society and has received its Meritorious Service Award and Certi6cates of Appreciation.

About the Author—RAMESH JAIN, Ph.D., is both an entrepreneur and a research scientist. He was the founding chairman ofVirage (NASDAQ: VRGE), a San Mateo-based company developing systems for media management solutions and visual informationmanagement.

Dr. Jain was also the chairman of Imageware Inc., an Ann Arbor, Michigan-based company dedicated to linking virtual andphysical worlds by providing solutions for surface modeling, reverse engineering, rapid prototyping and inspection.

Dr. Jain is a Professor Emeritus of Electrical and Computer Engineering, and Computer Science and Engineering at Universityof California, San Diego. He is a world-renowned pioneer in multimedia information systems, image databases, machine vision,and intelligent systems. While professor of engineering and computer science at the University of Michigan, Ann Arbor and theUniversity of California, San Diego, he founded and directed arti6cial intelligence and multimedia information systems labs. Heis also the founding editor of IEEE MultiMedia magazine. He has also been aLliated with Stanford University. IBM AlmadenResearch Labs, General Motors Research Labs, Wayne State University, University of Texas at Austin, University of Hamburg,Germany, and Indian Institute of Technology, Kharagpur, India.

Ramesh is a Fellow of IEEE. AAAI, and Society of Photo-Optical Instrumentation Engineers, and member of ACM, PatternRecognition Society, Cognitive Science Society, Optical Society of America and Society of Manufacturing Engineers. He has beeninvolved in organization of several professional conferences and workshops, and served on editorial boards of many journals. Hewas the founding Editor-in-Chief of IEEE Multimedia, and is on the editorial boards of Machine Vision and Applications, PatternRecognition, and Image and Vision Computing. He received his Ph.D. from IIT, Kharagpur in 1975 and his B.E. from NagpurUniversity in 1969.

super

Documents

video information

video retrieval cbirsystems

pattern recognition

visual information content

contentbased access

retrieval of images

development of methods

imageand video data