spatio-temporal object relationships in image sequences using adjacency matrices

12
SIViP (2012) 6:247–258 DOI 10.1007/s11760-010-0195-3 ORIGINAL PAPER Spatio-temporal object relationships in image sequences using adjacency matrices Bijan G. Mobasseri · Preethi Krishnamurthy Received: 27 August 2009 / Revised: 8 September 2010 / Accepted: 3 October 2010 / Published online: 15 December 2010 © Springer-Verlag London Limited 2010 Abstract In this work, we bring together object tracking and digital watermarking to solve the spatio-temporal object adjacency problem in image sequences. Spatio-tem- poral relationships are established by embedding objects with unique digital watermarks and then by propagating the watermark frame by frame. Watermark propagation is accom- plished by an existing object tracking module so that a tracked object acquires its watermark from the correspon- dences established by the object tracker. The spatio-tem- porally marked image sequences can then be searched to establish spatial and temporal adjacency among objects with- out using traditional spatio-temporal graphs. Borrowing from graph theory, we construct binary adjacency matrices among tracked objects and develop interpretation rules to establish a track history for each object. Track history can be used to determine the arrival of new objects in frames or the chang- ing of spatial and temporal positions of objects with respect to each other as they move through time and space. Keywords Spatio-temporal graphs · Adjacency matrix · Object tracking · Watermarking 1 Introduction Automated visual object tracking is a well-studied discipline [1]. The problem is traditionally defined as the ability to locate objects of interest and estimate their trajectories across frames. When objects are modeled as point targets, probabi- listic state-space methods such as Kalman filters [2] and par- ticle filters [3, 4] have been effective tools. When objects can B. G. Mobasseri (B ) · P. Krishnamurthy Department of Electrical and Computer Engineering, Villanova University, Villanova, PA 19085, USA e-mail: [email protected] be extracted as regions in frames, kernel tracking is a more effective and robust approach. Examples of kernel tracking are template matching such as block matching and, more recently, mean-shift tracking [5, 6]. The demands and requirements of tracking algorithms have changed considerably over time. Proliferation of ground-based surveillance videos, unmanned aerial vehi- cles (UAV) and vastly different challenges in law enforce- ment, defense and homeland security requires capabilities that go beyond simple frame-to-frame tracking of objects. For example, loitering aerial platforms provide persistent staring reconnaissance, surveillance, and target acquisition for hours and perhaps days. During such long exposures, events span large geographical areas that need to be connected and inter- preted. Traditional target-tracking algorithms are primarily concerned with frame-to-frame tracking. However, it is no longer sufficient to merely track targets. What is needed is Tagging, Tracking and Locating Systems (TT&L) [7]. In such a framework, one needs a “return address” by retracing the route and determining the origin and/or path taken by the object of interest. Establishing track history has been studied mainly through the construction of spatio-temporal graphs [8]. An indexing structure that ties all objects within frames as well as across frames is needed. The best-known framework for this pur- pose is a variation of spatio-temporal (ST) graphs such as ST region graphs (STRG) [8]. STRG is an extension of a region adjacency graph (RAG). A RAG is defined by a set of nodes and edges. The nodes represent objects, and edges represent adjacency or connection between objects. A RAG is a spa- tial connectivity map of objects in a frame. STRG is simply a stack of RAGs in time and is represented by a 6-tuple; nodes, spatial edges, temporal edges, node attributes, spa- tial edge attributes, and temporal edge attributes. For object tracking, the availability of STRG for the entire length of 123

Upload: preethi

Post on 25-Aug-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Spatio-temporal object relationships in image sequences using adjacency matrices

SIViP (2012) 6:247–258DOI 10.1007/s11760-010-0195-3

ORIGINAL PAPER

Spatio-temporal object relationships in image sequencesusing adjacency matrices

Bijan G. Mobasseri · Preethi Krishnamurthy

Received: 27 August 2009 / Revised: 8 September 2010 / Accepted: 3 October 2010 / Published online: 15 December 2010© Springer-Verlag London Limited 2010

Abstract In this work, we bring together object trackingand digital watermarking to solve the spatio-temporalobject adjacency problem in image sequences. Spatio-tem-poral relationships are established by embedding objectswith unique digital watermarks and then by propagating thewatermark frame by frame. Watermark propagation is accom-plished by an existing object tracking module so that atracked object acquires its watermark from the correspon-dences established by the object tracker. The spatio-tem-porally marked image sequences can then be searched toestablish spatial and temporal adjacency among objects with-out using traditional spatio-temporal graphs. Borrowing fromgraph theory, we construct binary adjacency matrices amongtracked objects and develop interpretation rules to establisha track history for each object. Track history can be used todetermine the arrival of new objects in frames or the chang-ing of spatial and temporal positions of objects with respectto each other as they move through time and space.

Keywords Spatio-temporal graphs · Adjacency matrix ·Object tracking · Watermarking

1 Introduction

Automated visual object tracking is a well-studied discipline[1]. The problem is traditionally defined as the ability tolocate objects of interest and estimate their trajectories acrossframes. When objects are modeled as point targets, probabi-listic state-space methods such as Kalman filters [2] and par-ticle filters [3,4] have been effective tools. When objects can

B. G. Mobasseri (B) · P. KrishnamurthyDepartment of Electrical and Computer Engineering, VillanovaUniversity, Villanova, PA 19085, USAe-mail: [email protected]

be extracted as regions in frames, kernel tracking is a moreeffective and robust approach. Examples of kernel trackingare template matching such as block matching and, morerecently, mean-shift tracking [5,6].

The demands and requirements of tracking algorithmshave changed considerably over time. Proliferation ofground-based surveillance videos, unmanned aerial vehi-cles (UAV) and vastly different challenges in law enforce-ment, defense and homeland security requires capabilitiesthat go beyond simple frame-to-frame tracking of objects. Forexample, loitering aerial platforms provide persistent staringreconnaissance, surveillance, and target acquisition for hoursand perhaps days. During such long exposures, events spanlarge geographical areas that need to be connected and inter-preted. Traditional target-tracking algorithms are primarilyconcerned with frame-to-frame tracking. However, it is nolonger sufficient to merely track targets. What is needed isTagging, Tracking and Locating Systems (TT&L) [7]. In sucha framework, one needs a “return address” by retracing theroute and determining the origin and/or path taken by theobject of interest.

Establishing track history has been studied mainly throughthe construction of spatio-temporal graphs [8]. An indexingstructure that ties all objects within frames as well as acrossframes is needed. The best-known framework for this pur-pose is a variation of spatio-temporal (ST) graphs such as STregion graphs (STRG) [8]. STRG is an extension of a regionadjacency graph (RAG). A RAG is defined by a set of nodesand edges. The nodes represent objects, and edges representadjacency or connection between objects. A RAG is a spa-tial connectivity map of objects in a frame. STRG is simplya stack of RAGs in time and is represented by a 6-tuple;nodes, spatial edges, temporal edges, node attributes, spa-tial edge attributes, and temporal edge attributes. For objecttracking, the availability of STRG for the entire length of

123

Page 2: Spatio-temporal object relationships in image sequences using adjacency matrices

248 SIViP (2012) 6:247–258

video is necessary to determine the objects’ spatio-temporalrelationships. However, constructing, maintaining, and stor-ing this structure is complicated and time-consuming. In a1 h of video tracking N objects at 30 frames per second,STRG consists of 3,600 N nodes. If each node is describedby a 6-tuple, then there are 21,600 items per tracked objectto store.

Digital watermarking has been used in still image andvideo for a variety of purposes including ownership infor-mation, metadata binding, fingerprinting, data hiding, andtracing [9]. Signal processing forensics is an important off-shoot of watermarking [10]. In this work, we propose the con-cept of a spatio-temporal watermark and propose a potentialapplication for it. The digital watermarking of images is anexample of a still spatial watermark. The watermarking ofvideo seems to qualify as a spatio-temporal watermark, butit does not have to be. Video watermarking is often an exten-sion of image watermarking whereby still-image watermark-ing algorithms are applied to individual frames. We definea spatio-temporal watermark as a watermark that is attachedto features or structures in a frame which then propagatesthroughout the video. This video is self-contained with foren-sic information that can be used to perform a data-mining ofsorts on the video. Although some video watermarks do prop-agate in video, this phenomenon is considered an undesirableside effect that needs to be corrected. For example, whenthe watermark is embedded in the DC differential coeffi-cients of macroblocks[11], the propagation of the watermarkcombined with differential decoding will impact quality.Therefore, methods are often proposed to offset watermarkpropagation.

In this work we propose we build upon previous workin [12,13] and use digital watermarking of video objects asa compact alternative to STRG. The algorithm rides on topof a target-tracking module, embeds unique watermarks intracked objects and propagates each watermark throughoutthe video. In effect, targets are stamped with digital signa-tures that can be later searched for. A track history and adja-cency procedure are developed to extract the watermark fromtracked objects and establishes temporal and spatial adja-cency across multiple frames using the concept of an adja-cency matrix. New interpretation rules are developed thatuses the structure of adjacency matrices to establish objectrelationships The approach consists of six steps: (1) objectdetection, (2) tracking, (3) object watermarking, (4) water-mark propagation, (5) adjacency matrix construction, and (6)adjacency matrix interpretation.

2 Object tagging and track history

Watermarking has been used for the authentication and fin-gerprinting of multimedia signals for some time now [14],

but its application in the context of object tracking is new. Inthe following, we develop an integrated framework for objecttracking, watermarking and watermark propagation.

2.1 Object tracking framework

Object tracking is often formulated as a state-space estima-tion problem. At initialization, objects are localized eithermanually or by a model-driven segmentation stage. Assumethere are N objects in each frame and the video consists ofK frames. Object n in frame k is characterized by a statevector {x(n)

k , n = 1, . . . , N , k = 1, . . . , K }. We define theextent of object n in frame k by a bounding box Rn

k where

x(n)k points to its centroid. From x(n)

k−1, the tracker recursively

estimates x(n)k = f (x(n)

k−1, vk−1) where vk−1 is system noisevector. In general, f is a vector-valued, nonlinear, and time-varying function[5]. At the conclusion of the first step, thetracker has generated updated state vectors in frame k forevery object found in frame k − 1. Generally, tracking con-sists of a prediction and an update stage. Prior probabilitydensity function (pdf) of the state vector in frame k usingmeasurements through k − 1 frame is computed. This pdfis then updated by the Bayes rule once a new measurementin frame k becomes available. If the state-space equation islinear, then Kalman filtering produces an optimal estimate ofobject location. The function of the tracking algorithm is toestablish correspondence among same objects in consecutiveframes, i.e. {Rn

k−1 ⇔ Rnk , n = 1, . . . , N }. Among the many

tracking algorithms, we have selected mean-shift tracking [5]for its effectiveness and ease of implementation.

2.2 Object tagging and watermark propagation

Once correspondence is established, each object receives aunique tag. Tagging object n is accomplished by watermark-ing R(n)

k to produce R(wn)k = g(R(n)

k , wn) where g is theembedding function and wn is the watermark to be embed-ded in the nth object. Object support area in surveillancevideos obtained from standoff distances are small. Often, a16 × 16 or 32 × 32 block of pixels covers most of the target.Smaller blocks have less impact on the frame but are harderto detect. We have used blocks as small as 4×4 with success.Objects are first initialized in the first frame. Initialization cantake place manually or by applying target models. We usedmasks extracted from real data for initialization. We haveselected additive spread spectrum watermarking model [15]to watermark targets. The idea is to spread the watermarkbits over a range of frequencies in the cover image. In thisapproach, objects are watermarked by additively modifyingthe mid-frequency DCT coefficients of object block Rk . Thewatermark is spread by a pseudo-random sequence gener-ated by a secure key. The block is then inverse transformed

123

Page 3: Spatio-temporal object relationships in image sequences using adjacency matrices

SIViP (2012) 6:247–258 249

and replaces the original block. The watermarking of objectblocks must be reasonably robust and not produce visible andadverse impact on image quality. Robustness is achieved byselecting the mid-frequency coefficients for watermarking.The strength of the watermark provides a balance betweenrobustness and visibility. Although we use [15], there area number of variations of spread spectrum watermarking.Two more recent variations are improved spread spectrum[16] and double-sided watermarking [17]. Both exhibit morerobustness compared to additive spread spectrum, but for thepurposes of achieving the objectives of this work [16] hasproven to be satisfactory.

Watermark detection is implemented by a correlationdetector. To detect the nth object, a sliding block correlationof wn with the frame is calculated. Correlation will show apeak if an image block contains wn . However, this outcomeis not guaranteed unless watermark sequences are chosenproperly. Three conditions may arise at the detection stage:(1) false positives, (2) missed objects, and (3) mislabeledobjects. False positives occur when an unmarked data blockis falsely identified as carrying a watermark. Misses occurwhen the watermark embedded in the object is not detected.Mislabeling occurs when one object watermark is mistakenfor another. To minimize such outcomes, the following con-ditions must be met:

〈wi , w j 〉 ∼ φ ∀i, j, i �= j (1)

〈wi , wi 〉 > 〈wi , w j 〉 ∀(i, j), i �= j (2)

〈wi , Rwik 〉 > 〈wi , R

w jk 〉, ∀i �= j (3)

〈wi , Ik(x, y)〉 < 〈wi , Rwik 〉 (4)

where Ik(x, y) is unmarked image data. Expression in (1)assures near orthogonality of watermark sequences. In ourexperience, normally distributed random numbers generatedwith different seeds are near-orthogonal for the purposes ofthis work. Expression in (2) ensures that a watermark auto-correlation is higher than the cross-correlation with otherwatermarks. This condition helps keep object mislabelinglow. Expression in (3) ensures that the property in (2) holdswhen the watermark is embedded in the object. Therefore,this condition also helps in reduced mislabeling of objects.Expression in (4) ensures that watermark correlation withimage data is smaller than correlation with a watermarkedblock. This condition reduces the chance of false detection.

Track watermarking summary

1. Initialize tracking by locating objects in frame k,{x(n)

k , n = 1, . . . , N}

.

2. Identify regions of support for each object{Rn

k , n = 1, . . . , N}.

3. Embed{Rnk , n = 1, . . . , N } with {wn, n = 1, 2, . . . , n}

to produce{

Rwnk , n = 1, . . . , N

}.

4. Establish correspondence among objects in frames k andk + 1.

5. Propagate watermark {wn, n = 1, . . . , N } from frame kto k + 1 for corresponding objects.

6. Repeat 1–5 for all frames.

At the conclusion of the process, all objects are uniquelywatermarked. Each watermark is also propagated throughoutthe video such that each object carries its own tag from frameto frame. The video is now self-contained with track historyinformation. The question now is how to extract this trackhistory and recover object adjacency information.

3 Watermark detection and track recovery

To establish spatio-temporal links among objects, a water-mark detection stage is implemented. Detecting an imageblock that contains a specific watermark wi is tantamount todetecting object i . Watermark detection does not have to beperformed in every frame because object relationships areoften needed over a subset of frames or just a pair of frames.Therefore, watermark recovery is performed on an as-neededbasis. This is an important departure from spatio-temporalgraphs where the tree needs to be built and stored for theentire video. Watermark detection is implemented as a blindcorrelation detector. A blind watermark detector does notrequire the original unmarked object block. It is sufficientto know the watermark sequence embedded in the object.As the location of the object is not known, the detector isimplemented using the inner product of the watermark witha sliding block across the frame. The detector output shouldpeak when an image block contains the desired watermark.We model object adjacency by four compass directions. Thismeans that an object may be located to the north, east, west,or south of another object. Define two outer product matrices,Vwe and Vns. Vwe localizes the object in a west–east (W–E)direction and Vns localizes the object in a north–south (N–S)direction. Compass directions and orientation of the frameis available from the imaging platform. For the rest of thiswork, we use N–S to mean vertical positioning and W–E tomean horizontal positioning of objects in the frame. Define,

V[ns,we] =

⎡⎢⎢⎢⎣

w1

w2...

wN

⎤⎥⎥⎥⎦

[B1 B2 . . . Bm . . . Bp

]

=

⎡⎢⎢⎢⎣

〈w1, B1〉 〈w1, B2〉 . . . 〈w1, Bm〉 . . . 〈w1, Bp〉〈w2, B1〉 〈w2, B2〉 . . . 〈w2, Bm〉 . . . 〈w2, Bp〉

...... . . . . . .

...

〈wN , B1〉 〈wN , B2〉 . . . 〈wN , Bm〉 . . . 〈wN , Bp〉

⎤⎥⎥⎥⎦

(5)

123

Page 4: Spatio-temporal object relationships in image sequences using adjacency matrices

250 SIViP (2012) 6:247–258

Fig. 1 Object localization bywatermark detection. (Vwe)localizes objects from left toright as (A, B, C, D). (Vns)localizes objects from top tobottom as (B, D, A, C).

Each element is the inner product of the watermarksequence with an image block. B j is the j th sequentiallynumbered position of the sliding block where B1 is the topleft and Bp is the bottom right position. The number of slid-ing block positions is p = (M − m + 1)2 where M × Mis the frame size and m × m is the block size. Sliding blockoperation is performed column-wise (Fig. 1) for Vwe and row-wise (Fig. 1) for Vns. For Vwe and Vns, blocks are numberedcolumn-wise and row-wise, respectively. If all N targets arepresent in the frame, each row will show a peak in some col-umn. The column where Vwe peaks localizes the object hori-zontally in the frame. The column where Vns peaks localizesthe object vertically in the frame. Therefore, for every rowr , there is a column c∗

r where the sliding block correlationpeaks. Form an N × 2 matrix where each row consists of therow and column where Vwe or Vns peaks.

q = [r, c∗r ] (6)

where by definition r = [1, 2, . . . , N ]T . The key property of(6) is the column location of peaks in each row. For Vwe, asmall column number indicates that the corresponding objectis more to the west of other objects. Similarly, for Vns, smallcolumn number indicates that the corresponding object ismore to the north. To automatically sequence the detectedtargets from west to east or north to south, sort (6) by thesecond column.

qs = [rs, c∗rs] (7)

rs contains the geographically ordered objects. For a numer-ical example let us apply this procedure to Fig. (1). Afterevaluating (5), (6) is given by,

q = [(A, 300), (B, 100), (C, 3, 400), D(, 200)] (8)

qs = [(B, 100), (D, 200), (A, 300), (C, 400)] (9)

Detected objects are located north to south in the followingorder, rs = [B, D, A, C]. In summary,

1. Sort each row in (5) in descending order.

2. Extract rows and columns of the elements of the firstcolumn of the sorted matrix.

3. Sort extracted (row,column) pairs by the column.4. Row numbers indicate object locations from left to right

of the frame(for Vwe) and top to bottom (for Vns).

4 Spatio-temporal object relationship using adjacencymatrices

We have looked at three frameworks for modeling adjacency:(1) graphs, (2) adjacency matrices, and (3) linked lists. Wehave selected adjacency matrices for their simple structureand their ease of interpretation and implementation. Adja-cency matrices are a special case of binary matrices. Binarymatrices have been studied in the context of switching cir-cuits [18] and more recently in coding theory [19]. Binarymatrices find applications in non-numerical data as well. Forexample, in [20] six such datasets are mentioned. Other appli-cations include frequent item sets [21] and graph partitioning[22]. A wide variety of operations can be performed on binarymatrices to extract useful information. An eigen decomposi-tion of binary matrices is defined in the form of a binary PCA[20]. The adjacency matrix of a finite directed or undirectedgraph G consisting of n nodes is an n × n binary matrixA = {ai j , i, j = 1, . . . , n} where the nondiagonal entry ai j

is 0 if there is no link between node i and j and is 1 otherwise.Diagonal entries are all zero. For nondirected graphs, A issymmetrical with 0s on the diagonal as well. Each graph has aunique adjacency matrix and vice versa. Figure 2 shows sev-eral examples of graphs and their corresponding adjacencymatrices. Adjacency matrices describe their correspondinggraphs in a compact way. For example, the number of 1s inrow i is equal to the number of connected nodes to node i .

We will use binary matrices as a tool to establishintraframe and interframe relationship among objects. In in-trafame mode, object locations with respect to each otherare defined for a single frame. In interframe mode, objectlocations with respect to each other are defined for multiple

123

Page 5: Spatio-temporal object relationships in image sequences using adjacency matrices

SIViP (2012) 6:247–258 251

Fig. 2 Graphs and theiradjacency matrices

frames as objects’ positions change over time. For exam-ple, in aerial video footage, vehicle positions in two arbitraryframes change as a result of the passing, stopping or entry ofnew vehicles in the scene. We define a derivative of the adja-cency matrix to establish such spatio-temporal relationships.

4.1 Target adjacency matrices

In the context of target tracking, T = {Ti , i = 1 . . . N } repre-sents N tracked targets in video frames. We develop a layeredTarget Adjacency Matrix (TAM) that can be used to estab-lish target positions relative to each other in each frame andalso the position changes in time. TAM is a binary matrix ofdimensions N × N × l where N is the number of targets andl is the number of layers. The definition of a layer varies bythe application. In this work, we use two layers defined alongnorth–south and west–east directions. The reason is that wewould like to localize objects relative to others in each videoframe. Each TAM layer can be represented by an adjacencymatrix shown in (10).

ALayer =

⎡⎢⎢⎢⎣

0 a12 . . . a1n

a21 0 . . . a23...

... . . ....

an1 an2 . . . 0

⎤⎥⎥⎥⎦ , Layer = {ns, we} (10)

For Ans, if ai j = 1 then Ti is to the south of Tj . Similarly,in Awe if ai j = 1 then Ti appears to the east of Tj . As anexample, consider a frame with three targets T1, T2 and T3

shown in Fig. 3. The adjacency matrices in (11) describe tar-gets spatial relationship in a compact way. The occurrences of0’s and 1’s are consistent with the definitions just given. Wewill show that these matrices can be automatically formedin software by using data extracted from the sliding blockcorrelation detector.

Ans =⎡⎣

0 0 01 0 01 1 0

⎤⎦ , Awe =

⎡⎣

0 1 00 0 01 1 0

⎤⎦ (11)

Fig. 3 Frame with three targets. T1 is north of T2 and T3; T3 is to theeast of T1 and T2. These relationships are captured in target adjacencymatrices

4.2 Populating TAM

Target adjacency matrices can be automatically populatedfrom the data available in (5). For a given frame with Ntargets, the number of rows and columns of both layers ofTAM is equal to N . The diagonal values of this matrix areinitialized to zero as the adjacency of the target with itselfis not considered. The starting point for populating TAM is(7). From the data available in (5), target adjacency is cap-tured in the array rs = [r1, r2, . . . , rN ]. First, Ans is set toan N × N matrix of zeros. Ans is populated one row at atime. Starting with row r1, all columns are set to 0 becausethe object designated by r1 is to the north of all others. Rowr2 is populated by setting column r1 to 1. Row r3 is pop-ulated by setting columns r1 and r2 to one and so on. Ingeneral,

Ans(r1, r j ) = 0, j = 2, . . . , N (12)

Ans(ri , r j ) = 1, i = 2, . . . , N , j = 1, . . . , i − 1 (13)

The same procedure yields Awe.

123

Page 6: Spatio-temporal object relationships in image sequences using adjacency matrices

252 SIViP (2012) 6:247–258

4.3 Intraframe TAM

Target adjacency matrices have special properties that helppaint a complete picture of target dispositions in a singleframe or their movements in different frames. We defineintra-TAM as a matrix computed from a single frame andused to establish spatial relationship in that frame. InterframeTAM (inter-TAM) is obtained from a two different framesand used to establish how targets have moved with respectto each other over time. These properties can be extractedby simple algorithms to provide an automated interpretationtool.

For intraTAM,

1. In Ans, if ai j = 1, Ti is to the south of Tj .2. In Awe, if ai j = 1, Ti is to the east of Tj .3. In Ans, the column with the maximum number of 1s is

the most northerly target.4. In Awe, the column with the maximum number of 1’s is

the most westerly target.5. If i th row and i th column are all 0’s, then Ti is not present

in the frame.6. The sum of column i in Ans is the number of targets to

the south of Ti .7. The sum of column i in Awe is the number of targets to

the west of Ti .8. All zeros in column i of Ans or Awe means that Ti is

the most southerly or most easterly target in the scene,respectively.

4.4 Interframe TAM

Interframe TAM (interTAM) describes position changesamong targets in the time spanned between two arbitraryframes. InterTAM is constructed by XNORing two intra-frame TAMs:

interT AM = T AMi ⊕ T AM j (14)

Inspection of interTAM reveals the following prop-erties:

1. An all 1 matrix indicates that there have been no changesin target positions in any direction in the two frames.

2. All 1s in column i implies that the relative position oftarget i with respect to other targets in the two frameshas not changed.

3. The number of 0s in column i equals the number ofposition changes of target i relative to others targets.

4. In column i , the row indices of 1s indicate no change inposition of target i relative to the targets on the same rowindex.

5. If column i has a 0 entry in row j , then target i haschanged position with respect to target j .

6. If column i has an entry of 1 in row j , then the positionof target i has not changed relative to target j .

4.5 Algorithm summary

Target adjacency can be established in seven steps asfollows:

1. Track targets in video.2. Embed digital watermarks in all targets in the reference

frame and propagate them throughout the frames.3. Perform sliding block correlation for every watermark

and store in a matrix.4. Calculate peak index values of the correlation matrix.5. Populate the TAM layers from the sorted peak index

values.6. For a single frame, follow the interpretation rules for

intraframe TAM.7. For multiple frames, follow the interpretation rules for

interframe TAM.

5 Experimental results

The proposed algorithm is implemented on 1820 frames ofEgtest01 in the CMU dataset [23]. Four randomly selectedframes are shown in Fig. 4. The idea is to establish sequenc-ing and change of position among the vehicles across framesusing track watermarks.

5.1 Tracking

We have selected the mean-shift tracking algorithm[5]. Inorder to begin tracking, an initialization step is needed. Fiveof the six cars are initialized in frame 1 (Fig. 5a). How-ever, the sixth car (in red) is initialized later in the videowhen it makes its first appearance in frame 64 (Fig. 5b).The location and size of these cars are given by the masksshown in Fig. 6. Initialization of all the cars at their entry orre-entry into the video is done using masks shown in Fig. 6.The masks are superimposed on the frame of the car’s firstappearance. Masks shown in Fig. 6a–e are generated fromFig. 5a and the mask shown in Fig. 6f is generated fromFig. 5b. The histograms of these cars are computed and storedfor the generation of Bhattacharyya coefficient in the futureframes. The cars’ density estimates are weighted by a mono-tonically decreasing Epanechnikov kernel [24]. The pixelvalues within the region encompassing the car are normal-ized and then their Euclidean distance of each of these pixelsfrom the center of the target is computed. Smaller weightsare assigned to pixels farther from the center.

123

Page 7: Spatio-temporal object relationships in image sequences using adjacency matrices

SIViP (2012) 6:247–258 253

Fig. 4 Sample frames from thevideo. The goal is to establishthe spatio-temporalrelationships among vehicles.The ambiguity surrounding theidentity of vehicles and positionchanges in different framesunderlines the problem

Fig. 5 Frames used forgenerating masks

Fig. 6 Car masks giving the shape, size, and location

123

Page 8: Spatio-temporal object relationships in image sequences using adjacency matrices

254 SIViP (2012) 6:247–258

Fig. 7 Tracked cars are circled.Same color circles indicate thesame car

Fig. 8 Two frames with eachcar carrying a uniquewatermark. Cars carrying thesame watermark in two differentframes are in fact the same cars

Track information extracted at this time forms the basisfor building the conventional spatio-temporal tree of trackedcars. It is at this point that our algorithm departs from thatapproach. A sample of tracked cars appear in Fig. 7.

5.2 Track watermarking

For each frame, the tracker establishes a match betweencorresponding cars in adjacent frames. The centroids ofmatching cars are then available in the two frames. Spread

spectrum watermarking is used to mark tracked cars. Forthe six cars that are tracked in the span of the video, sixunique watermarks are defined. Each watermark is used totag a car using DCT watermarking of 4x4 block of pixelsresiding inside a bounding box containing the car. Choosinglarger blocks increases spread spectrums processing gain.This gain provides more robustness when video is subjectedto compression. For applications involving aerial footage ofvehicular traffic, 8x8 blocks are perhaps as large as one needs.In practice, blocks need to be just big enough to contain the

Fig. 9 Correlation plots with medium and high SWR

123

Page 9: Spatio-temporal object relationships in image sequences using adjacency matrices

SIViP (2012) 6:247–258 255

Fig. 10 Peak plots to obtain peaks of Vwe of all cars in frame 1

vehicles. In [15] it is reported that the bit error rate for spreadspectrum watermarking is near zero for JPEG Q-factor above70 with 100% embedding density. The baseline JPEG Q-fac-tor, the threshold above which compression is deemed imper-ceptible, is 75 [25]. Lowering the embedding density to 50%allows error-free watermark recovery for Q-factors downto 55.

The watermarks are sequences of random numbers withsmall cross-correlation coefficients. This property helpsreduce false matches during detection. Commonly,m-sequences are used although we have observed no per-formance gains compared to using sequences drawn from aGaussian distribution. Once all cars are watermarked in thecurrent frame, the watermarks propagate to matching cars

123

Page 10: Spatio-temporal object relationships in image sequences using adjacency matrices

256 SIViP (2012) 6:247–258

in the following frames. As the tracker proceeds, the water-marks propagate from frame to frame. Figure 8 shows twoframes with each car carrying a unique watermark. Num-bers attached to the cars serve as a visual identification ofembedded watermarks.

5.3 Watermark detection

At this point, the video is a self-contained entity that can beused to establish complete track history and object relation-ships. The first task for the decoder is the evaluation of (5)for frame 1 or another reference frame. Each row of (5) is thecross-correlation of one watermark with a sliding block inthe frame. If target i is present in the frame, then row i mustshow a peak. There are cases of false targets, wrong targets, ormissed targets. Since individual watermark sequences are notperfectly orthogonal, targets could be misidentified for eachother. However, in our experiments we have not encounteredany. On the other hand, for high signal-to-watermark (SWR)ratios (weak watermark), the watermark-background corre-lation may mask the peak and lead to missed target. Fig. 9shows watermark correlation plots for low and high SWRs.Clearly, SWR=57 dB is much more prone to cause detectionerror than SWR=30 dB. Once watermarks are detected, (5)can be populated. In (5), the column locations of the peaksare of interest. A graphical representation of rows of (5) areshown in Fig. 10. The peak locations are tabulated in Table 1.To populate TAMs, we sort this table by peak location andshow them in Tables 2 and 3. In Vwe, the row with the small-est column number for its peak shows that the correspondingcar is to the west of all other cars. Similarly, the car with thesmallest peak column in Vns is the northern of all the cars. Fora more formal interpretation, binary target adjacency matri-ces in (12) are formed from the data.

From the data reported in the tables, two layers (west–eastand north–south) of TAM are obtained for frame 1 and shownin (15).

Ans =

⎡⎢⎢⎢⎢⎣

0 0 0 0 01 0 0 0 01 1 0 0 01 1 1 0 11 1 1 0 0

⎤⎥⎥⎥⎥⎦

, Awe =

⎡⎢⎢⎢⎢⎣

0 0 0 0 01 0 0 0 01 1 0 0 01 1 1 0 01 1 1 1 0

⎤⎥⎥⎥⎥⎦

(15)

5.4 Interpretation of intraframe TAM

Simple operations on (15) matrices can completely describethe spatial relationships among the vehicles. For example,looking at Vns car 1 is to the north of all the other cars becauseall entries in column 1 (except the diagonal) are 1s. Car 4 isto the south of all other cars because column 4 is an all zerocolumn. Summing (15) column-wise provides adjacencyinformation as well. In Vns, the column with the largest sum

Table 1 Peak index values

Car i V maxwe V max

ns index

1 18,480 59,160

2 69,150 74,460

3 122,300 78,240

4 173,200 169,100

5 185,700 118,100

Table 2 Sorted Vwe peak indexvalues Car i V max

we index

1 18,480

2 69,150

3 122,300

4 173,200

5 185,700

Table 3 Sorted Vns peak indexvalues Car i V max

ns index

1 59,160

2 74,460

3 78,240

5 118,100

4 169,100

corresponds to the car to the north of all others, which wouldbe car 1. Similar interpretations can be carried out on Vwe.For example, the column in Vwe corresponding to the car tothe west of all others will have the largest sum, which wouldalso be car 1. These rules can be easily coded for machineinterpretation of adjacency information.

5.5 Interpretation of interframe TAM

Frames 68 and 202 are pulled, and their corresponding targetadjacency matrices are computed. These frames are as shownin Fig. 11. The interTAM for the two frames are constructedfrom individual intraTAMs as shown in (16). Interpretationfor north–south adjacency is as follows:

1. From column 1→car 1’s position remains the same w.r.tcars 2, 3, and 6, but varies w.r.t cars 4 and 5.

2. From column 2→car 2’s position varies w.r.t cars otherthan car 1 and car 6.

3. From column 3→car 3’s position remains same w.r.t cars1, 5, and 6, but varies w.r.t cars 2 and 4.

4. From column 4→car 4’s position varies w.r.t all othercars except car 5.

5. From column 5→car 5’s position remains the same w.r.tcars 3 and 4, but varies w.r.t cars 1, 2, and 6.

123

Page 11: Spatio-temporal object relationships in image sequences using adjacency matrices

SIViP (2012) 6:247–258 257

Fig. 11 Frames used forinterframe TAM calculation

6. From column 6→car 6’s position remains same w.r.tother cars except cars 4 and 5.

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0 0 0 0 0 1

1 0 0 0 0 1

1 1 0 0 1 1

1 1 1 0 1 1

1 1 0 0 0 1

0 0 0 0 0 0

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0 0 0 1 1 1

1 0 1 1 1 1

1 0 0 1 1 1

0 0 0 0 1 0

0 0 0 0 0 0

0 0 0 1 1 0

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 1 1 0 0 1

1 1 0 0 0 1

1 0 1 0 1 1

0 0 0 1 1 0

0 0 1 1 1 0

1 1 1 0 0 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

north–south north–south north–south⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0 1 0 0 0 1

0 0 0 0 0 0

1 1 0 0 0 1

1 1 1 0 0 1

1 1 1 1 0 1

0 1 0 0 0 0

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0 0 0 0 0 1

1 0 0 0 0 1

1 1 0 1 1 1

1 1 0 0 1 1

1 1 0 0 0 1

0 0 0 0 0 0

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 0 1 1 1 1

0 1 1 1 1 0

1 1 1 0 0 1

1 1 0 1 0 1

1 1 0 0 1 1

1 0 1 1 1 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

west–east west–east west–east↑ ↑ ↑

Frame 68 Frame 202 XNOR

(16)

Interpretation for west–east adjacency is as follows,

1. From column 1 → car 1 has not changed position w.r.tcars 3, 4, 5, and 6, but varies w.r.t car 2.

2. From column 2 → car 2’s position varies w.r.t cars 1and 6.

3. From column 3 → car 3’s position remains the samew.r.t cars 1, 2, and 6.

4. From column 4 → car 4’s position varies w.r.t cars otherthan cars 1, 2, and 6.

5. From column 5 → car 5’s position remains same w.r.tcars 1, 2, and 6.

6. From column 6 → car 6’s position varies only w.r.tcar 2.

6 Conclusions

In this work, we have brought together elements of objecttracking, watermarking, and spatio-temporal relationshipswithin a single framework. We have produced a self-contained video that can be processed to establish objecttracks, track history, and relationships among objects withoutbuilding, maintaining, and storing spatio-temporal graphs.Instead, the relationship among objects is captured in adja-cency matrices. Adjacency matrices are a well-establishedframework and are used extensively in graph theory. We firstpopulate the adjacency matrices by identifying the watermarkin the detected objects and then propose an interpretation rulefor machine extraction of inter- and intraframe adjacencyinformation among objects. The success of the algorithm isclearly dependent on the accuracy of the tracking module butis independent of the tracking algorithm itself. As such, theproposed object adjacency algorithm can be an addition toan existing object tracking effort.

Acknowledgments This work was supported in part by the Centerfor Advanced Communications, Villanova University, Villanova, PA.

References

1. Yilmaz, A., Javed, O., Shah, M.: Object tracking: a survey. ACMComput. Surv. 38(4) (2006)

2. Broida, T.J., Chellappa, R.: Estimation of object motionparameters from noisy images. IEEE Trans. Pattern Anal. Mach.Intell. 8(1), 90–99 (1986)

3. Kitagawa, G.: Non-gaussian state-space modeling of nonstationarytime series. J. Am. Stat. Assoc. 82(400), 1032–1041 (1987)

4. Hue, C., Le Cadre, J.P., Perez, P.: Tracking multiple objects withparticle filtering. IEEE Trans. Aero. Electron. Syst. 38(3), 791–812 (2002)

5. Comaniciu, D., Ramesh, V., Meer, P.: Kernel-based object track-ing. IEEE Trans. Patt. Anal. Mach. Intell. 25(5), 564–575 (2003)

6. Wang, J., Yagi, Y.: Adaptive mean-shift tracking with auxil-iary particles. IEEE Trans. Syst. Man Cybernet. Part B Cyber-net. 39(6), 1578–1589 (2009)

123

Page 12: Spatio-temporal object relationships in image sequences using adjacency matrices

258 SIViP (2012) 6:247–258

7. Tether, T.: Statement by director, defense advanced research pro-ject agency to subcommittee on terrorism, unconventional threatsand capabilities house armed services committee United StatesHouse of Representatives. http://www.dod.gov/dodgc/olc/docs/testTether080313.pdf, March 13 (2008)

8. Lee, J., Oh, J., Hwang, S.: STRG-index: spatio-temporal regiongraph indexing for large video databases. In: Proceedings of the2005 ACM SIGMOD International Conference on Managementof Data, pp. 718–729 (2005)

9. Cox, I., Miller, M., Bloom, J., Fridrich, J.: Digital Water-marking and Steganography, 2nd edn. Morgan-Kaufmann, LosAltos (2007)

10. Fridrich, J.: Digital image forensics. Sign. Process. Magaz.26(2), 26–37 (2009)

11. Liu, H., Shao, F., Huang, J.: A MPEG-2 video watermarkingalgorithm with compensation in bit stream, in Digital RightsManagement. Technologies, Issues, Challenges and Systems,vol. 3919/2006, pp. 123–134. Springer, Berlin (2006)

12. Mobasseri, B.G., Krishnamurthy, P.: Track history development bycombining watermarking and target tracking, 6962, 696209 (2008)(SPIE)

13. Mobasseri, B.G., Krishnamurthy, P.: Establishing target track his-tory by digital watermarking, 6819, 68190W (2008) (SPIE )

14. Hartung, F., Kutter, M.: Multimedia watermarking techniques.Proc. IEEE 87(7), 1079–1107 (1999)

15. Kutter, M., Winkler, S.: A vision-based masking model forspread-spectrum image watermarking. IEEE Trans. Image Pro-cess. 11(1), 16–25 (2002)

16. Malvar, H.S., Florêncio, D.A.F.: Improved spread spectrum: a newmodulation technique for robust watermarking. IEEE Trans. SignalProcess 51(4), 898–905 (2003)

17. Zhong, J., Huang, S.: Double-sided watermark embedding anddetection. IEEE Trans. Inform. For. Sec. 2(3), 297–310 (2007)

18. Harrison, M.A.: On the number of classes of binary matrices. IEEETrans. Comput. C-22(12), 1048–1052 (1973)

19. Wadayama, T.: On undetected error probability of binary matrixensembles, CoRR, vol. abs/0705.3995 (2007)

20. de Leeuw, J.: Principal component analysis of binary databy iterated singular value decomposition. Comput. Stat. DataAnal. 50(1), 21–39 (2006)

21. Yang, G.: The complexity of mining maximal frequent itemsets andmaximal frequent patterns. In: KDD ’04: Proceedings of the TenthACM SIGKDD International Conference on Knowledge Discov-ery and Data Mining. New York, pp. 344–353. ACM (2004)

22. Walter, C.D.: Adjacency matrices. SIAM J. Algeb. Dis. Meth-ods 7(1), 18–29 (1986)

23. Collins, R.T., Zhou, X., Teh, S.K.: An open-source tracking test-bed and evaluation web site. In: IEEE International Workshop onPerformance Evaluation of Tracking and Surveillance, pp. 17–24(2005)

24. Epanechnikov, V.A.: Nonparametric estimation of a multidimen-sional probability density. Theor. Prob. Appl. 14, 153–158 (1969)

25. Said, A., Guleryuz, O.: Exact JPEG recompression. In: Proceed-ings of Visual Information Processing and Communication. SPIE,7543 (2010)

123