ieee transactions on pattern analysis and...

Network Consistent Data AssociationAnirban Chakraborty,Member, IEEE, Abir Das, Student Member, IEEE, and

Amit K. Roy-Chowdhury, Senior Member, IEEE

Abstract—Existing data association techniques mostly focus on matching pairs of data-point sets and then repeating this processalong space-time to achieve long term correspondences. However, in many problems such as person re-identification, a set ofdata-points may be observed at multiple spatio-temporal locations and/or by multiple agents in a network and simply combining thelocal pairwise association results between sets of data-points often leads to inconsistencies over the global space-time horizons. In thispaper, we propose a Novel Network Consistent Data Association (NCDA) framework formulated as an optimization problem that notonly maintains consistency in association results across the network, but also improves the pairwise data association accuracies. Theproposed NCDA can be solved as a binary integer program leading to a globally optimal solution and is capable of handling thechallenging data-association scenario where the number of data-points varies across different sets of instances in the network. We alsopresent an online implementation of NCDA method that can dynamically associate new observations to already observed data-pointsin an iterative fashion, while maintaining network consistency. We have tested both the batch and the online NCDA in two applicationareas—person re-identification and spatio-temporal cell tracking and observed consistent and highly accurate data association resultsin all the cases.

Index Terms—Data association, network consistency, integer program, person re-identification, spatio-temporal cell tracking

Ç

1 INTRODUCTION

IN many computer vision problems such as tracking, re-identification etc., associating detected targets across

space and/or time is of utmost importance. Most data asso-ciation approaches try to find correspondences betweenpairs of instances of a set of datapoints and repeat this pro-cess along space/time to obtain long term correspondences.However, this local approach for finding correspondencesmay lead to inconsistencies over the global space-time hori-zons. The goal of this paper is to show how globally consis-tent correspondence results can be obtained by enforcingsuitable network-level constraints over the entire set ofobservation data points.

The notion of Network Consistency is applicable to awide class of data association problems, especially wherethe set of all observations can be partitioned into multiplenon-overlapping subsets such that no two observationsfrom within the same subset may be associated with oneanother. The target of such a problem is to estimate pair-wise associations between observations belonging to dif-ferent subsets, over all possible pairs of such subsets. Theprobable links associating pairs of observations form a net-work of associated observations and thus, the study of

analyzing the feasibility of pairwise associations in contextto the overall network can be termed as Network ConsistentData Association (NCDA). The class of data associationproblems where such consistency is of utmost importanceincludes person re-identification, feature matching acrossmultiple sensors (such as multi-robot localization, map-ping, exploration etc.) and/or across multiple time points(tracking), feature matching in high dimensional biologicaldatasets (such as 3D+t cell tracking) and many more. Weexplain the problem more precisely through two of suchexamples below.

Consider the well-studied person re-identification prob-lem where the objective is to associate targets across cam-eras with non overlapping field-of-views (FoVs). Mostwidely used approaches focus on pairwise re-identification,i.e., association between two camera FoVs. Even if the re-identification accuracy for each camera pair is high, it mightcontain many global association inconsistencies over theentire network if three or more cameras are considered.Matches between targets given independently by every pairof cameras might not conform to one another and, in turn,may lead to inconsistent mappings. Thus, in person re-identification across a camera network, multiple paths ofcorrespondences may exist between targets from any twocameras, but ultimately all these paths must point to thesame correspondence maps for each target in each camera.An example scenario is shown in Fig. 1a. Even though cam-era pairs 1-2 and 2-3 have correct re-identification of the tar-get, the false match between the targets in camera pair 1-3makes the overall re-identification across the triplet incon-sistent. It can be noted that the error in re-identificationmanifests itself through inconsistency across the network,and hence by enforcing consistency the pairwise accuraciescan be improved as well.

Multi-view feature tracking is another application areawhere consistent data association is important. One such

! A. Chakraborty is with the Department of Diagnostic Radiology, NationalUniversity of Singapore, Singapore 119077. This work was done when hewas at the University of California, Riverside, CA 92521.E-mail: [email protected].

! A. Das is with the University of Massachusetts Lowell.E-mail: [email protected].

! A. Roy-Chowdhury is with the Department of Electrical and ComputerEngineering, University of California, Riverside, CA 92521.E-mail: [email protected].

Manuscript received 31 July 2014; revised 8 June 2015; accepted 16 Sept.2015. Date of publication 15 Oct. 2015; date of current version 11 Aug. 2016.Recommended for acceptance by Y. Moses.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference the Digital Object Identifier below.Digital Object Identifier no. 10.1109/TPAMI.2015.2491922

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 38, NO. 9, SEPTEMBER 2016 1859

0162-8828! 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

mailto:

mailto:

mailto:

feature tracking problem is the spatio-temporal cell track-ing. Using confocal microscopes, multicellular biological tis-sues are often imaged at multiple time points to observe thegrowth of hundreds of individual cells in the tissue. At eachtime point, cells within the tissue are imaged at various con-focal planes, thus resulting in a (3D+t) stack of images. Eachcell, therefore, may have projections on different spatio-temporal planes. A cell tracker aims to find correspondencesbetween cell image slices along both ‘z’ (depth of the tissue)and time and hence the problem is same as any multi-viewfeature tracking problem. Because of the multi-dimensionalnature of this tracking problem, spatial and temporal corre-spondences obtained by choosing themost similar candidatefor each cell independently do not guarantee consistentresults. Note that, as in the case of re-identification, a 2D cellsegment in any spatio-temporal image slice must not havemore than one match in any other spatio-temporal imageand if at least one spatio-temporal path exists in the networkthat associates two cell slices, they must be projections of thesame cell onto two image planes. Example of network-levelassociation inconsistencies in the (3D+t) cell tracking prob-lem is shown in Fig. 1b.

The examples above represent the class of relevant dataassociation problems where network consistency has to beenforced and where all the data points are available beforethe associations between them are estimated. Besides these,for many other dynamic data association problemswheremoreobservations become available with time, the consistencyneeds to be applied in an online fashion. Dynamic featuretracking (across space and time), tracklet association, onlinespatio-temporal cell tracking etc. are some of such problemsand the proposed data association method has to beequippedwith both online and offlinemode of operations.

Contribution of the present work. We propose a novel con-sistent data association scheme over a network withindividual observations (i.e., targets) as nodes. We pose the

problem as an optimization problem that minimizes theglobal cost for associating pairs of targets on the entire net-work constrained by a set of consistency criteria and termthis as network consistent data association.

The inputs to NCDA are pairwise similarity scoresbetween targets. Unlike assigning a match for which thesimilarity score is maximum, our formulation picks theassignments for which the total similarity of all matches isthe maximum, under the constraint that there is no inconsis-tency in the assignment among any two sets of targets giventhe assignments between all other sets of targets across thenetwork. The problem is translated into a binary integerprogram that can be solved using standard methods [1].

The proposed NCDA method is further generalized to amore challenging scenario in data association where thenumber of targets may vary across different sets of instancesin the network. For example, in person re-identificationproblems all persons may not appear in all the cameras orin case of cell tracking, 2D projections of new cells appeardeeper into a tissue. The objective function and the con-straints are modified to incorporate probable one-to-nonemappings without jeopardizing the network consistency.

For dynamic data association problems, we propose anonline version of the generalized NCDA method. We showthat the online NCDA uses the same core ideas as the batchversion, but the size of the optimization problem solved ateach time point is substantially smaller than that of the batchversion—in terms of both the number of variables optimizedand the number of constraints. The online NCDA method isable to associate newer observations available over timewith those from the past while strictly maintaining the net-work consistency.

We show the general applicability of the proposedmethod by testing it in two previously mentioned computervision application domains, viz., person re-identificationand spatio-temporal cell tracking. We describe how each ofthese challenges can be mapped to the exact same NCDAproblem, which can then be solved to generate unambigu-ous and more accurate data-association results.

2 RELATED WORK

After discussing about the related work in each of the appli-cation areas, we shall highlight the differences between thecurrent submission with our earlier paper [2] that intro-duced network consistency in person re-identification.

Person re-identification. The existing person re-identificationapproaches are camera pairwise and they can be roughlydivided into three categories: (i) discriminative signaturebased methods [3], [4], [5], [6], (ii) metric learning basedmethods [7], [8], [9], and (iii) transformation learning basedmethods [10], [11]. Person specific discriminative signa-tures are computed using multiple local features (color,shape and texture) [4], [5], [6], [12], [13]. Metric learningbased methods try to improve the re-identification perfor-mance by learning optimal non-Euclidean metric definedon pairs of true and wrong matches [14], [15], [16]. Worksexploring transformation of features between cameraslearn a brightness transfer function (BTF) between appear-ance features [11], a subspace of the computed BTFs [10],linear color variations model [17], or a cumulativeBTF [18] between cameras. As the above methods do not

Fig. 1. Example of network inconsistency in data association. (a) Personre-identification: Among the three possible re-identification results, twoare correct. Match of the target from camera 1 to camera 3 can be foundin two ways. The first one is the direct pairwise re-identification resultbetween cameras 1 and 3 (shown as ‘Path 1’), and the second one isthe indirect re-identification result in camera 3 given via the matched per-son in camera 2 (shown as ‘Path 2’). The two outcomes do not matchand thus the overall associations of the target across three cameras isnot consistent. (b) Network inconsistency in spatio-temporal cell track-ing: In this schematic, association results between 2D projections ofthe same 3D cell on four spatio-temporal image planes are analyzed.The pairwise associations need to be consistent across the loop overthe four image slices. This consistency can be used to obtain corre-spondences when there are no direct pairwise matches or to correctwrong ones. For example, the correspondence between the same cell inimage slice 1 and slice 3 (broken arrow) is established via an indirectpath (solid arrows) through slices 2 and 4.

1860 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 38, NO. 9, SEPTEMBER 2016

take consistency into account, applying them to a cameranetwork does not give consistent re-identification. Sincethe proposed method is built upon the pairwise similar-ity scores, any of the above methods can be the buildingblock to generate the camera pairwise similarity betweenthe targets.

Spatio-temporal cell tracking. There has been some work onautomated tracking and segmentation of cells in time-lapseimages, for both plants and animals. Some of the well-known approaches for segmenting and tracking cells areactive contours based methods [19], [20], [21], [22], [23], Sof-tassign method [24], [25], tracking based on associationbetween detections[26], [27], [28], multiple hypothesesbased tracking[29], joint detection and tracking[30], [31]. In[32], [33], a spatio-temporal tracking algorithm for Arabi-dopsis SAM was proposed, where relative positional infor-mation of neighboring cells were used to generate uniquefeatures for each cell. In [34], the spatio-temporal cell track-ing problem is posed as an inference problem on a condi-tional random field (CRF). However, most of these methodshave focused on slice to slice/pairwise cell tracking. Themethod in [32] utilizes indirect paths between any two slicesto improve the pairwise tracking accuracy. However, thismethod does not ensure spatio-temporally consistent associ-ation results. The proposed NCDA method yields globallyoptimal and consistent correspondences between 2D cell sli-ces when built on any method that can generate similarityscores between cells, such as [34].

Other relevant work: Some recent work aims to find pointcorrespondences in monocular image sequences [35] orlinks detections in a tracking scenario by solving a con-strained flow optimization [36] or using sparse appearancepreserving tracklets [37]. Another flow based method formulti target tracking was presented in [38], which allowsfor one-to-many/many-to-one matchings and therefore cankeep track of targets even when they merge into groups.With known flow direction, a flow formulation of a data-association problem will yield consistent results. But indata-association problems with no temporal or spatial lay-out information (e.g., person re-identification), the flowdirections are not natural and thus the performance maywidely vary with different choices of temporal or spatialflow. Using the transitivity of correspondence, point corre-spondence problem was addressed in a distributed as wellas computationally efficient manner [39]. However,‘Consistency’ and ‘transitivity’ being complementary toeach other, less computation comes at the cost of local con-flicts and mismatch cycles in absence of any consistencyconstraints, requiring a heuristics based approach to correctthe conflicts subsequently. The proposed NCDA approach,on the other hand, uses maximal information by enforcingconsistency and produces a globally optimal solution with-out needing to correct the correspondences at later stages.

In a very recent paper [2], we have introduced the net-work consistency in solving the person re-identificationproblem. In [2], the method and the constraints used in theinteger program are specific to that particular problem (re-identification). However, in this paper, we provide a gener-alized problem formulation for solving any network leveldata association problem first, and describe the way thegeneralized constraints can be simplified further for

problems in specific application areas. Moreover, in thispaper we introduce a novel online implementation of theNCDA for solving dynamic/online data association prob-lems. We show how the design of the optimization problemcan reduce the size of the problem per iteration than that ofthe batch generalized NCDA method. Besides the re-identi-fication problem, we also show applications of this dataassociation method in the (3D+t) cell tracking problem andhow the generalized constraints can be translated into theirproblem specific forms.

3 THE NETWORK CONSISTENT DATA ASSOCIATION

PROBLEM

3.1 Notations and Terminologies1.Node.A node is a datapoint/target that needs to be associ-ated with other datapoints via NCDA. For person re-identification problems, a node represents a target in theFoV of a camera, whereas, in cell tracking problem a node isa 2D segmented cell (at any given spatio-temporal location).

2. Group. A ‘group’ is a collection of nodes. A node cannever be associated with any other node from the samegroup it belongs to. For example, in a typical person re-identification problem, the set of all targets appearing in theFoV of the same camera is a group and for spatio-temporaltracking, the collection of 2D cell segmentations in one imageslice can be assumed a group. Thus, a node is a member of agroup. Let the ith node in the group g be denoted asPg

i .3. Similarity score matrix. This is a matrix data structure

containing feature similarity scores between nodes belong-ing to two different groups. Therefore, for each pair ofgroups in a network there is one such matrix. Let Cðp;qÞ

denote the similarity matrix between groups p and q. Then

ði; jÞth element in Cðp;qÞ is the similarity score between thenodes Pp

i and Pqj .

4. Assignment matrix. We need to know whether thenodes Pp

i and Pqj are associated or not, 8i; j ¼ f1; . . . ; ng and

8p; q ¼ f1; . . . ;mg. The associations between targets acrossgroups can be represented using ‘Assignment matrices’,one for each pair of groups. Each element xp;q

i;j of the assign-

ment matrix Xðp;qÞ between the group pair ðp; qÞ is defined asfollows:

xp;qi;j ¼ 1; if Pp

i and Pqj are the same targets,

0; otherwise:

!(1)

If the same set of targets is observed in each group, then

Xðp;qÞ is a permutation matrix, i.e., only one element per rowand per column is 1, all the others are 0. Mathematically,8xp;qi;j 2 f0; 1g

Xn

j¼1

xp;qi;j ¼ 1 8i ¼ 1 to n;

Xn

i¼1

xp;qi;j ¼ 1 8j ¼ 1 to n: (2)

5. Edge. An ‘edge’ between two nodes Ppi and Pq

j fromtwo different groups of nodes is constructed between theith node in group p and the jth node in group q. It shouldbe noted that there will be no edge between the nodes of thesame group. There are two attributes connected to each

CHAKRABORTY ETAL.: NETWORKCONSISTENT DATA ASSOCIATION 1861

edge. They are the similarity score cp;qi;j and the association

value xp;qi;j .

6. Path. A ‘path’ between two nodes ðPpi ;P

qjÞ is a set of

edges that connect the nodes Ppi and Pq

j without traveling

through a node twice. Moreover, each node on a pathbelongs to a different group. A path between Pp

i and Pqj can

be represented as the set of edges eðPpi ;P

qjÞ ¼ fðPp

i ;PraÞ;

ðPra;Ps

bÞ; . . . ; ðPtc;P

qjÞg, where fPr

a;Psb; . . . ;Pt

cg are the set of

intermediate nodes on the path between Ppi and Pq

j . The set

of association values on all the edges between the nodes isdenoted as L, i.e., xp;q

i;j 2 L; 8i; j ¼ ½1; . . . ; n&; 8p; q ¼ ½1; . . . ;m& and p < q. Finally, the set of all paths between any twonodes Pp

i and Pqj is represented as EðPp

i ;PqjÞ and the zth

path is eðzÞðPpi ;P

qjÞ 2 EðPp

i ;PqjÞ.

3.2 NCDA Objective Function and ConstraintsLet us first discuss the problem of one-to-one data associa-tion where the number of datapoints per group is constantand each datapoint from one group would have exactly onematch in another group. This type of data association prob-lem is often relevant to the person re-identification datasets,where the same set of persons appears across the FoVs of allthe cameras. Later, we shall present a more generalized ver-sion of NCDA where number of datapoints belonging todifferent groups may vary and therefore a datapoint may ormay not have a match in another group.

For the pair of groups ðp; qÞ, the sum of the similarityscores of association is given by

Pni;j¼1 c

p;qi;j x

p;qi;j . Summing

over all possible pairs of groups, the global similarity scorecan be written as

C ¼Xm

p;q¼1p< q

Xn

i;j¼1

cp;qi;j xp;qi;j : (3)

The set of constraints are as follows.1. Pairwise association constraint. For the one-to-one associ-

ation scenario, a datapoint from the group p can have onlyone match from another group q. This is mathematicallyexpressed by the set of equations (2). This is true for all pos-sible pairs of data groups and can be expressed as

Xn

j¼1

xp;qi;j ¼ 1 8i ¼ 1 to n 8p; q ¼ 1 to m; p < q;

Xn

i¼1

xp;qi;j ¼ 1 8j ¼ 1 to n 8p; q ¼ 1 to m; p < q:

(4)

2. Loop constraint. This constraint comes from the consis-tency requirement. If two nodes are indirectly associatedvia nodes in other groups, then these two nodes must alsobe directly associated. Therefore, given two nodes Pp

i andPq

j , it can be noted that for consistency, a logical ‘AND’ rela-

tionship between the association value xp;qi;j and the set of

association values fxp;ri;a ; x

r;sa;b; . . . ; x

t;qc;jg of any possible path

between the nodes has to be maintained. The associationvalue between the two nodes Pp

i and Pqj has to be 1 if the

association values corresponding to all the edges of any pos-sible path between these two nodes are 1. Keeping thebinary nature of the association variables and the pairwiseassociation constraint in mind the relationship can be

compactly expressed as

xp;qi;j '

X

ðPrk;Ps

lÞ2eðzÞðPp

i ;Pqj Þ

xr;sk;l

0

B@

1

CA( jeðzÞðPpi ;P

qjÞjþ1; (5)

8 paths eðzÞðPpi ;P

qjÞ 2 EðPp

i ;PqjÞ, where jeðzÞðPp

i ;PqjÞj denotes

the cardinality of the path jeðzÞðPpi ;P

qjÞj, i.e., the number of

edges in the path. The relationship holds true for all i andall j. For the case of a triplet of cameras the constraint inEqn. (5) simplifies to

xp;qi;j ' xp;r

i;k þ xr;qk;j ( 2þ 1 ¼ xp;ri;k þ xr;qk;j ( 1: (6)

An example from person re-identification involving threecameras and two persons is illustrated with the help ofFig. 2. Say, the raw similarity scores between pairs of targetsacross cameras suggest associations between ðP1

1;P21Þ;

ðP21;P3

1Þ and ðP12;P3

1Þ independently. However, when theseassociations are combined together over the entire network,it leads to an infeasible scenario - P1

1 and P12 are the same

person. This infeasibility is also correctly captured through

the constraint in Eqn. (6), i.e., x1;31;1 ¼ 0 but x1;2

1;1 þ x2;31;1 ( 1 ¼ 1,

thus violating the loop constraint.For a generic scenario involving a large number of

groups of nodes where similarity scores between every pairof groups may not be available the loop constraint equations(i.e., Eqn. (5)) have to hold for every possible triplet, quartet(and so on) of groups. On the other hand, if the similarityscores between all nodes for every possible pair of groupsare available, the loop constraints on quartets and higherorder loops are not necessary. If loop constraint is satisfiedfor every triplet of groups then it automatically ensures con-sistency for every possible combination of groups takingthree or more of them. So, in such a case, the loop constraintfor the network can be written as,

xp;qi;j ' xp;r

i;k þ xr;qk;j ( 1;

8 i; j; k ¼ ½1; . . .n&; 8 p; q; r ¼ ½1; . . .m&; and p < r < q:(7)

In spatio-temporal cell tracking problem, 3D structures oftightly packed multilayer tissues are imaged at variousdepths and at multiple observational time points. Thisyields a (3D+t) data structure of images, each of which

Fig. 2. An illustrative example showing the importance of the loop con-straint in a data-association problem. It presents a simple person re-identification scenario in a camera network involving two persons (datapoints) in three cameras (groups).


contains 2D spatio-temporal projections of numerous 3Dcells. More details on the structure of the data can be foundat Section 6.2. Now, the target is to estimate correspond-ences between these 2D cellular projections along bothspace and time, i.e., between images at various depths atthe same time point or at the same depth across consecutivetime points. Thus, the entire spatio-temporal cell trackingnetwork can be exhaustively partitioned into quartets of cellslices. One such quartet is shown (by arrows) in Fig. 1b.Unlike triplets as in person re-identification case, the num-ber of edges on an indirect path between any two nodes in aquartet is 3 and hence the value of jeðzÞðPp

i ;PqjÞj in Eqn. (5) is

3. The loop constraints for the cell tracking problem can,therefore, be derived from Eqn. (5) as

xp;qi;j ' xp;r

i;k þ xr;sk;l þ xs;ql;j ( 3þ 1

¼ xp;ri;k þ xr;s

k;l þ xs;ql;j ( 2;

8 i; j; k; l ¼ ½1; . . .n&; 8 p; q; r; s ¼ ½1; . . .m&;and p < r < s < q:

(8)

Note that the above expressions are valid when there areequal number of nodes (cells/persons) in all groups and anexact one-to-one correspondence between each pair ofgroups is expected.

3.3 Overall Problem for One-to-One AssociationsBy combining the objective function in Eqn. (3) with the con-straints in Eqn. (4) and Eqn. (5), we pose the overall optimi-zation problem for the case of one-to-one mapping betweengroups as

argmaxxp;qi;j

i;j¼½1;***;n&p;q¼½1;***;m&

Xm

p;q¼1p< q

Xn

i;j¼1

cp;qi;j xp;qi;j

0

B@

1

CA

subject toXn

j¼1

xp;qi;j ¼ 1 8i ¼ ½1; . . . ; n&;

8p; q ¼ ½1; . . . ; m&; p < q;

Xn

i¼1

xp;qi;j ¼ 1 8j ¼ ½1; . . . ; n& 8p; q ¼ ½1; . . . ;m&; p < q;

xp;qi;j '

X

ðPrk;Ps

lÞ2eðzÞðPp

i ;Pqj Þ

xr;sk;l

0

B@

1

CA( jeðzÞðPpi ;P

qjÞjþ 1;

8 i; j ¼ ½1; . . .n&; 8 p; q ¼ ½1; . . .m&; and p < q;


qjÞ 2 EðPp

i ;PqjÞ;

xp;qi;j 2 f0; 1g 8i; j ¼ ½1; . . . ; n&; 8p; q ¼ ½1; . . . ;m&; p < q:

(9)

The above optimization problem for optimal and consistentdata association is a binary integer program. The exact andsimplified form of the integer program is discussed in thesupplementary material.

4 NCDA FOR VARIABLE NUMBER OF DATA-POINTS IN EACH GROUP

As explained in the previous sub-section, the NCDA can beachieved by solving the binary IP formulated in Eqn. (9).

However, the assumption of one-to-one association betweentargets across groups may not be valid in many practicalscenarios, especially when there are unequal numbers ofdatapoints in different groups. For re-identification in acamera network, there may be situations when every persondoes not go through the FoV of every camera. For thespatio-temporal cell tracking problems, there could be vari-able number of segmented cell slices on the images at differ-ent spatio-temporal locations. In such cases, a datapointmay not have association with any datapoint from anothergroup and hence the values of assignment variables in everyrow or column of the assignment matrix can all be 0. How-ever, a one-to-many association is still infeasible as before.For re-identification, a person from any camera p can haveat most one match from another camera q. As a result, thepairwise association constraints now change from equalitiesto inequalities as follows,

Xnq

j¼1

xp;qi;j + 1 8i ¼ ½1; . . . ; np& 8p; q ¼ ½1; . . . ;m&; p < q;

Xnp

i¼1

xp;qi;j + 1 8j ¼ ½1; . . . ; nq& 8p; q ¼ ½1; . . . ;m&; p < q;

(10)

where, np snd np are the number of nodes (datapoints) ingroups p and q respectively.

However, with this generalization, it is easy to see thatthe objective function (in Eqn. (9)) is no longer valid. Eventhough the provision of ‘no match’ is now available, theoptimal solution will try to get as many associations as pos-sible across the network. This is due to the fact that the cur-rent objective function assigns reward to both true positive(TP) (correctly associating a datapoint across groups) andfalse positive (FP) associations. Thus the optimal solutionmay contain many false positive associations. This situationcan be avoided by incorporating a modification in the objec-tive function as follows:

Xm

p;q¼1p< q

Xnp;nq

i;j¼1

ðcp;qi;j ( kÞxp;qi;j ; (11)

where k is any value in the range of the similarity scores.This modification leverages upon the idea that, typically,similarity scores for most of the true positive matches inthe data would be much larger than majority of the falsepositive matches. In the new cost function, instead ofrewarding all positive associations we give reward to mostof the true positives, but impose penalties on the false posi-tives. As the rewards for all true positive matches are dis-counted by the same amount k and as there is penalty forfalse positive associations, the new cost function gives usoptimal results for both ‘match’ and ‘no-match’ cases. Thechoice of the parameter k depends on the similarity scoresgenerated by the chosen method, and thus can vary fromone pairwise similarity score generating method toanother. Ideally, the distributions of similarity scores of theTPs and FPs are non-overlapping and k can be any realnumber from the region separating these two distributions.However, for practical scenarios where TP and FP scoresoverlap, an optimal k can be learned from training data.A simple method to choose k could be running NCDA for


different values of k over the training data and choosing theone giving the maximum accuracy on the cross validationdata. So, for this more generalized case, the NCDA problemcan be formulated as follows:

argmaxxp;qi;j

i¼½1;...;np &j¼½1;...;nq &p;q¼½1;...;m&

Xm

p;q¼1p< q

Xnp;nq

i;j¼1

"cp;qi;j ( k

#xp;qi;j

0

B@

1

CA

subject toXnq

j¼1

xp;qi;j + 1 8i ¼ ½1; . . . ; np&;

8p; q ¼ ½1; . . . ;m&; p < q;

Xnp

i¼1

xp;qi;j + 1 8j ¼ ½1; . . . ; nq& 8p; q ¼ ½1; . . . ;m&; p < q;

xp;qi;j '

X

ðPrk;Ps

lÞ2eðzÞðPp

i ;Pqj Þ

xr;sk;l

0

B@

1

CA( jeðzÞðPpi ;P

qjÞjþ 1;

8 i ¼ ½1; . . . ; np&; j ¼ ½1; . . . ; nq&;8 p; q ¼ ½1; . . .m&; and p < q;


qjÞ 2 EðPp

i ;PqjÞ;

xp;qi;j 2 f0; 1g 8i ¼ ½1; . . . ; np&; j ¼ ½1; . . . ; nq&;

8p; q ¼ ½1; . . . ;m&; p < q: (12)

5 ONLINE IMPLEMENTATION OF NCDA

5.1 MotivationThe generalized NCDA implementation, as presented inSection 4, is aimed at solving data association problemswhere all the observations are available and the target isto establish an optimal set of correspondences betweenthem while maintaining network consistency. It is imple-mented as a batch method and the size of the problemincreases quite dramatically with increase in the numberof observations.

For example, a typical person re-id system is designed torun on datasets where all the observations across multiplecamera FoVs are given. However, in a realistic setting, newobservations are obtained with time and the data associa-tion method must be capable of assigning ids to observa-tions as and when they become available.

In this section, we present an online formulation of theNCDAmethod. The online formulation is a direct theoreticalextension of the batch problem (Eqn. (12)), as all the con-straints (pairwise/loop) from the batchNCDA are preserved.Additionally, the online implementation of NCDA is capableof handling another very important and realistic scenariothat the batch NCDA is not designed to. In a person re-id or amulti-view tracking problem, the same target may reappearin the same camera FoV after passing through the FoVs ofsome other camera(s) in the network. Unlike the batchmethod, the online NCDA can correctly re-id the target whilemaintaining global consistency.

5.2 MethodThe definitions of nodes, groups, edges and paths follow fromSection 3.1. Let us assume that there are m groups of

observations upto time point t and the number of unique

observations in group k is nðtÞk ; k ¼ 1; 2; . . .m. Thus, until

time t, the total number of unique observations is

NðtÞ ¼Pm

1 nðtÞk . Let us also assume that the N ðtÞ observa-

tions are already associated and the association is repre-

sented using a set of estimated labels xp;qi;j ¼ ðtÞ xp;q

i;j ; 8i ¼½1; . . .nðtÞ

p &; 8j ¼ ½1; . . .nðtÞq &; p; q ¼ ½1; . . .m&; p < q.

In the next time window ½t; tþ w&, say, there are lðwÞ

new observations across different groups and the objec-tive is to associate these new observations to the alreadyobserved targets and among each other. Now, some of

these lðwÞ observations may have temporal or spatial over-lap with some other new observation and therefore maynot be associated with each other. As example, in personre-identification/multi-camera tracking problem, multipleobservations may have temporal overlap between them.Likewise, in spatio-temporal cell tracking, cells in a 2Dimage slice have spatio-temporal overlap and hence can-

not have associations among one another. The lðwÞ newobservations can therefore be partitioned into s subsetswhere no two observations within a subset may havecome from the same target. Thus, nmþ1 þ nmþ2 þ * * *nmþs ¼ lðwÞ, where np is the number of unique observa-tions in the pth subset.

Following the definitions in Section 3.1, each of these ssubsets can be called a virtual/dummy ‘group’. Thus in theaforementioned time window, the data-association problem

can be solved using NCDA with a total of NðtÞ þ lðwÞ nodes(observations) andmþ s groups.

Time evolution of the online NCDA and the set ofunlabeled edges are shown in Fig. 3. Edges are con-structed between each node in a dummy group and thenodes in the other mþ s( 1 groups. Now the target is toassign labels (0/1) on each of these unlabeled edges main-taining optimality and network consistency. As the data-

association between all the past observations (N ðtÞ in mgroups) is already solved, all the labeled edges can betreated as a set of additional constraints in NCDA. Asthese additional constraints use the already assignedlabels for edges present till time t, the resulting IP onlysolves for the edge labels each of which involves (at least)

one node from the new lðwÞ nodes. For re-identificationproblems, moving new observations into dummy groupsalso enables us to associate observations of the same tar-get re-appearing in the same camera FoV.

Objective function. The objective function is similar tothat of generalized NCDA (Eqn. (11)), though it isdefined only on the set of unlabeled edges for the onlineimplementation, i.e.,

Xmþs

p;q¼mþ1p<q

Xnp;nq

i;j¼1

ðcp;qi;j ( kÞxp;qi;j þ

Xmþs;m

p¼mþ1;q¼1

Xnp;nq

i;j¼1

ðcp;qi;j ( kÞxp;qi;j : (13)

The first part of the objective function is defined over allunlabeled edges between nodes of the dummy groups(mþ 1; . . .mþ s), whereas, the second part constitutes ofunlabeled edges between the nodes in the dummy groups


and the past observed nodes (in groups 1; . . .m). Let, the setcontaining all unlabeled edges at any iteration be repre-sented as Eu.

Pairwise association constraints. The set of pairwiseassociation constraints between pairs of groups ofobservations is defined as in Eqn. (10), except the factthat at least one of the groups must be a dummy group.Mathematically,

Xnq

j¼1

xp;qi;j + 1 8i ¼ ½1; . . . ; np&; 8ðp; qÞ 2 EðwÞ;

Xnp

i¼1

xp;qi;j + 1 8j ¼ ½1; . . . ; nq&; 8ðp; qÞ 2 EðwÞ;

(14)

where the pairs of groups (EðwÞ) are given as

EðwÞ ¼ fðp; qÞ : p; q 2 ½1; . . .mþ s&; p < qgnfðp; qÞ : p + m; q + mg:

(15)

Loop constraints. The loop constraints remain the same asin Eqn. (5). However, we only need a much smaller subsetin the online NCDA implementation as each of theseinequality constraints must involve at least one unlabelededge, i.e., at least one edge from the set fðPp

i ;PqjÞ [

eðzÞðPpi ;P

qjÞg must belong to the set of unlabeled edges Eu.

Thus, mathematically, fðPpi ;P

qjÞ [ eðzÞðPp

i ; PqjÞg \ Eu 6¼ ;

and, the loop constraints are, therefore,

xp;qi;j '

X

ðPrk;Ps

lÞ2eðzÞðPp

i ;Pqj Þ

xr;sk;l

0

B@

1

CA( jeðzÞðPpi ;P

qjÞjþ 1

and;$"

Ppi ;P

qj

#[ eðzÞ

"Pp

i ;Pqj

#%\ Eu 6¼ ;;

8 i ¼ ½1; . . . ; np&; j ¼ ½1; . . . ; nq&;8 p; q ¼ ½1; . . .mþ s&; and p < q;


qjÞ 2 EðPp

i ;PqjÞ:

(16)

Associations between past observations. All observations(N ðtÞ) upto time t are associated with one another and theseset of already estimated labels are imposed in the onlineNCDA as linear constraints, i.e.,

xp;qi;j ¼ ðtÞxp;q

i;j ; 8i ¼ ½1; . . .np&; 8j ¼ ½1; . . .nq&;p; q ¼ ½1; . . .m&; p < q:

(17)

So, by combining Eqns. (13), (14), (15), (16) and (17),the online NCDA problem for the time window ½t; tþ w&is given as follows:

argmaxxp;qi;j

i¼½1;...;np &; j¼½1;...;nq &ðp;qÞ2EðwÞ

Xmþs

p;q¼mþ1p< q

Xnp;nq

i;j¼1

ðcp;qi;j ( kÞxp;qi;j

0

B@

þXmþs;m

p¼mþ1;q¼1

Xnp;nq

i;j¼1

ðcp;qi;j ( kÞxp;qi;j

1

CA

subject toXnq

j¼1

xp;qi;j + 1 8i ¼ ½1; . . . ; np&; 8ðp; qÞ 2 EðwÞ;

Xnp

i¼1

xp;qi;j + 1 8j ¼ ½1; . . . ; nq&; 8ðp; qÞ 2 EðwÞ;

xp;qi;j '

X

ðPrk;Ps

lÞ2eðzÞðPp

i ;Pqj Þ

xr;sk;l

0

B@

1

CA( jeðzÞðPpi ;P

qjÞjþ 1

and; fðPpi ;P

qjÞ [ eðzÞðPp

i ;PqjÞg \ Eu 6¼ ;;

8 i ¼ ½1; . . . ; np&; j ¼ ½1; . . . ; nq&;8 p; q ¼ ½1; . . .mþ s&; and p < q;


qjÞ 2 EðPp

i ;PqjÞ

xp;qi;j ¼ ðtÞxp;q

i;j ; 8i ¼ ½1; . . .np&; 8j ¼ ½1; . . .nq&;p; q ¼ ½1; . . .m&; p < q; and;

xp;qi;j 2 f0; 1g 8i ¼ ½1; . . .np&; j ¼ ½1; . . .nq&; ðp; qÞ 2 EðwÞ:

(18)

Once the association labels are obtained by solvingEqn. (18), the dummy groups are dissolved and the newobservations, labeled according to the association results,are put back to the original groups they belong to. If, accord-ing to the newly estimated labels, multiple observationscome from same target, they are clubbed together into onenode using any suitable fusion strategy. The associationresults are further used to re-estimate edge labels after thisobservation re-assignment step. This set of labels, denoted

Fig. 3. A schematic showing time evolution of the online NCDA imple-mentation. (a) The green target appeared in both group 1 (G1) andgroup 2 (G2) until time t and the black target only appeared in group 3(G3). Data association problem is solved until time t using NCDA andthe labels are shown. (b) Two new observations (from the blue and thegreen target) are observed simultaneously at time point tþ 1 in groups1 and 3 respectively. The past observations are blurred out. (c) At timetþ 2, no observation in G1 or G3, but an observation (from the blacktarget) is obtained in G2. All past observations are blurred includingthe ones from time tþ 1. (d) NCDA is run to associate the three newobservations obtained in the time window ½t; tþ 2& to the ones obtaineduntil t. Two dummy groups (DG1 and DG2) are created for the newobservations—DG1 holds the observations from the blue and thegreen target as they have time overlap and DG2 holds the observationfrom the black target as it has no space/time overlap with the othertwo and hence can be legally associated with either of them. Nodesfrom the dummy groups are connected via edges (dashed lines) toone another and to the nodes corresponding to all past observations.Online NCDA (as in Eqn. (18)) is run with past associations (solidlines) as constraints and the new labels are estimated (only label 1sare shown).


as ðtþwÞxp;qi;j , is used in the next iteration of the online method

once the new set of observations are available. Please notethat information on timestamps of the ‘past’ observations isnot remembered once the associations are done and is notused anywhere in the online NCDA in the subsequent timesteps. Also, as observable from Eqn. (18), the number ofunknown labels being optimized as well as the total numberof constraints used in online NCDA at each time window issubstantially less than that in batch NCDA based imple-mentations, thereby reducing both the time complexity andthe memory requirements.

6 EXPERIMENTS AND RESULTS

In this section, we evaluate the NCDA method on two dif-ferent computer vision application areas, viz., 1. person re-identification and 2. spatio-temporal cell tracking. Analysisof the results in each application area is provided in therespective subsections.

6.1 Person Re-IdentificationDatasets and performance measures. To validate our approach,we performed experiments on two benchmark datasets—WARD [6] and RAiD [2]. Though state-of-the-art methodsfor person re-identification e.g., [4], [40], [41] evaluate theirperformances using other datasets too (e.g., ETHZ, CAV-IAR4REID, CUHK) these do not fit our purposes since theseare either two camera datasets or several sequences of dif-ferent two camera datasets. Results are shown in terms ofrecognition rate as cumulative matching characteristic(CMC) curves and normalized area under curve (nAUC)values (provided in the supplementary), as is the common

practice in the literature. The CMC curve is a plot of the rec-ognition percentage versus the ranking score and representsthe expectation of finding the correct match inside top tmatches. nAUC gives an overall score of how well a re-iden-tification method performs irrespective of the dataset size.In the case where every person is not present in all cameras,we show the accuracy as total number of true positives(true matches) and true negatives (true non matches)divided by the total number of unique people present. Allthe results used for comparison were either taken from thecorresponding works, by running publicly available codes,or obtained from the authors.

Pairwise similarity score generation. The camera pairwisesimilarity score generation starts with extracting appear-ance features in the form of HSV color histogram from theimages of the targets. The low level features are extracted asproposed in [4]. Given the extracted features, we generatethe similarity scores by learning the way features get trans-formed as proposed in [42]. To keep the procedure simple,we used only HSV appearance features unlike [42] whereseveral other appearance and texture features are studied.Due to the use of HSV features only, we differentiate thismethod of similarity score generation from [42] by callingour similarity score generation method as feature transform(FT). In addition to the feature transformation basedmethod, similarity scores are also generated using the pub-licly available code of a recent work—ICT [3] where pair-wise re-identification was posed as a classification problemin the feature space formed of concatenated features of per-sons viewed in two different cameras. More details aboutthe similarity score generation is provided in [42] and inthe supplementary materials.

6.1.1 WARD Dataset

The WARD dataset [6] has 4,786 images of 70 different peo-ple acquired in a real surveillance scenario in three non-overlapping cameras. This dataset has a huge illuminationvariation apart from resolution and pose changes. The cam-eras here are denoted as camera 1; 2 and 3. Figs. 4a, 4b and4c compare the performance for camera pairs 1( 2; 1( 3;and 2( 3 respectively. The 70 people in this dataset areequally divided into training and test sets of 35 personseach. The proposed approach is compared with the meth-ods SDALF [4], ICT [3] and WACN [6]. The legends ‘NCDAon FT’ and ‘NCDA on ICT’ imply that the NCDA algorithmis applied on similarity scores generated by learning the fea-ture transformation and by ICT respectively. For all 3 cam-era pairs the proposed method outperforms the rest withrank 1 recognition percentage as high as 61.71 percent forthe camera pair 2-3. The next runner up is the methodapplying only FT which has the recognition percentage of50.29 percent for rank 1.

To show how NCDA yields consistent re-identificationwhere pairwise method fails, two example cases are pro-vided in Fig. 5. At first, re-identification is performed onthree camera pairs independently on the WARD data by FTmethod. In the first example, though the camera pairs 1-2and 2-3 gave correct association (red dashed lines) for boththe targets, the incorrect associations between camera pair1-3 (red dashed line) make the re-identification across the 3cameras inconsistent. Similarly, in the second example,

Fig. 4. CMC curves for the WARD dataset. Results and comparisonsin (a), (b) and (c) are shown for the camera pairs 1-2, 1-3, and 2-3respectively.


incorrect associations between targets across camera pair1-2 make the overall re-identification results inconsistent.However, in both the cases, NCDA exploits the consistencyrequirement and makes the resultant re-identification acrossthree cameras correct (shown using green arrows).

6.1.2 RAiD Dataset

This dataset [2] was collected using two indoor (camera 1and 2) and two outdoor (camera 3 and 4) cameras. It haslarge illumination variation that is not present in most ofthe publicly available benchmark datasets. Forty-one sub-jects were asked to walk through these four cameras and6,920 images of 41 persons are present in it.

The proposed approach is compared with the samemethods as for the WARD dataset. 21 persons were used fortraining while the rest 20 were used in testing. Figs. 6a, 6b,6c, 6d, 6e, 6f compare the performance for camera pairs 1-2,1-3, 1-4, 2-3, 2-4 and 3-4 respectively. We see that the pro-posed method performs better than all the rest for both thecases when there is not much appearance variation (forcamera pair 1-2 where both cameras are indoor and for cam-era pair 3-4 where both cameras are outdoor) and whenthere is significant lighting variation (for the rest four cam-era pairs). Expectedly, for camera pairs 1-2 and 3-4 the per-formance of the proposed method is the best. For the indoorcamera pair 1-2 the proposed method applied on similarityscores generated by feature transformation (NCDA on FT)and on the similarity scores by ICT (NCDA on ICT) achieve86 and 89 percent rank 1 performance respectively. For theoutdoor camera pair 3-4 the same two methods achieve 79and 68 percent rank 1 performance respectively. For the restof the cases where there is significant illumination variationthe proposed method is superior to all.

In all the camera pairs, the top two performances comefrom the NCDA method applied on two different camerapairwise similarity scores generating methods. It can fur-ther be seen that for camera pairs with large illuminationvariation (i.e., 1-3, 1-4, 2-3 and 2-4) the performanceimprovement is significantly large. For camera pair 1-3 therank 1 performance shoots up to 67 and 60 percent onapplication of NCDA algorithm to FT and ICT comparedto their original rank 1 performance of 26 and 28 percentrespectively. Clearly, imposing consistency improves theoverall performance with the best absolute accuracyachieved for camera pairs consisiting of only indoor oronly outdoor cameras. On the other hand, the relative

improvement is significantly large in case of large illumina-tion variation between the two cameras.

6.1.3 Re-Identification with Variable Number of Persons

Next we evaluate the performance of the proposed methodfor the generalized setting when all the people may not bepresent in all cameras. For this purpose, from the RAiDdataset we chose two cameras (namely camera 3 and 4) andremoved 8 (40 percent out of the test set containing 20 peo-ple) randomly chosen people keeping all the persons intactin camera 1 and 2. For this experiment the accuracy of theproposed method is shown with similarity scores asobtained by learning the feature transformation between

the camera pairs. The accuracy is calculated by taking both

true positive and true negative matches into account and it

is expressed as ð#true positiveþ#true negativeÞ#of unique people in the testset .

Since the existing methods do not report re-identifica-tion results on variable number of persons nor is the code

Fig. 6. CMC curves for RAiD dataset. In (a), (b), (c), (d), (e), (f) compari-sons are shown for the camera pairs 1-2 (both indoor), 1-3 (indoor-outdoor), 1-4 (indoor-outdoor), 2-3 (indoor-outdoor), 2-4 (indoor-outdoor)and 3-4 (both outdoor) respectively.

Fig. 5. Two examples of correction of inconsistent re-identification fromWARD dataset. The red dashed lines denote re-identifications per-formed on three camera pairs independently by FT. The green solid linesshow the re-identification results on application of NCDA on FT. TheNCDA algorithm exploits the consistency requirement and makes theresultant re-identification across three cameras correct.


available which we can modify easily to incorporate such ascenario, we can not provide a comparison of performancehere. However we show the performance of the proposedmethod for different values of k. The value of k is learntusing two random partitions of the training data in thesame scenario (i.e., removing 40 percent of the people fromcamera 3 and 4). The average accuracy over these two ran-dom partitions for varying k for all the six cameras areshown in Fig. 7a. As shown, the accuracy remains more orless constant till k ¼ 0:25. After that, the accuracy for cam-era pairs having the same people falls rapidly, but for therest of the cameras where the number of people are vari-able remains significantly constant. This is due to the factthat the reward for ‘no match’ increases with the value ofk and for camera pair 1-2 and 3-4 there is no ‘no match’case. So, for these two camera pairs, the optimization prob-lem (in Eqn. (12)) reaches the global maxima at the cost ofassigning 0 label to some of the true associations (for whichthe similarity scores are on the lower side). So any value ofk in the range ð0( 0:25Þ will be a reasonable choice. Theaccuracy of all the 6 pairs of cameras for k ¼ 0:1 and 0:2 isshown in Fig. 7b, where it can be seen that the perfor-mance is significantly high and does not vary much withdifferent values of k.

6.1.4 Online Re-Identification

In this section, we present results on application of theonline NCDA in an online person re-identification problem.We use the RAiD dataset again for this purpose. We ran-domly choose 20 subjects for training while the remaining21 form the test set. The results are averaged over five suchrandom partitions.

In a classical person re-identification problem setup, allobservations are assumed available at runtime and the asso-ciation problem is solved in batch. This makes the timeinformation for the observations redundant. However, theRAiD dataset contains timestamps associated with observa-tions and we utilize the timestamps to generate the tempo-ral data stream in the online experimental setup. Weassume that the input to the NCDA method is a set of track-lets (a temporally consecutive series of observations/imagesfrom the same target obtained from within the same cameraFoV), which were made available to the NCDA at theirrespective times of appearance. Moreover, as explained inSection 5.2, the tracklets which are temporally overlapping(even at different camera FoVs) may not be associated withone another. Hence, during runtime of the online NCDA,tracklets having temporal overlaps were clustered into the

same dummy group, with the pairwise similarity scorescomputed as described earlier in Section 6.1.

In the test set, there were a total of 84 tracklets for 21 tar-gets. Based on time overlap, the 84 tracklets were clusteredinto 35 groups on an average. These groups of observationswere time ordered and at each iteration of the onlineNCDA, one such group is introduced as input, Each groupmay contain more than one tracklet. At each iteration, the

new tracklets are associated with the previously observed

ones and labeled accordingly. The association accuracy at

each iteration is estimated as ð#true positiveþ# true negativeÞ#of unique people in the testset . The

variation of estimated accuracy with increasing number of

observations is plotted in Fig. 8. It can be observed that theassociation accuracy is more than 65 percent and remainsconstant on average over time. Therefore, as new data isbeing observed, the data association results do not deterio-rate on account of the larger and larger datasets—thus indi-cating robustness of the proposed online NCDAmethod.

6.2 Spatio-Temporal Cell Tracking—A Multi-ViewFeature Tracking Problem

Dataset. For the experiments performed in the presentstudy, the 3D structure of the tissues are imaged using sin-gle-photon confocal laser scanning microscope and we havespecially dealt with the ‘Shoot Apical Meristem’ (SAM) ofthe plants that showcase all the challenges associated withany spatio-temporal cell tracking problem in a tightlypacked multilayer, multicellular tissue. By changing thedepth of the focal plane, CLSM can provide in-focus imagesfrom various depths of the specimen. To make the cells visi-ble under laser, fluorescent dyes are used. The set of images,thus obtained at each time point, constitute a 3D stack, alsoknown as the ‘Z-stack’. Each Z-stack is imaged at a timeinterval of 3 hours and it is comprised of a series of opticalcross sections of SAMs that are separated by 1.5 mm. Thus,in this 4D image stack, every cell can have 2D projections onvarious ‘z-planes’ and the same cell can be imaged at multi-ple time points. The problem of cell tracking is to associatethese spatio-temporal projections of the individual cells inthe tissue along with detection of cell division events.

Pairwise similarity score generation. Each 2D image slice inthe 4D confocal image stack is segmented into individualcell slices using an adaptive Watershed segmentationmethod [43] and are temporally registered using a land-mark-based registration scheme [44]. The similarity scores

Fig. 7. Performance of the NCDA algorithm after removing 40 percent ofthe people from both camera 3 and 4 in the RAiD dataset. In (a) re-identi-fication accuracy on the training data is shown for camera pairs by vary-ing k after removing 40 percent of the training persons. (b) shows there-identification accuracy on the test data for the chosen values ofk ¼ 0:1 and 0:2 when 40 percent of the test people were not present.

Fig. 8. Application of online NCDA for the online re-id problem. Meanassociation accuracies with standard deviation are plotted for increasingnumber of observed tracklets in the online setup. The online NCDAmaintains a high accuracy (, 65 percent) even as the number ofobserved tracklets increases.


between 2D cell slices in spatio-temporally neighboringimages are obtained using the method described in [34].First, cell division events are detected between every pair oftemporally neighboring images by utilizing the fact thatcombined shape of children cells is typically similar to thatof the parent. Then, for every spatially/temporally neigh-boring pairs of images a spatial graph is built on one of theimages, where the 2D cells are nodes and any two neighbor-ing cells share a link. Note that this graph does not containthe parent/children cells detected in the previous step. Foreach cell in first image, a set of candidate cells (one probablematch in this set) is obtained from the second image. Now,a conditional random field is defined on this graph, and thenode and edge potentials are computed based on shape sim-ilarity between cells and their candidates and by utilizingtight spatial topology of the tissue respectively. A loopybelief propagation is run on this CRF and the marginal pos-teriors, thus obtained, are treated as similarity scoresbetween the cells and their candidates. This method isrepeated for all spatially/temporally consecutive pairs ofimage slices. More details on this can be found in [34] andalso in the supplementary materials.

Establishing network consistency. Now, the objective is toobtain consistent associations between the 2D cell slices inthe entire spatio-temporal image stack using the similarityscores generated via previous method. Each 2D image slice(containing a cluster of tightly packed cell slices) is treatedas a ‘group’ and individual 2D cells on these slices are thenodes, as before. Also, for any given image slice, similarityscores are computed only to its immediate spatio-temporalneighboring slices (i.e., slice above, slice below, slice at same‘z’ at previous time point and the same at next time point).This architecture yields a network of image slices (groups)that can be exhaustively covered using quartets of groups.Fig. 9a shows one such quartet in a large network. Pleasenote that, unlike the person re-identification problem, theloop constraints for the cell tracking problem cannot be

expressed as triplets, as similarity scores are not generatedbetween temporally neighboring image slices that lie on dif-ferent ‘z-planes’. Using the marginal posteriors as similarityscores between a cell slice to its spatial/temporal candidates,we run NCDA for generating complete optimal 4D spatio-temporal correspondences between 2D cell slices.

Analysis of results. The effect of NCDA towards improve-ment of spatio-temporal tracking results is shown in Fig. 9.In Fig. 9a, a sample 2X2 block of images of Arabidopsis SAMare shown, which contains two spatially neighboring imageslices at each of two consecutive time points of observation.Pairs of image slices are chosen and CRFs are formed foreach of the pairs (I11 ( I12; I12 ( I22; I21 ( I22 and I11 ( I21).Now,marginal posteriors are estimated using LBP andMAPinferences are drawn to generate pairwise correspondences.When these pairwise associations are combined together,spatio-temporally infeasible associations are observed for anumber of cells. For example, correct associations are foundbetween cell 15 in I11 and cell 20 in I12, cell 20 in I12 and cell25 in I22, cell 25 in I22 and cell 18 in I21. Therefore, for spatio-temporal feasibility, cell 15 in I11 and cell 18 in I21 must alsobe associated. However, according to the aforementionedMAP inference, no associations for cell cell 15 from I11 isfound in I21. Similar infeasibilities are observed for cells 3and 44 in I11. The network consistent data association tech-nique, when applied on the previously computed marginalposteriors for pairs of images, corrects these infeasibilitiesand establishes the associations. Fig. 9b shows similar resultson a 2X3 confocal image stack, where the false negative asso-ciations (broken arrows) are corrected using the generalizedNCDA for both spatial/temporal tracking.

Although the number of network inconsistencies mayseema small (3 in the 2X2 stack and 4 in 2X3 stack) percentageof the total number of cells, it is of utmost importance thateach such error is rectified. A few inconsistencies per slicemay add up to a large number of errors in a typical confocalstack consisting of thousands of 2D cell slices. Moreover, a

Fig. 9. Effect of NCDA towards improvement of spatio-temporal tracking results. (a) The figure shows a spatio-temporal 2X2 block of confocalimages. Pairwise assignments between cells in spatial or temporal pairs of images are obtained by performing MAP inference on graphs formed onevery image slice. Infeasible 4D assignments are observed when these pairwise associations are combined over the stack. The solid arrows repre-sent correct associations between cells and the broken arrows depict no association which is incorrect and cause the infeasibility. Our proposed dataassociation approach establishes consistency in association and corrects these errors. (b) Similar results are observed in a 2X3 confocal stack.


tracking error not only affects the corresponding cell lineage,but may also affect the tracking accuracies for a number of itsneighbors in the tightly packedmultilayer tissue.

6.2.1 Online Spatio-Temporal Cell Tracking

Typically in CLSM based live cell imaging, the observationsare obtained every 3-6 hours. As explained in Section 6.2, eachof these observations is a 3D stack of confocal images contain-ing 2D projections of cells. Instead of waiting for the entire 4Dstack to be collected, especially when it often takes 3-4 days,the online NCDA can generate correspondence results when-ever a new 3D observation is obtained. In the cell trackingproblem, the graph is built only between temporally/spatiallyneighboring pairs of images. Hence, when a new 3D stack ofimages is available at the observational timepoint t, the onlineNCDA is used to find correspondences between spatiallyneighboring 2D cell slices in the 3D stack at t and betweenthese cells and that in the stack at t( 1. Note that the spatialcorrespondences between cells at time t( 1 are already estab-lished in previous iteration of the onlineNCDA.

In a 4D confocal image stack, total of 1,170 2D unique cel-lular projections are observed across 4 time points and 3 z-sli-ces. With each time point, more cells are observed and thecell tracking accuracies with number of observations, asobtained by using the online NCDA, are plotted in Fig. 10.As in the case for re-identification, the accuracies are esti-mated as a function of both true positives and true negatives.As observed, the accuracy increases and stabilizes as moreobservations become available, thereby establishing robust-ness of the online NCDA in case of biological cell trackingproblem, too.

To make a quantitative comparison of the proposedNCDA over the state-of-the-art, we compared against thelocal graph based cell tracking method [32] on the sameimage stack. The output of this method are associationresults between pairs of 2D image slices and associationaccuracies are plotted in the same figure (Fig. 10). It can beobserved that NCDA outperforms [32] substantially overtime as the latter does not enforce additional consistencyconstraints over the pairwise association results.

7 CONCLUSION

When sets of data-points are observed at multiple spatio-temporal locations, pairwise data-association may often

lead to infeasible scenarios over the global space-timehorizon. To overcome this, we have proposed a general-ized data-association method as a binary integer programon a graph. This proposed NCDA method not only main-tains consistency across the global network of data-pointsets, but also improves the pairwise data-association accu-racy, even when the number of data-points varies acrossdifferent sets of instances in the network. We have alsoproposed a novel mathematical framework for establish-ing network consistency in online data association prob-lems. The online NCDA builds on similar ideas used forthe generalized batch method, but solves a much smalleroptimization problem at each iteration. Two applicationsof the batch and online version of proposed NCDA areshown in person re-identification and (3D+t) cell tracking.Analysis of the results indicates robustness of the methodas well as significant improvements in accuracy over thestate-of-the-arts.

ACKNOWLEDGMENTS

This work was partially supported by NSF grants IIS-1316934 and CNS-1544969. The work was done when theauthors A. Chakraborty and A. Das were at the Universityof California, Riverside, CA 92521.

REFERENCES

[1] A. Schrijver, Theory of Linear and Integer Programming. New York,NY, USA: Wiley, 1998.

[2] A. Das, A. Chakraborty, and A. Roy-Chowdhury, “Consistent re-identification in a camera network,” in Proc. Eur. Conf. Comput.Vision, 2014, pp. 330–345.

[3] T. Avraham, I. Gurvich, M. Lindenbaum, and S. Markovitch,“Learning implicit transfer for person re-identification,” in Proc.Eur. Conf. Comput. Vision, Workshops Demonstrations, 2012,pp. 381–390.

[4] L. Bazzani, M. Cristani, and V. Murino, “Symmetry-driven accu-mulation of local features for human characterization and re-iden-tification,” Comput. Vision Image Understanding, vol. 117, no. 2,pp. 130–144, Nov. 2013.

[5] C. Liu, S. Gong, C. C. Loy, and X. Lin, “Person re-identification:What features are important?” in Proc. Eur. Conf. Comput. Vision,Workshops Demonstrations, 2012, pp. 391–401.

[6] N. Martinel and C. Micheloni, “Re-identify people in wide areacamera network,” in Proc. Comput. Vision Pattern Recog. Workshops,Providence, RI, USA, Jun. 2012, pp. 31–36.

[7] A. Bellet, A. Habrard, and M. Sebban, “A survey on metric learn-ing for feature vectors and structured data,” ArXiv e-prints, 2013.

[8] A. Alavi, Y. Yang, M. Harandi, and C. Sanderson, “Multi-shot per-son re-identification via relational Stein divergence,” in Proc. Int.Conf. Image Process., 2013, pp. 3542–3546.

[9] L. Yang and R. Jin, “Distance metric learning : A comprehen-sive survey,” Michigan State University, East Lansing, MI,USA, 2006. www.cs.cmu.edu/~liuy/frame_survey_v2.pdf

[10] O. Javed, K. Shafique, Z. Rasheed, and M. Shah, “Modeling inter-camera spacetime and appearance relationships for trackingacross non-overlapping views,” Comput. Vision Image Understand-ing, vol. 109, no. 2, pp. 146–162, Feb. 2008.

[11] F. Porikli and M. Hill, “Inter-camera color calibration using cross-correlation model function,” in Proc. Int. Conf. Image Process., 2003,pp. 133–136.

[12] I. Kviatkovsky, A. Adam, and E. Rivlin, “Color invariants for per-son re-identification,” IEEE Trans. Pattern Anal. Mach. Intell.,vol. 35, no. 7, pp. 1622–1634, Jul. 2013.

[13] R. Zhao, W. Ouyang, and X. Wang, “Unsupervised salience learn-ing for person re-identification,” in Proc. IEEE Int. Comput. VisionPattern Recog., 2013, pp. 3586–3593.

[14] M. Dikmen, E. Akbas, T. S. Huang, and N. Ahuja, “Pedestrian rec-ognition with a learned metric,” in Proc. Asian Conf. Comput.Vision, 2010, pp. 501–512.

Fig. 10. Application of online NCDA in Online spatio-temporal cell track-ing problem and comparison with Local graph based cell tracker [32].Cell tracking accuracies with increasing amount of observations(expressed as percentage of total number of observations) are plottedfor both [32] and the proposed online NCDA. The online NCDA outper-forms [32] and the margin increases with more observations.


[15] W. Li, R. Zhao, and X. Wang, “Human reidentification with trans-ferred metric learning,” in Proc. Asian Conf. Comput. Vision, 2012,pp. 31–44.

[16] S. Pedagadi, J. Orwell, and S. Velastin, “Local Fisher discriminantanalysis for pedestrian re-identification,” in Proc. Comput. VisionPattern Recog., 2013, pp. 3318–3325.

[17] A. Gilbert and R. Bowden, “Tracking objects across cameras byincrementally learning inter-camera colour calibration and patternsof activity,” in Proc. Eur. Conf. Comput. Vision, 2006, pp. 125–136.

[18] B. Prosser, S. Gong, and T. Xiang, “Multi-camera matching usingbi-directional cumulative brightness transfer functions,” in Proc.Brit. Mach. Vision Conf., Sep. 2008, pp. 64.1–64.10.

[19] O. Dzyubachyk, W. Van Cappellen, J. Essers, W. Niessen, andE. Meijering, “Advanced level-set-based cell tracking in time-lapse fluorescence microscopy,” IEEE Trans. Med. Imaging, vol. 29,no. 3, pp. 852–867, Mar. 2010.

[20] K. Li and T. Kanade, “Cell population tracking and lineage con-struction using multiple-model dynamics filters and spatiotempo-ral optimization,” in Proc. 2nd Int. Workshop Microscopic ImageAnal. Appl. Biol., 2007.

[21] K. Li, M. Chen, T. Kanade, E. Miller, L. Weiss, and P. Campbell,“Cell population tracking and lineage constructionwith spatiotem-poral context,”Med. Image Anal., vol. 12, no. 5, pp. 546–566, 2008.

[22] D. R. Padfield, J. Rittscher, N. Thomas, and B. Roysam, “Spatio-temporal cell cycle phase analysis using level sets and fast march-ing methods.”Med. Image Anal., vol. 13, no. 1, pp. 143–155, 2009.

[23] A. Dufour, V. Shinin, S. Tajbakhsh, N. Guillen-Aghion, J. C.Olivo-Marin, and C. Zimmer, “Segmenting and tracking fluores-cent cells in dynamic 3-D microscopy with coupled activesurfaces,” IEEE Trans. Image Process., vol. 14, no. 9, pp. 1396–1410,Sep. 2005.

[24] H. Chui and A. Rangarajan, “A new algorithm for non-rigid pointmatching,” in Proc. IEEE Conf. Comput. Vision Pattern Recog., 2000,pp. 44–51.

[25] V. Gor, M. Elowitz, T. Bacarian, and E. Mjolsness, “Tracking cellsignals in fluorescent images,” in Proc. IEEE Comput. Soc. Conf.Comput. Vision Pattern Recog. Workshops, 2005, p. 142.

[26] N. N. Kachouie, P. Fieguth, J. Ramunas, and E. Jervis,“Probabilistic model-based cell tracking,” Int. J. Biomed. Imaging,vol. 2006, p. 12186, 2006.

[27] T. Kirubarajan, Y. Bar-Shalom, andK. R. Pattipati, “Multiassignmentfor tracking a large number of overlapping objects,” IEEE Trans.Aerosp. Electron. Syst., vol. 37, no. 1, pp. 2–21, Jan. 2001.

[28] R. Bise, Z. Yin, and T. Kanade, “Reliable cell tracking by globaldata association,” in Proc. Int. Symp. Biomed. Imaging, From Nano toMacro, 2011, pp. 1004–1010.

[29] L. Liang, H. Shen, P. Rompolas, V. Greco, P. D. Camilli, and J. S.Duncan, “A multiple hypothesis based method for particle track-ing and its extension for cell segmentation,” Inf. Process. Med.Imaging, vol. 23, pp. 98–109, 2013.

[30] D. Delibaltov, S. Karthikeyan, V. Jagadeesh, and B. S. Manjunath,“Robust biological image sequence analysis using graph basedapproaches,” in Proc. Conf. Rec. 46th Asilomar Conf. Signals, Syst.Comput., 2012, pp. 1588–1592.

[31] S. Karthikeyan, D. Delibaltov, U. Gaur, M. Jiang, D. Williams, andB. Manjunath, “Unified probabilistic framework for simultaneousdetection and tracking of multiple objects with application to bio-image sequences,” in Proc. 19th IEEE Int. Conf. Image Process.,2012, pp. 1349–1352.

[32] M. Liu, R. K. Yadav, A. Roy-Chowdhury, and G. V. Reddy,“Automated tracking of stem cell lineages of arabidopsis shootapex using local graph matching,” Plant J., vol. 62, pp. 135–147,2010.

[33] M. Liu, A. Chakraborty, D. Singh, R. K. Yadav, G. Meenakshisun-daram, G. V. Reddy, and A. K. Roy-Chowdhury, “Adaptive cellsegmentation and tracking for volumetric confocal microscopyimages of a developing plant meristem,” Molecular Plant, vol. 4,no. 5, pp. 922–931, 2011.

[34] A. Chakraborty and A. K. Roy-Chowdhury, “A conditional ran-dom field model for tracking in densely packed cell structures,”in Proc. IEEE nt. Conf. Image Process., 2014, pp. 451–455.

[35] K. Shafique and M. Shah, “A noniterative greedy algorithm formultiframe point correspondence,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 27, no. 1, pp. 51–65, Jan. 2005.

[36] J. Berclaz, F. Fleuret, E. Turetken, and P. Fua, “Multiple objecttracking using k-shortest paths optimization,” IEEE Trans. PatternAnal. Mach. Intell., vol. 33, no. 9, pp. 1806–1819, Sep. 2011.

[37] H. Ben Shitrit, J. Berclaz, F. Fleuret, and P. Fua, “Tracking multiplepeople under global appearance constraints,” in Proc. Int. Conf.Comput. Vision, 2011, pp. 137–144.

[38] J. F. Henriques, R. Caseiro, and J. Batista, “Globally optimal solu-tion to multi-object tracking with merged measurements,” in Proc.IEEE Int. Conf. Comput. Vision, 2011, pp. 2470–2477.

[39] S. Avidan, Y. Moses, and Y. Moses, “Centralized and distributedmulti-view correspondence,” Int. J. Comput. Vision, vol. 71, no. 1,pp. 49–69, 2007.

[40] D. S. Cheng, M. Cristani, M. Stoppa, L. Bazzani, and V. Murino,“Custom pictorial structures for re-identification,” in Proc. Brit.Mach. Vision Conf., 2011, pp. 68.1–68.11.

[41] W. Li and X. Wang, “Locally aligned feature transforms acrossviews,” in Proc. IEEE Conf. Comput. Vision Pattern Recog., 2013,pp. 3594–3601.

[42] N. Martinel, A. Das, C. Micheloni, and A. K. Roy-Chowdhury,“Re-identification in the function space of feature warps,” IEEETrans. Pattern Anal. Mach. Intell., vol. 37, no. 8, pp. 1656–1669,Aug. 2015.

[43] K.Mkrtchyan,D. Singh,M. Liu,G. V. Reddy,A. K. Roy-Chowdhury,andM.Gopi, “Efficient cell segmentation and tracking of developingplant meristem,” in Proc. 18th IEEE Int. Conf. Image Process., 2011,pp. 2165–2168.

[44] K. Mkrtchyan, A. Chakraborty, and A. K. Roy-Chowdhury,“Automated registration of live imaging stacks of arabidopsis,” inProc. Int. Symp. Biomed. Imaging, 2013, pp. 672–675.

Anirban Chakraborty received the BE degreefrom Jadavpur University, India, in 2007 and theMS and PhD degree from the University ofCalifornia, Riverside, in 2010 and 2014, respec-tively, all in electrical engineering. He is currentlya research fellow at the Department of Diagnos-tic Radiology, National University of Singapore.His research interests include computer vision,pattern recognition, biomedical image analysis,applications of probabilistic graphical models,stochastic processes, and optimization in visionproblems. He is a member of the IEEE.

Abir Das received the BE degree from JadavpurUniversity, India, in 2007, and the MS and PhDdegrees from the University of California, River-side, in 2013 and 2015, respectively, in electricalengineering. He is currently a postdoctoralresearcher at the Department of Computer Sci-ence, University of Massachusetts, Lowell. Hismain research interests include computer vision,person reidentification, multicamera multitargettracking, and activity description using machine-learning-based methods. He is a student memberof the IEEE.

Amit K. Roy-Chowdhury received the bach-elor’s degree in electrical engineering fromJadavpur University, Calcutta, India, the master’sdegree in systems science and automation fromthe Indian Institute of Science, Bangalore, India,and the PhD degree in electrical engineeringfrom the University of Maryland, College Park. Heis currently a professor of electrical engineeringat the University of California, Riverside. Hisresearch interests include image processing andanalysis, computer vision, and video communica-

tions and statistical methods for signal analysis. His current researchprojects include intelligent camera networks, wide-area scene analysis,motion analysis in video, activity recognition and search, video-basedbiometrics (face and gait), biological video analysis, and distributedvideo compression. He is the author of the book Camera Networks: TheAcquisition and Analysis of Videos over Wide Areas. He has been onthe organizing and program committees of multiple conferences andserves on the editorial boards of a number of journals related to hisresearch area.

" For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.


ieee transactions on pattern analysis and...

Documents