b. fazzinga, s.flesca, f. furfaro, f. parisi dimes – university of calabria

Cleaning trajectory data of RFID-monitored objectsthrough conditioning under integrity constraints

B. Fazzinga, S.Flesca, F. Furfaro, F. ParisiDIMES – University of Calabria

17th International Conference on Extending Database Technology (EDBT)Athens, Greece, March 24-28, 2014

Scenario

The RFID technology is widely used to track moving objects (supply chain, people inside buildings, luggages in airports, etc.)

How RFID-based tracking works: tags and readers Tags can emit radio signals encoding identifying information; Readers detect the presence of tags thanks to their antennas LIMITATION: even if inside the detection range of an antenna, a tag

may not be detected (malfunctions, reflections, interferences)

Scenario




r1r2

The tag may be detected by:

Scenario




r1r2

The tag may be detected by:{r1, r2}

Scenario




r1r2

The tag may be detected by:{r1, r2}{r1}

Scenario




r1r2

The tag may be detected by:{r1, r2}{r1}{r2}

Interpreting the readings

For each tag, the result of the tracking task is a sequence of readings R1,…,RT

Each Ri is the set of readers that

detected the tag at time point i

Time point

Set of reader

s

1{r1, r2}

2{r1, r2}

3 {r3}

4{r3, r4}

5 {r1}

6{r1, r2}

Interpreting the readings

For each tag, the result of the tracking task is a sequence of readings R1,…,RT

Each Ri is the set of readers that

detected the tag at time point i

The collected data must be translated into sequences of positions (i.e., TRAJECTORIES) Positions of interest can be room names,

cells over a grid, etc.

Time point

Set of reader

s

1{r1, r2}

2{r1, r2}

3 {r3}

4{r3, r4}

5 {r1}

6{r1, r2}

Position

Corridor

Corridor

Coffee room

Coffee room

Corridor

Corridor

From sequences of readings to trajectories

Issues to deal with

L0

L1

L2

L4

L3

r1

r5r0


No one-to-one correspondence between readers and locations readers can cover portions of different

locations; Some zones may be covered by no

reader

Issues to deal with

L0

L1

L2

L4

L3

r1

r5r0

L0

L1

L2

L4

L3

r1

r5r0



reader

Issues to deal with


r1

r5r0

L0

L1

L2

L4

L3 No one-to-one correspondence

between readers and locations readers can cover portions of different


reader

Issues to deal with


L0

L1

L2

L4

L3

r1

r5r0



reader

Issues to deal with


L0

L1

L2

L4

L3

r1

r5r0



reader

Issues to deal with

False negatives a tag may not be detected even if in the

range of an antenna


L0

L1

L2

L4

L3

r1

r5r0



reader

Issues to deal with


range of an antenna

An object detected by a set of readers can be in different

locations

An undetected object can be anywhere!


L0

L1

L2

L4

L3

r1

r5r0



reader

Issues to deal with


range of an antenna

A sequence of detections can be generated by different

trajectories: which is the actual one?


r1

r5r0

Time 1 2 3

Set of readers

{r1, r5} {r1, r5} {r0}

Table of detections

L0

L1

L2

L4

L3

A naive probabilistic interpretation of the readings

Consider the time points separately (independence assumption)

Model the correspondence between locations and set of readers as a PDF pa(l|R)

pa(l|R) is easy to obtain from the position of readers and their physical model

r1

r5r0

Table of detections

L0

L1

L2

L4

L3


Time 1 2 3

Set of readers

{r1, r5} {r1, r5} {r0}

Locations & Probabilities

pa(L1|{r1,r5}) = 50%pa(L4|{r1,r5}) = 50%

r1

r5r0

Table of detections

L0

L1

L2

L4

L3


Time 1 2 3

Set of readers

{r1, r5} {r1, r5} {r0}


L1, 50%L4, 50%

pa(L1|{r1,r5}) = 50%pa(L4|{r1,r5}) = 50%

r1

r5r0

Table of detections

L0

L1

L2

L4

L3


Time 1 2 3

Set of readers

{r1, r5} {r1, r5} {r0}


L1, 50%L4, 50%

pa(L1|{r1,r5}) = 50%pa(L4|{r1,r5}) = 50%

The same as the previous time point:

r1

r5r0

Table of detections

L0

L1

L2

L4

L3


Time 1 2 3

Set of readers

{r1, r5} {r1, r5} {r0}


L1, 50%L4, 50%

L1, 50%L4, 50%

pa(L1|{r1,r5}) = 50%pa(L4|{r1,r5}) = 50%

The same as the previous time point:

r1

r5r0

Table of detections

L0

L1

L2

L4

L3


Time 1 2 3

Set of readers

{r1, r5} {r1, r5} {r0}


L1, 50%L4, 50%

L1, 50%L4, 50%

pa(L0|{r0}) = 100%

The detection range of r0 is entirely inside L0:

r1

r5r0

Table of detections

L0

L1

L2

L4

L3


Time 1 2 3

Set of readers

{r1, r5} {r1, r5} {r0}


L1, 50%L4, 50%

L1, 50%L4, 50%

L0, 100%

pa(L0|{r0}) = 100%

The detection range of r0 is entirely inside L0:

r1

r5r0

Table of detections

L0

L1

L2

L4

L3

Probabilistic trajectories

Time 1 2 3

Set of readers

{r1, r5} {r1, r5} {r0}


L1, 50%L4, 50%

L1, 50%L4, 50%

L0, 100%

4 corresponding trajectories:

t1: L1–L1–L0

r1

r5r0

Table of detections

L0

L1

L2

L4

L3


Time 1 2 3

Set of readers

{r1, r5} {r1, r5} {r0}


L1, 50%L4, 50%

L1, 50%L4, 50%

L0, 100%


t1: L1–L1–L0 p=25%

p(t1)= pa(L1|{r1,r5}) × pa(L1|{r1,r5}) × pa(L0|{r0}) = 50% × 50% × 100 %= 25%

r1

r5r0

Table of detections

L0

L1

L2

L4

L3


Time 1 2 3

Set of readers

{r1, r5} {r1, r5} {r0}


L1, 50%L4, 50%

L1, 50%L4, 50%

L0, 100%


t1: L1–L1–L0 p=25%

t2: L1–L4–L0 p=25%

r1

r5r0

Table of detections

L0

L1

L2

L4

L3


Time 1 2 3

Set of readers

{r1, r5} {r1, r5} {r0}


L1, 50%L4, 50%

L1, 50%L4, 50%

L0, 100%


t1: L1–L1–L0 p=25%

t2: L1–L4–L0 p=25%

t3: L4–L1–L0 p=25%

r1

r5r0

L0

L1

L2

L4

L3



t1: L1–L1–L0 p=25%

t2: L1–L4–L0 p=25%

t3: L4–L1–L0 p=25%

t4: L4–L4–L0 p=25%

Table of detectionsTime 1 2 3

Set of readers

{r1, r5} {r1, r5} {r0}


L1, 50%L4, 50%

L1, 50%L4, 50%

L0, 100%

r1

r5r0

L0

L1

L2

L4

L3



t1: L1–L1–L0 p=25%

t2: L1–L4–L0 p=25%

t3: L4–L1–L0 p=25%

t4: L4–L4–L0 p=25%

… but some trajectories turn out to be impossible when looking at the map!

r1

r5r0

L0

L1

L2

L4

L3


t1: L1–L1–L0 p=25%

t2: L1–L4–L0 p=25%

t3: L4–L1–L0 p=25%

t4: L4–L4–L0 p=25%

Considering time points as independent (thus disregarding spatio-temporal correlations) yielded inadmissible interpretations:

1) Three trajectories must be discarded;2) The probabilities of the remaining ones must be revised

Use integrity constraints!

The trajectory cleaning problem

Start from the probabilistic trajectories resulting from interpreting the readings by considering them independently

Discard the impossible trajectories

Revise the probabilities of the possible trajectories

L0

L1

L2

L4

L3

Integrity constraints

DU (direct unreachability) DU(L’, L’’) means there is no direct

connection from L’ to L’’

Example:

DU(L0,

L4)

L0

L1

L2

L4

L3




TT (traveling time) TT(L’,L’’, T) means that T is the min

number of time points needed to go from L’ to L’’

Example:

TT(L0, L4,

4)

L0

L1

L2

L4

L3




TT (traveling time) TT(L’,L’’, T) means that T is the min number

of time points needed to go from L’ to L’’

LT (latency) LT(L, T) means that T is the min number of

time points for which an object, once entered L, must stay at L

Example:

LT(L0, 3)

Probabilisticconditioning

The trajectory cleaning problem

Start from the probabilistic trajectories resulting from interpreting the readings by considering them independently

Discard the impossible trajectories Use integrity constraints

Revise the probabilities of the possible trajectories

Conditioning probabilities

Given a PDF p(X) and an event E, the conditioning problem is that of evaluating p(X|E)

In probabilistic DBs, conditioning is a way for enforcing integrity constraints over a DB where independence assumption is used In this case, E is the event that the constraints are satisfied

General framework for conditioning probabilistic DBs:

C. Koch, D. Olteanu: Conditioning probabilistic databases. PVLDB 1(1). 2008.

The general conditioning/confidence computation problem is NP-hard on succint representations

L0

L1

L2

L4

L3


t1: L1–L1–L0 pa(t1)=25%

t2: L1–L4–L0 pa(t2)=25%

t3: L4–L1–L0 pa(t3)=25%

t4: L4–L4–L0 pa(t4)=25%

Example

Let IC be the set of DU constraints implied by the map

L0

L1

L2

L4

L3


t1: L1–L1–L0 pa(t1)=25%

t2: L1–L4–L0 pa(t2)=25%

t3: L4–L1–L0 pa(t3)=25%

t4: L4–L4–L0 pa(t4)=25%

Example


Three out of four trajectories are discarded

L0

L1

L2

L4

L3


t1: L1–L1–L0 pa(t1)=25%

t2: L1–L4–L0 pa(t2)=25%

t3: L4–L1–L0 pa(t3)=25%

t4: L4–L4–L0 pa(t4)=25%

Example


Three out of four trajectories are discarded

The a-priori probability of t1 is revised as p(t1)= pa(t1|IC)= 0.25/0.25=100%

L0

L1

L4

L3


t1: L0–L1–L1 pa(t1)=50%

t2: L0–L1–L2 pa(t2)=25%

t3: L0–L1–L4 pa(t3)=25%

Example 2

Let IC be the set of DU and TT constraints, containing TT(L1, L4, 4)

L2

L0

L1

L4

L3


t1: L0–L1–L1 pa(t1)=50%

t2: L0–L1–L2 pa(t2)=25%

t3: L0–L1–L4 pa(t3)=25%

Example 2


Trajectory t3 violates TT(L1, L4, 4)

L2

L0

L1

L4

L3


t1: L0–L1–L1 pa(t1)=50%

t2: L0–L1–L2 pa(t2)=25%

t3: L0–L1–L4 pa(t3)=25%

Example 2


Trajectory t3 violates TT(L1, L4, 4) The a-priori probabilities of t1 and t2

are revised as: p(t1)= 0.5/(0.5+0.25)= 66.6%

p(t2)= 0.25/(0.5+0.25)=33.3%

L2

L0

L1

L4

L3


t1: L0–L1–L1 pa(t1)=50%

t2: L0–L1–L2 pa(t2)=25%

t3: L0–L1–L4 pa(t3)=25%

Example 2

Let IC be the set of DU and TT constraints

Trajectory t3 violates TT(L1, L4, 4) The a-priori probabilities of t1 and t2

are revised as: p(t1)= 66.6%

p(t2)= 33.3%

L2

t1 is twice as probable as t2,

like before conditioning

Naive cleaning algorithm

Generate all the possible trajectories compatible with the sequence of readings;

Discard the trajectories violating IC Compute, for each valid trajectory t, its a-priori probability Revise the probabilities of the trajectories satisfying IC

Naive cleaning algorithm

Generate all the possible trajectories compatible with the sequence of readings;

Discard the trajectories violating IC Compute, for each valid trajectory t, its a-priori probability Revise the probabilities of the trajectories satisfying IC

INFEASIBLE!For instance:

Time interval length= 10min; Reading rate= 2s -1;

Trajectory duration: 2×60×10= 1200 time points;Avg number of locations compatible with each reading:

2NUMBER OF TRAJECTORIES: 21200= 1.7 ∙10361

Our approach: CT-GRAPH

Conditioned Trajectory-GRAPH

Each node is a possible location at a time pointSource nodes are associated with the probability of representing the starting pointEach edge is a transition between two consecutive time pointsEdges are associated with the probability of the transition

L1 L1 L4 L4 L5

L2 L2 L4 L4

L5

0.6

0.4

1

1

1 1 1

1

1

1 0.2

0.8

=1 =2 =3 =4 =5




One-to-one correspondence between

valid trajectories and source-to-destination

paths

L1 L1 L4 L4 L5

L2 L2 L4 L4

L5

0.6

0.4

1

1

1 1 1

1

1

1 0.2

0.8

=1 =2 =3 =4 =5




The revised probability of a trajectory is the

product of the probabilities along the

corresponding source-to-destionation

path0.4×

1×1×0.8×1= 0.32

L1 L1 L4 L4 L5

L2 L2 L4 L4

L5

0.6

0.4

1

1

1 1 1

1

1

1 0.2

0.8

=1 =2 =3 =4 =5




Can we obtain a CT-graph by just creating,

for each time point , a node for each location compatible with the

reading at ?

L1 L1 L4 L4 L5

L2 L2 L4 L4

L5

0.6

0.4

1

1

1 1 1

1

1

1 0.2

0.8

=1 =2 =3 =4 =5




At the same time point, different nodes may refer

to the same location…

L1 L1 L4 L4 L5

L2 L2 L4 L4

L5

0.6

0.4

1

1

1 1 1

1

1

1 0.2

0.8

=1 =2 =3 =4 =5

Building a CT-graph: a 2-phase algorithm

Forward phase (progressively builds a graph proceeding from =1 to =T)

For each node n at time , create its successors at time +1 A successor is a node that represents a location compatible with R+1, and

that can prolong the trajectories ending at n without violating any constraint;Set the probabilities according to the a-priori PDFs


The result is a graph where nodes represent locations that are compatible with the current reading and the «past».

Some nodes may have no successor!





Backward phaseIteratively remove nodes having no successors and their ingoing edges;Revise probabilities to take into account node removals.

The result is a graph where nodes represent locations that are compatible with the current reading and the «past».

Some nodes may have no successor!




Time 1 2 3

Readings {r1} {r2} {r3}

Locations L1, 6/10; L2, 4/10 L3, 1/3; L4, 2/3 L3, 2/3; L5, 1/3

Constraints: DU(L2,L3), DU(L4,L5); LT(L4,2); TT(L1,L5,3)

Time 1 2 3




Time 1 2 3



FORWARD PHASE

=1 =2 =3


Time 1 2 3



=1 =2 =3

n1

L1

TL=

p=6/10

Time 1 2 3


LocationsL1, 6/10; L2,

4/10L3, 1/3; L4, 2/3 L3, 2/3; L5, 1/3


=1 =2 =3

n1

L1

TL=

p=6/10

Time 1 2 3



4/10L3, 1/3; L4, 2/3 L3, 2/3; L5, 1/3


• is defined only for nodes over locations involved in latency constraints;

• It represents the duration of the current stay at the location of the node

=1 =2 =3

n1

L1

TL=

p=6/10

Time 1 2 3



4/10L3, 1/3; L4, 2/3 L3, 2/3; L5, 1/3


• TL is the list of the locations visited so far, each with the point of the last departure from it;

• Only locations involved in TT constraints are in TL

=1 =2 =3

n1

L1

TL=

p=6/10

Time 1 2 3



4/10L3, 1/3; L4, 2/3 L3, 2/3; L5, 1/3


n2

L2

TL=

p=4/10

=1 =2 =3

n1

L1

TL=

p=6/10


n2

L2

TL=

p=4/10

n3

L3

TL= <1,L1>1/3

Time 1 2 3


Locations L1, 6/10; L2, 4/10L3, 1/3; L4,

2/3L3, 2/3; L5, 1/3

=1 =2 =3

n1

L1

TL=

p=6/10


n2

L2

TL=

p=4/10

n3

L3

TL= <1,L1>

n4

L40

TL= <1,L1>

1/3

2/3

Time 1 2 3


Locations L1, 6/10; L2, 4/10L3, 1/3; L4,

2/3L3, 2/3; L5, 1/3

=1 =2 =3

n1 loss=0

L1

TL=

p=6/10


n2

L2

TL=

p=4/10

n3

L3

TL= <1,L1>

n4

L40

TL= <1,L1>

1/3

2/3

Time 1 2 3


Locations L1, 6/10; L2, 4/10L3, 1/3; L4,

2/3L3, 2/3; L5, 1/3

The loss of a node is the sum of the

probabilities of the candidate

successors which have not been materialized

=1 =2 =3

n1 loss=0

L1

TL=

p=6/10


n2

L2

TL=

p=4/10

n3

L3

TL= <1,L1>

n4

L40

TL= <1,L1>

n5

L3

TL=

1/3

2/3

1/3

Time 1 2 3


Locations L1, 6/10; L2, 4/10L3, 1/3; L4,

2/3L3, 2/3; L5, 1/3

Same location, but different

«history»

Different nodes

=1 =2 =3


n2

L2

TL=

p=4/10

n3

L3

TL= <1,L1>

n4

L40

TL= <1,L1>

n5

L3

TL=

1/3

2/3

1/3

Time 1 2 3


Locations L1, 6/10; L2, 4/10L3, 1/3; L4,

2/3L3, 2/3; L5, 1/3

n1 loss=0

L1

TL=

p=6/10

=1 =2 =3


n2

L2

TL=

p=4/10

n3

L3

TL= <1,L1>

n4

L40

TL= <1,L1>

n5

L40

TL=

1/3

2/3

2/3

Time 1 2 3


Locations L1, 6/10; L2, 4/10L3, 1/3; L4,

2/3L3, 2/3; L5, 1/3

Same location, but different

«history»

Different nodes

n1 loss=0

L1

TL=

p=6/10

=1 =2 =3

n1 loss=0

L1

TL=

p=6/10


n2 loss=1/3

L2

TL=

p=4/10

n3

L3

TL= <1,L1>

n4

L40

TL= <1,L1>

n5

L40

TL=

1/3

2/3

2/3

Time 1 2 3


Locations L1, 6/10; L2, 4/10L3, 1/3; L4,

2/3L3, 2/3; L5, 1/3

=1 =2 =3

n1 loss=0

L1

TL=

p=6/10


n2 loss=1/3

L2

TL=

p=4/10

n3

L3

TL= <1,L1>

n4

L40

TL= <1,L1>

n5

L40

TL=

1/3

2/3

2/3

Time 1 2 3


Locations L1, 6/10; L2, 4/10 L3, 1/3; L4, 2/3L3, 2/3; L5,

1/3

n6

L3

TL= <1,L1>

2/3

=1 =2 =3

n1 loss=0

L1

TL=

p=6/10


n2 loss=1/3

L2

TL=

p=4/10

n3

L3

TL= <1,L1>

n4

L40

TL= <1,L1>

n5

L40

TL=

1/3

2/3

2/3

Time 1 2 3



1/3

n6

L3

TL= <1,L1>

n7

L5

TL= <1,L1>

2/3

1/3

=1 =2 =3

n1 loss=0

L1

TL=

p=6/10


n2 loss=1/3

L2

TL=

p=4/10

n3 loss=1/3

L3

TL= <1,L1>n4

L40

TL= <1,L1>

n5

L40

TL=

1/3

2/3

2/3

Time 1 2 3



1/3

n6

L3

TL= <1,L1>

n7

L3

TL= <1,L1>

2/3

2/3

Same location, Same history

Same node!

=1 =2 =3

n1 loss=0

L1

TL=

p=6/10


n2 loss=1/3

L2

TL=

p=4/10

n3 loss=1/3

L3

TL= <1,L1>n4

L40

TL= <1,L1>

n5

L40

TL=

1/3

2/3

2/3

Time 1 2 3



1/3

n6

L3

TL= <1,L1>

2/3

2/3

=1 =2 =3

n1 loss=0

L1

TL=

p=6/10


n2 loss=1/3

L2

TL=

p=4/10

n3 loss=1/3

L3

TL= <1,L1>n4

L40

TL= <1,L1>

n5

L40

TL=

1/3

2/3

2/3

Time 1 2 3



1/3

n6

L3

TL= <1,L1>

2/3

And so on!

=1 =2 =3

n1 loss=0

L1

TL=

p=6/10


n2 loss=1/3

L2

TL=

p=4/10

n3 loss=1/3

L3

TL= <1,L1>n4 loss=1

L40

TL= <1,L1>

n5 loss=1

L40

TL=

1/3

2/3

2/3

Time 1 2 3



n6

L3

TL= <1,L1>

2/3


Time 1 2 3



BACKWARD PHASE

For each node with loss>0:Revise the probabilities of outgoing edges (if any);Revise the probabilities of ingoing edges (according to the loss);Propagate the loss to the preceding nodes.

=1 =2 =3

n1 loss=0

L1

TL=

p=6/10


n2 loss=1/3

L2

TL=

p=4/10

n3 loss=1/3

L3

TL= <1,L1>n4 loss=1

L40

TL= <1,L1>

n5 loss=1

L40

TL=

1/3

2/3

2/3

Time 1 2 3



n6

L3

TL= <1,L1>

2/3

=1 =2 =3

n1 loss=0

L1

TL=

p=6/10


n2 loss=1

L2

TL=

p=4/10

n3 loss=1/3

L3

TL= <1,L1>n4 loss=1

L40

TL= <1,L1>

n5 loss=1

L40

TL=

1/3

2/3

2/3

Time 1 2 3



n6

L3

TL= <1,L1>

2/3

=1 =2 =3

n1 loss=0

L1

TL=

p=6/10


n2 loss=1

L2

TL=

p=4/10

n3 loss=1/3

L3

TL= <1,L1>n4 loss=1

L40

TL= <1,L1>

1/3

2/3

Time 1 2 3



n6

L3

TL= <1,L1>

2/3

=1 =2 =3

n1 loss=2/3

L1

TL=

p=6/10


n2 loss=1

L2

TL=

p=4/10

n3 loss=1/3

L3

TL= <1,L1>n4 loss=1

L40

TL= <1,L1>

1/3

2/3

Time 1 2 3



n6

L3

TL= <1,L1>

2/3

=1 =2 =3

n1 loss=2/3

L1

TL=

p=6/10


n2 loss=1

L2

TL=

p=4/10

n3 loss=1/3

L3

TL= <1,L1>

1/3

Time 1 2 3



n6

L3

TL= <1,L1>

2/3

New probability of the outgoing edge:

2/3=1

2/3

=1 =2 =3

n1 loss=2/3

L1

TL=

p=6/10


n2 loss=1

L2

TL=

p=4/10

n3 loss=1/3

L3

TL= <1,L1>

1/3

Time 1 2 3



n6

L3

TL= <1,L1>

1

New probability of the outgoing edge:

2/3=1

2/3

=1 =2 =3

n1 loss=2/3

L1

TL=

p=6/10


n2 loss=1

L2

TL=

p=4/10

n3 loss=1/3

L3

TL= <1,L1>

1/3

Time 1 2 3



n6

L3

TL= <1,L1>

1

New probability of the ingoing edge <n1,n3>:

Old × (1–n3.loss)= 1/3 × 2/3 = 2/9

=1 =2 =3

n1 loss=2/3

L1

TL=

p=6/10


n2 loss=1

L2

TL=

p=4/10

n3 loss=1/3

L3

TL= <1,L1>

2/9

Time 1 2 3



n6

L3

TL= <1,L1>

1


Old × (1–n3.loss)= 1/3 × 2/3 = 2/9

=1 =2 =3

n1 loss=2/3

L1

TL=

p=6/10


n2 loss=1

L2

TL=

p=4/10

n3 loss=1/3

L3

TL= <1,L1>

2/9

Time 1 2 3



n6

L3

TL= <1,L1>

1


Old × (1–n3.loss)= 1/3 × 2/3 = 2/9

New loss of the preceding node n1:

n1.loss= 1- 2/9 = 7/9

=1 =2 =3

n1 loss=7/9

L1

TL=

p=6/10


n2 loss=1

L2

TL=

p=4/10

n3 loss=1/3

L3

TL= <1,L1>

2/9

Time 1 2 3



n6

L3

TL= <1,L1>

1


Old × (1–n3.loss)= 1/3 × 2/3 = 2/9

New loss of the preceding node n1:

n1.loss= 1- 2/9 = 7/9

=1 =2 =3

n1 loss=7/9

L1

TL=

p=6/10


n2 loss=1

L2

TL=

p=4/10

n3

L3

TL= <1,L1>2/9

Time 1 2 3



n6

L3

TL= <1,L1>

1

=1 =2 =3

n1 loss=7/9

L1

TL=

p=6/10


n3

L3

TL= <1,L1>2/9

Time 1 2 3



n6

L3

TL= <1,L1>

1

=1 =2 =3

n1 loss=7/9

L1

TL=

p=6/10


n3

L3

TL= <1,L1>1

Time 1 2 3



n6

L3

TL= <1,L1>

1

=1 =2 =3

n1 loss=7/9

L1

TL=

p=1


n3

L3

TL= <1,L1>1

Time 1 2 3



n6

L3

TL= <1,L1>

1

=1 =2 =3

n1

L1

TL=

p=1


n3

L3

TL= <1,L1>1

Time 1 2 3



n6

L3

TL= <1,L1>

1

7 out of 8 trajectories have been discarded!

The remanining trajectory has probability 1

Experimental analysis

2 synthetic data sets Syn1: 4-floor building, each floor: 300m2, 11 locations; Syn2: 4-floor building, the size of each floor is twice that of Syn1;

L0

r1

L1 L2 L3

r0 r2 r3

r5r4r6

r11

r8r7

r9

r10

L4

L10

L5 L6 L7 L8 L9


2 synthetic data sets Syn1: 4-floor building, each floor: 300m2, 11 locations; Syn2: 4-floor building, the size of each floor is twice that of Syn1;

Different trajectory lengths: 10min, 60min, 90min, 120min

Different detection ranges of antennas: 2.0m, 2.5m, 3.0m

Number of trajectories: 50 trajectories for each length and detection range

Integrity constraints: three sets DU, LT, TT, automatically generated (vmax=4m/s; minimum latency at each room: 2 sec)


We our algorithm working in three different settings:

CTG(DU): only DU constraints are exploited;

CTG(DU+LT): DU and LT constraints are exploited;

CTG(DU+LT+TT): DU and LT and TT constraints are exploited.

Cleaning times (SYN 1)

• Cleaning time is linear w.r.t. the trajectory length, independently from the constraints;

• Cleaning time gets larger as the set of constraints is enlarged (larger number of nodes over the same location at each time point)


• As the detection range of the antennas is increased, the number of missing detections decreases. This yields less uncertainty!

Cleaning times (SYN 1 vs. SYN 2)

• Cleaning time is only marginally affected by the size of the map

Effectiveness of the cleaning task

• The effectiveness was measured as the average accuracy of the answers of a workload of stay queries evaluated over the CT-Graph

• A stay query q is of the form: «Where was the object at time point t?»

• The accuracy of the answer of q evaluated over a ct-graph G is the overall probability assigned to nodes over location L at time t, where L is the actual location

Effectiveness (SYN 1)

• The effectiveness is independent from the trajectory length• Considering also LT constraints improves the effectiveness compared with

DU only• Considering also TT constraints improves the effectiveness compared with

DU+LT

Effectiveness (SYN 1)

• The greater the detection range, the higher the accuracy of the query answers, for every set of constraints

Comparison with other cleaning techniques

Term of comparison: Metropolis Hastings sampler with constraints (MH-C)

Starting from a valid trajectory, other valid trajectories are generated by randomly perturbing each time point of the previous valid trajectory (only perturbations keeping the new trajectory still valid are accepted)

After a whole valid trajectory is generated, it is put in the sample set if its likelihood is a reasonable improvement of the previous trajectory

Our approach is compared with an MH-C sampler with the same storage space bound (the generation of samples is halted when the memory space occupied by the samples is equal to the size of the CT-Graph built by our algorithm)

H. Chen, W.-S. Ku, H. Wang, M.-T. Sun; Leveraging Spatio-temporal Redundancy for RFID Data Cleansing; SIGMOD 2010.

Comparison with MH-C: efficiency

• MH-C is more efficient at every trajectory length

Comparison with MH-C: efficiency

• MH-C is more efficient at every detection range

Comparison with MH-C: effectiveness

• Our approach is more effective in cleaning the trajectories

Future work

Use prospection to reduce the number of nodes over the same location

Reduce the size of the CT-graph Reduce the construction times

Try to exploit correlations among different tags (as in the supply chain scenario)

Thank you!


They all are easy to obtain:

Direct unreachability: they follow from the topology of the map

Traveling time: they follow from the maximum speed of the monitored objects and the distances between locations

in the indoor scenario: door-to-door distances, obstructed distances, etc.

Latency: they follow from the nature of a location, and from the importance given to short-length stays

Obtaining pa(l|R) Construct a grid over the map (cells: 0.5m×0.5m)

Keep a tag for 30sec in each cell

For each cell c and reader r, the number of times the tag was detected by reader r is recorded into an array F[r,c]

Computational complexity

The time complexity is O(n) (n is the number of time points)

The number of nodes built by the forward phase at each level is bounded by a constant, depending on the number of locations and the constraints The values of are in [0..maxLT] where maxLT is the maximum

duration among those specified in a latency constraint TL may contain at most one entry for each location; For a location L, only entries <x,L> are considered, where

x<maxTT (maxTT is the maximum duration specified in the TT constraints)

The backward phase performs a constant number of operations for each node.

L1 L1 L4 L4 L5

L2 L2 L4 L4

L5

0.6

0.4

1

1

1 1 1

1

1

1 0.2

0.8

=1 =2 =3 =4 =5

Answering stay queries over a CT-Graph

Q: Where was the tag at time point =4?

L1 L1 L4 L4 L5

L2 L2 L4 L4

L5

0.6

0.4

1

1

1 1 1

1

1

1 0.2

0.8

=1 =2 =3 =4 =5



Compute the probability of the stay represented by each node at t=4 (actually, this can be pre-computed for each node!);

p=0.6×1×1×1=0.6

p=0.4×1×1×0.2=0.08

p=0.4×1×1×0.8=0.32

L1 L1 L4 L4 L5

L2 L2 L4 L4

L5

0.6

0.4

1

1

1 1 1

1

1

1 0.2

0.8

=1 =2 =3 =4 =5




Sum the probabilities associated to the same location: L40.6+0.08= 0.68;

p=0.6×1×1×1=0.6

p=0.4×1×1×0.2=0.08

p=0.4×1×1×0.8=0.32

+

L1 L1 L4 L4 L5

L2 L2 L4 L4

L5

0.6

0.4

1

1

1 1 1

1

1

1 0.2

0.8

=1 =2 =3 =4 =5




Sum the probabilities associated to the same location: L40.6+0.08= 0.68; L5 0.32

p=0.6×1×1×1=0.6

p=0.4×1×1×0.2=0.08

p=0.4×1×1×0.8=0.32

L1 L1 L4 L4 L5

L2 L2 L4 L4

L5

0.6

0.4

1

1

1 1 1

1

1

1 0.2

0.8

=1 =2 =3 =4 =5




Sum the probabilities associated to the same location: L40.6+0.08= 0.68; L5 0.32

Return the so obtained PDF: p(L4)= 68%; p(L5)=32%

p=0.6×1×1×1=0.6

p=0.4×1×1×0.2=0.08

p=0.4×1×1×0.8=0.32

r1

r5r0

L0

L1

L2

L4

L3



t1: L1–L1–L0 p=25%

t2: L1–L4–L0 p=25%

t3: L4–L1–L0 p=25%

t4: L4–L4–L0 p=25%

… but some trajectories turn out to be impossible when looking at the map!


Consider the time points separately (independence assumption)

The association <locations, set of readers> at each time point is naturally modeled by means of a PDF pa(l|R)

pa(l|R) represents the probability that an object is at location l given that it has been detected by the set R of readers

pa(l|R) does not depend on time, but takes into account only the positions of the readers, the topology of the locations, and the physical model of the reader (i.e., reading rate vs. distance)

pa(l|R) will be said to be the a-priori probability distribution and assumed to be given (it is easy to obtain, and in several ways)

L0

L1

L2

L4

L3




TT (traveling time) TT(L’,L’’, T) means that T is the min

number of time points needed to go from L’ to L’’

Constraint satisfied!

Example:

TT(L0, L4,

4)

L0

L1

L2

L4

L3




TT (traveling time) TT(L’,L’’, T) means that T is the min number

of time points needed to go from L’ to L’’

LT (latency) LT(L, T) means that T is the min number of

time points for which an object, once entered L, must stay at L

Constraint satisfied!

Example:

LT(L0, 3)


In our scenario Let R1…Rn be the sequence of readings ( each Ri is the

set of readers that detected the object at time point i )

Let t= t[1]…t[n] be the generic trajectory compatible with the readings (t[i] denotes the location of t at time point i )

The probability of t resulting from independence assumption (a-priori probability) is:

pa(t)= pa(t[1]|R1) × … × pa(t[n]|Rn)

Conditioning pa(t) to a set IC of integrity constraints means revising it as p(t)= pa(t|IC is satisfied)

Checking constraints while building the CT-graph

Every node is associated with a summary of the «past», used to decide which of the alternative interpretations of the future time points are consistent

For a node representing the presence at L at time point , we store:the duration of the current stay at L (if an LT constraint is defined over

L);the list TL of the locations visited so far and involved in TT constraints, each with the time point of the departure from it





It is used to discard a location L’ different from L as a possible next location!


It is used to discard a location L’ as a possible next location if L’ does not satisfy a TT constraint involving some location !




Example

Time 1 2 3



L1, 6/10L2,

4/10

L3, 1/3L4, 2/3

L3, 2/3L5, 1/3

00

L1

L2

L3

L5L4r1

r2 r4


Example

Time 1 2 3

Set of readers

{r1} {r2} {r4}


L1, 50%L2,

50%

L3, 50%L4, 50%

L3, 50%L5, 50%

00

L1

L2

L3

L5L4r1

r2 r4

Building a CT-graph:a 2-phase algorithm

2 Phase algorithmForward phase=1:

Consider as source nodes the locations L1, … , Lk compatible with R0; Assign to them the a-priori probabilities pa(L1|R1), …, pa(Lk|R1)

From to +1: For each node n at , build its successors (locations compatible with R+1

that can prolong the trajectories ending at n without violating any constraint);

Connect n with each node n’ in the just created set of successors with an edge having weight pa(L1|R1);

Backward phaseIteratively remove non-destination nodes having no successors;Revise probabilities to take into account node removals.

Scenario

Example

L0

L1

L2

L4

L3 A (piece of a) map containing 5

locations

Scenario

L0

L1

L2

L4

L3

r1

r5r0

Example A (piece of a) map containing 5

locations Several readers (we show 3 of them)

Scenario

L0

L1

L2

L4

L3

r1

r5r0


locations Several readers (we show 3 of them) A person o equipped with a tag is

moving

Scenario

L0

L1

L2

L4

L3

Time 1 2 3 4

Set of readers

r1

r5r0

Example A (piece of a) map containing 5 locations Several readers (we show 3 of them) A person o equipped with a tag is moving

Table of detections

Scenario

L0

L1

L2

L4

L3

Time 1 2 3 4

Set of readers

{r1, r5}

r1

r5r0


At t=1, o is detected by both r1 and r5

Table of detections

Scenario

L0

L1

L2

L4

L3

Time 1 2 3 4

Set of readers

{r1, r5}

r1

r5r0


locations Several readers (we show 3 of them) A person o equipped with a tag is

moving Table of detections

Then o moves southward

Scenario

L0

L1

L2

L4

L3

Time 1 2 3 4

Set of readers

{r1, r5}

{r1}

r1

r5r0


At t=2, o is in the area covered by both r1 and r5, but is detected by r1

only

Table of detections

Scenario

L0

L1

L2

L4

L3

Time 1 2 3 4

Set of readers

{r1, r5}

{r1} {r0}

r1

r5r0


Table of detections

Then o moves southward

Scenario

L0

L1

L2

L4

L3

Time 1 2 3 4

Set of readers

{r1, r5}

{r1} {r0}

r1

r5r0


Table of detections

At t=3, o is detected by r0

Scenario

L0

L1

L2

L4

L3

Time 1 2 3 4

Set of readers

{r1, r5}

{r1} {r0}

r1

r5r0


Table of detections

o keeps moving

Scenario

L0

L1

L2

L4

L3

Time 1 2 3 4

Set of readers

{r1, r5}

{r1} {r0}

r1

r5r0


Table of detections

At t=4, even if inside the detection range of r0, o is not detected

Scenario

L0

L1

L2

L4

L3

r1

r5r0


Table of detections

PROBLEM: How a sequence of detections can be «interpreted» (i.e., translated into a trajectory)?

Time 1 2 3 4

Set of readers

{r1, r5}

{r1} {r0}

Scenario

r5

Time 1 2 3

Set of readers

{r1, r5} {r1, r5} {r0}

Possible locations

Table of detections

L0

L1

L2

L4

L3

r0

r1

Scenario

r5r0

Time 1 2 3

Set of readers

{r1, r5} {r1, r5} {r0}

Possible locations

L1L4

Table of detections

L0

L1

L2

L4

L3

r1

Scenario

r1

r5r0

Time 1 2 3

Set of readers

{r1, r5} {r1, r5} {r0}

Possible locations

L1L4

L1L4

Table of detections

L0

L1

L2

L4

L3

Scenario

r1

r5r0

Time 1 2 3

Set of readers

{r1, r5} {r1, r5} {r0}

Possible locations

L1L4

L1L4

L0

Table of detections

L0

L1

L2

L4

L3

r1

r5r0

Table of detections

L0

L1

L2

L4

L3

Probabilistically associating locations with readings

Time 1 2 3

Set of readers

{r1, r5} {r1, r5} {r0}

Possible locations

L1L4

L1L4

L0


pa(L1|{r1,r5}) = 50%pa(L4|{r1,r5}) = 50%

=1 =2 =3

n1 loss=0

L1

TL=

p=6/10


n2 loss=1/3

L2

TL=

p=4/10

n3 loss=1/3

L3

TL= <1,L1>n4

L40

TL= <1,L1>

n5

L40

TL=

1/3

2/3

2/3

Time 1 2 3



1/3

n6

L3

TL= <1,L1>

2/3

And so on!

=1 =2 =3

n1 loss=0

L1

TL=

p=6/10


n2 loss=1/3

L2

TL=

p=4/10

n3 loss=1/3

L3

TL= <1,L1>n4

L40

TL= <1,L1>

n5

L40

TL=

1/3

2/3

2/3

Time 1 2 3



1/3

n6

L3

TL= <1,L1>

n7

L5

TL= <1,L1>

2/3

1/3

=1 =2 =3

n1 loss=0

L1

TL=

p=6/10


n2 loss=1/3

L2

TL=

p=4/10

n3 loss=1/3

L3

TL= <1,L1>n4 loss=1

L40

TL= <1,L1>

n5

L40

TL=

1/3

2/3

2/3

Time 1 2 3



1/3

n6

L3

TL= <1,L1>

n7

L3

TL= <1,L1>

2/3

2/3

Same location, Same history

Same node!

=1 =2 =3

n1 loss=0

L1

TL=

p=6/10


n2 loss=1/3

L2

TL=

p=4/10

n3 loss=1/3

L3

TL= <1,L1>n4 loss=1

L40

TL= <1,L1>

n5

L40

TL=

1/3

2/3

2/3

Time 1 2 3



1/3

n6

L3

TL= <1,L1>

2/3

2/3

=1 =2 =3

n1 loss=0

L1

TL=

p=6/10


n2 loss=1/3

L2

TL=

p=4/10

n3 loss=1/3

L3

TL= <1,L1>n4 loss=1

L40

TL= <1,L1>

n5

L40

TL=

1/3

2/3

2/3

Time 1 2 3



1/3

n6

L3

TL= <1,L1>

n7

L5

TL= <1,L1>

2/3

1/3

=1 =2 =3

n1 loss=0

L1

TL=

p=6/10


n2 loss=1/3

L2

TL=

p=4/10

n3 loss=1/3

L3

TL= <1,L1>n4 loss=1

L40

TL= <1,L1>

n5 loss=1

L40

TL=

1/3

2/3

2/3

Time 1 2 3



n6

L3

TL= <1,L1>

2/3

L1 L1 L4 L4 L5

L2 L2 L4 L4

L5

0.6

0.4

1

1

1 1 1

1

1

1 0.2

0.8

=1 =2 =3 =4 =5

b. fazzinga, s.flesca, f. furfaro, f. parisi dimes – university of calabria

Documents

interferencesr1r2the

presence of tags thanks

rfidbased tracking works

moving objects supply

radio signals

detection range

tracking task

sequence of readings