zinovi rabinovich jeffrey s. rosenschein school of ... · zinovi rabinovich jeffrey s. rosenschein...

53
Extended Markov Tracking solution for Strategic/Tactical Paradigm Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended Markov Tracking – p.1/28

Upload: hadieu

Post on 25-Sep-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Extended Markov Trackingsolution for Strategic/Tactical Paradigm

Zinovi RabinovichJeffrey S. Rosenschein

School of Engineering and Computer SciencesHebrew University in Jerusalem

Extended Markov Tracking – p.1/28

Page 2: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Agenda

Strategic/Tactical Paradigm

Markov Environment Model

Extended Markov Tracking (EMT)

Tactical Solution by EMT

Example application

Strategic Morphology

Multi-agent EMT

Multi-agent example

Future directions

Extended Markov Tracking – p.2/28

Page 3: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Strategic/Tactical Paradigm

Strategic/Tactical Paradigm breaks a continualplanning/control problem into two basic levels:

Strategic level deals with system formalization anddefinition of ideal development.Tactical level deals not with the high level tasks, butonly with ideal system development which itattempts to implement.

The levels continually interact: Tactical level reports it’sdegree of success, while Strategic level updates andmodifies model parameters and tactical target.

This paradigm is evident in many existing planning andcontrol algorithms, and goes far back in time, even toepic stories like “Ulysses journey”.

Extended Markov Tracking – p.3/28

Page 4: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Scylla and Charybdis

Extended Markov Tracking – p.4/28

Page 5: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Ulysses journey

Halfway up the cliff there is a cave, misty-looking and turnedtoward Erebos and the dark, the very direction from which,

O shining Odysseus, you and your men will be steeringyour hollow ship; and from the hollow ship no vigorous

young man with a bow could shoot to the hole in the cliffside. In that cavern Scylla lives, whose howling is terror.

Her voice indeed is only as loud as a new-born puppy couldmake, but she herself is an evil monster. No one not even a

god encountering her, could be glad at that sight.

There are two dangers awaiting for you:Centuries took, but they know what they do.Sailors, who dared them - none to be seen,

For all they have perished, if stories are keen.

Mortal, of dangers first, is for all,Your crew will be fewer by second’s one toll.

Extended Markov Tracking – p.5/28

Page 6: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Ulysses journey

She has twelve feet, and all of them wave in the air. Shehas six necks upon her, grown to great length, and uponeach neck there is a horrible head, with teeth in it, set in

three rows close together and stiff, full of black death. Herbody from the waist down is holed up inside the hollow

cavern, but she holds her heads poked out and away fromthe terrible hollow, and there she fishes, peering all over thecliff side, looking for dolphins or dogfish to catch or anything

bigger, some sea monster, of whom Amphitrite keeps somany; never can sailors boast aloud that their ship has

passed her without any loss of men, for with each of herheads she snatches one man away and carries him off from

the dark-prowed vessel.

There are two dangers awaiting for you:Centuries took, but they know what they do.Sailors, who dared them - none to be seen,

For all they have perished, if stories are keen.

Mortal, of dangers first, is for all,Your crew will be fewer by second’s one toll.

Extended Markov Tracking – p.5/28

Page 7: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Ulysses journey

The other cliff is lower; you will see, Odysseus, for they lieclose together, you could even cast with an arrow across.There is a great fig tree grows there, dense with foliage,and under this shining Charybdis sucks down the black

water. For three times a day she flows it up, and three timesshe sucks it terribly down; may you not be there when she

sucks down water, for not even the Earthshaker couldrescue you out of that evil.

But sailing your ship swiftly drive her past and avoid her andmake for Scylla’s rock instead, since it is far better to mourn

six friends lost out of your ship than the whole company.

There are two dangers awaiting for you:Centuries took, but they know what they do.Sailors, who dared them - none to be seen,

For all they have perished, if stories are keen.

Mortal, of dangers first, is for all,Your crew will be fewer by second’s one toll.

Extended Markov Tracking – p.5/28

Page 8: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Ulysses journey

There are two dangers awaiting for you:Centuries took, but they know what they do.Sailors, who dared them - none to be seen,

For all they have perished, if stories are keen.

Mortal, of dangers first, is for all,Your crew will be fewer by second’s one toll.Take careful decision charting your course,Make it much closer to the second of those.

Extended Markov Tracking – p.5/28

Page 9: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Ulysses journey

There are two dangers awaiting for you:Centuries took, but they know what they do.Sailors, who dared them - none to be seen,

For all they have perished, if stories are keen.

Mortal, of dangers first, is for all,Your crew will be fewer by second’s one toll.Take careful decision charting your course,Make it much closer to the second of those.

Extended Markov Tracking – p.5/28

Page 10: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Ulysses journey

There are two dangers awaiting for you:Centuries took, but they know what they do.Sailors, who dared them - none to be seen,

For all they have perished, if stories are keen.

Mortal, of dangers first, is for all,Your crew will be fewer by second’s one toll.Take careful decision charting your course,Make it much closer to the second of those.

Extended Markov Tracking – p.5/28

Page 11: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Ulysses journey due to Strategy

It is the task of the Strategic level of our paradigm toformalize the problem, and it has to procure two:

Environment formal descriptionThe ideal development of the formal environment:the tactical target

The environment, and this will be our preferred choice,will be described by a Partially Observable MarkovEnvironment Model.

Extended Markov Tracking – p.6/28

Page 12: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Formal Model

Partially Observable Markov Environment Model< S, s0, A, T,O,Ω >

S - the set of the system states, s0 ∈ S the initial system state

A - the set of action applicable

T : S × A → Π(S) - the system transition function

O - the set of possible observations

Ω : S × A × S → Π(O) - the observations probability distribution

The system develops in epochs, at each epoch anobservation received chosen with respect to theobservations probability distribution, parametrized bythe transition the system underwent.

Basic beliefs about the system state at time t areexpressed by a probability vector ~pt ∈ Π(S).

Extended Markov Tracking – p.7/28

Page 13: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

How model fits Strategy

It is quite easy to see how ship’s motion is describedby a Markovian model < S, s0, A, T,O,Ω >:

The system states S represent different distancesfrom Scylla and Charybdis.Violent sea randomly throws the ship around - T ,but steering rod - A - can influence tendency ofmotion left and right.There’s so much commotion Ω, you’ll be lucky tohave a distant clue of what’s going on - O.

This makes it easy to describe the second output ofthe Strategic level: tactical target. We’d like the ship toalways return to a prescribed distance between Scyllaand Charybdis.

Under Markov Environment Model it’s simply the(conditional) distribution p(s′|s) = 1 ⇐⇒ s′ = ideal.

Extended Markov Tracking – p.8/28

Page 14: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

How model fits Strategy

It is quite easy to see how ship’s motion is describedby a Markovian model < S, s0, A, T,O,Ω >:

This makes it easy to describe the second output ofthe Strategic level: tactical target. We’d like the ship toalways return to a prescribed distance between Scyllaand Charybdis.

Under Markov Environment Model it’s simply the(conditional) distribution p(s′|s) = 1 ⇐⇒ s′ = ideal.

Extended Markov Tracking – p.8/28

Page 15: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Ulysses journey due to Tactics

Tactical level deals not with the problem, but only withits abstract representation.

Given a tactical target, make formal system developdue to that target.

How?Try to understand how the system actuallydevelopsCorrect it by means of action application

Extended Markov Tracking – p.9/28

Page 16: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Ulysses journey due to Tactics

What journey??

Tactical level deals not with theproblem, but only with its abstract representation.

Given a tactical target, make formal system developdue to that target.

How?Try to understand how the system actuallydevelopsCorrect it by means of action application

Extended Markov Tracking – p.9/28

Page 17: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Ulysses journey due to Tactics

Tactical level deals not with the problem, but only withits abstract representation.

Given a tactical target, make formal system developdue to that target.

How?

Try to understand how the system actuallydevelopsCorrect it by means of action application

Extended Markov Tracking – p.9/28

Page 18: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Ulysses journey due to Tactics

Tactical level deals not with the problem, but only withits abstract representation.

Given a tactical target, make formal system developdue to that target.

How?Try to understand how the system actuallydevelopsCorrect it by means of action application

Extended Markov Tracking – p.9/28

Page 19: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

The need to know what others see

It is important to underline that we attempt to correctthe dynamics impression created by actual systemstate development.

But we do not have the exact knowledge of thatimpression and have to estimate it from noisy data.

First, estimating the state itself - Markov TrackingSecond, estimating the cause of the change -Dynamics Tracking

We term the overall combination the Extended MarkovTracking (EMT).

Extended Markov Tracking – p.10/28

Page 20: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Markov Tracking

In Markov Model ’tracking’ simply means fusion ofobservation data, action data and current beliefs aboutthe system state into new beliefs.

Since the system develops in discrete time epochs,and observation distribution is known it can be doneusing, so called Bayesian Update

pt+1(s) ∝ p(o|s, a)∑

s′

T (s|a, s′)pt(s′)

Markov tracking is used on its own merits in varioussolutions of Markov Decision Problems (MDPs)

Extended Markov Tracking – p.11/28

Page 21: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Markov Tracking

In Markov Model ’tracking’ simply means fusion ofobservation data, action data and current beliefs aboutthe system state into new beliefs.

Since the system develops in discrete time epochs,and observation distribution is known it can be doneusing, so called Bayesian Update

pt+1(s) ∝ p(o|s, a)∑

s′

T (s|a, s′)pt(s′)

Markov tracking is used on its own merits in varioussolutions of Markov Decision Problems (MDPs)

Extended Markov Tracking – p.11/28

Page 22: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Markov Tracking

In Markov Model ’tracking’ simply means fusion ofobservation data, action data and current beliefs aboutthe system state into new beliefs.

Since the system develops in discrete time epochs,and observation distribution is known it can be doneusing, so called Bayesian Update

pt+1(s) ∝ p(o|s, a)∑

s′

T (s|a, s′)pt(s′)

Markov tracking is used on its own merits in varioussolutions of Markov Decision Problems (MDPs)

Extended Markov Tracking – p.11/28

Page 23: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Markov Decision Problem (MDP)

Econometric extension of Partially Observable MarkovEnvironment Model

Reward function R : S × A → R

Optimality measure, e.g., maximize averageaccumulated discounted reward

Algorithms exist that can solve MDPs and provide anaction selection procedure to obtain required optimality

Wide range of system behaviors can be induced byaction selection procedures dictated by a wisechoice of reward function.Problem: high computational complexity of solvingMDPs in partially observable environments

Extended Markov Tracking – p.12/28

Page 24: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Markov Decision Problem (MDP)

Econometric extension of Partially Observable MarkovEnvironment Model

Reward function R : S × A → R

Optimality measure, e.g., maximize averageaccumulated discounted reward

Algorithms exist that can solve MDPs and provide anaction selection procedure to obtain required optimality

Wide range of system behaviors can be induced byaction selection procedures dictated by a wisechoice of reward function.Problem: high computational complexity of solvingMDPs in partially observable environments

Extended Markov Tracking – p.12/28

Page 25: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Markov Decision Problem (MDP)

Econometric extension of Partially Observable MarkovEnvironment Model

Reward function R : S × A → R

Optimality measure, e.g., maximize averageaccumulated discounted reward

Algorithms exist that can solve MDPs and provide anaction selection procedure to obtain required optimality

Wide range of system behaviors can be induced byaction selection procedures dictated by a wisechoice of reward function.Problem: high computational complexity of solvingMDPs in partially observable environments

Extended Markov Tracking – p.12/28

Page 26: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Markov Decision Problem (MDP)

Econometric extension of Partially Observable MarkovEnvironment Model

Reward function R : S × A → R

Optimality measure, e.g., maximize averageaccumulated discounted reward

Algorithms exist that can solve MDPs and provide anaction selection procedure to obtain required optimality

Wide range of system behaviors can be induced byaction selection procedures dictated by a wisechoice of reward function.Problem: high computational complexity of solvingMDPs in partially observable environments

Extended Markov Tracking – p.12/28

Page 27: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Markov Decision Problem (MDP)

Econometric extension of Partially Observable MarkovEnvironment Model

Reward function R : S × A → R

Optimality measure, e.g., maximize averageaccumulated discounted reward

Algorithms exist that can solve MDPs and provide anaction selection procedure to obtain required optimality

Wide range of system behaviors can be induced byaction selection procedures dictated by a wisechoice of reward function.

Problem: high computational complexity of solvingMDPs in partially observable environments

Extended Markov Tracking – p.12/28

Page 28: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Markov Decision Problem (MDP)

Econometric extension of Partially Observable MarkovEnvironment Model

Reward function R : S × A → R

Optimality measure, e.g., maximize averageaccumulated discounted reward

Algorithms exist that can solve MDPs and provide anaction selection procedure to obtain required optimality

Wide range of system behaviors can be induced byaction selection procedures dictated by a wisechoice of reward function.Problem: high computational complexity of solvingMDPs in partially observable environments

Extended Markov Tracking – p.12/28

Page 29: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Dynamics Tracking

It can be done by methods of Machine LearningGraphical ModelsDecision TreesNeural Networks

All are computationally hard

Extended Markov Tracking – p.13/28

Page 30: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Dynamics Tracking

It can be done by methods of Machine LearningGraphical ModelsDecision TreesNeural Networks

All are computationally hard

Extended Markov Tracking – p.13/28

Page 31: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Dynamics Tracking

It can be done by methods of Machine LearningGraphical ModelsDecision TreesNeural Networks

All are computationally hard

Information Theory

DKL(p‖q) =∑

xp(x) log

(

p(x)q(x)

)

measures the cost of an encoding guided bydistribution q of a data source governed bydistribution p

Extended Markov Tracking – p.13/28

Page 32: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Dynamics Tracking

It can be done by methods of Machine LearningGraphical ModelsDecision TreesNeural Networks

All are computationally hard

Degree of Mental Change

DKL(p‖q) =∑

xp(x) log

(

p(x)q(x)

)

measures degree of mental change required tomove from old beliefs q to new beliefs p

Extended Markov Tracking – p.13/28

Page 33: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Extended Markov Tracking

Given that our beliefs about the system state havechanged from pt to pt+1, and our previous beliefs aboutthe exhibited system dynamics are PDt, the update ofthis belief is the solution of the following:

PDt+1 = arg minQ

Ept+1(s) [DKL(Q(·|s)‖PDt(·|s))]

s.t.

pt+1 = Q · pt

Denote PDt+1 = H[pt+1, pt, PDt].

Extended Markov Tracking – p.14/28

Page 34: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Tactical solution

Now that we have means of understanding how thesystem develops we can attempt to fix it to our liking,the liking of tactical target r : S → Π(S)

Tactical solution performs continual loop ofEMT estimation of system developmentAction choice

a∗ = arg mina

DKL( H[Ta ∗ pt, pt, PDt] ‖ r)

Application of a∗.

Extended Markov Tracking – p.15/28

Page 35: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Example: Steering the Ship

We tilt the steering rod to vary ship’s tendency to moveleft and right, so as to keep it at specified distance fromthe dangerous cliffs.

Extended Markov Tracking – p.16/28

Page 36: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Example: Steering the Ship

We tilt the steering rod to vary ship’s tendency to moveleft and right, so as to keep it at specified distance fromthe dangerous cliffs.

Environment Model instantiation: < S, s0, A, T,O,Ω >

S = [0 : n], s0 = bn

2c, n = 12

A ⊂ [0, 1] a finite set symmetric with respect to 0.5, |A| = 11

T (s′|s, a) is computed in such a way that the ship can makethree probabilistic steps with (p−, p+), probabilities of singlestep left or right, conform to the following:

p is constant for all s, s′ ∈ S and a ∈ A

p− = a ∗ (1 − p) and p+ = (1 − a) ∗ (1 − p)

O = S, Ω(o|s′, a, s) = 1

3iff |s′ − o| ≤ 1

Extended Markov Tracking – p.16/28

Page 37: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Example: Steering the Ship

We tilt the steering rod to vary ship’s tendency to moveleft and right, so as to keep it at specified distance fromthe dangerous cliffs.

Environment Model instantiation: < S, s0, A, T,O,Ω >

S = [0 : n], s0 = bn

2c, n = 12

A ⊂ [0, 1] a finite set symmetric with respect to 0.5, |A| = 11

T (s′|s, a) is computed in such a way that the ship can makethree probabilistic steps with (p−, p+), probabilities of singlestep left or right, conform to the following:

p is constant for all s, s′ ∈ S and a ∈ A

p− = a ∗ (1 − p) and p+ = (1 − a) ∗ (1 − p)

O = S, Ω(o|s′, a, s) = 1

3iff |s′ − o| ≤ 1

Ideal dynamics: r(s′|s) = 1 ⇐⇒ s′ = bn2 c

Extended Markov Tracking – p.16/28

Page 38: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Example: Results

0 1 2 3 4 5 60.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Distance from global goal position

Cum

ulat

ive

prob

abili

ty

TacticalPOMDP

Extended Markov Tracking – p.17/28

Page 39: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Strategic Morphology

Now that EMT works, we can ask: What kind of tacticaltargets can it really solve?

Those it can get from Strategic level.

There are several major dimensions for tactical taskspecifications:

Tactical target complexity

Model’s action complexityModel’s state observability

Extended Markov Tracking – p.18/28

Page 40: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Target complexity

Tactical target complexity refers to the structure of theconditional distribution r : S → Π(S).

State distribution: tactical target r expressespreference over states, rather then state transitions.Essentially it means that original task wasexpressed by some vector p ∈ Π(S) and p = r ∗ p.Complete Dynamics: tactical target is anunstructured set of preferences. It may have beencreated by normalization from a reward functionR : S × S → R

Extended Markov Tracking – p.19/28

Page 41: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Action complexity

Action complexity refers to the composition of theactions space. It is possible in fact to replace A by aCartesian product A1 × · · · × An, expressing thepresence of multiple active parties - agents.

For example in the case of Ulysses ship, one may thinkof controlling it by coordinated action of rowers.

EMT in this case may deal with different degrees ofsocial awareness

Complete social awareness: each and all agentsselect actions with respect to the presence andpotentials of other agents.Unaware scenario: agents are unaware of otheragents presence, and respond to them onlyindirectly through the environment modifications.

Extended Markov Tracking – p.20/28

Page 42: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Observations complexity

As part of multi-agent scenario, it is also possible tosplit observations O = O1 × · · · × On and/or systemstate S = S1 × · · · × Sn, disengaging and dis-correlatingthe knowledge of different agents withing the system.

The degree of mutual dependency between whatagents know and experience provides additionalcomplexity dimension for Strategic level to use.

Extended Markov Tracking – p.21/28

Page 43: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Example: Springed bar problem

Consider a long bar resting it’s ends on two equalsprings and two agents of equal mass are standing onthe bar. Their task is to shift themselves around so thatthe bar would level.

Formally the system state is described by the positionsof the two agent on the bar S = [1 : dmax]2, where dmax

is the length of the bar in “steps”, and the initial state isa dis-balanced one s0 = (1, dmax

2 + 1). The actions setsare Ai = left, stay, right, and the transitionprobability is built according to physics of motion.

Extended Markov Tracking – p.22/28

Page 44: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Example: Springed bar problem

Consider a long bar resting it’s ends on two equalsprings and two agents of equal mass are standing onthe bar. Their task is to shift themselves around so thatthe bar would level.

Formally the system state is described by the positionsof the two agent on the bar S = [1 : dmax]2, where dmax

is the length of the bar in “steps”, and the initial state isa dis-balanced one s0 = (1, dmax

2 + 1). The actions setsare Ai = left, stay, right, and the transitionprobability is built according to physics of motion.

Extended Markov Tracking – p.22/28

Page 45: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Observations

1. Oi = S = all positions of the two agents, Ω1 = Ω2 andcreates uniform noise over the immediateneighborhood of the real joint position of agents.

2. Oi = [1 : dmax] and represents the position of theobserving agent. Ωi creates a uniform noise over theimmediate neighborhood of the observing agent realposition.

Extended Markov Tracking – p.23/28

Page 46: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Multi-agent EMT

EMT has to be modified only slightly to admitmulti-agent scenario with complete social awarenessand dis-correlated observations and state. The agenthas to consider the complete Cartesian product ofactions A = A1 × · · · × An, but perform only it’s part.

Multi-Agent EMT loop:EMT estimation of system development due tolocal (agent specific) data.Action choice

~a∗ = arg min~a

DKL( H[Ta ∗ pt, pt, PDt] ‖ r)

Application of a∗i from a∗ = (a∗1, ..., a∗

n).

Extended Markov Tracking – p.24/28

Page 47: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Results: First Observational scenario

In this first observation scenario, agents converge to asymmetric position around the ideal center of mass

5 10 15 20 25 30 35 400

2

4

6

8

10

12

14

16

Time step

Pos

ition

on

the

bar 1

≤ po

s ≤

d max

=15

Agent 1Agent 2Center of Mass

Extended Markov Tracking – p.25/28

Page 48: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Results: First Observational scenario

In this first observation scenario, agents converge to asymmetric position around the ideal center of mass

10 20 30 40 50 60 70 80 90 100−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Time Step

Dev

iatio

n −c

onfid

ence

bar

val

ue *

104

Extended Markov Tracking – p.25/28

Page 49: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Results: Second Observational scenario

In the second observation scenario have found anequilibrium point, where each agent occupies the farend of the bar, thus balancing it.

5 10 15 20 25 30 35 400

2

4

6

8

10

12

14

16

Time step

Pos

ition

on

the

bar 1

≤ po

s ≤

d max

=15

Agent 1Agent 2Center of Mass

Extended Markov Tracking – p.26/28

Page 50: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Results: Second Observational scenario

In the second observation scenario have found anequilibrium point, where each agent occupies the farend of the bar, thus balancing it.

5 10 15 20 25 30 35 400

2

4

6

8

10

12

14

16

Time step

Pos

ition

on

the

bar 1

≤ po

s ≤

d max

=15

Agent 1Agent 2Center of Mass

Extended Markov Tracking – p.26/28

Page 51: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

Results: Second Observational scenario

In the second observation scenario have found anequilibrium point, where each agent occupies the farend of the bar, thus balancing it.

10 20 30 40 50 60 70 80 90 100−1.5

−1

−0.5

0

0.5

1

1.5

Time Step

Dev

iatio

n

Extended Markov Tracking – p.26/28

Page 52: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

whEre May iT lead?

There are still many questions to be answered andapplications tested.

Can EMT be optimized for parametric models?

Can EMT handle balancing between multiple targets:a∗ =arg min

a[DKL( H[. . . ] ‖ r1) + DKL( H[. . . ] ‖ r2)]

Can EMT handle negative target? We’d like to avoidcertain dynamics

a∗ = arg maxa

DKL( H[. . . ] ‖ r)

a∗ = arg mina

DKL( H[. . . ] ‖ 1 − r)

Extended Markov Tracking – p.27/28

Page 53: Zinovi Rabinovich Jeffrey S. Rosenschein School of ... · Zinovi Rabinovich Jeffrey S. Rosenschein School of Engineering and Computer Sciences Hebrew University in Jerusalem Extended

where Else May iT lead?

There are also questions regarding Strategic/TacticalParadigm itself

Are there alternative tactical solutions?How they compare to EMT’s performance?Is there any optimality measure that we may thinkof?

More specifically to Strategic levelIf Tactical level fails to achieve its target, is thereany alternative tactical target that Strategic levelcan provide?Can Strategic level claim that this tactical target isbest? Most easily achieved?

Extended Markov Tracking – p.28/28