introduction to collectives kagan tumer nasa ames research center [email protected] kagan

33
Introduction to Collectives Kagan Tumer NASA Ames Research Center [email protected] http://ic.arc.nasa.gov/~kagan http://ic.arc.nasa.gov/projects/COIN/ index.html (Joint work with David Wolpert)

Post on 19-Dec-2015

249 views

Category:

Documents


3 download

TRANSCRIPT

Introduction to Collectives

Kagan Tumer

NASA Ames Research Center

[email protected]

http://ic.arc.nasa.gov/~kagan

http://ic.arc.nasa.gov/projects/COIN/index.html

(Joint work with David Wolpert)

CDCS 2002 K. Tumer 2

Ames Research Center

Outline

• Introduction to collectives– Definition / Motivation– A naturally occurring example

• Illustration of theory of collectives I– Central equation of collectives

• Interlude 1:– Autonomous defects problem (Johnson and Challet)

• Illustration of theory of collectives II– Aristocrat utility– Wonderful life utility

• Interlude 2:– El Farol bar problem: System equilibria and global optima– Collective of rovers: Scientific return maximization

• Final thoughts

CDCS 2002 K. Tumer 3

Ames Research CenterMotivation

• Most complex systems, not only can be, but need to be viewed as collectives. Examples include:– Control of a constellation of communication satellites– Routing data/vehicles over a communication network/highway– Dynamic data migration over large distributed databases– Dynamic job scheduling across a (very) large computer grid– Coordination of rovers/submersibles on Mars/Europa– Control of the elements of an amorphous computer/telescope– Construction of parallel algorithms for optimization problems– Autonomous defects Problem

CDCS 2002 K. Tumer 4

Ames Research Center

Collectives

• A Collective is– A (perhaps massive) set of agents;– All of which have “personal” utilities they are trying to achieve;– Together with a world utility function measuring the full

system’s performance.

• Given that the agents are good at optimizing their personal utilities, the crucial problem is an inverse problem:

How should one set (and potentially update) the personal utility functions of the agents so that they “cooperate unintentionally” and optimize the world utility?

CDCS 2002 K. Tumer 5

Ames Research Center

Natural Example: Human Economy

• World utility is GDP– Agents are the individual humans– Agents try to maximize their own “personal” utilities

• Design problem is:– How to modify personal utilities of the agents through

incentives or regulations (e.g., tax breaks, SEC regulations against insider trading, antitrust laws) to achieve high GDP?

– Note: A. Greenspan does not tell each individual what to do.

• Economics hamstrung by “pre-set agents” – No such restrictions for an artificial collective

CDCS 2002 K. Tumer 6

Ames Research Center

Outline• Introduction to Collectives

– Definition / Motivation– A naturally occurring example

• Illustration of Theory of Collectives IIllustration of Theory of Collectives I– Central Equation of CollectivesCentral Equation of Collectives

• Interlude 1:– Autonomous defects problem (Johnson and Challet)

• Illustration of theory of collectives II– Aristocrat utility – Wonderful life utility

• Interlude 2:– El Farol bar problem: System equilibria and global optima– Collective of rovers: Scientific return maximization

• Final thoughts

CDCS 2002 K. Tumer 7

Ames Research Center

Nomenclature

an agentstate of all agents across all time t : state of agent at time t ^t : state of all agents other than at time t

tn

1,t0

^4,t0

4

CDCS 2002 K. Tumer 8

Ames Research CenterKey Concepts for Collectives

• Intelligence: Percentage of states that would have resulted in agent having a worse utility (e.g., SAT-like percentile concept).

• Learnability: Signal-to-noise measure. Quantifies how sensitive an agent’s personal utility function is to a change in its state.

• Factoredness: Degree to which an agent’s personal utility is aligned with the world utility (e.g., quantifies “if you get rich, world benefits” concept).

CDCS 2002 K. Tumer 9

Ames Research Center

• Our ability to control system consists of setting some parameters s (e.g, agents' goals):

Central Equation of Collectives

P(G |s) = dr ε G∫ P(G |

r ε G,s) d

r ε gP(

r ε G |

r ε g,s)P(

r ε g |s)∫

Learnability Factoredness Explore vs. Exploit

Operations Research Economics Machine Learning

– G and g are intelligences for the agents w.r.t the world utility (G) and their personal utilities (g) , respectively

CDCS 2002 K. Tumer 10

Ames Research Center

Outline• Introduction to Collectives

– Definition / Motivation– A naturally occurring example

• Illustration of Theory of Collectives I– Central Equation of Collectives

• Interlude 1:Interlude 1:– Autonomous defects problem (Johnson and Autonomous defects problem (Johnson and

Challet)Challet)• Illustration of Theory of Collectives II

– Aristocrat utility – Wonderful life utility

• Interlude 2:– El Farol bar problem: System equilibria and global optima– Collective of rovers: Scientific return maximization

• Final thoughts

CDCS 2002 K. Tumer 11

Ames Research Center

Autonomous Defects Problem

• Given a collection of faulty devices, how to choose the subset of those devices that, when combined with each other, gives optimal performance (Johnson & Challet).

G(ζ ) =n j a j

j =1

N

nk

k =1

N

∑ nk: action of agent k (nk = 0 ; 1)

aj distortion of component j

• Collective approach: Identify each agent with a component.• Question: what utility should each agent try to maximize?

CDCS 2002 K. Tumer 12

Ames Research Center

Autonomous Defects Problem (N=100)

CDCS 2002 K. Tumer 13

Ames Research Center

Autonomous Defects Problem (N=1000)

CDCS 2002 K. Tumer 14

Ames Research Center

Autonomous Defects Problem: Scaling

CDCS 2002 K. Tumer 15

Ames Research Center

Outline• Introduction to Collectives

– Definition / Motivation– A naturally occurring example

• Illustration of Theory of Collectives I– Central Equation of Collectives

• Interlude 1:– Autonomous defects problem (Johnson and Challet)

• Illustration of Theory of Collectives IIIllustration of Theory of Collectives II– Aristocrat utility Aristocrat utility – Wonderful life utilityWonderful life utility

• Interlude 2:– El Farol bar problem: System equilibria and global optima– Collective of rovers: Scientific return maximization

• Final thoughts

CDCS 2002 K. Tumer 16

Ames Research Center

• Recall central equation:

Personal Utility

P(G |s) = dr ε G∫ P(G |

r ε G,s) d

r ε gP(

r ε G |

r ε g,s)P(

r ε g |s)∫

Learnability Factoredness

• Solve for personal utility g that maximizes learnability, while constrained to the set of factored utilities

CDCS 2002 K. Tumer 17

Ames Research CenterAristocrat Utility

• One can solve for factored U with maximal learnability i.e., a U with good term 2 and 3 in central equation:

• Intuitively, AU reflects the difference between the actual G and the average G (averaged over all actions you could take).

• For simplicity, when evaluating AU here, we make the following approximation:

AUη (ζ ) ≡ G(ζ ) − E[G(ζ ) | ζ ^η ]

= G(ζ ) − pi.G(ζ

^η,CL

η

r s i )

i∑

1

Number of possible actions for pi() =

CDCS 2002 K. Tumer 18

Ames Research Center

• Clamping parameter CLv: replace ’s state (taken

to be unary vector) with constant vector v• Clamping creates a new “virtual” worldline• In general v need not be a “legal” state for • Example: four agents, three actions. Agent 2 clamps

to “average action” vector a = (.33 .33 .33):

Clamping

0 0 0 1 1 1 3 0 9 0 0 0

CDCS 2002 K. Tumer 19

Ames Research CenterWonderful Life Utility

• The Wonderful Life Utility (WLU) for is given by:

– Clamping to “null” action (v = 0) removes player from system (hence the name).

– Clamping to “average” action disturbs overall system minimally (can be viewed as approximation to AU).

– Theorem: WLU is factored regardless of v– Intuitively, WLU measures the impact of agent on the world

• Difference between world as it is, and world without • Difference between world as it is, and world where takes average

action

– WLU is “virtual” operation. System is not re-evolved.

WLUη (ζ ) ≡ G(ζ ) − G(ζ ^η ,CLη

r v )

CDCS 2002 K. Tumer 20

Ames Research Center

Outline• Introduction to Collectives

– Definition / Motivation– A naturally occurring example

• Illustration of Theory of Collectives I– Central Equation of Collectives

• Interlude 1:– Autonomous defects problem (Johnson and Challet)

• Illustration of Theory of Collectives II– Aristocrat utility – Wonderful life utility

• Interlude 2:Interlude 2:– El Farol bar problem: System equilibria and global El Farol bar problem: System equilibria and global

optimaoptima– Collective of rovers: Scientific return maximization

• Final thoughts

CDCS 2002 K. Tumer 21

Ames Research CenterEl Farol Bar Problem

• Congestion game: A game where agents share the same action space, and world utility is a function purely of how many agents take each action.

• Illustrative Example: Arthur’s El Farol bar problem:– At each time step, each agent decides whether to attend a bar:

• If agent attends and bar is below capacity, agent gets reward

• If agent stays home and bar is above capacity, agent gets reward

– Problem is particularly interesting because rational agents cannot all correctly predict attendance:

• If most agents predict attendance will be low and therefore attend, attendance will be high

• If most agents predict high attendance and therefore do not attend …

CDCS 2002 K. Tumer 22

Ames Research Center

Modified El Farol Bar Problem

• Each week agents select one of seven nights to attend a bar

G(ζ ) = xk (ζ t )e− xk (ζ t )

c

k =1

7

∑t

Reward for night k at week t

Rt : Reward for week t

Attendance for night k at week t

Capacity of bar

• Further modifications:– Each week each agent selects two nights to attend bar.– ...– Each week each agent selects six nights to attend bar.

CDCS 2002 K. Tumer 23

Ames Research CenterPersonal Utility Functions

• Two conventional utilities:– Uniform Division (UD): Divide each night’s total reward among

all agents that attended that night (the “natural” reward)

– Team Game (TG): Total world reward at time t (Rt)

• Three collective-based utilities:– WL 0 : WL utility with clamping parameter set to vector of 0s

(world utility minus “world utility without me”)

– WL 1 : WL utility with clamping parameter set to vector of 1s (world utility minus “world utility where I attend every night”)

– WL a : WL utility with clamping parameter set to vector of average action (world utility minus “world utility where I do what is “expected of me”)

CDCS 2002 K. Tumer 24

Ames Research Center

Bar Problem: Utility Comparison

(Attend one night, 60 agents, c=3)

CDCS 2002 K. Tumer 25

Ames Research CenterTypical Daily Bar Attendance

0

20

40

60

80

100

120

140

Daily Attendance

WLU TG UD

Days of week

(c=6; t=1000 s ; Number of agents = 168)

CDCS 2002 K. Tumer 26

Ames Research Center

Scaling Properties (attend one night)

c=2,3,4,6,8,10,15, respectively

CDCS 2002 K. Tumer 27

Ames Research Center

Performance vs. # of Nights to Attend

60 agents; c= 3,6,8,10,10,12,15 respectively

CDCS 2002 K. Tumer 28

Ames Research Center

Collectives of Rovers

• Design a collective of autonomous agents to gather scientific information (e.g., rovers on Mars, submersibles under Europa)

– Some areas have more valuable information than others

– World Utility: Total importance weighted information collected

– Both the individual rovers and the collective need to be flexible so they can adapt to new circumstances

– Collective-based payoff utilities result in better performance than more “natural” approaches

CDCS 2002 K. Tumer 29

Ames Research CenterWorld Utility

• Token value function:

– L : Location Matrix for all agents– L : Location Matrix agent – Lt

a: Location Matrix of agent at time t, had it taken action a at t-1

– : Initial token configuration

V (L,Θ) = Θx ,yx ,y∑ min(1,Lx ,y )

G(ζ ) = V (L ,Θ)

• World Utility :

• Note: Agents’ payoff utilities reduce to figuring out what “L” to use.

CDCS 2002 K. Tumer 30

Ames Research CenterPayoff Utilities

WLUη

r 0 (ζ ) = G(ζ ) − V (L^η ,Θ)

SUη (ζ ) = V (Lη ,Θ)

AUη (ζ ) = G(ζ ) − p r a V (L^η + Lη

r a ,Θ)

r a ∈

r A η

∑• Collectives-Based Utility (theoretical):

• Selfish Utility :

TGη (ζ ) = V (L,Θ)• Team Game Utility :

• Collectives-Based Utility (practical):

WLUη

r a (ζ ) = G(ζ ) − V (L^η + p r

a Lη

r a

r a ∈

r A η

∑ ,Θ)

CDCS 2002 K. Tumer 31

Ames Research CenterUtility Comparison in Rover Domain

100 rovers on a 32x32 grid

CDCS 2002 K. Tumer 32

Ames Research CenterScaling Properties in Rover Domain

CDCS 2002 K. Tumer 33

Ames Research Center

Summary

• Given a world utility, deploying RL algorithms provides a solution to the distributed design problem. But what utilities does one use?

• Theory of collectives shows how to configure and/or update the personal utilities of the agents so that they “unintentionally cooperate” to optimize the world utility

• Personal utilities based on collectives successfully applied to many domains (e.g., autonomous rovers, constellations of communication satellites, data routing, autonomous defects)

• Performance gains due to using collectives-based utilities increase with size of problem

• A fully fleshed science of collectives would benefit from and have applications to many other sciences