ics 806-week 10
TRANSCRIPT
-
8/14/2019 ics 806-Week 10
1/21
November 2009 School of Computing and
Informatics
MSC COMPUTER SCIENCE
Multi-agent systems learningSession Topics
1. Defining Learning
2. How Humans Learn
3. Machine Learning
4. Reinforcement Learning
5. Q-Learning
6. Learning in Multi-Agent Systems
7. End of Session Exercises
-
8/14/2019 ics 806-Week 10
2/21
November 2009 School of Computing and
Informatics
MSC COMPUTER SCIENCEMulti-agent systems learning
Learning by humans - Educational perspective (Wikipedia)
Learning is the process of gaining understanding that
leads to the modification of attitudes and behaviors
through the acquisition of knowledge, skills and values,
through study and experience.
Learning causes a change of behavior that is persistent,measurable, and specified or allows an individual to
formulate a new mental construct or revise a prior mental
construct (conceptual knowledge such as attitudes orvalues).
It depends on experience and leads to long-term changes
in behavior potential.
-
8/14/2019 ics 806-Week 10
3/21
November 2009 School of Computing and
Informatics
ICS 806 - MULTI-AGENT SYSTEMS
Multi-agent systems learning
Learning by humans - Educational perspective (Wikipedia)Basic learning processes
Bloom's Taxonomydivides the learning process into a six-level hierarchy,
where knowledge is the lowest order ofcognitionand evaluation the
highest:
1. Knowledgeis the memory of previously-learned materials such as facts,terms, basic concepts, and answers.
2. Comprehensionis the understanding of facts and ideas by organization,comparison, translation, interpretation, and description.
3. Applicationis the use of new knowledge to solve problems.4. Analysisis the examination and division of information into parts by
identifying motives or causes. A person can analyze by making
inferences and finding evidence to support generalizations.
5. Synthesisis the compilation of information in a new way by combining
elements into patterns or proposing alternative solutions.
6. Evaluationis the presentation and defense of opinions by making judgments
about information, validity of ideas, or quality of work based on a set of criteria.
-
8/14/2019 ics 806-Week 10
4/21
November 2009 School of Computing and
Informatics
ICS 806 - MULTI-AGENT SYSTEMS
Ways by which human learning may take place Learning by example. Imitation of a role modelis a natural mechanismforinfantsand children, when learning from experience.
Learning by worked examples. Worked examples in books showing
exactly how the author solved problems, step by step, for example, inmathematics. Such examples may help explain methods in different
ways
Learning by teaching. Teachers share new lesson content with groupsof students who may prepare on their part in order to teach this
content to the rest of the class. Learning which alternative methods exist. Sometimes different methodscan be
applied to solve a particular problem. These methods need to be pointed out by
the teacher, in which case the student should also be made aware of how to
select the "best" method from among those available, and which textbooksare
likely to be especially helpful.
Learning which shortcuts exist to solve specific problems. Sometimesshortcutsexist that can reduce by many hours the solution of practical
problems. For example, Maxima and minimaof functions can be obtained "thehard way" by a whole series of numerical calculations, while the use of
calculus is often a shortcut.
-
8/14/2019 ics 806-Week 10
5/21
November 2009 School of Computing and
Informatics
ICS 806 - MULTI-AGENT SYSTEMS
Overview of Learning in Artificial IntelligenceMachine learning (Wikipedia)
Machine learning is a subfield of artificial intelligence. It focuses on the
development of algorithms and techniques that allow computers to"learn". At a general level, there are two types of learning: inductive, anddeductive. Inductive machine learning methods create computer
programs by extracting rules and patterns out of massive data sets.
Application areas - several and including natural language processing,search engines, medical diagnosis, bioinformaticsand cheminformatics,detecting credit card fraud, stock marketanalysis, classifying DNAsequences, speechand handwriting recognition, object recognitionin
computer vision, game playingand robot locomotion.
-
8/14/2019 ics 806-Week 10
6/21
November 2009 School of Computing and
Informatics
Type of machine learning algorithms
Machine learning algorithmsare commonly grouped as follows: supervised learning--- where the algorithm generates a function that
maps inputs to desired outputs. One standard formulation of the
supervised learning task is the classificationproblem: the learner is
required to learn (to approximate the behavior of) a function which mapsa vector into one of several classes by looking at several input-output
examples of the function. unsupervised learning--- which models a set of inputs: labeled examples
are not available.
semi-supervised learning--- which combines both labeled and unlabeledexamples to generate an appropriate function or classifier.
reinforcement learning--- where the algorithm learns a policy of how to act givenan observation of the world. Every action has some impact in the environment,
and the environment provides feedback that guides the learning algorithm. transduction--- similar to supervised learning, but does not explicitly construct afunction: instead, tries to predict new outputs based on training inputs, training
outputs, and new inputs.
learning to learn--- where the algorithm learns its own inductive biasbased on
previous experience.
-
8/14/2019 ics 806-Week 10
7/21
November 2009 School of Computing and
Informatics
Reinforcement learning
Reinforcement learning refers to a class of problems inmachine learningwhich postulate an agent exploring anenvironment in which the agent perceives its current state
and takes actions. The environment, in return, provides areward (which can be positive or negative).
Reinforcement learning algorithms attempt to find apolicy for maximizing cumulative reward for the agent
over the course of solving a problem.
-
8/14/2019 ics 806-Week 10
8/21
November 2009 School of Computing and
Informatics
Basic reinforcement learning model:
1.a set of environment states S;2.a set of actions A; and
3.a set of scalar "rewards" in .
At each time t, the agent perceives its state st
Sand the set of possible actions A(st). It chooses anaction a A(st) and receives from the environment
the new state st+1 and a reward rt+1. Based on theseinteractions, the reinforcement learning agent must
develop a policy :SA which maximizes the
quantity R=r0+r1+...+rn .
-
8/14/2019 ics 806-Week 10
9/21
November 2009 School of Computing and
Informatics
(From Sutton and Barto- Reinforcement learning, an introduction)
-
8/14/2019 ics 806-Week 10
10/21
November 2009 School of Computing and
Informatics
Q-learningQ-learning is a reinforcement learning technique. It involves learning an
action-value function that gives the expected utility of taking a givenaction in a given state and following a fixed policy thereafter. An
advantage of Q-learning is that it is able to compare the expected utility of
the available actions without requiring a model of the environment.
where: s - current state, s' - next state, a - action, a' - action of the nextstate, r - immediate reward, - learning rate, - discount factor,
Q(s,a) - expected discounted reinforcement of taking action a in state s.
-
8/14/2019 ics 806-Week 10
11/21
November 2009 School of Computing and
Informatics
Q-Learning AlgorithmThe core of the algorithm is a value iteration update.
For each state, s, from the state set S, and for each action, A,we can calculate an update to its expected discounted reward with
the following expression:
where ris an observed real reward, is a convergence rate such that 0 < < 1, and is a discount rate such that 0 < < 1.
-
8/14/2019 ics 806-Week 10
12/21
November 2009 School of Computing and
Informatics
Begin
For each s, a, initialize table entry Q(s,a)
-
8/14/2019 ics 806-Week 10
13/21
November 2009 School of Computing and
Informatics
Russel & Novigs Q-learning algorithm P.614
Function Q-Learning- Agent (e) returns an action
static Q, a table of action valuesN, a table of state-action values
a, the last action taken
i, the previous state visited
r, the reward received in state i
j STATE[e]If i is non-null then
N[a,i] N[a,i] +1Q[a,i] Q[a,i] + a(r + maxd Q[a, j] Q[a,i])
If TERMINAL?[e] then
i nullelse
i j; r REWARD[e]; a arg maxd f(Q[a,j], N[a, j])return a
-
8/14/2019 ics 806-Week 10
14/21
November 2009 School of Computing and
Informatics
Multi-agent systems learning(Stone & Veloso(1997))ML techniques can be directly applied in multi-agent
systems;Multi-agent learning is more concerned with learning
issues that arise because of the multiagent aspect of a
given domain.Multi-agent learning is learning that is done by several
agents and that becomes possible only because
several agents are present. For example in football itinvolves learning those items that are related to groupactions. Or individual actions for the sake of thegroup(stone&veloso(1997)).
Classification of agents(Stone and Veloso (1997)) homogeneous non-communicating; heterogeneous non-communicating and,
heterogeneous communicating agents
-
8/14/2019 ics 806-Week 10
15/21
November 2009 School of Computing and
Informatics
Homogeneous non-communicating agents
IssuesReactive vs. deliberative agents
Local or global perspective
Modeling of other agents states
How to affect others
Learning opportunities
Enable others actions
Sensor data
Other agents sensor data
Learning Techniques and areas
Reactive Behaviors for Formation maintenance. Balch[7]
Local knowledge sometimes better. Roychowdhury[67] (limited) Recursive Modeling Method (RMM). Durfee[24]
Dont model othersjust pay attention to reward. Schmidhuber[77] Stigmergy. Holland/Goldman and Rosenschein[39, 31] Qlearning for behaviors like foraging, homing, etc. Mataric[52]
-
8/14/2019 ics 806-Week 10
16/21
November 2009 School of Computing and
Informatics
Heterogeneous non-communicating
agents
Issues
Benevolence vs. competitiveness Stable vs. evolving agents (arms race, credit/blame)
Modeling of others goals, actions, and knowledge
Resource management (interdependent actions) Social conventions
Roles
Learning opportunities
Credit/blame in competitive scenarios
Behaviors that blend well with team
Prediction of others actions
Dynamic role assumption
-
8/14/2019 ics 806-Week 10
17/21
November 2009 School of Computing and
Informatics
Heterogeneous non-communicating agents
Learning Techniques and areasGame theory, iterative play. By Mor and Rosenschein/Sandholm and CritesMinimaxQ.
Competitive coevolution. By Haynes and Sen/Grefenstette and Daley/Rosin andBelewDeduce intentions through observation. By Huber and DurfeeAutoepistemic reasoning (ignorance). By PermpoontanalarpModel as a team (individual role). By TambeSocial reasoning: depend on others for goal (game theory). By Sichman and
DemazeauGAs to deal with Braes paradox (more resource ,worse). By Arora and SenMultiagent RL for adaptive Load Balancing. By Schaerf, Shoham, andTennenholtzFocal points/Emergent conventions. By Fenster et al./Walker and WooldridgeDesign agents play different roles. By Prasad et al.
-
8/14/2019 ics 806-Week 10
18/21
November 2009 School of Computing and
Informatics
Heterogeneous communicating agents
Issues
Understanding each other
Planning communicative acts
Benevolence vs. competitiveness Resource management (schedule coordination)
Commitment/decommitment
Truth in communication
Learning opportunities
Evolving language
Effects of speech acts on global dynamics
Communication utility and truthfulness
Commitment utility
-
8/14/2019 ics 806-Week 10
19/21
November 2009 School of Computing and
Informatics
Heterogeneous communicating agentsLearning and related Techniques
Language protocols: KIF for content (Genesreth and Fikes, KQML for
message format by Finin et al. Speech acts. By Cohen and Levesque/Lux and Steiner Learning social behaviors. By Mataric Bayesian learning in negotiation: model others. By Zeng and Sycara
Multiagent Qlearning. By Weiss Training other agents Qfunctions (track driving). By Clouse Minimize the need for training. By Potter and Grefenstette
Cooperative coevolution. By Bull et al. Contract nets for electronic commerce. By Sandholm and Lesser Marketbased systems. By Huberman and Clearwater Generalized Partial Global Planning (GPGP). By Decker Internal, Social, and Collective (role) commitments. By Castelfranchi Commitment states (potential, pre, and actual) as planning states.
ByHaddadi Belief/Desire/Intention (BDI) model: OASIS. By Rao and Georgeff BDI commitments only over intentions. By Rao and Georgeff Coalitions. By Zlotkin and Rosenschein/Shehory and Kraus/Sandholm
and Lesser Reasoning about truthfulness. By Sandholm and Lesser/ Rosenschein
-
8/14/2019 ics 806-Week 10
20/21
November 2009 School of Computing and
Informatics
CRITIC how well
agent is doing
LEARNING ELEMENT
make improvements
PROBLEM GENERATOR
suggest actions that maylead to new experiences
PERFORMANCE
ELEMENT select external
action
Environment
Feedback
Sensors
Changes
Learning goals
Future adjustmentsEffectorsKnowledge
A learning agent: Adapted from Russel & Novig P.526
-
8/14/2019 ics 806-Week 10
21/21
November 2009 School of Computing and
Informatics
END OF SESSION(WEEK 11) EXERCISES
1. Define the term learning
2. Describe how human beings learn
3. What is machine learning?4. Outline various types of machine learning
algorithms
5. What is reinforcement learning?
6. What is Q-learning
7. Formulate algorithms for basic reinforcementleaning and Q-learning
8. Describe learning in Multi-Agent Systems