Download - Giannone Nao Learning
-
7/30/2019 Giannone Nao Learning
1/20
-
7/30/2019 Giannone Nao Learning
2/20
Overview:environment
Robotic Agent NAO
Application Robotic Soccer
SDK
Simulator
Humanoid Robot
Produced by Aldebaran
-
7/30/2019 Giannone Nao Learning
3/20
Process raw data
from environment
Elaborate raw data to obtain
more reliable information
Decide the best behaviour to
accomplish the agent goal
Actuate robot motors
accordindly
Vision Module Modelling Module
Motion Control
Module
Behaviour Control
Module
Environment
At First !!!
At First !!!
Overview:(sub)tasks
-
7/30/2019 Giannone Nao Learning
4/20
Make Nao walkhow?
Main Advantage
and a Drawback
Based on an unknow Walk Model
Ready to Use (to be tuned)
Nao is equipped
with a set of motion utilities including
a walk implementationthat can be
No flexibility at all!!!
called through an interface
(NaoQi Motion Proxy)
partially customized by tuning
some parameters
For these reasons
we decided to develop
our walkmodeland to tune it using
machine learnig tecniques
-
7/30/2019 Giannone Nao Learning
5/20
-
7/30/2019 Giannone Nao Learning
6/20
A simple walking RAgent for Nao
Motion Control Module
NaoQi Adaptor
Simple Behaviour Module
Switches between
two states: walk -
stand
Smemy
SPQR Walking Library
NAO (NaoQi)
Webots Client
TCP channel
WEBOTS
uses
-
7/30/2019 Giannone Nao Learning
7/20
Choose a set of variable output:
3D coordinates of selected points
of the robot
Choose and parametrize the desired
trajectories for these variables
at each phase of the gait
SPQR Walking Engine Model
21 degrees of freedom
Velocity Commands (v,) v is linear velocity
is angolar velocity
We follow theStatic Walking Pattern:
Use a-priori definition of the
desired trajectories defined by:
NAO modelcharacteristics
No actuated trunk
No dynamic model available
-
7/30/2019 Giannone Nao Learning
8/20
-
7/30/2019 Giannone Nao Learning
9/20
SPQR walking subtasks and parameters
SPQR walk subtasks
Foot trajectories in
the xz planeCenter of mass
trajectory in lateral
direction
Hip yaw/pitch
control (turn)
Arm control
Xtot, Xsw0, Xds
Zst, Zsw
Yft, Yss, Yds, Kr
HypKs
Biped walking
Double support phaseSwing phase SS%
-
7/30/2019 Giannone Nao Learning
10/20
Walk tuning: main issues Possible choices
By hand
By using machine learning techniques
Machine Learning seems the best solution
Less human interaction
Explores the search space in a more systematic way
but take care of some aspects
You need to define an effective fitness function
You need to choose the right algorithm to explore the parameterspace
Only a limited amount of experiments can be done on a real
robot
-
7/30/2019 Giannone Nao Learning
11/20
SPQR Learning System Architecture
LearnerLearning library
RAgent
Walking library
uses
uses
Real Nao
Webots
Datato evaluatethe fitness
FitnessIterationexperiments
(GPS)
-
7/30/2019 Giannone Nao Learning
12/20
SPQR Learner
First
iteration?
Return initial
Iteration and
iteration information
Apply the chosen
algorithm (strategy)
Yes
No
Policy Gradient
(e.g., PGPR)
Nelder Mead
Simplex Method
Genetic Algorithm
Learner
Return next
Iteration and
iteration information
-
7/30/2019 Giannone Nao Learning
13/20
-
7/30/2019 Giannone Nao Learning
14/20
Enhancing PG: PGPR
At each iteration i, the gradient estimate (i) can be
used to obtain a metric for measuring therelevance of the parameters.
Given the relevance and a threshold T, PGPR prunes less relevant parameters
in next iterations.
forgetting factor
-
7/30/2019 Giannone Nao Learning
15/20
-
7/30/2019 Giannone Nao Learning
16/20
Simulators in learning tasks
Advantages
You can test the gait model and the learningalgorithm without being biased by noise
Limits
The results of the experiments on the simulator can
be ported on the real robot, but specialized solutions
for the simulated model can be not so effective on the
real robot (e.g., it does not take into account
asymmetries, models are not very accurate)
-
7/30/2019 Giannone Nao Learning
17/20
Results (1)
Five sessions of PG, 20 iterations each, all starting from
the same initial configuration
SS%, Ks, Yft have been set to hand-tuned values
16 policies for each iteration
Fitness increases
in a regular way
Low variance
among the five
simulations
-
7/30/2019 Giannone Nao Learning
18/20
Results (2)
Zsw Xs KrXsw0
Five runs of PGPR
Final parameter setsfor the five PG runs
-
7/30/2019 Giannone Nao Learning
19/20
A. Cherubini, F. Giannone, L. Iocchi, M. Lombardo, G. Oriolo. Policy
Gradient Learning for a Humanoid Soccer Robot. Accepted for Journal ofRobotics and Autonomous Systems.
A. Cherubini, F. Giannone, L. Iocchi, and P. F. Palamara, An extendedpolicy gradient algorithm for robot task learning, Proc. of IEEE/RSJInternational Conference on Intelligent Robots and System, 2007.
A. Cherubini, F. Giannone, and L. Iocchi, Layered learning for a soccerlegged robot helped with a 3D simulator, Proc. of 11th InternationalRobocup Symposium, 2007.
http://openrdk.sourceforge.net
http://www.aldebaran-robotics.com/
http://spqr.dis.uniroma1.it
Bibliography
http://openrdk.sourceforge.net/http://www.aldebaran-robotics.com/http://spqr.dis.uniroma1.it/http://spqr.dis.uniroma1.it/http://www.aldebaran-robotics.com/http://www.aldebaran-robotics.com/http://www.aldebaran-robotics.com/http://openrdk.sourceforge.net/ -
7/30/2019 Giannone Nao Learning
20/20
??? Any Questions ???
???
???