gradient sampling for improved action selection and path...
TRANSCRIPT
Gradient Sampling forImproved Action Selection and Path Synthesis
Ian M. Mitchell
Department of Computer ScienceThe University of British Columbia
November 2016
http://www.cs.ubc.ca/~mitchell
Copyright 2016 by Ian M. MitchellThis work is made available under the terms of the Creative Commons Attribution 4.0 International license
http://creativecommons.org/licenses/by/4.0/
Ian M. Mitchell — 1
Shortest Path via the Value Function
• Assume isotropic holonomic vehicle
ddtx(t) = x(t) = u(t), ‖u(t)‖ ≤ 1.
• Plan paths to target set T optimal by costmetric
ψ(x0) = infx(·)
∫ tf
t0
c(x(s)) ds,
tf = argmins | x(s) ∈ T .
• Value function ψ(x) satisfies Eikonalequation
‖∇ψ(x)‖ = c(x), for x ∈ Ω \ T ;
ψ(x) = 0, for x ∈ T .
Top: Goal location (blue circle) andobstacles (grey).Middle: Contours of value function.Bottom: Gradients of value function(subsampled grid).
Ian M. Mitchell — 2
Path Extraction from Value Function
• Given the value function, optimal statefeedback action
u∗(x) =∇ψ(x)
‖∇ψ(x)‖.
• Typical robot makes decisions on a periodiccycle with period δt so path is given by
ti+1 = ti + ∆t,
x(ti+1) = x(ti) + ∆t u∗(x(ti)).
• Even variable step integrators forx(t) = u∗(x(t)) struggle
Top: Fixed stepsize explicit (forwardEuler).Middle: Adaptive stepsize implicit(ode15s).Bottom: Sampled gradient algorithm.
Ian M. Mitchell — 3
Not Just for Static HJ PDEs!
Time dependent HJ PDEs arise in:
• Finite horizon optimal control.
• Reachable sets and viability kernels for formal verification of continuous andhybrid state systems.
(See game of two identical vehicles / softwalls slide deck.)
Ian M. Mitchell — 4
Outline
1. Motivation: Value Functions and Action / Path Synthesis
2. Background: Gradient Sampling and Particle Filters
3. Gradient Sampling Particle Filter
4. Dealing with Stationary Points
5. Concluding Remarks
Ian M. Mitchell — 5
Outline
1. Motivation: Value Functions and Action / Path Synthesis
2. Background: Gradient Sampling and Particle Filters
3. Gradient Sampling Particle Filter
4. Dealing with Stationary Points
5. Concluding Remarks
Ian M. Mitchell — 5
Gradient Sampling for Nonsmooth Optimization I
Gradient sampling algorithm [Burke, Lewis & Overton, SIOPT 2005]
• Evaluate gradient at k random sampleswithin ε-ball of current point x(ti)
x(k)(ti) = x(ti) + ε δx(k),
p(k)(ti) = ∇ψ(x(k)(ti)).
• Determine consensus direction
p∗(ti) = argminp∈P(ti)
‖p‖
P(ti) = convp(1)(ti), . . . , p(K)(ti).
P(ti) approximates the Clarkesubdifferential at x(ti).
Gradient samples (yellow) andconsensus direction (red).
Plotted in state space.
Plotted in gradient space.Convex hull (blue) also shown.
Ian M. Mitchell — 6
Gradient Sampling for Nonsmooth Optimization II
Gradient sampling algorithm [Burke, Lewis & Overton, SIOPT 2005]
If ‖p∗(ti)‖ = 0
• There is a Clarke ε-stationary point insidethe sampling ball.
• Shrink ε and resample.
If ‖p∗(ti)‖ 6= 0
• Choose step length s by Armijo line searchalong p∗(ti).
• Set new point
x(ti+1) = x(ti)− sp∗(ti)
‖p∗(ti)‖.
Gradient samples (yellow) andconsensus direction (red).
Plotted in state space.
Plotted in gradient space.Convex hull (blue) also shown.
Ian M. Mitchell — 7
Particle Filters
Monte Carlo localization (MCL) [Thrun, Burgard & Fox, Probabilistic Robotics,2005] is often used to estimate current state for mobile robots.
• State estimate is a collection of weighted samples
(w(k)(t), x(k)(t)).
• Predict: Draw new sample state x(k)(ti+1) when action u(ti) is taken
x(k)(ti+1) ∼ p(x(ti+1) | x(k)(ti), u(ti)).
• Correct: Update weights w(k)(ti+1) when sensor reading arrives
w(k)(ti+1) = p(sensor reading | x(k)(ti+1)) w(k)(ti),
• Resample states and reset weights regularly.
We always work with particle cloud after resampling (when all weights are unity).
Ian M. Mitchell — 8
Outline
1. Motivation: Value Functions and Action / Path Synthesis
2. Background: Gradient Sampling and Particle Filters
3. Gradient Sampling Particle Filter
4. Dealing with Stationary Points
5. Concluding Remarks
Ian M. Mitchell — 8
The Gradient Sampling Particle Filter (GSPF)
• Sample the gradients at the particle locations.
• If ‖p∗(ti)‖ 6= 0, then p∗(ti) is a consensus descent direction for current stateestimate.
Choosing action by steepest descent on theexpected state.
Choosing action by GSPF.
Simulated traversal of a narrow corridor.Estimated (blue) and true (green) paths shown.
Ian M. Mitchell — 9
Narrow Corridor Simulation
Run in ROS (Robot Operating System) with simulated motion and sensor noise(ROS/Gazebo) and particle filter localization (ROS/AMCL).
Occupancy grid constructed with laser rangefinder(s) and SLAM package.
Cost function c(x) (higher cost is darker blue).
Value function ψ(x) (higher value is red). Vertical component of ∇ψ(x) (yellow isdownward, brown is upward).
Ian M. Mitchell — 10
Narrow Corridor Simulation
Compare paths taken when choosing action by AMCL expected state (roughly themean of particle locations) or GSPF.
• Chattering remains even as step size reduced.
Choosing action by steepest descent on theexpected state.
Choosing action by GSPF.
Visualizations shown in the videos:
• Main window: Estimated state and path (blue), actual state and path (green), particlecloud (pink), action choices (red arrows), sensor readings (red dots), vertical component of∇ψ(x) (background).
• Lower left window: Gradient space visualization of gradient samples (yellow) and actionchoice (red).
Ian M. Mitchell — 11
Outline
1. Motivation: Value Functions and Action / Path Synthesis
2. Background: Gradient Sampling and Particle Filters
3. Gradient Sampling Particle Filter
4. Dealing with Stationary Points
5. Concluding Remarks
Ian M. Mitchell — 11
Finite Wall Scenario
If ‖p∗(ti)‖ = 0 there is no consensus direction.
Finite wall scenario displays the twotypical types of stationary points:
• Minimum (left side): Path iscomplete(?)
• Saddle point (right side): Seek adescent direction.
Cost. Value approximation.
Ian M. Mitchell — 12
Classify the Stationary Point
Quadratic ansatz for value function in neighborhood of samples
ψ(x) = 12 (x− xc)T A(x− xc)
• Fit to the existing gradient samples
∇ψ(x) = A(x− xc).
• Solve by least squares
minA,b‖p(k)(ti)−Ax(k)(ti)− b‖
and set xc = A−1b.
• Examine eigenvalues λjdj=1 of A
I If all λj > 0, local minimum.I If any λj < 0, corresponding eigenvectors
are descent directions.
ψ(x) at minimum.
ψ(x) at saddle.
Ian M. Mitchell — 13
Classification Experiments: Minimum
State space view of path.
State space gradient samples(gold) and eigenvectors of
Hessian of ψ(x) (blue).Inward pointing eigenvectorarrow pairs correspond to
positive eigenvalues.
Ian M. Mitchell — 14
Classification Experiments: Saddle
State space view of path.
Gradient space convex hull
State space eigenvectors ofHessian of ψ(x).
Ian M. Mitchell — 15
Resolve the Stationary Point
Three potential responses to detection of a stationary point
• Stop: If it is a minimum and localization issufficiently accurate.
• Reduce sampling radius: Collect additional sensordata to improve localization estimate.
I Rate and/or quality of sensing can be reduced whenconsensus direction is available.
I Localization should be improved by independentsensor data.
• Vote: If it is a saddle and improved localization isinfeasible.
I Let v be the eigenvector associated to a negativeeigenvalue and α =
∑k sign(−v
T p(k)).I Travel in direction sign(α)v.
0 5 10 15 20 25 30 35 400.00
0.79
1.57
2.36
3.14
3.93
4.71
5.50
6.28
Action Angle vs. Particle Covariance
Time
Angle
0 5 10 15 20 25 30 35 400
0.05
0.1
0.15
0.2
0.25
Covariance
Decreasing localizationcovariance (green) allows foridentification of a consensus
direction (blue). In greyregion, no consensusdirection was found.
Ian M. Mitchell — 16
Finite Wall Simulation
Generate paths for finite wall scenario using each of the two resolution proceduresfor the saddle point.
Resolution by using an improved sensor. Resolution by voting.
Visualizations shown in the videos:
• Main window: Estimated state and path (blue), actual state and path (green), particlecloud (pink), action choices (red arrows), gradient samples (yellow), sensor readings (reddots) and eigenvectors of the Hessian approximation (blue arrows).
• Lower left window: Gradient space visualization of gradient samples (yellow) and actionchoice (red).
Ian M. Mitchell — 17
Outline
1. Motivation: Value Functions and Action / Path Synthesis
2. Background: Gradient Sampling and Particle Filters
3. Gradient Sampling Particle Filter
4. Dealing with Stationary Points
5. Concluding Remarks
Ian M. Mitchell — 17
Gradient Sampling is not just for Value Functions
Actions synthesized by nearest neighbor lookup on RRT* tree.
Ian M. Mitchell — 18
No Time For. . .
Monotone acceptance ordered upwind method [Alton & Mitchell, JSC 2012].
Ian M. Mitchell — 19
No Time For. . .
Mixed implicit explicit formulation of reach sets to reduce computationaldimension [Mitchell, HSCC 2011].
Ian M. Mitchell — 20
No Time For. . .
Infusion pump
Controller
Parametric approximations of viability and discriminating kernels with applicationsto verification of automated anesthesia delivery [Maidens et al, Automatica 2013].
Ian M. Mitchell — 21
No Time For. . .
Shared control smart wheelchair for older adultswith cognitive impairments [Viswanathan et al,
Autonomous Robots 2016].
0 2 4 6 8 10 12 14 16 18 20−2
0
2
x1
State Trajectories
0 2 4 6 8 10 12 14 16 18 200
1
2
x2
0 2 4 6 8 10 12 14 16 18 20−1
0
1
x3
0 2 4 6 8 10 12 14 16 18 20−1
0
1
x4
0 2 4 6 8 10 12 14 16 18 20−0.1
0
0.1
x5
0 2 4 6 8 10 12 14 16 18 20−2
0
2
x6
0 2 4 6 8 10 12 14 16 18 20−0.5
0
0.5
u10 2 4 6 8 10 12 14 16 18 20
−0.2
0
0.2
t (seconds)u2
Ensuring safety for human-in-the-loop flightcontrol of low sample rate indoor quadrotors
[Mitchell et al, CDC 2016].
Ian M. Mitchell — 22
Conclusions
Gradient sampling particle filter (GSPF)
• Utilizes natural uncertainty in system state to reduce chattering due tonon-smooth value function and/or numerical approximation.
• Easily implemented on existing planners and state estimation.
• To appear in [Traft & Mitchell, CDC 2016].
Future work
• Nonholonomic dynamics.
• Convergence proof.
• Scalability to more particles.
• Set-valued actions.
• Human-in-the-loop control and interface.
Ian M. Mitchell — 23