kevin forbes optimizing flocking controllers using gradient descent
TRANSCRIPT
Motivation
• Flocking models can animate complex scenes in a cost-effective way• But, they are hard to control – there are many parameters that interact in non-intuitive ways – animators find good values by trial and error
• Can we use machine learning techniques to optimize the parameters instead of setting them by hand?
Background – Flocking modelReynolds (1987): Flocks, Herds, and Schools: A Distributed Behavioral Model
Reynolds (1999): Steering Behaviors For Autonomous Characters
• Each agent can “see” other agents in its neighbourhood
• Motion derived from weighted combination of force vectors
Alignment Cohesion Separation
Background – Learning ModelLawrence (2003): Efficient Gradient Estimation for Motor Control Learning
Policy Search: Finds optimal settings of a system’s control parameter vector, as evaluated by some objective function
Stochastic elements in the system result in noisy gradient estimates, but there are techniques to limit their effects.
Simple 2-parameter example
Axes: values of control parameters
Color: value of objective function
Blue arrows: negative gradient of objective function
Red line: result of gradient descent
Project Steps
1. Define physical agent model
2. Define flocking forces
3. Define objective function
4. Take derivatives of all system element w.r.t all control parameters
5. Do policy search
1. Agent Model
Recursive definition: the base case is the system’s initial condition.If there are no stochastic forces, the system is deterministic (w.r.t. the initial conditions).
The flock’s policy is defined by the alpha vector.
Position, Velocity and Acceleration defined as in Reynolds (1999):
2. ForcesThe simulator includes the following forces:
Flocking Forces:Cohesion*, Separation*, Alignment
Single-Agent Forces:Noise, Drag*
Environmental Forces:Obstacle Avoidance, Goal Seeking*
* Implemented with learnable coefficients (so far)
3. Objective Function
The exact function used depends upon the goals of the particular animation
I used the following objective function for the flock at time t:
The neighbourhood function implied here (and in the force calculations) will come back to haunt us on the next slide. . .
4. Derivatives
In order to estimate the gradient of the objective function, it must be differentiable.
We can build an appropriate N-function by multiplying transformed sigmoids together:
Other derivative-related wrinkles:
• Can not use max/min truncations• Numerical stability issues• Increased memory requirements
5. Policy SearchUse Monte Carlo to estimate the expected value of the gradient:
This assumes that the only random variables are the initial conditions. A less-noisy estimate can be made if the distribution of the stochastic forces in the model are taken into account using importance sampling.
The Simulator
Features:• Forward flocking simulation• Policy learning and mapping• Optional OpenGL visualization• Spatial sorting gives good performance
Limitations:• Wraparound• Not all forces are learnable yet• Buggy neighbourhood function derivative
Experimental Method
Simple Gradient descent:
• Initialize flock, assign a random alpha• Run simulation ( N times)• Step (with annealing) in negative gradient direction• Reset flock
Steps 2-4 are repeated for a certain number of steps
Results - ia
Simple system test:
• 1 agent• 2 forces: seek and drag• Seek target in front of agent; agent initially moving towards target• Simulate 2 minutes• No noise• Take best of 10 descents
Results - ib
Simple system test w/noise:
• Same as before, but set the wander force to strength 1• Used N=10 for the Monte Carlo estimate
Effects of noise:
• Optimal seek force is larger• Both surface and descent path are noisy
Results - ii
More complicated system:
• 2 Agents• Seek and drag set at 10, .1• Learn cohesion, separation• Seek target orbiting agents’ start position• Simulate 2 minutes• Target distance of 5,10• Noise in initial conditions
Results:
• Target distance does influence the optimized values• Search often gets caught in foothills
Results - iii
Higher dimensions:
• 10 agents• Learn cohesion, separation, seek, drag• Otherwise the same as last test
Results:
• Objective function is being optimized (albeit slowly)!• Target distance is matched!