© 2010 autodesk massive parallelism in ai throughput versus realtime pierre pontevia 10 th march...
TRANSCRIPT
© 2010 Autodesk
Massive Parallelism in AIThroughput versus Realtime
Pierre Pontevia10th March 2010
© 2010 Autodesk
Agenda
Where are we today
The pathfinding challenge : from throughput to realtime
MASAI : the premises of an AI massive parallel solution
© 2010 Autodesk
WHERE ARE WE TODAY?
© 2010 Autodesk
Where are we today?
Parallel programming has becoming a reality for game developers since the arrival of ”next gen” consoles (2005-2006)
Since then, a lot of new languages and programming models have been suggested to better tackle parallelism,
And new hardware is being announced, shaping the future of consoles…
So this is a good moment to see how parallelism could be revisited for the games of tomorrow… with a special focus on pathfinding
© 2010 Autodesk
As a start, the 13 dwarves should help us to find the right parallel pattern
The 13 dwarves is an initiative from Berkeley University to help achieve high parallelism
A dwarf is an algorithmic method that captures a pattern of computation and communication
The 1st exercise is to identify which dwarves match the problems involved in pathfinding
© 2010 Autodesk
As a start, the 13 dwarves should help us to find the right parallel pattern (cont’d)
Dwarf Description
1. Dense Linear Algebra Data are dense matrices or vectors
2. Sparse Linear AlgebraData sets include many zero values. Data is usually stored in compressed matrices to reduce the storage and bandwidth requirements to access all of the nonzero values
3. Spectral Methods Data are in the frequency domain, as opposed to time or spatial domains
4. N-Body Methods Depends on interactions between many discrete points. Variations include particle-particle methods
5. Structured Grids Represented by a regular grid; points on grid are conceptually updated together. It has high spatial locality
6. Unstructured Grids An irregular grid where data locations are selected, usually by underlying characteristics of the application
7. Monte Carlo Calculations depend on statistical results of repeated random trials
© 2010 Autodesk
As a start, the 13 dwarves should help us to find the right parallel pattern (cont’d)
Dwarf Description
8. Combinational Logic Functions that are implemented with logical functions and stored state
9. Graph traversalVisits many nodes in a graph by following successive edges. These applications typically involve many levels of indirection, and a relatively small amount of computation
10. Dynamic ProgrammingComputes a solution by solving simpler overlapping sub problems. Particularly useful in optimization problems with a large set of feasible solutions
11. Backtrack and Branch + Bound
Finds an optimal solution by recursively dividing the feasible region into sub domains, and then pruning sub problems that are suboptimal
12. Construct Graphical Models
Constructs graphs that represent random variables as nodes and conditional dependencies as edges. Examples include Bayesian networks and Hidden Markov Models
13. Finite State Machine A system whose behavior is defined by states, transitions defined by inputs and the current state, and events associated with transitions or states
© 2010 Autodesk
Recent languages and programming models provide guidance for parallel implementation
Data Parallelism for homogenous architectures
• OpenMP• TBB• Ct
Data Parallelism for heterogeneous architectures
• CUDA, • OpenCL, • DirectCompute• SPURS• RapidMind
PC clusters• MPI • Map Reduce
Concurrent Programming• PPL, Asynchronous Agents• Grand Central Station
© 2010 Autodesk
However, there are specific constraints in the video games impacting on parallel design…
Memory Resources Constraints How much scratch memory required by solver
Concurrent Memory access Computations are done on data which can change significantly from frame to
frame
Data lifetime / persistence Things are volatile by nature
Reactivity / Time delay / Frequency constraints When do you really need the result of your computation
Interruptibility The system can change its mind – 80% of the path goals are never reached
© 2010 Autodesk
…and even more constraints when you develop middleware
Multiple cohabitant models Several middleware with several threading models Not blocking is not enough -> fine tuning issues Spurs everywhere?
Multiple HW targets PC is different from Xbox 360 console which is different from a
PlayStation® 3 (PS3) console Multiple exclusive programming languages
© 2010 Autodesk
A gap analysis on existing solutions shows that no one solution fits the video game context perfectly
No model really takes care of memory as a limitating resource in the design of parallel solutions
No model takes into account time as a dimension of the problem
All the approches are very throughput oriented
© 2010 Autodesk
THE PATHFINDING CHALLENGE : FROM THROUGHPUT TO REALTIME
© 2010 Autodesk
Pathfinding in a nutshell
Path PlanningPath
SmoothingDA(*) &
Steering
LOW FREQUENCY (0,1 Hz)• Input :
- Topology- current position- destination
• Output : - Valid Path
MEDIUM FREQUENCY (2 Hz)Input :
- current position- destination
• Output : - Target point
HIGH FREQUENCY (10 Hz)• Input :
- current position- Target point
• Output : - New Target point
(*): DA - Dynamic Avoidance
A
B
© 2010 Autodesk
Pathfinding is made of different solvers with different characteristics
3 categories of solvers: A*, Graph Traversal : low frequency/large input-work memory
Trajectory Smoothing : medium frequency/optional
DA / Steering : high frequency/critical
Frequency
Wor
k M
emor
y re
quire
men
ts
• A*• Graph Traversals
• Smoothing• DA• Steering
1030.2
> 500 K
< 5 K
© 2010 Autodesk
There are 2 natures of data parallelism in pathfinding
Number of characters: all solver jobs increase linearly with the number of characters
Size of graph : Graph Traversal related solvers can use a Dwarf 9 pattern solving approach
© 2010 Autodesk
A first approach could be a single frame batch paradigm (throughput) compatible with most programming models
Pathfinder – Entity 1
Path RequestQueue Target Request
Queue
DA RequestQueue
Steering RequestQueue
ComputeKernel
ComputeKernel
ComputeKernel
ComputeKernel
ComputeKernel
ComputeKernel
ComputeKernel
ComputeKernel
ComputeKernel
ComputeKernel
ComputeKernel
ComputeKernel
ComputeKernel
ComputeKernel
ComputeKernel
ComputeKernel
ComputeKernel
SearchPathTask
SelectTargetTask
Compute DA
Task
ComputeSteering
Task
PPM (Parallel Programming Model)
Mid
dle
Wa
reQ
ue
ue
PP
MQ
ue
ue
Fra
me
wo
rk
© 2010 Autodesk
Each task request has a context composed of character data, global data, and potentially customized objects
Searching Path
Start & Destination
Movement Model
Constraint
LPF(*) Shortcut
Pathdata
Potentially all PathObjects
Path
Selecting Target
Current Pos
Current target
Path
Movement Model
Constraint
LPF(*) Shortcut
PathObjects of the path
Pathdata
Target Pos
Computing DA Target
Current Pos
Current Target
Movement Model
Cluster of entities
Pathdata
DA Target Pos
Steering
Current Pos
Current DA Target
Movement Model
Current PathObject
LPF Shortcut
Wanted Speed & Yaw
Character ContextGlobal DataCustomizable
Output
(*): LPF – Obstacle Avoidance
© 2010 Autodesk
ComputePath
ComputeTargetPoint
ComputeDA TgtPoint
ComputeSteering
However, as the number of solvers can be limited by memory…
Thread 1
Thread 2
© 2010 Autodesk
…throughput maximization approach in parallelization can be capped by Amdahl’ law
Thread 1
Thread 1
Thread 2
Thread 1
Thread 2
Parallel - No memory limitation
Parallel - Memory constrained environment
Serial - No memory limitation
© 2010 Autodesk
To avoid that, the Pathfinding solution needs to find more task parallelism on time dimension
Moving from
“How to solve all the work within a frame”
To
“How to distribute work across several frames”
© 2010 Autodesk
A good illustration is describing Pathfinding as a statechart with 4 orthogonal states
StoppedPath Not Found Has Arrived
Active
Target Selection
No Target
SelectingTarget
Target Found
Path Updated
Target Found
Has arrived
DA Target
No DA Target
ComputingDA Target
DA Target Computed
Target Updated
DA Target Found
Has arrived
Steering
No Steering
Computing Steering
SteeringComputed
DA Target Updated
Steering Computed
Has arrived
Path Planning
No Path
SearchingPath
PathFound
New Destination
Path Found
Has arrived
New
Des
tinatio
n
New
Po
sP
ath U
pd
ated
New
Po
sTarg
et Up
dated
New
Po
sD
A Targ
et Up
dated
New Destination Pos updated
© 2010 Autodesk
It is still compatible with the precedent approach, but multiframe (no more capped by Amdahl’s law)
Path RequestQueue Target Request
Queue
DA RequestQueue
Steering RequestQueue
SearchPathTask
SelectTargetTask
Compute DA
Task
ComputeSteering
Task
Mid
dle
Wa
reQ
ue
ue
Fra
me
wo
rk
Active
Target Selection
No Target
SelectingTarget
Target Found
Path Updated
Target Found
Has
arriv
ed
DA Target
No DA Target
ComputingDA Target
DA Target Computed
Target Updated
DA Target Found
Has
arriv
ed
Steering
No Steering
Computing Steering
SteeringComputed
DA Target Updated
Steering ComputedH
as a
rrive
d
Path Planning
No Path
SearchingPath
PathFound
New Destination
Path Found
Has
arriv
ed
New
De
stin
ation
New
Po
sP
ath U
pd
ated
New
Po
sTa
rget U
pd
ate
d
New
Po
sD
A Ta
rge
t Up
date
d
Active
Target Selection
No Target
SelectingTarget
Target Found
Path Updated
Target Found
Has
arriv
ed
DA Target
No DA Target
ComputingDA Target
DA Target Computed
Target Updated
DA Target Found
Has
arriv
ed
Steering
No Steering
Computing Steering
SteeringComputed
DA Target Updated
Steering ComputedH
as a
rrive
d
Path Planning
No Path
SearchingPath
PathFound
New Destination
Path Found
Has
arriv
ed
New
De
stin
ation
New
Po
sP
ath U
pd
ated
New
Po
sTa
rget U
pd
ate
d
New
Po
sD
A Ta
rge
t Up
date
d
Active
Target Selection
No Target
SelectingTarget
Target Found
Path Updated
Target Found
Has
arriv
ed
DA Target
No DA Target
ComputingDA Target
DA Target Computed
Target Updated
DA Target Found
Has
arriv
ed
Steering
No Steering
Computing Steering
SteeringComputed
DA Target Updated
Steering Computed
Has
arriv
ed
Path Planning
No Path
SearchingPath
PathFound
New Destination
Path Found
Has
arriv
ed
New
De
stin
ation
New
Po
sP
ath U
pd
ated
New
Po
sTa
rget U
pd
ate
d
New
Po
sD
A Ta
rge
t Up
date
d
© 2010 Autodesk
But now we have 3 new problems
Problem 1 : How to guarantee that high frequency steering solvers return value on time?
Problem 2 : How to deal with multiframe volatility and dynamicity of data?
Problem 3 : What computation triggering logic do we want?
© 2010 Autodesk
Problem 1 is a scheduling problem for realtime systems
Problem 1 can be reworded as follows:“How to guarantee a deadline for each pathfinding solver request
compatible with the frequency of the solver”
This is very close the definition of a realtime software as found on Wikipedia:
“In computer science, real-time computing (RTC), or "reactive computing", is the study of hardware and software systems that are subject to a "real-time constraint"—i.e., operational deadlines from
event to system response”
The good news is that there is a good literature on realtime scheduling!
© 2010 Autodesk
To answer problem 1 we restate pathfinding solvers in a realtime formalism…
Realtime formalism: a task x is defined by 4 parameters X.s : starting time X.d : deadline X.e : execution requirement X.p : execution period
Adapting to pathfinding solvers: Need to assume all tasks are periodic:
Easy for smoothing, steering or DA solvers More tricky for A* and other Graph traversals solvers
Need to have an estimate of each core solver job duration: Again quite simple for smoothing, steering or DA solvers Much less easy for A* and other Graph traversals solvers -> need to decompose graph
traversal tasks into subtasks of constant duration
© 2010 Autodesk
…and select a scheduling algorithm
P-fairness scheduling scheme (S.K. Baruah, N.K. Cohen, C.G. Plaxton, D.A. Varvel): Defines a notion of proportionate progress called P-fairness Uses it to define an efficient algorithm solving the periodic scheduling problem
Cache-aware P-fair based scheduling scheme (J.H. Anderson, J.M. Calendrino, U.M. Devi) Extends P-fairness approach to avoid scheduling of co-existent threads that
would worsen performance of shared caches
Task-grouping P-fair based scheduling scheme (J.H. Anderson, J.M. Calendrino) Extends P-fairness approach to encourage grouping of tasks that share
common working set
© 2010 Autodesk
Answering problem 2 (volatile data) requires a better description of memory models
Programming models differ in the way they manage memory space
Homogenous models: unified memory Heterogeneous models: Host / Device space
Today only homogenous models offer a transparent memory management
For heterogeneous models, the developer still has to do a lot of work
© 2010 Autodesk
Programming models differ in the way they manage memory space
Framework
RequestQueue
ComputeKernel
ComputeKernel
ComputeKernel
ComputeKernel
Task
OpenCL Queue
Host Memory Space
Device Memory Space
© 2010 Autodesk
There is a need for locking mechanism between the framework and the kernel
FrameworkRequest
TaskRequest
KernelRequest
KernelExecution
TaskUpdate
FrameworkUpdate
InsertingData
OK OKLOCK
(if Kernel uses data)
OK OK OK
Data Ready OK OK OK OK OK OK
Data Locked OK OK OKLOCK
(if Kernel accesses host
memory)
OK OK
RemovingData
OK OKLOCK
(if Kerneluses data)
OK OK OK
© 2010 Autodesk
It requires also a better description of user data
There are 3 types of user data:
Read Only Memory (e.g. navmesh in a static world) Needs to be aware of when user data is available and when it is garbage
Read / Write Memory (e.g.. navmesh in a dynamic world) Same as Read Only approach, with extension to secure data modification
stages
Work Memory (e.g. open & closed sets for a A* solver) Located where the solver is really called
© 2010 Autodesk
Data Lifecycle States
Data Life cycle States are introduced to handle R/O and R/W data volatility and dynamicity
Data Ready
Notifying Data To be Inserted
Data in Insertion
Data in Removal
Notifying Data Removed
LOAD Notification
Ready for insertion
Data Inserted
Data Removed
UNLOAD Notification
Data Locked
On Dependency Insertion / RemovalDependency Inserted / Removed
End
CRITICAL when data are not owned by middleware
© 2010 Autodesk
Problem 3 (triggering logic) requires choosing between Pull or Push Triggering mechanism
To limit computations over time, it is important to decide whether we want a pull or push triggering model In a push model, the system polls over all the characters to get new steering
policy In a pull model, the system gets update requirements from the game engine
and only performs computations on related characters
The pull model better controls the amount of computations – not really compatible with a Realtime approach
The push model offers the capabilities of optimizing from a Cache and Task Grouping point of view
© 2010 Autodesk
MASAI : THE PREMISES OF AN AI MASSIVE PARALLEL SOLUTION
© 2010 Autodesk
Guidelines for a new parallel programming model for realtime AI
• Extends to the full AI the rational described in previous slides
• Data / Message Flow based system• Realtime P-fair Scheduling algorithm• Compatible with heterogeneous programming models• Push Triggering Mechanism
© 2010 Autodesk
Introducing the concept of Working Unit
A WU receives requests to process A WU communicates with another WU ONLY through strongly typed requests Requests are explicitly exposed in the WU interface A request can be synchronous or asynchronous (2 different implementations of the
request) A WU is responsible for the serialization Host<->Device of its context
Working Unit
Host Code
Device Code
Owner / Children
Event Handler
Incoming Requests Queues
Context
ContextSerializer
RequestsInterface
ContextAccessors
© 2010 Autodesk
The system works on a mixture of events and requests
Entity 1 Entity 2 Entity …
Brain1 Brain 2 Brain …
PF 1 PF 2 PF …
Entity Update WU
Entity UpdateQueue
Brain Update WU
Brain UpdateQueue
Pathfinding WU
Pathfinding Update Queue
Pathdata Mgr
CanGo WU
CanGoQueue
World Update WU
World UpdateQueue
RequestEvent
Game Engine
World1 World…
Geometry Mgr
IsVisible WU
IsVisibleQueue
© 2010 Autodesk
The underlying architecture would rely on a event broadcaster and communicating components
Global Events Broadcaster
Local Events Broadcaster
SearchPath CC
SelectTargetCC
ComputeDACC
SteeringCC
Local Events Broadcaster
SearchPath CC
SelectTargetCC
ComputeDACC
SteeringCC
Communicating Component = Working Unit for parallelism
© 2010 Autodesk
Open challenges
Customized Objects vs. Data / Services model
Interruptability
Multi-platform
Scheduling algorithm performance
And many more…
© 2010 Autodesk
Multiplatform
Too many programming languages! C++ C for OpenCL C for CUDA C99 for Spurs HLSL 5 for DirectX …
Which standards will emerge?
Which standards will be chosen in future consoles?
© 2010 Autodesk
GAME DEVELOPER ZONEwww.the-area.com/gamedev
© 2010 Autodesk