1 scalable and transparent parallelization of multiplayer games bogdan simion masc thesis department...
TRANSCRIPT
1
Scalable and transparent parallelization of
multiplayer games
Bogdan SimionMASc thesis
Department of Electrical andComputer Engineering
2
Multiplayer games Captivating,
highly popular
Dynamic
artifacts
3
Multiplayer games
- Long playing times:
- More than 100k concurrent players
4
Multiplayer games
1.World of Warcraft:“I've been playing this Mage for 3 and a half years now, and I've invested too much time, blood, sweat and tears to quit now.”
- Long playing times:
- More than 100k concurrent players
5
Multiplayer games
1.World of Warcraft:“I've been playing this Mage for 3 and a half years now, and I've invested too much time, blood, sweat and tears to quit now.”
2.Halo2: “My longest playing streak was last summer, about 19 hours playing Halo2 on my XBox.”
- Long playing times:
- More than 100k concurrent players
6
Multiplayer games
1.World of Warcraft:“I've been playing this Mage for 3 and a half years now, and I've invested too much time, blood, sweat and tears to quit now.”
2.Halo2: “My longest playing streak was last summer, about 19 hours playing Halo2 on my XBox.”
- Long playing times:
- More than 100k concurrent players
- Game server is the bottleneck
7
Server scaling
Game code parallelization is hard Complex and highly dynamic code Concurrency issues (data races) require
conservative synchronization Deadlocks
8
State-of-the-art
Parallel programming paradigms: Lock-based (pthreads)Transactional memory
Previous parallelizations of Quake Lock-based [Abdelkhalek et. al ‘04] shows
that false sharing is a challenge
9
Transactional Memory vs. Locks
AdvantagesSimpler programming taskTransparently ensures correct execution
Shared data access tracking Detects conflicts and aborts conflicting
transactions
DisadvantagesSoftware (STM) access tracking overheads
10
Transactional Memory vs. Locks
AdvantagesSimpler programming taskTransparently ensures correct execution
Shared data access tracking Detects conflicts and aborts conflicting
transactions
DisadvantagesSoftware (STM) access tracking overheads
Never shown to be practical for real applications
11
Contributions
Case study of parallelization for gamessynthetic version of Quake (SynQuake)
We compare 2 approaches: lock-based and STM parallelizations
We showcase the first realistic application where STM outperforms locks
12
Outline
Application environment: SynQuake gameData structures, server architecture
Parallelization issues False sharingLoad balancing
Experimental results Conclusions
13
Environment: SynQuake game
Simplified version of Quake
Entities: players resources
(apples) walls
Emulated quests
14
SynQuake
Playerscan move
and interact(eat, attack, flee, go to
quest)Apples
Food objects,increase life
WallsImmutable, limit
movement
Contains all the features found in Quake
15
Game map representation
Fast retrieval of game objects
Spatial data structure: areanode tree
16
Areanode tree
Game map Areanode tree
Root node
17
Areanode tree
Game map Areanode tree
A B
18
Areanode tree
A B
Game map Areanode tree
A B
19
A1
A2
B1
B2
Areanode tree
A B
A1 A2 B1 B2
Game map Areanode tree
20
Areanode tree
A1 A2 B1 B2
A B
Game map Areanode tree
A1
A2
B1
B2
21
Server frame
1
2
3
Server frame
Barrier
Barrier
Barrier
Barrier
22
Server frame
Receive & Process Requests
1
2
3
Server frame
Clientrequests
23
Server frame
Receive & Process Requests
1
2
3
Server frame
Admin(singlethread)
Clientrequests
24
Server frame
Clientrequests
Receive & Process Requests
Form &Send Replies
Clientupdates
1
2
3
Server frame
Admin(singlethread)
25
Parallelization in games
Quake - Locks-based synchronization [Abdelkhalek et al. 2004]
26
Parallelization: request processing
Clientrequests
Receive & Process Requests
Form &Send Replies
Clientupdates
1
2
3
Server frame
Admin(singlethread)
Parallelization in this stage
27
Outline
Application environment: SynQuake game Parallelization issues
False sharingLoad balancing
Experimental results Conclusions
28
Parallelization overview
Synchronization problems Synchronization algorithms Load balancing issues Load balancing policies
29
Collision detection
• Player actions: move, shoot etc. • Calculate action bounding box
30
Action bounding box
P1
Short- Range
31
Action bounding box
P1
P2
Short- Range
Long- Range
32
Action bounding box
P1
P2
P3
33
Action bounding box
P1
P2
P3
Overlap P1
Overlap P2
34
Player assignment
P1
P2
P3
T1
T3Players handled by threads
If players P1,P2,P3 are assigned to distinct threads → Synchronization required
Long-range actions have a higher probability to cause conflicts
T2
35
False sharing
36
False sharing
Move range
Sh
oo
t ran
ge
37
False sharing
Action bounding box with locks
Move range
Sh
oo
t ran
ge
38
False sharing
Action bounding box with TM
Action bounding box with locks
Move range
Sh
oo
t ran
ge
39
Parallelization overview
Synchronization problems Synchronization algorithms Load balancing issues Load balancing policies
40
Synchronization algorithm: Locks
Hold locks on parents as little as possible Deadlock-free algorithm
41
Synchronization algorithm: Locks
…
A B
1
2
P1
P2
P4
P3P5
P6
P
A B
A1 A2 B1 B2
Root
P1
P2P5, P6 P4
P3
42
Synchronization algorithm: Locks
…
A B
1
2
P1
P2
P4
P3P5
P6
P
A B
A1 A2 B1 B2
Root
P1
P2P5, P6 P4
P3
Area of interest
43
Synchronization algorithm: Locks
…
A B
1
2
P1
P2
P4
P3P5
P6
P
A B
A1 A2 B1 B2
Root
P1
P2P5, P6 P4
P3
Area of interest
Leaves overlapped
44
Synchronization algorithm: Locks
…
A B
1
2
P1
P2
P4
P3P5
P6
P
A B
A1 A2 B1 B2
Root
P1
P2P5, P6 P4
P3
Area of interest
Leaves overlapped
Lock parentstemporarily
45
Synchronization: Locks vs. STMLocks:
1. Determine overlapping leaves (L)
2. LOCK (L)
3. Process L
4. For each node P in overlapping parents
LOCK(P)
Process P
UNLOCK(P)
5. UNLOCK (L)
STM:
1. BEGIN_TRANSACTION
2. Determine overlapping leaves (L)
3. Process L
4. For each node in P in overlapping parents
Process P
5. COMMIT_TRANSACTION
46
Synchronization: Locks vs. STMLocks:
1. Determine overlapping leaves (L)
2. LOCK (L)
3. Process L
4. For each node P in overlapping parents
LOCK(P)
Process P
UNLOCK(P)
5. UNLOCK (L)
STM:
1. BEGIN_TRANSACTION
2. Determine overlapping leaves (L)
3. Process L
4. For each node in P in overlapping parents
Process P
5. COMMIT_TRANSACTION
STM acquires ownership gradually, reduced false sharingConsistency ensured transparently by the STM
47
Parallelization overview
Synchronization problems Synchronization algorithms Load balancing issues Load balancing policies
48
Load balancing issues
Assign tasks to threads
Balance workload
T1 T2
T3 T4
P2
P1
P3
P4
49
Assign tasks to threads
Cross-border conflicts→ Synchronization
T1 T2
T3 T4
P2
P1
Moveaction
Shootaction P3
P4
Load balancing issues
50
Load balancing goals
Tradeoff:Balance workload among threadsReduce synchronization/true sharing
51
Load balancing policies
a) Round-robin
Y
256
768
1024
512
0 256 768 1024512
X
- Thread 3 - Thread 4- Thread 1 - Thread 2
52
Load balancing policies
a) Round-robin
Y
256
768
1024
512
0 256 768 1024512
X
- Thread 3 - Thread 4- Thread 1 - Thread 2
b) Spread
256
768
1024
512
0 256 768 1024512
Y
53
Load balancing policies
c) Static locality-aware
- Thread 3 - Thread 4- Thread 1 - Thread 2
b) Spread
256
768
1024
512
0 256 768 1024512
X
Y
256
768
1024
512
0 256 768 1024512
X
Y
54
Locality-aware load balancing
Dynamically detect player hotspots and adjust workload assignments
Compromise between load balancing and reducing synchronization
55
Dynamic locality-aware LB
Game map Graph representation
56
Dynamic locality-aware LB
Game map Graph representation
57
Experimental results
Test scenarios Scaling:
with and without physics computation The effect of load balancing on scaling The influence of locality-awareness
58
128
256
384
640
768
896
1024
512
0 128 256 384 640 768 896 1024512
- Quest 1
X
YQuest scenarios
59
Quest scenarios
- Quest 3
128
256
384
640
768
896
1024
512
0 128 256 384 640 768 896 1024512
- Quest 4
X
Y
- Quest 1 - Quest 2
60
Scalability
61
Processing times – without physics
62
Processing times – with physics
63
Load balancing
64
Quest scenarios (4 quadrants)Y
static
dynamic
- Thread 3 - Thread 4- Thread 1 - Thread 2
65
Quest scenarios (4 splits)
static dynamic
- Thread 3 - Thread 4- Thread 1 - Thread 2
66
Quest scenarios (1 quadrant)
static dynamic
- Thread 3 - Thread 4- Thread 1 - Thread 2
67
Locality-aware load balancing (locks)
68
Conclusions
First application where STM outperforms locks:Overall performance of STM is better at 4
threads in all scenariosReduced false sharing through on-the-fly
collision detection
Locality-aware load balancing reduces true sharing but only for STM
69
Thank you !
70
Splitting components (1 center quest)
71
Load balancing (short range actions)
72
Locality-aware load balancing (STM)