pictor - sprite.utsa.edu
Post on 03-Feb-2022
3 Views
Preview:
TRANSCRIPT
PictorA BenchmarkingFramework(ResearchFlatform)
for CloudGaming
Tianyi Liu*SenHe*Sunzhou Huang*DannyTsang*Lingjia Tang
JasonMarsWeiWang*
0
The University of Texas at San Antonio*University of Michigan, Ann Arbor
Modern-Day Computer Games
• If we would like to play video games, what should we do first?
2
Download & Install
Cloud3DSystemOverview
2D/3D Libs
𝑰𝒎𝒂𝒈𝒆𝒔
Client proxyServerproxy
𝐼𝑛𝑝𝑢𝑡𝑠
Internet
Server Client
𝑰𝒎𝒂𝒈𝒆𝐼𝑛𝑝𝑢𝑡
𝐶𝑎𝑙𝑙𝑠
𝐶𝑚𝑑𝑠
8
3D applications
9
• Problem: 3D applications are becoming a major type of workload for cloud,However, there is no standard benchmarking infrastructure or platform in
this area to support public research.
• Goal: build a cloud 3D benchmarking platform.
Motivation&Goal
Outline
1.Background:Cloud3DSystemBasics2.ThePictorFramework
IntelligentClient
10
3. Evaluation4.Cloud Security
RemoteDesktopBenchmarkSuite
11
RemoteDesktop&BenchmarkSuite• SystemSoftwarePlatform• TurboVNC andVirtualGL 2.6
• Cloud3DBenchmarkSuite• Four3Dapplications• TwoVirtualReality(VR)applications• Allapplicationsbelongtodifferentgamegenres
Application Area BenchmarkGame: Racing SuperTuxKart (STK)Game: Real-time Strategy 0A.D. (0AD)Game: First-person Shoot Red Eclipse (RE)Game: Online Battle Arena DoTA2 (D2)VR: Education/Game InMind VR (IM)VR: Health IMHOTEP (ITP)
Cloud3DSystemOverview
2D/3D Libs
𝑰𝒎𝒂𝒈𝒆𝒔
Client proxyServerproxy
𝐼𝑛𝑝𝑢𝑡𝑠
Internet
Server Client
𝑰𝒎𝒂𝒈𝒆𝐼𝑛𝑝𝑢𝑡
𝐶𝑎𝑙𝑙𝑠
𝐶𝑚𝑑𝑠
12
3D applications
IntelligentClientCV
Intelligent Client
𝐼𝑛𝑝𝑢𝑡𝑠
RNN
𝑨𝒄𝒕𝒊𝒐𝒏𝒔
𝐼𝑛𝑝𝑢𝑡𝑠
Human Client
Human Player
Intelligent Client
13
VS
Interactiveapps
2D/3D Libs
𝑰𝒎𝒂𝒈𝒆𝒔
CV
RNNClient ProxyIntelligent Client
ServerProxy
𝐼𝑛𝑝𝑢𝑡𝑠
𝑨𝒄𝒕𝒊𝒐𝒏𝒔Internet
Server Client
𝑰𝒎𝒂𝒈𝒆𝐼𝑛𝑝𝑢𝑡
𝐶𝑎𝑙𝑙𝑠
𝐶𝑚𝑑𝑠
Overview:Cloud3DSystemwithIntelligentClient
14
Outline
1.Background:Cloud3DSystemBasics2.ThePictorFramework
IntelligentClient
15
3. Evaluation4.Cloud Security
RemoteDesktopBenchmarkSuite
16
Evaluation:IntelligentClient• Accuracy
• Intelligentclienthashighaccuracyandcanmimichumanplayerfaithfully.• TheaverageerrorofRTTdistributionis1.6%,andDota2(D2)hasmaxerrorof3.2%
Fig3.1 Round-Trip Time distribution(RTT) of Intelligent client(IC) and human players(H) got from 6 benchmarks.
17
Evaluation:IntelligentClient• Performance• CVtimeisabout80ms,andInputgenerationtimeisabout2ms.• Intelligentclientcangenerate800actionsperminute,whichisfastenoughtosimulateprofessionalgameplayers (300actionsperminuteatmaximum).
Fig3.2 Computer vision time (left side) and input generation time(right side).
18
PictorinourLAB:• YouTubeVideos:
• https://www.youtube.com/watch?v=4VG0KgFgc_c• https://www.youtube.com/watch?v=-BnYlKonxJI• https://www.youtube.com/watch?v=mgz5tWt2_rc
Outline
1.Background:Cloud3DSystemBasics2.ThePictorFramework
IntelligentClient
19
3. Evaluation4.Cloud Security
RemoteDesktopBenchmarkSuite
Side Channel Attack Example
• Spectre• https://spectreattack.com/spectre.pdf• https://meltdownattack.com• https://www.youtube.com/watch?v=Phmt8UrofDY
21
Take-HomeMessages:
• Cloud Gaming is a promising future gaming format.• You can play game anytime and anywhere without installation.• Cloud Gaming works well with Cellphone, tablet, laptop, and desktop. • Our Cloud Gaming research platform can make computers play games
by themselves.• Cloud Gaming also has some security issues, but the attack is difficult
to realize in practice.
22
PictorA BenchmarkingFramework(ResearchFlatform)
for CloudGaming
Tianyi Liu*SenHe* Sunzhou Huang* DannyTsang* Lingjia Tang
JasonMarsWeiWang*
23
The University of Texas at San Antonio*University of Michigan, Ann Arbor
This presentation and recording belong to the authors. Contact: tianyi.liu@utsa.edu ; wei.wang@utsa.edu
25
Evaluation:PictorFramework
-1.00%
0.00%
1.00%
2.00%
3.00%
4.00%
5.00%
6.00%
STK 0AD RE D2 IM ITP AVG
< 6%
2.73%
• Overhead• TheFPSreductionis2.73%onaverage,and5.5%atmaximum.• Pictorframeworkhaslowoverhead.
Fig3.3 FPS reduction with Pictor framework.
FPS
Redu
ctio
n
26
• HardwareEfficiency:• OneGPUcanserve2to3gameswhilemaintaining25+FPS.
ResourceEfficiency
• PowerEfficiency:• Powerconsumptionpergamealsodecreasesignificantly.
Fig3.5 Average server and client FPS when running 1-4 instances of same game on one server.
Fig3.6 Per-instance power usage when running 1-4 instances of same game on one server.
27
Performance:CPUCycleBreakdown• Running1– 4instancesofsamegame,figureshows,• IM has problems on instruction fetching, comparing to other applications, since IM has
largest portion on front end.• Serious contention is observed on back end, especially for RE.
Fig3.4 CPU cycle breakdown into back end, front end, bad speculation, and retiring.
28
Performance:RTTBreakdown• Round-TripTime(RTT)Breakdown• SendFrametimeandSendInputtimearethetimespentoninputsendingandframetransmittingonthenetwork.• Whenrunning1– 4instancesofsamegame,ServertimedominatestheRTT,indicatingcloudsystem/hardwaredesigniscrucialtocloud3Dperformance.
Fig3.7 Break RTT into 3 parts: server time, send frame time, and send input time.
29
Performance:ServerTimeBreakdown• ServerTimeBreakdown
• Whenrunning1– 4instancesofsamegame,applicationtimeismosttimeconsuming.
Fig3.8 Server time breakdown when running 1-4 instances of same game on one server.
30
Performance:AppTimeBreakdown
Pictor Framework is effective for identifying system bottleneck
• ApplicationTimeBreakdown• Whenrunning1– 4instancesofsamegame,frame copy needs to be optimized.
Fig3.9 Application time breakdown when running 1-4 instances of same game on one server.
Caching
31
Observation1: Time measurement of readPixel() returns different results, when we use CPU clock and GPU clock. The difference is about 6~9ms. • XGetWindowAttributes() is called in readPixel() and it is extremely slow and
consumes 6~9ms.• XGetWindowAttributes() only returns window size.
Key idea: only call XGetWindowAttributes() once and save the result for future use.
Insight: Caching makes Pixel Data Copy Stage shorter.
1 2 3 4 6 8 9N N N N N-1 N-1 N-1 N-1
N5
7
Render1 2 3 4 6 8 9N N N N N-1 N-1 N-1 N-1
N5
7
RenderBenefit
Input Processing Frame Copy and Sending
Parallelization
32
VIDEOMemDRAM
PBO
FB
CPUCPUCPUCPU
CPUCPUCPUGPU
Frame
Observation2:Pixel copy has two sequential steps: FrameBuffer->PixelBuffer->DRAM.
DRAM
PBO2
FB
CPUCPUCPUCPU
CPUCPUCPUGPU
PBO1
Key idea: parallel the sequential steps.
33
EffectivenessofTwoOptimizations• Optimizations improve server FPS by 57.7% on average and 115.23% at
maximum.• The client FPS was improved by 7.4% on average.• The RTT was reduced by 8.5% on average.
Fig3.10 Improved FPS/RTT with our optimizations.
• Motivation: Cloud 3D applications are becoming a major type of workload for cloud. • Main reason: no standard benchmarking infrastructures for this new area,
including benchmark suite and performance evaluation tools.
35
• Challenges: 3D Benchmarking Framework need to 1. generate human-like input to interact with random objects in 3D
applications. 2. track input processing and accurately measure performance of each
component in cloud 3D system.3. be general and extendable, which is required for the fast-evolution 3D
applications.• Goal: An effective benchmarking framework for cloud 3D that can
address challenges above.
Motivation&Goal
36
Evaluation:BenchmarkDiversity
The benchmark suite covers a variety of 3D applications with diversebehaviors and resource utilizations
Application Area BenchmarkGame: Racing SuperTuxKart (STK)Game: Real-time Strategy 0A.D. (0AD)Game: First-person Shoot Red Eclipse (RE)Game: Online Battle Arena DoTA2 (D2)VR: Education/Game InMind VR (IM)VR: Health IMHOTEP (ITP)
à GoodDiversity.
Cloud3DSystemPipeline(1)
37
Render and send Frame N
1 2 3 4 6 8 9N N N N
5 7
Input Processing Render Frame Sending
2D/3D Libs
𝑰𝒎𝒂𝒈𝒆𝒔
Client Proxy
Server Proxy
𝐼𝑛𝑝𝑢𝑡𝑠
Internet
Server Client𝑰𝒎𝒂𝒈𝒆𝐼𝑛𝑝𝑢𝑡
𝐶𝑎𝑙𝑙𝑠
𝐶𝑚𝑑𝑠
123
4
5
6
7 8 9
N N N N N
Frame CopyCPU StageGPU Stage
1 2 3 4 6 8 9N+1 N+1 N+1 N+1
5 7N+1 N+1 N+1 N+1 N+1
Hook 1 Hook 2 Hook 3 Hook 4 Hook 5 Hook 6 Hook 7 Hook 8 Hook 9 Hook 10
Render and send Frame N+1
ID RTT
ID VALID T_hook1 T_hook2 T_hook3 T_hook4 T_hook5 T_hook6 T_hook7 T_hook8 T_hook9 T_hook10
FPS
38
Render Frame N
1 2 3 4 6 8 9Send Frame N-1N N N N N-1 N-1 N-1 N-1
N5
7
Input Processing
Render
Render Frame N+2
1 2 3 4 6 8 9Send Frame N+1N+2 N+2 N+2 N+2 N+1 N+1 N+1 N+1
N+25
7
Render Frame N+11 2 3 4 6 8 9
Send Frame N N+1 N+1 N+1 N+1 N N N NN+15
7
Render
Render
Cloud3DSystemPipeline(2)
Frame SendingFrame CopyGPU Stage
39
ExperimentSetup• SystemSoftwarePlatform• TurboVNC andVirtualGL 2.6
• Cloud3DBenchmarkSuite• Four3Dapplications• TwoVRapplications• Allapplicationsbelongtodifferentgamegenres.
• Hardware• Server:Inteli7-7820x,16GB• NVIDIAGTX1080TiGPU,11GB• Client:Inteli5-7400,8GB
Application Area BenchmarkGame: Racing SuperTuxKart (STK)Game: Real-time Strategy 0A.D. (0AD)Game: First-person Shoot Red Eclipse (RE)Game: Online Battle Arena DoTA2 (D2)VR: Education/Game InMind VR (IM)VR: Health IMHOTEP (ITP)
top related