supercomputers and cloud games
DESCRIPTION
On September 19th, 2014, Shinra held its first developer event in Tokyo, titled “Supercomputers and Cloud Games.”TRANSCRIPT
Super computer & cloud gaming
Shinra Technologies, Inc.Senior vice president
Tetsuji Iwasaki
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 1
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 2
About me
Tetsuji Iwasaki
Hobby: Beer
Started working in the industry in 1990, Joined Square-Enix in 1994
Some Famous titlesFFT/FFXI/Crysis
+17 game projects
Currently holding these positions:2011 Square-Enix holdings Technology planning specialist
2012 Development director, Eidos Montreal2014 Shinra Technologies, Inc. SVP(Technology)
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 3
Internet
Streaming Video
Controller Input
Data center
What is cloud gaming?
「Mini Ninjas」© 2009 Eidos Interactive Ltd. Co-published by Eidos, Inc. and Warner Bros. Interactive Entertainment,
a division of Warner Bros. Home Entertainment Inc.
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 4
What is super computer?
There is no clear definition…
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 5
What is the imagine of supercomputer in your mind?
http://jp.fujitsu.com/about/tech/k/ スーパーコンピュータ「京」より転載 2014/9/17閲覧
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 6
Lets see the top 10http://www.top500.org/
1 Tianhe-2 (MilkyWay-2) TH-IVB-FEP Cluster, Intel Xeon E5-2692 12C 2.200GHz, TH Express-2, Intel Xeon Phi 31S1P
NUDT China
2 Titan Cray XK7 , Opteron 6274 16C 2.200GHz, Cray Gemini interconnect, NVIDIA K20x
Cray Inc. United States
3 Sequoia BlueGene/Q, Power BQC 16C 1.60 GHz, Custom IBM United States
4 K computer SPARC64 VIIIfx 2.0GHz, Tofu interconnect Fujitsu Japan
5 Mira BlueGene/Q, Power BQC 16C 1.60GHz, Custom IBM United States
6 Piz Daint Cray XC30, Xeon E5-2670 8C 2.600GHz, Aries interconnect , NVIDIA K20x
Cray Inc. Switzerland
7 Stampede PowerEdge C8220, Xeon E5-2680 8C 2.700GHz, InfinibandFDR, Intel Xeon Phi SE10P
Dell United States
8 JUQUEEN BlueGene/Q, Power BQC 16C 1.600GHz, Custom Interconnect
IBM Germany
9 Vulcan BlueGene/Q, Power BQC 16C 1.600GHz, Custom Interconnect
IBM United States
10Cray XC30, Intel Xeon E5-2697v2 12C 2.7GHz, Aries interconnect Cray Inc. United States
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 7
1 Tianhe-2 (MilkyWay-2) TH-IVB-FEP Cluster, Intel Xeon E5-2692 12C 2.200GHz, TH Express-2, Intel Xeon Phi 31S1P
NUDT China
2 Titan Cray XK7 , Opteron 6274 16C 2.200GHz, Cray Gemini interconnect, NVIDIA K20x
Cray Inc. United States
3 Sequoia BlueGene/Q, Power BQC 16C 1.60 GHz, Custom IBM United States
4 K computer SPARC64 VIIIfx 2.0GHz, Tofu interconnect Fujitsu Japan
5 Mira BlueGene/Q, Power BQC 16C 1.60GHz, Custom IBM United States
6 Piz Daint Cray XC30, Xeon E5-2670 8C 2.600GHz, Aries interconnect , NVIDIA K20x
Cray Inc. Switzerland
7 Stampede PowerEdge C8220, Xeon E5-2680 8C 2.700GHz, InfinibandFDR, Intel Xeon Phi SE10P
Dell United States
8 JUQUEEN BlueGene/Q, Power BQC 16C 1.600GHz, Custom Interconnect
IBM Germany
9 Vulcan BlueGene/Q, Power BQC 16C 1.600GHz, Custom Interconnect
IBM United States
10Cray XC30, Intel Xeon E5-2697v2 12C 2.7GHz, Aries interconnect Cray Inc. United States
Intel® Xeon®
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 8
IBM® Power® BQC
1 Tianhe-2 (MilkyWay-2) TH-IVB-FEP Cluster, Intel Xeon E5-2692 12C 2.200GHz, TH Express-2, Intel Xeon Phi 31S1P
NUDT China
2 Titan Cray XK7 , Opteron 6274 16C 2.200GHz, Cray Gemini interconnect, NVIDIA K20x
Cray Inc. United States
3 Sequoia BlueGene/Q, Power BQC 16C 1.60 GHz, Custom IBM United States
4 K computer SPARC64 VIIIfx 2.0GHz, Tofu interconnect Fujitsu Japan
5 Mira BlueGene/Q, Power BQC 16C 1.60GHz, Custom IBM United States
6 Piz Daint Cray XC30, Xeon E5-2670 8C 2.600GHz, Aries interconnect , NVIDIA K20x
Cray Inc. Switzerland
7 Stampede PowerEdge C8220, Xeon E5-2680 8C 2.700GHz, Infiniband FDR, Intel Xeon Phi SE10P
Dell United States
8 JUQUEEN BlueGene/Q, Power BQC 16C 1.600GHz, Custom Interconnect
IBM Germany
9 Vulcan BlueGene/Q, Power BQC 16C 1.600GHz, Custom Interconnect
IBM United States
10Cray XC30, Intel Xeon E5-2697v2 12C 2.7GHz, Aries interconnect Cray Inc. United States
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 9
Fujitsu® SPARC®64 Villfx
1 Tianhe-2 (MilkyWay-2) TH-IVB-FEP Cluster, Intel Xeon E5-2692 12C 2.200GHz, TH Express-2, Intel Xeon Phi 31S1P
NUDT China
2 Titan Cray XK7 , Opteron 6274 16C 2.200GHz, Cray Gemini interconnect, NVIDIA K20x
Cray Inc. United States
3 Sequoia BlueGene/Q, Power BQC 16C 1.60 GHz, Custom IBM United States
4 K computer SPARC64 VIIIfx 2.0GHz, Tofu interconnect Fujitsu Japan
5 Mira BlueGene/Q, Power BQC 16C 1.60GHz, Custom IBM United States
6 Piz Daint Cray XC30, Xeon E5-2670 8C 2.600GHz, Aries interconnect , NVIDIA K20x
Cray Inc. Switzerland
7 Stampede PowerEdge C8220, Xeon E5-2680 8C 2.700GHz, Infiniband FDR, Intel Xeon Phi SE10P
Dell United States
8 JUQUEEN BlueGene/Q, Power BQC 16C 1.600GHz, Custom Interconnect
IBM Germany
9 Vulcan BlueGene/Q, Power BQC 16C 1.600GHz, Custom Interconnect
IBM United States
10Cray XC30, Intel Xeon E5-2697v2 12C 2.7GHz, Aries interconnect Cray Inc. United States
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 10
NVIDIA® tesla®/Intel® phi
1 Tianhe-2 (MilkyWay-2) TH-IVB-FEP Cluster, Intel Xeon E5-2692 12C 2.200GHz, TH Express-2, Intel Xeon Phi 31S1P
NUDT China
2 Titan Cray XK7 , Opteron 6274 16C 2.200GHz, Cray Gemini interconnect, NVIDIA K20x
Cray Inc. United States
3 Sequoia BlueGene/Q, Power BQC 16C 1.60 GHz, Custom IBM United States
4 K computer SPARC64 VIIIfx 2.0GHz, Tofu interconnect Fujitsu Japan
5 Mira BlueGene/Q, Power BQC 16C 1.60GHz, Custom IBM United States
6 Piz Daint Cray XC30, Xeon E5-2670 8C 2.600GHz, Aries interconnect , NVIDIA K20x
Cray Inc. Switzerland
7 Stampede PowerEdge C8220, Xeon E5-2680 8C 2.700GHz, InfinibandFDR, Intel Xeon Phi SE10P
Dell United States
8 JUQUEEN BlueGene/Q, Power BQC 16C 1.600GHz, Custom Interconnect
IBM Germany
9 Vulcan BlueGene/Q, Power BQC 16C 1.600GHz, Custom Interconnect
IBM United States
10Cray XC30, Intel Xeon E5-2697v2 12C 2.7GHz, Aries interconnect Cray Inc. United States
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 11
The trend
General purpose processor85.4% of TOP500 is using Intel…not sure exactly but probably most of them is Xeon
Amazon EC2 is ranked as 76th
Amazon EC2 C3 Instance cluster Intel Xeon E5-2680v2 10C 2.800GHz, 10G Ethernet
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 12
TESLA GPU ACCELERATORS FOR SERVERS http://www.nvidia.com/object/tesla-servers.html2014-9-17閲覧
Super computer and GPU
NVIDIA® Tesla®
Intel® Xeon Phi™ Coprocessor
インテル® Xeon Phi™ コプロセッサー製品仕様http://www.intel.co.jp/content/www/jp/ja/processors/xeon/xeon-phi-detail.html 2014-9-17閲覧
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 13
The impact of DEGIMA
*Tsuyoshi Hamada, Tetsu Narumi, Rio Yokota, Kenji Yasuoka and Keigo Nitadori. 42 TFlops Hierarchical N-body Simulations on GPUs with Applications in both Astrophysics and Turbulence. SC '09 Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis Article No. 62
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 14
*長崎大学GPUクラスタDEGIMA(DEstination for Gpu Intensive MAchine)の紹介 https://www.cps-jp.org/seminar/fy2010/2010-12-01/hamada/pub/20101201_hamada_02.pdf page5 2014-9-17閲覧
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 15
Be careful, just incase
The value supercomputers can’t tell by just Linpack benchmark performance
Maintenance, usability, purpose of calculations are not considered by Top 500
ranking
But maybe people should mind the cost more…
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 16
1 Tianhe-2 (MilkyWay-2) TH-IVB-FEP Cluster, Intel Xeon E5-2692 12C 2.200GHz, TH Express-2, Intel Xeon Phi 31S1P
TH-IVB-FEP Cluster -> system nameIntel Xeon E5-2692 12C 2.200GHz -> cpu nameTH Express-2 -> interconnection Intel Xeon Phi 31S1P -> accelarator
How to check super computers
K-Computer’s Inter connection “Tofu”6 dimension mesh taurus
スーパーコンピュータの高次元接続技術が「恩賜発明賞」を受賞http://pr.fujitsu.com/jp/news/2014/05/29.html 2014-09-17閲覧
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 17
Questions so far?
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 18
Some parts of Shinra System Technology components
Remote rendering architecture
RDMA/TCP dual protocol inter connection
Distribution models depending on game design
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 19
Remote rendering architecture
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 20
Remote rendering architecture
• Rendering on GPU server
• DirectX11API calls are executed in my laptop
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 21
Game.exe(third-party)
dinput.dll dxgi.dll d3d11.dll
nvwgf2umx.dll
nvlddmkm.sys
Renderer.exe
dxgi.dllws2_32.dll d3d11.dll
nvwgf2umx.dll
nvlddmkm.sys
…
ws2_32.dll
…Fakedxgi.dll
Faked3d11.dll
Network card Network card
Remote rendering archtectureProcess environment
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 22
Remote rendering architecture
Logical unit of game system
Physical unit
• Separate CPU & GPU Servers• Many users per logical unit• Flexible architecture allows
efficient CPU/GPU usage
GPU GPU
GPU GPU
CPU
CPU
CPU GPU GPU
GPU GPUCPU
CPU CPU
CPU CPU
CPU CPU
CPU CPU
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 23
CPU
GPU
CPU/GPU performance mismatch
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 24
y = 1037.3x-0.826
R² = 0.9055
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 20,000 40,000 60,000 80,000 100,000 120,000 140,000 160,000
The relationship between the cost and performanceTwice expensive doesn’t mean double performance.
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 25
Rendering 60 games in a server
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 26
RDMA/TCP Dual protocol inter connection
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 27
Comp01<->GPU01 Effective bandwidth 8.8Gbps loopback([email protected])Effective bandwidth
3.59 Gbps
Unit size RTT(μsec) Unit size RTT(μsec)
4 42,09 4 15,080261
8 41,75 8 14,986181
16 42,18 16 15,00307
32 41,86 32 15,097176
64 42,69 64 15,081717
128 42,91 128 15,106041
256 43,35 256 15,17368
512 44,6 512 15,301775
1024 46,6 1024 15,67151
2048 64,19 2048 24,330402
4096 79,87 4096 30,921734
8192 140,06 8192 45,846207
16384 186,85 16384 79,473488
32768 291,19 32768 129,546127
65536 497,89 65536 227,030136
131072 909,93 131072 435,540619
262144 1800,49 262144 929,645325
524288 3483,36 524288 1904,819336
1048576 6841,73 1048576 4009,06958
The performance of a latest network card(TCP)
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 28
Mellanox Connect X3
http://www.mellanox.com/page/products_dyn?product_family=127 2014-9-17閲覧
-can use RDMA in Ether net environment
-the interconection of Tianhe-2 (MilkyWay-2) using RDMA as well
-can skip most of OS/Driver layer and directly move memory to remote machines
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 29
Game.exe(third-party)
dinput.dll
nvwgf2umx.dll
Renderer.exe
dxgi.dllws2_32.dll d3d11.dll
nvwgf2umx.dll
nvlddmkm.sys
…
ws2_32.dll
…Fakedxgi.dll
Faked3d11.dll
001001010001110101110010011101010
Compression (500µs / Ratio 1:8)Transmission to the Renderer• Using TCP over Gigabit Ethernet (500µs)• Using RDMA over Converged Ethernet (50µs)Decompression (200µs)
Delay ≈ 1.2ms
The interconnection of Shinra system
Video card
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 30
Distribution models depending on game design
Stand alone architecture
SS Architecture
MK Architecture
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 31
Compute Server Rendering Server
Game.exe Rendering.exe
internet internet
Input Video
GPU
GPURendering Commands
Stand alone architecture
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 32
Compute Server Rendering Server
Remote Renderer
internet internet
Input
Rendering CommandsGPU
GPU
Server
Game
Game
Game
Game
SS Architecture
4 x Video Streams
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 33
Compute Server Rendering Server
internet internet
GPU
GPUGame
User
User
User
User
Input
Rendering Commands
Remote Renderer
4 users in a single process…
4 x Video Streams
MK Architecture
11/12/2014 © 2014 Shinra Technologies, Inc. All Rights Reserved. 34
We will make a SDK for these 3 architectures standardized