direct3d 11 vs 12 - diva portal1055786/fulltext01.pdf · pass their output as input for the next...

29
Thesis no: BCS-2016-13

Upload: others

Post on 24-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Direct3D 11 vs 12 - DiVA portal1055786/FULLTEXT01.pdf · pass their output as input for the next stage in the pipeline; for example, the Input-Assembler reads geometric data from

Thesis no: BCS-2016-13

Direct3D 11 vs 12

A Performance Comparison Using Basic Geometry

Mikael Olofsson

Faculty of Computing

Blekinge Institute of Technology

SE�371 79 Karlskrona, Sweden

Page 2: Direct3D 11 vs 12 - DiVA portal1055786/FULLTEXT01.pdf · pass their output as input for the next stage in the pipeline; for example, the Input-Assembler reads geometric data from

This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology

in partial ful�llment of the requirements for the degree of Bachelor of Science in Computer

Science. The thesis is equivalent to 10 weeks of full-time studies.

Contact Information:

Author:Mikael OlofssonE-mail: [email protected]

University advisor:Stefan PeterssonDept. of Creative Technologies

Faculty of Computing Internet : www.bth.seBlekinge Institute of Technology Phone : +46 455 38 50 00SE�371 79 Karlskrona, Sweden Fax : +46 455 38 50 57

Page 3: Direct3D 11 vs 12 - DiVA portal1055786/FULLTEXT01.pdf · pass their output as input for the next stage in the pipeline; for example, the Input-Assembler reads geometric data from

Abstract

Context. Computer rendered imagery such as computer games is a�eld with steady development. To render games an application pro-gramming interface (API) is used to communicate with a graphicalprocessing unit (GPU). Both the interfaces and processing units are apart of the steady development in order to be able to push the limitsof graphical rendering.

Objectives. This thesis investigates if the Direct3D 12 API pro-vides higher rendering performance when compared to its predecessorDirect3D 11.

Methods. The method used is an experiment, in which a bench-mark rendering basic shaded geometry using both of the APIs whilemeasuring their performance was developed. The focus was aimed attesting API interaction and comparing Direct3D 11 against Direct3D12.

Results. Statistics gained from the benchmark suggest that in thisexperiment Direct3D 11 o�ered the best rendering performance in themajority of the cases tested, although Direct3D 12 had speci�c sce-narios where it performed better.

Conclusions. As a conclusion the benchmark gave contradicting re-sults when compared to other studies. This could be dependent on theimplementation, software or hardware used. In the tests Direct3D 12was closer to its Direct3D 11 counterpart when more cores were used.A platform with more processing cores available to execute in parallelcould reveal if Direct3D 12 could o�er better performance in that ex-perimental setting. In this study Direct3D 12 was implemented as toimitate Direct3D 11. If the implementation was further aligned withDirect3D 12 recommendations other results might be observed. Fur-ther study could be conducted to give a better evaluation of renderingperformance.

Keywords: DirectX, Direct3D, rendering, performance, geometry

i

Page 4: Direct3D 11 vs 12 - DiVA portal1055786/FULLTEXT01.pdf · pass their output as input for the next stage in the pipeline; for example, the Input-Assembler reads geometric data from

Contents

Abstract i

1 Introduction 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Programming Using Direct3D 3

2.1 Rendering Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . 32.1.1 Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1.2 Shader Stages . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Immediate and Deferred Rendering . . . . . . . . . . . . . . . . . 6

3 Method 7

3.1 Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.1.1 Application Structure . . . . . . . . . . . . . . . . . . . . . 83.1.2 Test De�nitions . . . . . . . . . . . . . . . . . . . . . . . . 9

4 Performance Tests 10

4.1 Test Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.2 Test 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124.3 Test 1b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.4 Test 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.5 Test 2b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.6 Test 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.7 Test 3b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.8 Test 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.9 Test 4b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5 Analysis and Discussion 20

5.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

6 Conclusions and Future Work 22

ii

Page 5: Direct3D 11 vs 12 - DiVA portal1055786/FULLTEXT01.pdf · pass their output as input for the next stage in the pipeline; for example, the Input-Assembler reads geometric data from

References 23

iii

Page 6: Direct3D 11 vs 12 - DiVA portal1055786/FULLTEXT01.pdf · pass their output as input for the next stage in the pipeline; for example, the Input-Assembler reads geometric data from

Chapter 1

Introduction

This chapter introduces the thesis and a reason for the work done. A backgroundto why the thesis topic was chosen and the research question that was derivedfrom it can be found in this chapter.

1.1 Background

"The Graphics Processing Unit (GPU) have become important in providing pro-cessing power for high performance computing applications" [1]. One exampleof an application that uses this power to its bene�t is a computer game. Oneof the challenges within this �eld is to provide users with a pleasing graphicalexperience. To establish this the GPU is used to render geometry and present itto the user. For this to be done an API (application programming interface) thatcommunicates with the GPU can be used.

This thesis serves as a general benchmark comparison between two APIs. Withevolving software it can be of importance to evaluate performance. The knowl-edge gained from evaluation could be used to decide whether to use a newer APIor keep using the already existing choice.

Direct3D 12 is the newest version of Direct3D. It is designed to be faster andmore e�cient than any previous version. In the world of PC gaming, the mainprogram thread often does the most and sometimes all of the work. Direct3D 12aims to make more e�cient use of multi-core CPUs. One important factor whenmaking the choice about which API to choose as a developer is that the majorityof PC gaming hardware available already support Direct3D 12. This means thatmany users will be able to play games developed with Direct3D 12 without theneed for additional hardware [2].

Based on previous work done within this area, Direct3D 12 should show a signif-icant improvement over its predecessor. In an article published on the DirectXDeveloper Blog the benchmark 3DMark got results which show improvement inCPU utilization and better distribution of work among threads [3].

1

Page 7: Direct3D 11 vs 12 - DiVA portal1055786/FULLTEXT01.pdf · pass their output as input for the next stage in the pipeline; for example, the Input-Assembler reads geometric data from

Chapter 1. Introduction 2

1.2 Research Question

The goal within this thesis is to measure and compare performance between twoAPIs. This goal originates from the question:

Will standard usage of Direct3D 12 have higher rendering performance when com-

pared to an equivalent Direct3D 11 implementation?

In order to evaluate the question an experiment was conducted. Since the re-search question revolves around "standard usage" this needs to be de�ned. Forthe purpose of this thesis this will be de�ned as rendering �at shaded geometryusing Direct3D according to MSDN documentation recommendations and theaim to keep the implementation of the Direct3D 12 API as close as possible toits Direct3D 11 counterpart. The rendering performance will be measured by thetime required to render the geometry.

To conduct the experiment a benchmark was developed using the C++ pro-gramming language. From this benchmark data, that can be used to evaluateperformance, were generated.

Page 8: Direct3D 11 vs 12 - DiVA portal1055786/FULLTEXT01.pdf · pass their output as input for the next stage in the pipeline; for example, the Input-Assembler reads geometric data from

Chapter 2

Programming Using Direct3D

In this chapter the basics of Direct3D is presented. This is intended to introducethe basic principles of the rendering pipeline and which parts that were used bythe benchmark in the experiment and, the two types of rendering, immediaterendering and deferred rendering.

2.1 Rendering Pipeline

The rendering pipeline refers to all the stages necessary to generate a 2D imagegiven a geometric description of a scene with a positioned and oriented camera[4]. Figure 2.1 shows the stages available in this pipeline and the GPU memoryresources on the right side. An arrow from the memory resource pool to a stageindicates that the stage can access the resources as input. The Pixel Shader stageand Output-Merger state has bidirectional arrows indicating that they both canread and write to the GPU resources. As seen in the �gure most of the stagespass their output as input for the next stage in the pipeline; for example, theInput-Assembler reads geometric data from the resources and pass it to the Ver-tex Shader. For the purpose of this study only a few vital parts are used to renderbasic shaded geometry, these are shown in Figure 2.2.

3

Page 9: Direct3D 11 vs 12 - DiVA portal1055786/FULLTEXT01.pdf · pass their output as input for the next stage in the pipeline; for example, the Input-Assembler reads geometric data from

Chapter 2. Programming Using Direct3D 4

Input-AssemblerStage

Vertex ShaderStage

Hull ShaderStage

TessellatorStage

Domain ShaderStage

Geometry ShaderStage

Pixel ShaderStage

RasterizerStage

Output-MergerStage

Stream OutputStage

Memory Resources(Buffer, Texture,Constant Buffer)

Figure 2.1: Stages of the rendering pipeline [4]

Page 10: Direct3D 11 vs 12 - DiVA portal1055786/FULLTEXT01.pdf · pass their output as input for the next stage in the pipeline; for example, the Input-Assembler reads geometric data from

Chapter 2. Programming Using Direct3D 5

Input-AssemblerStage

Vertex ShaderStage

Pixel ShaderStage

RasterizerStage

Output-MergerStage

Memory Resources(Buffer, Texture,Constant Buffer)

Figure 2.2: Simpli�ed pipeline used in the benchmark application

2.1.1 Primitives

Triangles, lines and points are three basic primitives that can be used to rendergeometry [5]. These primitives have in common that they can be de�ned from anumber of vertices. Frank D. Luna states that "Mathematically, the vertices of atriangle are where two edges meet; the vertices of a line are the endpoints; for asingle point, the point itself is the vertex." [5]. These primitives are the buildingblocks of 3D programming. The most common primitive in games is the triangle,which many objects are built from. Thus every improvement that can be madeto the pipeline that allows for more e�cient rendering of triangles is valuable forperformance in programs that use many objects.

2.1.2 Shader Stages

A shader is a program that is executed in parallell by the multiple cores of thegraphics card [5]. These programs can be speci�ed to align the visual output asneeded for the application in mind. Dependent on the implementation these canbe used as a part of rendering a simple 2D interface or used to render 3D objectswith advanced lighting techniques.

Page 11: Direct3D 11 vs 12 - DiVA portal1055786/FULLTEXT01.pdf · pass their output as input for the next stage in the pipeline; for example, the Input-Assembler reads geometric data from

Chapter 2. Programming Using Direct3D 6

Vertex Shader

The vertex shader program inputs a vertex and outputs a vertex. What hap-pens during this stage is dependent on the implementation of the program. Acommon case is that each input vertex is speci�ed in world space and during thevertex shader stage it is transformed to homogeneous clip space in preparationfor rendering the 2D representation of the world.

Pixel Shader

The pixel shader program is executed for each pixel fragment. This is the shaderstep that computes color according to the implementation, a simple implemen-tation could return a constant color or a color based on interpolated vertex at-tributes. More advanced techniques can also be set in motion such as per-pixellighting, shadows and re�ections.

2.2 Immediate and Deferred Rendering

Immediate rendering refers to calling rendering APIs or commands from a di-rect3D device, which queue the commands in a bu�er that then executes on theGPU [6]. When using deferred rendering the commands are instead stored in acommand bu�er that can be played back at some other time. A deferred contextis used to record the commands both for rendering and state settings to a com-mand list [6]. Multiple threads can work in parallel with the deferred context,although each thread needs its own context and command list. When queuingup commands in this fashion Direct3D generates rendering overhead. The gainis that command lists execute much more e�ciently during its playback [6].

Direct3D 11 uses the immediate context to play back the command lists gen-erated. Only one command list can be processed at the same time. Direct3D12 instead use a command queue to handle this. The di�erence between them isthat Direct3D 11 submits commands to the bu�er in a single threaded mannerwhile Direct3D 12 allows for a multithreaded workload distribution for this task[7].

Page 12: Direct3D 11 vs 12 - DiVA portal1055786/FULLTEXT01.pdf · pass their output as input for the next stage in the pipeline; for example, the Input-Assembler reads geometric data from

Chapter 3

Method

This chapter focuses on the experiment conducted to measure performance of thetwo APIs. The goal is to gather performance data, using an application thatrender basic shaded geometry, to be able to evaluate the di�erence in executiontime between Direct3D 11 and Direct3D 12.

3.1 Benchmark

A benchmark application was written to render basic shaded geometry. The ap-plication use either the Direct3D 11 or Direct3D 12 pipeline to generate a screenimage. Each image is �lled with a number of points which have a �xed color.

Each test use three variables:

� Thread count

� Amount of points rendered

� Which API is used

The primary focus is to study performance of API interaction when renderinggeometry. In order to try to keep the variables to a minimum no culling oranti-aliasing of geometry is performed. Triangles are often used to visualize thegeometry of objects in games. For this thesis points were used to render geome-try. The choice to use points instead of triangles is based on the fact that when itcomes to the rasterization of the geometry, points are interpreted as though theywere composed of two triangles, which use triangle rasterization rules. Conve-niently enough there is no culling for points, which align with the aim to keep thetest at a basic level [8]. The only shader stages that are used are the Vertex andPixel shaders. The use of these are intended to be very de�ned. Vertex shaderpasses data to the pixel shader through the rasterization stage without doing anyadditional calculations.

Since one of the aims for Direct3D 12 is to more e�ciently use all CPU and

7

Page 13: Direct3D 11 vs 12 - DiVA portal1055786/FULLTEXT01.pdf · pass their output as input for the next stage in the pipeline; for example, the Input-Assembler reads geometric data from

Chapter 3. Method 8

GPU cores the application has a focus on a multithreaded approach. The rea-soning behind a multithreaded implementation is to be able to use more of theCPUs capacity than a single threaded application would. In order to achievethis the concept used was simple, the submitted graphical workload was equallydistributed among the threads available. No other parts of the project were op-timized with multithreading.

Comparisons were made against Direct3D 11 and its threaded counterpart whichis the deferred context pipeline [9]. This alongside with the variables de�nedand choice of geometry serve as a base to help evaluate and answer the researchquestion.

Measurements were logged automatically by the developed benchmark. The val-ues of the variables were de�ned before the program was running to ensure thatboth APIs go through the same amount of work. Since the aim is to make anapplication that is aligned with the concept of a benchmark tool the user inter-action was kept to a minimum. The user speci�ed the variables used for the testbefore running it and had no further control over the data collection.

3.1.1 Application Structure

The application was designed with simplicity and correctness in mind, followingthe principles displayed within the MSDN documentation [2]. The main programinitializes the Windows interface and handle the basic message loop, while leav-ing the rest of the processing time for API testing. The purpose of the mainloop is to iterate the tests of which the parameters are de�ned in a separate�le. Each test uses a DirectX variable that represents the API interface. Thisis initialized with a vertex bu�er that matches the size needed for the maximumamount of vertices used during each speci�c test. The variable is used to renderand measure the performance of each API. The execution time measured eachframe is divided into two parts, one part that populates the command lists, whichis refered to as time spent by the CPU, and a second part that focuses on theexecution time of the prepared lists, which is refered to as time spent by the GPU.

For each test the API variable is initialized with the Direct3D 11 version duringthe �rst stage, then released and initialized as the Direct3D 12 version for thesecond stage. After each test the measured times are committed to a �le.

Page 14: Direct3D 11 vs 12 - DiVA portal1055786/FULLTEXT01.pdf · pass their output as input for the next stage in the pipeline; for example, the Input-Assembler reads geometric data from

Chapter 3. Method 9

Timer Class

The application uses a CPU timer to measure the execution time. The basicconcept of this is a class that uses the QueryPerformanceFrequency function to�nd the frequency that the processor is running. This is then used alongside witha QueryPerformanceCounter function to get timestamps and calculate elapsedtime. To calculate the time spent by the GPUs execution additional steps haveto be taken. For Direct3D 11 a timer class based around query timestamps isused in order to force the CPU to wait for the GPUs execution to be �nished. InDirect3D 12 this is established by using a fence in conjunction with the CommandQueues Signal function. When it is established that the CPU waits for GPUexecution to be �nished, the CPU timer can be used to measure the executiontime.

Shaders

The shaders used in the application have been minimized to reduce their impacton performance. During the vertex shader stage the vertices are simply passedon to the rasterization stage because they already have their position de�ned inhomogeneous clip space coordinates. The pixel shader returns a constant colorand does no additional work.

3.1.2 Test De�nitions

Each test speci�es a number of samples, increment of vertices per sample andthe amount of threads to use. In each test the vertex bu�er is initialized to themaximum size needed. Rendering starts at zero points and for each sample thenumber of vertices to render increases dependent of the increment chosen. Toget more reliability from the time measured, each test also has a variable thatdetermines the amount of times it should be run. An average of the time measuredis calculated when the data gathering is complete.

Page 15: Direct3D 11 vs 12 - DiVA portal1055786/FULLTEXT01.pdf · pass their output as input for the next stage in the pipeline; for example, the Input-Assembler reads geometric data from

Chapter 4

Performance Tests

This chapter presents the data obtained while running the benchmark on a testingplatform. Each test contains conclusions. These conclusions are drawn from asingle test platform, thus these conclusions can not be used to give a generalestimation about performance outside of the experiment conducted.

4.1 Test Parameters

Test computer speci�cations:

Component Description

CPU Intel(R) Xeon(R) CPU E5-1620 v4 3.50GHz (Four cores)GPU NVIDIA GeForce GTX 1080 Driver Version 368.39Operating system Microsoft Windows 10 Build 10.0.10586.0DirectX Microsoft DirectX 11 and Microsoft DirectX 12Development Microsoft Visual Studio Community 2015 with Update 1

Every test was executed in window mode at a resolution of 800x600. Each testproduce the same visual output. During the measuring the Present function ofthe Swap Chain was disabled to ensure no synchronization was used when ren-dering frames.

Section 4.2 to section 4.9 present each test generated by the benchmark andcontains the parameters used. Rendering performance is measured in millisec-onds. The workload is separated in two categories; CPU and GPU, the CPU partstarts measuring at the beginning of the render call and ends when all commandlists have been �lled with commands, the GPU part measure the execution of thelists on the GPU.

Each test render zero to 400 000 vertices, in steps of 4000 vertices. This amountwas chosen after preliminary tests with both higher and lower count showed lin-ear patterns. Each base test rendered all vertices with one draw call per thread,complimentary tests labeled "b" were executed with the same parameters but

10

Page 16: Direct3D 11 vs 12 - DiVA portal1055786/FULLTEXT01.pdf · pass their output as input for the next stage in the pipeline; for example, the Input-Assembler reads geometric data from

Chapter 4. Performance Tests 11

with one draw call per vertex instead. This is intended to shift the focus to APIinteraction. To make measurements more stable each of the tests were executed1000 times and the mean of the time measured was used for the results. Sincethe CPU has four cores with two threads running on each core tests were limitedto use a maximum of eight threads.

Page 17: Direct3D 11 vs 12 - DiVA portal1055786/FULLTEXT01.pdf · pass their output as input for the next stage in the pipeline; for example, the Input-Assembler reads geometric data from

Chapter 4. Performance Tests 12

4.2 Test 1

This test uses one thread to �ll the command list, and one draw call for vertices.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0

16

00

0

32

00

0

48

00

0

64

00

0

80

00

0

96

00

0

11

20

00

12

80

00

14

40

00

16

00

00

17

60

00

19

20

00

20

80

00

22

40

00

24

00

00

25

60

00

27

20

00

28

80

00

30

40

00

32

00

00

33

60

00

35

20

00

36

80

00

38

40

00

40

00

00

MILLISECONDS

VERTICES

DX11

D3D11 CPU D3D11 Total

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0

16

00

0

32

00

0

48

00

0

64

00

0

80

00

0

96

00

0

11

20

00

12

80

00

14

40

00

16

00

00

17

60

00

19

20

00

20

80

00

22

40

00

24

00

00

25

60

00

27

20

00

28

80

00

30

40

00

32

00

00

33

60

00

35

20

00

36

80

00

38

40

00

40

00

00

MILLISECONDS

VERTICES

DX12

D3D12 CPU D3D12 Total

Figure 4.1: Test 1, rendered with Direct3D 11 and Direct3D 12

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0

16

00

0

32

00

0

48

00

0

64

00

0

80

00

0

96

00

0

11

20

00

12

80

00

14

40

00

16

00

00

17

60

00

19

20

00

20

80

00

22

40

00

24

00

00

25

60

00

27

20

00

28

80

00

30

40

00

32

00

00

33

60

00

35

20

00

36

80

00

38

40

00

40

00

00

MIL

LISE

CO

ND

S

VERTICES

DX11 vs DX12

D3D11 Total D3D12 Total

Figure 4.2: Test 1, total render time for both APIs

Test 1 Conclusion

In Figure 4.1 Direct3D 11 shows lower CPU execution time overall, while Direct3D12 shows lower total execution time until around 144 000 points drawn. Both APIsshow linear behavior with constant time for CPU portion of the execution as seenin Figure 4.2.

Page 18: Direct3D 11 vs 12 - DiVA portal1055786/FULLTEXT01.pdf · pass their output as input for the next stage in the pipeline; for example, the Input-Assembler reads geometric data from

Chapter 4. Performance Tests 13

4.3 Test 1b

This test uses one thread to �ll the command list, and one draw call per vertex.

0

2

4

6

8

10

12

14

0

16

00

0

32

00

0

48

00

0

64

00

0

80

00

0

96

00

0

11

20

00

12

80

00

14

40

00

16

00

00

17

60

00

19

20

00

20

80

00

22

40

00

24

00

00

25

60

00

27

20

00

28

80

00

30

40

00

32

00

00

33

60

00

35

20

00

36

80

00

38

40

00

40

00

00

MILLISECONDS

VERTICES

DX11

D3D11 CPU D3D11 Total

0

2

4

6

8

10

12

14

0

16

00

0

32

00

0

48

00

0

64

00

0

80

00

0

96

00

0

11

20

00

12

80

00

14

40

00

16

00

00

17

60

00

19

20

00

20

80

00

22

40

00

24

00

00

25

60

00

27

20

00

28

80

00

30

40

00

32

00

00

33

60

00

35

20

00

36

80

00

38

40

00

40

00

00

MILLISECONDS

VERTICES

DX12

D3D12 CPU D3D12 Total

Figure 4.3: Test 1b, rendered with Direct3D 11 and Direct3D 12

0

2

4

6

8

10

12

14

0

16

00

0

32

00

0

48

00

0

64

00

0

80

00

0

96

00

0

11

20

00

12

80

00

14

40

00

16

00

00

17

60

00

19

20

00

20

80

00

22

40

00

24

00

00

25

60

00

27

20

00

28

80

00

30

40

00

32

00

00

33

60

00

35

20

00

36

80

00

38

40

00

40

00

00

MIL

LISE

CO

ND

S

VERTICES

DX11 vs DX12

D3D11 Total D3D12 Total

Figure 4.4: Test 1b, total render time for both APIs

Test 1b Conclusion

In Figure 4.3 Direct3D 11 shows lower total execution time. Both APIs showlinear behavior as seen in Figure 4.4.

Page 19: Direct3D 11 vs 12 - DiVA portal1055786/FULLTEXT01.pdf · pass their output as input for the next stage in the pipeline; for example, the Input-Assembler reads geometric data from

Chapter 4. Performance Tests 14

4.4 Test 2

This test uses two threads to �ll the command lists, and two draw calls for vertices.

MILLISECONDS

0

0.1

0.2

0.3

0.4

0.5

0.6

0

16

00

0

32

00

0

48

00

0

64

00

0

80

00

0

96

00

0

11

20

00

12

80

00

14

40

00

16

00

00

17

60

00

19

20

00

20

80

00

22

40

00

24

00

00

25

60

00

27

20

00

28

80

00

30

40

00

32

00

00

33

60

00

35

20

00

36

80

00

38

40

00

40

00

00

MILLISECONDS

VERTICES

DX11

D3D11 CPU D3D11 Total

0

0.1

0.2

0.3

0.4

0.5

0.6

0

16

00

0

32

00

0

48

00

0

64

00

0

80

00

0

96

00

0

11

20

00

12

80

00

14

40

00

16

00

00

17

60

00

19

20

00

20

80

00

22

40

00

24

00

00

25

60

00

27

20

00

28

80

00

30

40

00

32

00

00

33

60

00

35

20

00

36

80

00

38

40

00

40

00

00

MILLISECONDS

VERTICES

DX12

D3D12 CPU D3D12 Total

Figure 4.5: Test 2, rendered with Direct3D 11 and Direct3D 12

0

0.1

0.2

0.3

0.4

0.5

0.6

0

16

00

0

32

00

0

48

00

0

64

00

0

80

00

0

96

00

0

11

20

00

12

80

00

14

40

00

16

00

00

17

60

00

19

20

00

20

80

00

22

40

00

24

00

00

25

60

00

27

20

00

28

80

00

30

40

00

32

00

00

33

60

00

35

20

00

36

80

00

38

40

00

40

00

00

MIL

LISE

CO

ND

S

VERTICES

DX11 vs DX12

D3D11 Total D3D12 Total

Figure 4.6: Test 2, total render time for both APIs

Test 2 Conclusion

In Figure 4.5 Direct3D 11 shows lower CPU execution time overall, while Direct3D12 shows lower total execution time until around 160 000 points drawn. Both APIsshow linear behavior with constant time for CPU portion of the execution as seenin Figures 4.5 and 4.6.

Page 20: Direct3D 11 vs 12 - DiVA portal1055786/FULLTEXT01.pdf · pass their output as input for the next stage in the pipeline; for example, the Input-Assembler reads geometric data from

Chapter 4. Performance Tests 15

4.5 Test 2b

This test uses two threads to �ll the command lists, and one draw call per vertex.

0

1

2

3

4

5

6

7

8

0

16

00

0

32

00

0

48

00

0

64

00

0

80

00

0

96

00

0

11

20

00

12

80

00

14

40

00

16

00

00

17

60

00

19

20

00

20

80

00

22

40

00

24

00

00

25

60

00

27

20

00

28

80

00

30

40

00

32

00

00

33

60

00

35

20

00

36

80

00

38

40

00

40

00

00

MILLISECONDS

VERTICES

DX11

D3D11 CPU D3D11 Total

0

1

2

3

4

5

6

7

8

0

16

00

0

32

00

0

48

00

0

64

00

0

80

00

0

96

00

0

11

20

00

12

80

00

14

40

00

16

00

00

17

60

00

19

20

00

20

80

00

22

40

00

24

00

00

25

60

00

27

20

00

28

80

00

30

40

00

32

00

00

33

60

00

35

20

00

36

80

00

38

40

00

40

00

00

MILLISECONDS

VERTICES

DX12

D3D12 CPU D3D12 Total

Figure 4.7: Test 2b, rendered with Direct3D 11 and Direct3D 12

0

1

2

3

4

5

6

7

8

0

16

00

0

32

00

0

48

00

0

64

00

0

80

00

0

96

00

0

11

20

00

12

80

00

14

40

00

16

00

00

17

60

00

19

20

00

20

80

00

22

40

00

24

00

00

25

60

00

27

20

00

28

80

00

30

40

00

32

00

00

33

60

00

35

20

00

36

80

00

38

40

00

40

00

00

MIL

LISE

CO

ND

S

VERTICES

DX11 vs DX12

D3D11 Total D3D12 Total

Figure 4.8: Test 2b, total render time for both APIs

Test 2b Conclusion

In Figure 4.7 Direct3D11 shows lower total execution time. Both APIs showlinear behavior as seen in Figure 4.8. There is less di�erence in execution timewhen compared to test 1b.

Page 21: Direct3D 11 vs 12 - DiVA portal1055786/FULLTEXT01.pdf · pass their output as input for the next stage in the pipeline; for example, the Input-Assembler reads geometric data from

Chapter 4. Performance Tests 16

4.6 Test 3

This test uses four threads to �ll the command lists, and four draw calls for ver-tices.

0

0.1

0.2

0.3

0.4

0.5

0.6

0

16

00

0

32

00

0

48

00

0

64

00

0

80

00

0

96

00

0

11

20

00

12

80

00

14

40

00

16

00

00

17

60

00

19

20

00

20

80

00

22

40

00

24

00

00

25

60

00

27

20

00

28

80

00

30

40

00

32

00

00

33

60

00

35

20

00

36

80

00

38

40

00

40

00

00

MILLISECONDS

VERTICES

DX11

D3D11 CPU D3D11 Total

0

0.1

0.2

0.3

0.4

0.5

0.6

0

16

00

0

32

00

0

48

00

0

64

00

0

80

00

0

96

00

0

11

20

00

12

80

00

14

40

00

16

00

00

17

60

00

19

20

00

20

80

00

22

40

00

24

00

00

25

60

00

27

20

00

28

80

00

30

40

00

32

00

00

33

60

00

35

20

00

36

80

00

38

40

00

40

00

00

MILLISECONDS

VERTICES

DX12

D3D12 CPU D3D12 Total

Figure 4.9: Test 3, rendered with Direct3D 11 and Direct3D 12

0

0.1

0.2

0.3

0.4

0.5

0.6

0

16

00

0

32

00

0

48

00

0

64

00

0

80

00

0

96

00

0

11

20

00

12

80

00

14

40

00

16

00

00

17

60

00

19

20

00

20

80

00

22

40

00

24

00

00

25

60

00

27

20

00

28

80

00

30

40

00

32

00

00

33

60

00

35

20

00

36

80

00

38

40

00

40

00

00

MIL

LISE

CO

ND

S

VERTICES

DX11 vs DX12

D3D11 Total D3D12 Total

Figure 4.10: Test 3, total render time for both APIs

Test 3 Conclusion

In Figure 4.9 Direct3D 11 shows lower CPU execution time overall, while Direct3D12 shows lower total execution time until around 224 000 points drawn. Both APIsshow linear behavior with constant time for CPU portion of the execution as seenin Figure 4.10.

Page 22: Direct3D 11 vs 12 - DiVA portal1055786/FULLTEXT01.pdf · pass their output as input for the next stage in the pipeline; for example, the Input-Assembler reads geometric data from

Chapter 4. Performance Tests 17

4.7 Test 3b

This test uses four threads to �ll the command lists, and one draw call per vertex.

0

1

2

3

4

5

6

0

16

00

0

32

00

0

48

00

0

64

00

0

80

00

0

96

00

0

11

20

00

12

80

00

14

40

00

16

00

00

17

60

00

19

20

00

20

80

00

22

40

00

24

00

00

25

60

00

27

20

00

28

80

00

30

40

00

32

00

00

33

60

00

35

20

00

36

80

00

38

40

00

40

00

00

MILLISECONDS

VERTICES

DX11

D3D11 CPU D3D11 Total

0

1

2

3

4

5

6

0

16

00

0

32

00

0

48

00

0

64

00

0

80

00

0

96

00

0

11

20

00

12

80

00

14

40

00

16

00

00

17

60

00

19

20

00

20

80

00

22

40

00

24

00

00

25

60

00

27

20

00

28

80

00

30

40

00

32

00

00

33

60

00

35

20

00

36

80

00

38

40

00

40

00

00

MILLISECONDS

VERTICES

DX12

D3D12 CPU D3D12 Total

Figure 4.11: Test 3b, rendered with Direct3D 11 and Direct3D 12

0

1

2

3

4

5

6

0

16

00

0

32

00

0

48

00

0

64

00

0

80

00

0

96

00

0

11

20

00

12

80

00

14

40

00

16

00

00

17

60

00

19

20

00

20

80

00

22

40

00

24

00

00

25

60

00

27

20

00

28

80

00

30

40

00

32

00

00

33

60

00

35

20

00

36

80

00

38

40

00

40

00

00

MIL

LISE

CO

ND

S

VERTICES

DX11 vs DX12

D3D11 Total D3D12 Total

Figure 4.12: Test 3b, total render time for both APIs

Test 3b Conclusion

In Figure 4.11 Direct3D 11 shows lower total execution time. Both APIs showlinear behavior as seen in Figure 4.12. There is less di�erence in execution timewhen compared to test 2b.

Page 23: Direct3D 11 vs 12 - DiVA portal1055786/FULLTEXT01.pdf · pass their output as input for the next stage in the pipeline; for example, the Input-Assembler reads geometric data from

Chapter 4. Performance Tests 18

4.8 Test 4

This test uses eight threads to �ll the command lists, and eight draw calls forvertices.

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0

16

00

0

32

00

0

48

00

0

64

00

0

80

00

0

96

00

0

11

20

00

12

80

00

14

40

00

16

00

00

17

60

00

19

20

00

20

80

00

22

40

00

24

00

00

25

60

00

27

20

00

28

80

00

30

40

00

32

00

00

33

60

00

35

20

00

36

80

00

38

40

00

40

00

00

MILLISECONDS

VERTICES

DX11

D3D11 CPU D3D11 Total

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0

16

00

0

32

00

0

48

00

0

64

00

0

80

00

0

96

00

0

11

20

00

12

80

00

14

40

00

16

00

00

17

60

00

19

20

00

20

80

00

22

40

00

24

00

00

25

60

00

27

20

00

28

80

00

30

40

00

32

00

00

33

60

00

35

20

00

36

80

00

38

40

00

40

00

00

MILLISECONDS

VERTICES

DX12

D3D12 CPU D3D12 Total

Figure 4.13: Test 4, rendered with Direct3D 11 and Direct3D 12

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0

16

00

0

32

00

0

48

00

0

64

00

0

80

00

0

96

00

0

11

20

00

12

80

00

14

40

00

16

00

00

17

60

00

19

20

00

20

80

00

22

40

00

24

00

00

25

60

00

27

20

00

28

80

00

30

40

00

32

00

00

33

60

00

35

20

00

36

80

00

38

40

00

40

00

00

MIL

LISE

CO

ND

S

VERTICES

DX11 vs DX12

D3D11 Total D3D12 Total

Figure 4.14: Test 4, total render time for both APIs

Test 4 Conclusion

In Figure 4.13 Direct3D 11 shows lower total execution time. Both APIs showlinear behavior as seen in Figure 4.14.

Page 24: Direct3D 11 vs 12 - DiVA portal1055786/FULLTEXT01.pdf · pass their output as input for the next stage in the pipeline; for example, the Input-Assembler reads geometric data from

Chapter 4. Performance Tests 19

4.9 Test 4b

This test uses eight threads to �ll the command lists, and one draw call per vertex.

0

1

2

3

4

5

6

0

16

00

0

32

00

0

48

00

0

64

00

0

80

00

0

96

00

0

11

20

00

12

80

00

14

40

00

16

00

00

17

60

00

19

20

00

20

80

00

22

40

00

24

00

00

25

60

00

27

20

00

28

80

00

30

40

00

32

00

00

33

60

00

35

20

00

36

80

00

38

40

00

40

00

00

MILLISECONDS

VERTICES

DX11

D3D11 CPU D3D11 Total

0

1

2

3

4

5

6

0

16

00

0

32

00

0

48

00

0

64

00

0

80

00

0

96

00

0

11

20

00

12

80

00

14

40

00

16

00

00

17

60

00

19

20

00

20

80

00

22

40

00

24

00

00

25

60

00

27

20

00

28

80

00

30

40

00

32

00

00

33

60

00

35

20

00

36

80

00

38

40

00

40

00

00

MILLISECONDS

VERTICES

DX12

D3D12 CPU D3D12 Total

Figure 4.15: Test 4b, rendered with Direct3D 11 and Direct3D 12

0

1

2

3

4

5

6

0

16

00

0

32

00

0

48

00

0

64

00

0

80

00

0

96

00

0

11

20

00

12

80

00

14

40

00

16

00

00

17

60

00

19

20

00

20

80

00

22

40

00

24

00

00

25

60

00

27

20

00

28

80

00

30

40

00

32

00

00

33

60

00

35

20

00

36

80

00

38

40

00

40

00

00

MIL

LISE

CO

ND

S

VERTICES

DX11 vs DX12

D3D11 Total D3D12 Total

Figure 4.16: Test 4b, total render time for both APIs

Test 4b Conclusion

In Figure 4.15 Direct3D 11 shows lower total execution time, although very similarin the beginning. Both APIs show linear behavior as seen in Figure 4.16. Thereis less di�erence in execution time when compared to test 2b.

Page 25: Direct3D 11 vs 12 - DiVA portal1055786/FULLTEXT01.pdf · pass their output as input for the next stage in the pipeline; for example, the Input-Assembler reads geometric data from

Chapter 5

Analysis and Discussion

This chapter contains the analysis and discussion of the tests. The re�ectionshere are based on experience gained from MSDN documentations and by workingwith the thesis project.

5.1 Analysis

This thesis focuses on API interaction and rendering of basic shaded geometry.The research question asked if higher rendering performance would be achievedwhen rendering basic shaded geometry. The conclusions from the tests suggeststhat in this speci�c experiment, Direct3D 12 does not automatically o�er higherrendering performance. In most cases Direc3D 11 perform better, but there aresome cases where Direct3D 12 is ahead.

During the tests that use one draw call per thread Direct3D 12 executes faster onthe lower end of the graphs in all cases except when using eight threads. Whenviewing the complementary tests that is intended to shift the focus to API in-teraction Direct3D 11 shows better performance in all cases. It is worth to notethat this gap decreases signi�cantly when more threads are used, and in test 4bthe performance on the lower vertex counts approaches similar execution time.

5.2 Discussion

This result somewhat contradicts other studies made on the subject which showthat Direct3D 12 has given higher performance. One of these studies clearly showthat CPU time needed is signi�cantly less in Direct3D 12 when compared to Di-rect3D 11. In this study the benchmark Star Swarm, developed by Oxide Games,stress tests API e�ciency [10].

One reasoning behind why this is not evident in this experiment might be thatthe benchmark application developed does not use Direct3D 12 to its full extent.When designing the benchmark the structure of Direct3D 12 was implemented

20

Page 26: Direct3D 11 vs 12 - DiVA portal1055786/FULLTEXT01.pdf · pass their output as input for the next stage in the pipeline; for example, the Input-Assembler reads geometric data from

Chapter 5. Analysis and Discussion 21

with the intent to mimic the functionality of Direct3D 11. This is most likely notthe optimal case for Direct3D 12. It is even stated that the use of fences to waitfor the previous frame to be �nished is not best practice [2]. This leaves the CPUwaiting and is essentially wasting valuable execution time that could be used inother ways.

An observation made when running the benchmark was that Direct3D 11 utilizemore of the CPU and allocates more threads for the application than Direct3D12. This could be an indication that the drivers for Direct3D 11 handle the opti-mization automatically while in Direct3D 12 the user has to be more speci�c withresource usage. This thought process aligns with the observation that di�erencesin performance are less noticable when using more threads as the CPU spendsless time waiting during the Direct3D 12 execution.

When evaluating the primary tests which use one draw call per thread Direct3D12 did execute faster at the lower vertex counts. This could be bene�cial whenrendering objects with less complex geometry, which is often the case withingraphical rendering.

The testing approach for this thesis might not be aligned with the recommendeduse for the Direct3D 12 API. This could be because Direct3D 12 o�ers the abilityto set several stage settings with the pipeline structure, while the test speci�edis the bare minimum to render shaded geometry. These parts are divided in Di-rect3D 11 which allow for more basic use. This could mean that if all stages inthe pipeline were necessary more e�ciency might be achieved.

Page 27: Direct3D 11 vs 12 - DiVA portal1055786/FULLTEXT01.pdf · pass their output as input for the next stage in the pipeline; for example, the Input-Assembler reads geometric data from

Chapter 6

Conclusions and Future Work

The conclusion for this experiment is that Direct3D 12 did o�er higher renderingperformance, but only in speci�c cases, when rendering basic shaded geometrywith a benchmark designed as described by the method in this thesis.

A conclusion to be made is that for Direct3D 12 to give higher rendering perfor-mance it may not be as simple as imitating the implementation of a Direct3D 11application. To be able to use it to full extent considerations need to be madeto assure that the application is implemented with designs that align with rec-ommended usage of the Direct3D 12 API. With Direct3D 12 the user has morecontrol and responsibilies. The driver for Direct3D 11 does much for its user, forexample o�oad the render thread and optimize resource residency [11].

Knowing that the benchmark application show performance that contradictsother studies future work that could be done is to evaluate if the benchmarkwas using the Direct3D 12 API in a manner that was intended. Other factorscould be the hardware and software used, therefore studies with the developedbenchmark could give other results when used on another platform than the oneused in this thesis. Considering multiple core support is one of the aims for Di-rect3D 12 the study could bene�t from tests with more cores available for use.This would reveal if given these test parameters, could Direct3D 12 give higherrendering performance than Direct3D 11 if more work could be done in parallel.

22

Page 28: Direct3D 11 vs 12 - DiVA portal1055786/FULLTEXT01.pdf · pass their output as input for the next stage in the pipeline; for example, the Input-Assembler reads geometric data from

References

[1] K. Karimi, N. Dickson, and F. Hamze, A Performance Comparison of CUDA

and OpenCL. Cornell University Library, 2010. [Online] Available from:http://http://arxiv.org/abs/1005.2581 Accessed: 22 April 2015.

[2] MSDN Direct3D 12 Programming Guide. Available from: https:

//msdn.microsoft.com/en-us/library/windows/desktop/dn899121(v=

vs.85).aspx Accessed: 20 September 2016.

[3] DirectX Developer Blog. Available from: http://blogs.msdn.com/b/

directx/archive/2014/03/20/directx-12.aspx Accessed: 20 September2016.

[4] MSDN Graphics Pipeline. Available from: https://msdn.microsoft.com/en-us/library/windows/desktop/ff476882(v=vs.85).aspx Accessed:20 September 2016.

[5] F. D. Luna, Introduction to 3D GAME PROGRAMMING WITH DIRECTX

11. Dulles : David Pallai, 2012.

[6] MSDN Rendering. Available from: https://msdn.microsoft.com/

en-us/library/windows/desktop/ff476892(v=vs.85).aspx Accessed:20 September 2016.

[7] AMD DirectX 12. Available from: https://msdn.microsoft.com/

en-us/library/windows/desktop/ff476892(v=vs.85).aspx Accessed:20 September 2016.

[8] MSDN Rasterization Rules. Available from: https://msdn.microsoft.

com/en-us/library/windows/desktop/cc627092(v=vs.85).aspx Ac-cessed: 20 September 2016.

[9] J. Zink, M. Pettineo, and H. J, Practical Rendering and Computation with

Direct3D 11. A K Peters: CRC Press, 2011.

[10] DirectX 12 Performance Preview. Avail-able from: http://www.anandtech.com/show/8962/

the-directx-12-performance-preview-amd-nvidia-star-swarm Ac-cessed: 20 September 2016.

23

Page 29: Direct3D 11 vs 12 - DiVA portal1055786/FULLTEXT01.pdf · pass their output as input for the next stage in the pipeline; for example, the Input-Assembler reads geometric data from

References 24

[11] GDC Advanced Rendering with DirectX12. Available from: http:

//developer.download.nvidia.com/gameworks/events/GDC2016/\\

AdvancedRenderingwithDirectX11andDirectX12.pdf Accessed: 21September 2016.