siggraph asia 2008 modern opengl

206
1

Upload: mark-kilgard

Post on 06-May-2015

326.355 views

Category:

Technology


5 download

DESCRIPTION

A long-time implementer of OpenGL (Mark Kilgard, NVIDIA) and the system's original architect (Kurt Akeley, Microsoft) explain OpenGL's design and evolution. OpenGL's state machine is now a complex data-flow with multiple programmable stages. OpenGL practitioners can expect candid design explanations, advice for programming modern GPUs, and insight into OpenGL's future. These slides were presented at SIGGRAPH Asia 2008 for the "Modern OpenGL: Its Design and Evolution" course. Course abstract: OpenGL was conceived in 1991 to provide an industry standard for programming the hardware graphics pipeline. The original design has evolved considerably over the last 17 years. Whereas capabilities mandated by OpenGL such as texture mapping and a stencil buffer were present only on the world's most expensive graphics hardware back in 1991, now these features are completely pervasive in PCs and now even available in several hand-held devices. Over that time, OpenGL's original fixed-function state machine has evolved into a complex data-flow including several application-programmable stages. And the performance of OpenGL has increased from 100x to over 1,000x in many important raw graphics operations. In this course, a long-time implementer of OpenGL and the system's original architect explain OpenGL's design and evolution. You will learn how the modern (post-2006) graphics hardware pipeline is exposed through OpenGL. You will hear Kurt Akeley's personal retrospective on OpenGL's development. You will learn nine ways to write better OpenGL programs. You will learn how modern OpenGL implementations operate. Finally we discuss OpenGL's future evolution. Whether you program with OpenGL or program with another API such as Direct3D, this course will give you new insights into graphics hardware architecture, programmable shading, and how to best take advantage of modern GPUs.

TRANSCRIPT

Page 1: SIGGRAPH Asia 2008 Modern OpenGL

1

Page 2: SIGGRAPH Asia 2008 Modern OpenGL

2

Mark J. Kilgard, NVIDIAKurt Akeley, Microsoft Research

13 December 2008Singapore

Modern OpenGL:Its Design and Evolution

Page 3: SIGGRAPH Asia 2008 Modern OpenGL

3

Introductions

Page 4: SIGGRAPH Asia 2008 Modern OpenGL

4

Kurt AkeleyKurt Akeley

• Led development of OpenGL at Silicon Graphics (SGI)• Co-founded SGI • Lead development of SGI’s high-end graphics hardware• Co-author of OpenGL specification

• Returned to Stanford University to complete Ph.D.• Co-developed Cg “C for graphics” language at NVIDIA• Principal Researcher, Microsoft Research Silicon Valley

• Spent time at Microsoft Research Asia in Beijing • Member of US National Academy of Engineering

Page 5: SIGGRAPH Asia 2008 Modern OpenGL

5

Mark KilgardMark Kilgard

• Principal System Software Engineer, NVIDIA, Austin, Texas• Developed original OpenGL driver for 1st GeForce GPU• Specified many key OpenGL extensions• Works on Cg for portable programmable shading• NVIDIA Distinguished Inventor

• Before NVIDIA, worked at Silicon Graphics• Worked on X Window System integration for OpenGL• Developed popular OpenGL Utility Toolkit (GLUT)

• Wrote book on OpenGL and X, co-authored Cg Tutorial

Page 6: SIGGRAPH Asia 2008 Modern OpenGL

6

Marc LevoyMarc Levoy

• Moderator for our facilitated discussion

• Professor of Computer Science and Electrical Engineering• Stanford University

• SIGGRAPH Computer Graphics Achievement Award • ACM Fellow

Page 7: SIGGRAPH Asia 2008 Modern OpenGL

7

Course ScheduleCourse Schedule

• Modern OpenGL (Kilgard)• OpenGL’s evolution: a personal retrospective (Akeley)• Writing better OpenGL (Kilgard)

• Implementing OpenGL (Kilgard)• OpenGL’s future evolution (Kilgard)• OpenGL in Context (Akeley, Kilgard, Levoy)

• Facilitated conversation

– Mid-session break –

Page 8: SIGGRAPH Asia 2008 Modern OpenGL

8

Check Out the Course Notes (1)Check Out the Course Notes (1)

• Look to www.opengl.org web site for our final slides• New Material

• “An Incomplete History of OpenGL” (Kilgard)• How the OpenGL graphics system developed

• “Using Vertex Buffer Objects Well” (Kilgard)• Learn how to use Vertex Buffers objects for high

vertex processing rates

Page 9: SIGGRAPH Asia 2008 Modern OpenGL

9

Check Out the Course Notes (2)Check Out the Course Notes (2)

• Paper Reprints• OpenGL design rationale from its specification co-

authors (Segal, Akeley)• Realizing OpenGL: two implementations of one

architecture (Kilgard)• Graphics hardware: GTX, RealityEngine,

InfiniteReality, GeForce 6800• Key developments in graphics hardware design

over last 20 years• GPU Programmability: “User-Programmable Vertex

Engine” and “Cg” SIGGAPH papers• “How GPUs Work” (Luebke, Humpherys)

Page 10: SIGGRAPH Asia 2008 Modern OpenGL

10

Modern OpenGL

Mark Kilgard

Principal System Software Engineer

NVIDIA

Page 11: SIGGRAPH Asia 2008 Modern OpenGL

11

Modern OpenGLModern OpenGL

• History• How did OpenGL get where it is now?

• Present• Version 3.0• Functionality beyond 3.0

Page 12: SIGGRAPH Asia 2008 Modern OpenGL

12

An Overview History of OpenGLAn Overview History of OpenGL

• Pre-history 1991• IRIS GL, a proprietary Graphics Library by SGI

• OpenGL, an open standard for 3D• Focus: procedural hardware-accelerated 3D graphics• Governed by Architectural Review Board (ARB)• Extensibility planned into design

• Competition• Proprietary APIs (1991-1995)• PHIGS & PEX for X Window System (1992-1997)• Microsoft’s Direct3D (1998-)

Page 13: SIGGRAPH Asia 2008 Modern OpenGL

13

OpenGL’s Pre-historyOpenGL’s Pre-history

IRIS GL 1Window system: MEX

IRIS GL 2Window system: MEXOperating system: UNIX

IRIS GL 3Window system: NeWS/X11Operating system: IRIX 3.x

IRIS GL 4Window system: Native X11Operating system: IRIX 4.3

OpenGL 1.0Window system: Native X11 with GLXOperating system: IRIX 5.1

1991

1993

1988

1986

1983

First work onGL 5.0 proposal

1989

Dates are forshipping commercialSGI implementation

1983-2008 = 25 years

Page 14: SIGGRAPH Asia 2008 Modern OpenGL

14

OpenGL’s Design PhilosophyOpenGL’s Design Philosophy

• High-performance• Assumes hardware

acceleration• Defined by a specification

• Rather than a de-facto implementation

• Rendering state machine• Procedural• Not a window system,

not a scene graph• No initial sub-setting• Extensible

• Data type rich• Cross-platform

• Window system-independent core

• X Window System, Microsoft Windows, OS/2, OS X, etc.

• Multi-language bindings• C, FORTRAN, etc.• Not merely an API,

rather a system

Page 15: SIGGRAPH Asia 2008 Modern OpenGL

15

Timeline of OpenGL’s DevelopmentTimeline of OpenGL’s Development

1992 1994 1996 1998 2000 2002 2004 2006 2008

OpenGL 1.0 approvedOpenGL 1.1

OpenGL 1.2

Multitexture added (1.2.1)

OpenGL 1.3

OpenGL 1.4

OpenGL 1.5

OpenGL 2.0

OpenGL 2.1

OpenGL 3.0

SGIInfinite-Reality

OpenGL UtilityToolkit (GLUT)released

Mesa3Dopensource

KhronoscontrolsOpenGL

1st GPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)

NT 3.51bringOpenGLto PCs

OpenGL ES for embedded devices

1st commercialOpenGLimplementation(DEC)

Page 16: SIGGRAPH Asia 2008 Modern OpenGL

16

Competitive 3D APIsCompetitive 3D APIs

• OpenGL has always existed in competition with other APIs• Strengthened OpenGL by driving

feature parity• OpenGL’s competitive strengths:

1. Cross platform, open process2. API stability, extensibility3. Clean initial design & specification

1992 1994 1996 1998 2000 2002 2004 2006 2008

Proprietary Unix workstation 3D APIs

XGLXGLDoréDoré

StarbaseIRIS GLIRIS GL

X Consortium 3D standard

PEXPEX

Microsoft Direct3D

DirectX 3

DirectX 5

DirectX 6

DirectX 7

DirectX 8

DirectX 9

DirectX 10

Page 17: SIGGRAPH Asia 2008 Modern OpenGL

17

OpenGL 1.0OpenGL 1.0

1992 1994 1996 1998 2000 2002 2004 2006 2008

OpenGL 1.0 approvedOpenGL 1.1

OpenGL 1.2

Multitexture added (1.2.1)

OpenGL 1.3

OpenGL 1.4

OpenGL 1.5

OpenGL 2.0

OpenGL 2.1

OpenGL 3.0

SGIInfinite-Reality

OpenGL UtilityToolkit (GLUT)released

Mesa3Dopensource

KhronoscontrolsOpenGL

1st GPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)

NT 3.51bringOpenGLto PCs

OpenGL ES for embedded devices

1st commercialOpenGLimplementation(DEC)

•Immediate mode•Vertex transformation and lighting•Points, lines, polygons•Stippling, wide points and lines•Bitmaps, image rectangles, and pixel reads•Pixel store and transfer•1D and 2D textures, fog, and scissor•Display lists and evaluators•RGBA and color index color models•Color, depth, stencil, and accumulation buffers•Selection and feedback modes•Queries

Page 18: SIGGRAPH Asia 2008 Modern OpenGL

18

OpenGL State MachineOpenGL State Machine

• From OpenGL 3.0 specification, unchanged since 1.0

Page 19: SIGGRAPH Asia 2008 Modern OpenGL

19

SGI “Classic” Hardware View of OpenGLSGI “Classic” Hardware View of OpenGL

3D Applicationor Game

• Entirely fixed-function, no programmability• High-end SGI hardware manifested

functionality in distinct chipsOpenGL API

Front EndVertex

Assembly

VertexTransform & Lighting

Primitive Assembly,Clipping, Setup,

and Rasterization

Texture &Fog

Texture Fetch

RasterOperations

Framebuffer Access

Memory Interface

Graphics Hardware Boundary

1992

Graphics data flow

Memory operations

Fixed-function unit

Programmable unit

Page 20: SIGGRAPH Asia 2008 Modern OpenGL

20

OpenGL 1.1OpenGL 1.1

1992 1994 1996 1998 2000 2002 2004 2006 2008

OpenGL 1.0 approvedOpenGL 1.1

OpenGL 1.2

Multitexture added (1.2.1)

OpenGL 1.3

OpenGL 1.4

OpenGL 1.5

OpenGL 2.0

OpenGL 2.1

OpenGL 3.0

SGIInfinite-Reality

OpenGL UtilityToolkit (GLUT)released

Mesa3Dopensource

KhronoscontrolsOpenGL

1st GPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)

NT 3.51bringOpenGLto PCs

OpenGL ES for embedded devices

1st commercialOpenGLimplementation(DEC)

• Vertex arrays

• Texture objects• Texture internal formats• Texture sub-image updates• Texture proxies• Copy framebuffer-to-texture

• Polygon offset• RGBA logical operations

Page 21: SIGGRAPH Asia 2008 Modern OpenGL

21

The Look of OpenGL 1.1The Look of OpenGL 1.1

SGI skyfly demoSGI skyfly demo

StenciledStenciledshadow volumesshadow volumes

Ideas in MotionIdeas in Motion

Page 22: SIGGRAPH Asia 2008 Modern OpenGL

22

OpenGL 1.2OpenGL 1.2

1992 1994 1996 1998 2000 2002 2004 2006 2008

OpenGL 1.0 approvedOpenGL 1.1

OpenGL 1.2

Multitexture added (1.2.1)

OpenGL 1.3

OpenGL 1.4

OpenGL 1.5

OpenGL 2.0

OpenGL 2.1

OpenGL 3.0

SGIInfinite-Reality

OpenGL UtilityToolkit (GLUT)released

Mesa3Dopensource

KhronoscontrolsOpenGL

1st GPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)

NT 3.51bringOpenGLto PCs

OpenGL ES for embedded devices

1st commercialOpenGLimplementation(DEC)

• 3D textures• Texture edge clamp wrap mode• Texture level-of-detail clamping

• BGRA component order• Packed pixel formats• Imaging subset (optional)

• Normal rescaling• Separate specular

• Vertex array draw elements range

Page 23: SIGGRAPH Asia 2008 Modern OpenGL

23

Akeley’s (Modernized) OpenGL Data FlowAkeley’s (Modernized) OpenGL Data Flow

vertexshading

rasterization& fragment

shading

textureraster

operationsframebuffer

pixelunpack

pixelpack

vertexpuller

clientmemory

pixeltransfer

glReadPixels / glCopyPixels / glCopyTex{Sub}Image

glDrawPixelsglBitmapglCopyPixels

glTex{Sub}ImageglCopyTex{Sub}Image

glDrawElementsglDrawArrays

selection / feedback / transform feedback

glVertex*glColor*glTexCoord*etc.

blendingdepth testingstencil testingaccumulation

storage

operations

Page 24: SIGGRAPH Asia 2008 Modern OpenGL

24

OpenGL 1.2 Imaging SubsetOpenGL 1.2 Imaging Subset

Color Table

Convolution(separable or general)

Post-convolveScale & Bias

Post-convolveColor Table

Color Matrix

Post-color matrixScale & Bias

Post-color matrixColor Table

Histogram

Min-max

Look-up Table(RGBA-to-RGBA)

Look-up Table(Index-to-RGBA)

Scale & Bias Shift & Add

Index pixels RGBA pixels

Pixel RectangleRasterization

corefunctionality

ARB_imagingsubset

discard

discard

Page 25: SIGGRAPH Asia 2008 Modern OpenGL

25

OpenGL 1.2.1OpenGL 1.2.1

1992 1994 1996 1998 2000 2002 2004 2006 2008

OpenGL 1.0 approvedOpenGL 1.1

OpenGL 1.2

Multitexture added (1.2.1)

OpenGL 1.3

OpenGL 1.4

OpenGL 1.5

OpenGL 2.0

OpenGL 2.1

OpenGL 3.0

SGIInfinite-Reality

OpenGL UtilityToolkit (GLUT)released

Mesa3Dopensource

KhronoscontrolsOpenGL

1st GPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)

NT 3.51bringOpenGLto PCs

OpenGL ES for embedded devices

1st commercialOpenGLimplementation(DEC)

• Multi-texture (optional)

Page 26: SIGGRAPH Asia 2008 Modern OpenGL

26

Multi-texture Poster Child:Quake 2 Light MapsMulti-texture Poster Child:Quake 2 Light Maps

(modulate)

=

lightmaps onlylightmaps only

decal onlydecal only

combined scenecombined scene

Page 27: SIGGRAPH Asia 2008 Modern OpenGL

27

GeForce 256 (NV10) View of OpenGLGeForce 256 (NV10) View of OpenGL

3D Applicationor Game

• Vertex pulling (vertex buffer objects) via DMA• Dual-texture, cube maps, and register combiners

OpenGL API

GPUFront End

VertexAssembly

VertexTransform & Lighting

Primitive Assembly,Clipping, Setup,

and Rasterization

Texture &Fog

Texture Fetch

RasterOperations

Framebuffer Access

Memory Interface

CPU – GPU Boundary

1999

Attribute Fetch

Page 28: SIGGRAPH Asia 2008 Modern OpenGL

28

Hardware Cube MapsHardware Cube Maps

Rendered sceneRendered scene

Dynamically Dynamically createdcreatedcube map cube map imageimage

Image credit:“Guts” GeForce 2 GTS demo,Thant Thessman

Page 29: SIGGRAPH Asia 2008 Modern OpenGL

29

OpenGL 1.3OpenGL 1.3

1992 1994 1996 1998 2000 2002 2004 2006 2008

OpenGL 1.0 approvedOpenGL 1.1

OpenGL 1.2

Multitexture added (1.2.1)

OpenGL 1.3

OpenGL 1.4

OpenGL 1.5

OpenGL 2.0

OpenGL 2.1

OpenGL 3.0

SGIInfinite-Reality

OpenGL UtilityToolkit (GLUT)released

Mesa3Dopensource

KhronoscontrolsOpenGL

1st GPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)

NT 3.51bringOpenGLto PCs

OpenGL ES for embedded devices

1st commercialOpenGLimplementation(DEC)

• Multi-texture (required now)• Cube map texturing• Compressed texture formats• Texture border clamp• Texture environment functions

• Add, combine, dot product

• Multisample anti-aliasing

• Transpose matrix

Page 30: SIGGRAPH Asia 2008 Modern OpenGL

30

GeForce 3 & 4 Ti (NV2x) View of OpenGLGeForce 3 & 4 Ti (NV2x) View of OpenGL

3D Applicationor Game

• Programmable vertex processing• Highly configurable fragment processing

OpenGL API

GPUFront End

VertexAssembly

VertexProgram

Primitive Assembly,Clipping, Setup,

and Rasterization

Multi-textureshaders &

Combiners

Texture Fetch

RasterOperations

Framebuffer Access

Memory Interface

CPU – GPU Boundary

2001

Attribute Fetch

Page 31: SIGGRAPH Asia 2008 Modern OpenGL

31

Vertex ProgrammabilityVertex Programmability

Paletted matrixPaletted matrixskinningskinning

Twister vertex programTwister vertex program

Per-vertex Per-vertex cartoon cartoon shadingshading

Page 32: SIGGRAPH Asia 2008 Modern OpenGL

32

Configurable Fragment ProcessingConfigurable Fragment Processing

Bumpy shiny environment mappingBumpy shiny environment mapping Chromatic Chromatic aberration aberration

Offset 2D bump mappingOffset 2D bump mapping Depth spritesDepth sprites

Page 33: SIGGRAPH Asia 2008 Modern OpenGL

33

OpenGL 1.4OpenGL 1.4

1992 1994 1996 1998 2000 2002 2004 2006 2008

OpenGL 1.0 approvedOpenGL 1.1

OpenGL 1.2

Multitexture added (1.2.1)

OpenGL 1.3

OpenGL 1.4

OpenGL 1.5

OpenGL 2.0

OpenGL 2.1

OpenGL 3.0

SGIInfinite-Reality

OpenGL UtilityToolkit (GLUT)released

Mesa3Dopensource

KhronoscontrolsOpenGL

1st GPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)

NT 3.51bringOpenGLto PCs

OpenGL ES for embedded devices

1st commercialOpenGLimplementation(DEC)

• Automatic mipmap generation• Shadow-mapping

• Depth textures and shadow comparisons• Texture level-of-detail bias• Texture mirrored repeat wrap mode• Multi-texture combination

• Fog coordinate• Secondary color

• Configurable point size attenuation

• Color blending improvements• Stencil wrap operations

• Window-space raster position specification

Page 34: SIGGRAPH Asia 2008 Modern OpenGL

34

Hardware Shadow MappingHardware Shadow Mapping

Without shadow mappingWithout shadow mapping WithWith shadow mapping shadow mapping

Depth map from light Depth map from light source’s viewsource’s view

Darker is closerDarker is closer

lightlightpositionposition

Projective Texturing (1.0) &Polygon Offset (1.1)

key enablers

Page 35: SIGGRAPH Asia 2008 Modern OpenGL

35

Shadow Mapping ExplainedShadow Mapping Explained

Planar distance from lightPlanar distance from light Depth map projected onto sceneDepth map projected onto scene

≤≤ ==lesslessthanthan

True “un-shadowed” True “un-shadowed” region shown greenregion shown green

equalsequals

Page 36: SIGGRAPH Asia 2008 Modern OpenGL

36

OpenGL 1.5OpenGL 1.5

1992 1994 1996 1998 2000 2002 2004 2006 2008

OpenGL 1.0 approvedOpenGL 1.1

OpenGL 1.2

Multitexture added (1.2.1)

OpenGL 1.3

OpenGL 1.4

OpenGL 1.5

OpenGL 2.0

OpenGL 2.1

OpenGL 3.0

SGIInfinite-Reality

OpenGL UtilityToolkit (GLUT)released

Mesa3Dopensource

KhronoscontrolsOpenGL

1st GPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)

NT 3.51bringOpenGLto PCs

OpenGL ES for embedded devices

1st commercialOpenGLimplementation(DEC)

• Vertex buffer objects (VBOs)

• Occlusion queries

• Generalized shadow mapping functions

Page 37: SIGGRAPH Asia 2008 Modern OpenGL

37

GeForce FX (NV3x) View of OpenGLGeForce FX (NV3x) View of OpenGL

3D Applicationor Game

• Programmable fragment processing• 16 texture units, IEEE 754 32-bit floating-point

• Vertex program branchingOpenGL API

GPUFront End

VertexAssembly

VertexProgram

Primitive Assembly,Clipping, Setup,

and Rasterization

FragmentProgram

Texture Fetch

RasterOperations

Framebuffer Access

Memory Interface

CPU – GPU Boundary

2003

Attribute Fetch

Page 38: SIGGRAPH Asia 2008 Modern OpenGL

38

Floating-point Fragment ProgrammabilityFloating-point Fragment Programmability

Page 39: SIGGRAPH Asia 2008 Modern OpenGL

39

OpenGL Fragment Program Flowchart

More Instructions?

Read Interpolants and/or Registers

Map Input values: Swizzle, Negate, etc.

Perform InstructionMath / Operation

Write OutputRegister with

Masking

Begin Fragment

Fetch & Decode Next Instruction

Temporary Registers

initialized to 0,0,0,0

OutputDepth & Color

Registersinitialized to 0,0,0,1

Initialize Parameters

Emit Output Registers as

Transformed Vertex

EndFragment

Fragment Program

Instruction Loop

Fragment Program

Instruction Memory

TextureFetch

Instruction?

yes

no

no

Compute Texture Address & Level-of-detail & Fetch

Texels

Filter Texels

yes

Texture Images

PrimitiveInterpolants

Page 40: SIGGRAPH Asia 2008 Modern OpenGL

40

Key Trend: Configurability becomes ProgrammabilityKey Trend: Configurability becomes Programmability

Fixed-function Programmable

SimpleConfigurability

ComplexConfigurability

Page 41: SIGGRAPH Asia 2008 Modern OpenGL

41

Core OpenGL fragment texturing & coloringPoint

Rasterization

LineRasterization

PolygonRasterization

Pixel RectangleRasterization

BitmapRasterization

FromPrimitiveAssembly

DrawPixels

Bitmap

Conventional Texture Fetching

TextureEnvironmentApplication

Color Sum

FogTo raster

operations

CoverageApplication

Texture Unit 0

Texture Unit 1

Texture Unit 0

Texture Unit 1

Page 42: SIGGRAPH Asia 2008 Modern OpenGL

42

NV1x OpenGL fragment texturing & coloring

PointRasterization

LineRasterization

PolygonRasterization

Pixel RectangleRasterization

BitmapRasterization

FromPrimitiveAssembly

DrawPixels

Bitmap

Conventional Texture Fetching

TextureEnvironmentApplication

Color Sum

FogTo raster

operations

CoverageApplication

RegisterCombiners

Texture Unit 0

General Stage 1

Final Stage

Texture Unit 1

General Stage 0

Texture Unit 0

Texture Unit 1

GL_REGISTER_COMBINERS_NVenable

Page 43: SIGGRAPH Asia 2008 Modern OpenGL

43

Texture Shader 3

…Texture Shader 1

Texture Shader 0

RegisterCombiners

NV2x OpenGL fragment texturing & coloring

PointRasterization

LineRasterization

PolygonRasterization

Pixel RectangleRasterization

BitmapRasterization

FromPrimitiveAssembly

DrawPixels

Bitmap

Conventional Texture Fetching

TextureEnvironmentApplication

Color Sum

Fog

To rasteroperations

CoverageApplication

Texture Shaders

General Stage 1

Final Combiner

General Stage 0

General Stage 7

…Texture Unit 3

…Texture Unit 1

Texture Unit 0

Texture Unit 3

…Texture Unit 1

Texture Unit 0

GLTEXTURE_SHADER_NVenable

GL_REGISTER_COMBINERS_NVenable

Page 44: SIGGRAPH Asia 2008 Modern OpenGL

44

Fragment ProgramInstruction 0

Texture Shader 3

…Texture Shader 1

Texture Shader 0

NV3x OpenGL fragment texturing & coloringPoint

Rasterization

LineRasterization

PolygonRasterization

Pixel RectangleRasterization

BitmapRasterization

FromPrimitiveAssembly

DrawPixels

Bitmap

Conventional Texture Fetching

TextureEnvironmentApplication

Color Sum

Fog

To rasteroperations

CoverageApplication

Texture Shaders

General Stage 1

Final Combiner

General Stage 0

General Stage 7

Texture Unit 3

…Texture Unit 1

Texture Unit 0

Texture Unit 3

…Texture Unit 1

Texture Unit 0

Fragment Program

Fragment ProgramInstruction 1023

GL_REGISTER_COMBINERS_NVenable

GLTEXTURE_SHADER_NVenable

GL_FRAGMENT_PROGRAM_NVenable

!!FP1.0 or!!ARBfp1.0 programs

Page 45: SIGGRAPH Asia 2008 Modern OpenGL

45

OpenGL 2.0OpenGL 2.0

1992 1994 1996 1998 2000 2002 2004 2006 2008

OpenGL 1.0 approvedOpenGL 1.1

OpenGL 1.2

Multitexture added (1.2.1)

OpenGL 1.3

OpenGL 1.4

OpenGL 1.5

OpenGL 2.0

OpenGL 2.1

OpenGL 3.0

SGIInfinite-Reality

OpenGL UtilityToolkit (GLUT)released

Mesa3Dopensource

KhronoscontrolsOpenGL

1st GPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)

NT 3.51bringOpenGLto PCs

OpenGL ES for embedded devices

1st commercialOpenGLimplementation(DEC)

• Programmable shading• OpenGL Shading Language (GLSL)

• Multiple color buffer rendering targets

• Non-power-of-two texture dimensions

• Point sprites

• Separate blend equation

• Two-sided stencil testing

Page 46: SIGGRAPH Asia 2008 Modern OpenGL

46

GeForce 6 & 7 (NV4x/G7x) View of OpenGLGeForce 6 & 7 (NV4x/G7x) View of OpenGL

3D Applicationor Game

• Limited vertex texturing• Fragment branching• Multiple render targets & floating-point blending

OpenGL API

GPUFront End

VertexAssembly

VertexProgram

Primitive Assembly,Clipping, Setup,

and Rasterization

FragmentProgram

Texture Fetch

RasterOperations

Framebuffer Access

Memory Interface

CPU – GPU Boundary

2004

Attribute Fetch

Page 47: SIGGRAPH Asia 2008 Modern OpenGL

47

PrimitiveProgram

GeForce 8 & 9 (G8x/G9x) View of OpenGLGeForce 8 & 9 (G8x/G9x) View of OpenGL

3D Applicationor Game

• Primitive (geometry) programs• Parameter reads from buffer objects• Transform feedback (stream out)

OpenGL API

GPUFront End

VertexAssembly

VertexProgram

,Clipping, Setup,

and Rasterization

FragmentProgram

Texture Fetch

RasterOperations

Framebuffer Access

Memory Interface

CPU – GPU Boundary

2006

Attribute Fetch

PrimitiveAssembly

Parameter Buffer Read

Page 48: SIGGRAPH Asia 2008 Modern OpenGL

48

PrimitiveProgram

OpenGL Pipeline Fixed-function StepsOpenGL Pipeline Fixed-function Steps

• Much of functional pipeline remains fixed-function• Vital to maintaining performance and data flow

• Hard to compete with hard-wired rasterization, Zcull, and pixel compression

GPUFront End

VertexAssembly

VertexProgram

,Clipping, Setup,

and Rasterization

FragmentProgram

Texture Fetch

RasterOperations

Framebuffer Access

Memory Interface 2006

Attribute Fetch

PrimitiveAssembly

Parameter Buffer Read

Page 49: SIGGRAPH Asia 2008 Modern OpenGL

49

PrimitiveProgram

OpenGL Pipeline Programmable DomainsOpenGL Pipeline Programmable Domains

• New geometry shader domain for per-primitive programmable processing• Unified Streaming Processor Array (SPA) architecture means same

capabilities for all domains

GPUFront End

VertexAssembly

VertexProgram

,Clipping, Setup,

and Rasterization

FragmentProgram

Texture Fetch

RasterOperations

Framebuffer Access

Memory Interface 2006

Attribute Fetch

PrimitiveAssembly

Parameter Buffer Read

Can beunifiedhardware!

Page 50: SIGGRAPH Asia 2008 Modern OpenGL

50

OpenGL 2.1OpenGL 2.1

1992 1994 1996 1998 2000 2002 2004 2006 2008

OpenGL 1.0 approvedOpenGL 1.1

OpenGL 1.2

Multitexture added (1.2.1)

OpenGL 1.3

OpenGL 1.4

OpenGL 1.5

OpenGL 2.0

OpenGL 2.1

OpenGL 3.0

SGIInfinite-Reality

OpenGL UtilityToolkit (GLUT)released

Mesa3Dopensource

KhronoscontrolsOpenGL

1st GPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)

NT 3.51bringOpenGLto PCs

OpenGL ES for embedded devices

1st commercialOpenGLimplementation(DEC)

• OpenGL Shading Language (GLSL) improvements

• Non-square matrices

• Pixel buffer objects (PBOs)

• sRGB color space texture formats

Page 51: SIGGRAPH Asia 2008 Modern OpenGL

51

OpenGL 3.0OpenGL 3.0

1992 1994 1996 1998 2000 2002 2004 2006 2008

OpenGL 1.0 approvedOpenGL 1.1

OpenGL 1.2

Multitexture added (1.2.1)

OpenGL 1.3

OpenGL 1.4

OpenGL 1.5

OpenGL 2.0

OpenGL 2.1

OpenGL 3.0

SGIInfinite-Reality

OpenGL UtilityToolkit (GLUT)released

Mesa3Dopensource

KhronoscontrolsOpenGL

1st GPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)

NT 3.51bringOpenGLto PCs

OpenGL ES for embedded devices

1st commercialOpenGLimplementation(DEC)

• OpenGL Shading Language (GLSL) improvements• New texture fetches• True integer data types and operators• switch/case/default flow control statements

• Conditional rendering based on occlusion query results• Transform feedback• Vertex array objects

• Floating-point textures, color buffers, and depth buffers• Half-precision vertex arrays

• Texture arrays• Integer textures• Red and red-green texture formats

• Compressed red and red-green formats

• Framebuffer objects (FBOs)• Packed depth-stencil pixel formats• Per-color buffer clearing, blending, and masking• sRGB color space color buffers

• Fine-grain buffer mapping and flushing

Page 52: SIGGRAPH Asia 2008 Modern OpenGL

52

Areas of 3.0 Functionality ImprovementAreas of 3.0 Functionality Improvement

• Programmability• Shader Model 4.0 features• OpenGL Shading Language (GLSL) 1.30

• Texturing• New texture representations and formats

• Framebuffer operations• Framebuffer objects• New formats• New copy (blit), clear, blend, and masking operations

• Buffer management • Non-blocking and fine-grain update of buffer object data stores

• Vertex processing• Vertex array configuration objects• Conditional rendering for occlusion culling• New half-precision vertex attribute formats

• Pixel processing• New half-precision external pixel formats

All BrandNew Core

Features

Page 53: SIGGRAPH Asia 2008 Modern OpenGL

53

OpenGL 3.0 ProgrammabilityOpenGL 3.0 Programmability

• Shader Model 4.0 additions• True signed & unsigned integer values

• True integer operators: ^, &, |, <<. >>, %,~• Texture additions

• Texture arrays• Base texture size queries• Texel offsets to fetches• Explicit LOD and derivative control

• Integer samplers• Interpolation modifiers: centroid, noperspective, and flat • Vertex array element number: gl_VertexID

• OpenGL Shading Language (GLSL) improvements• ## concatenation in pre-processor for macros•switch/case/default statements

Page 54: SIGGRAPH Asia 2008 Modern OpenGL

54

OpenGL 3.0 Texturing FunctionalityOpenGL 3.0 Texturing Functionality

• Texture representation• Texture arrays: indexed access to a set of 1D or 2D texture

images• Texture formats

• Floating-point texture formats• Single-precision (32-bit, IEEE s23e8)• Half-precision (16-bit, s10e5)

• Red & red/green texture formats• Intended as FBO framebuffer formats too

• Compressed red & red/green texture formats• Shared exponent texture formats• Packed floating-point texture formats

Page 55: SIGGRAPH Asia 2008 Modern OpenGL

55

Texture ArraysTexture Arrays

• Conventional texture = One logical pre-filtered image

• Texture array = index-able plurality of pre-filtered images• Rationale is fewer texture object binds when drawing different objects• No filtering between mipmap sets in a texture array• All mipmap sets in array share same format/border & base dimensions• Both 1D and 2D texture arrays• Require shaders, no fixed-function support

• Texture image specification• Use glTexImage3D, glTexSubImage3D, etc. to load 2D texture arrays

• No new OpenGL commands for texture arrays

• 3rd dimension specifies integer array index• No halving in 3rd dimension for mipmaps

• So 64×128x17 reduces to 32×64×17all the way to 1×1×17

Page 56: SIGGRAPH Asia 2008 Modern OpenGL

56

Texture Arrays ExampleTexture Arrays Example

• Multiple skins packed in texture array• Motivation: binding to one multi-skin texture array avoids texture

bind per objectTexture array index

0 1 2 3 4

0

1

234

Mip

map

lev

el i

nd

ex

Page 57: SIGGRAPH Asia 2008 Modern OpenGL

57

Compact Floating-point TexturesCompact Floating-point Textures

• Shared exponent & packed float representations are ideal of High Dynamic Range (HDR) applications

Page 58: SIGGRAPH Asia 2008 Modern OpenGL

58

Compact Floating-point Texture Formats Compact Floating-point Texture Formats

• Packed float format• No sign bit, independent exponents

• Shared exponent format • No sign bit, shared exponent, no implied leading 1

5-bitmantissa

5-bitexponent

6-bitmantissa

5-bitexponent

6-bitmantissa

5-bitexponent

bit 31 bit 0

9-bitmantissa

5-bitshared exponent

9-bitmantissa

9-bitmantissa

bit 31 bit 0

Page 59: SIGGRAPH Asia 2008 Modern OpenGL

59

1- and 2-componentBlock Compression Scheme1- and 2-componentBlock Compression Scheme

• Basic 1-component block compression format• Borrowed from alpha compression scheme of S3TC 5

8-bit B8-bit A

2 min/max values

64 bits total per block

+

4x4 Pixel Decoded BlockEncoded Block

16 pixels x 8-bit/componet = 128 bits decodedso effectively 2:1 compression

16 bits

Page 60: SIGGRAPH Asia 2008 Modern OpenGL

60

Framebuffer OperationsFramebuffer Operations

• Framebuffer objects• Standardized framebuffer objects (FBOs) for rendering to textures

and renderbuffers• Render-to-texture

• Multisample renderbuffers for FBOs

• Framebuffer operations• Copies from one FBO to another, including multisample data• Per-color attachment color clears, blending, and write masking

• Framebuffer formats• Floating-point color buffers• Floating-point depth buffers• Rendering into framebuffer format with 3 small unsigned floating-

point values packed in a 32-bit value• Rendering into sRGB color space framebuffers

Page 61: SIGGRAPH Asia 2008 Modern OpenGL

61

Framebuffer Object ExampleFramebuffer Object Example

• Depth peeling for correctly ordered transparency• Great render-to-texture application for FBOs

Page 62: SIGGRAPH Asia 2008 Modern OpenGL

62

Depth Peeling Behind the ScenesDepth Peeling Behind the Scenes

• Depth buffer has closest fragment at all pixels• Save depth buffer

• Render again, but use depth buffer as shadow map

• Discard fragment in front of shadow map’s depth value

• Effectively peels one layer of depth!• Resulting color buffer is 2nd closest fragment

• And depth buffer for 2nd closest fragments’ depth

• Now repeat peeling more layers• Use ping-pong depth buffer scheme• Use occlusion query to detect when no more

fragments to peel• Composite color layers front-to-back (or back-to-

front)• Front-to-back peeling can be done during the

peeling process

Page 63: SIGGRAPH Asia 2008 Modern OpenGL

63

Delicate Color Fidelity with sRGBDelicate Color Fidelity with sRGB

• Problem: PC display devices have non-linear (sRGB) display gamut—delicate color shading looks wrong

Conventional

rendering(uncorrect

ed color)

Gamma correct(sRGB rendered)

Softer andmore natural

Unnaturallydeep facial

shadows

NVIDIA’s Adriana GeForce 8 Launch Demo

Page 64: SIGGRAPH Asia 2008 Modern OpenGL

64

What is sRGB?What is sRGB?

• A standard color space• Intended for monitors, printers, and the Internet• Created cooperatively by HP and Microsoft• Non-linear, roughly gamma of 2.2• Intuitively “encodes more dark values”

• OpenGL 2.1 already added sRGB texture formats• Texture fetch converts sRGB to linear RGB, then filters• Result takes more than 8-bit fixed-point to represent in shader

• 3.0 adds complementary sRGB framebuffer support• “sRGB correct blending” converts framebuffer sRGB to linear,

blend with linear color from shader, then convert back to sRGB• Works with FrameBuffer Objects (FBOs)

sRGB chromaticity

Page 65: SIGGRAPH Asia 2008 Modern OpenGL

65

So why sRGB? Standard Windows Display is Not Gamma CorrectedSo why sRGB? Standard Windows Display is Not Gamma Corrected

• 25+ years of PC graphics, icons, and images depend on not gamma correcting displays• sRGB textures and color buffers compensates for this

“Expected” appearance ofWindows desktop & icons

but 3D lighting too dark

Wash-ed out desktop appearance ifcolor response was linearbut 3D lighting is correct

Gamma 1.0

Gamma2.2

linearcolor

response

Page 66: SIGGRAPH Asia 2008 Modern OpenGL

66

Vertex ProcessingVertex Processing

• Vertex array configuration• Objects to manage vertex array configuration client

state• Half-precision floating-point vertex array formats

• Vertex output streaming• Stream transformed vertex results into buffer object

data stores• Occlusion culling

• Skip rendering based on occlusion query result

Page 67: SIGGRAPH Asia 2008 Modern OpenGL

67

MiscellaneousMiscellaneous

• Pixel Processing• Half-precision floating-point pixel external formats

• Buffer Management• Non-blocking and fine-grain update of buffer object data

stores

Page 68: SIGGRAPH Asia 2008 Modern OpenGL

68

ARB Extensions to OpenGL 3.0ARB Extensions to OpenGL 3.0

• OpenGL 3.0 standard provides new ARB extensions• Extensions go beyond OpenGL 3.0

• Standardized at same time as OpenGL 3.0• Support features in hardware today

• Specifically• ARB_geometry_shader4—provides per-primitive programmable

processing• ARB_draw_instanced—gives shader access to instance ID• ARB_texture_buffer_object—allows buffer object to be sampled

as a huge 1D unfiltered texture• Shipping today

• NVIDIA driver provides all three

Page 69: SIGGRAPH Asia 2008 Modern OpenGL

69

Transform Feedback for Terrain Generation by Recursive SubdivisionTransform Feedback for Terrain Generation by Recursive Subdivision

• Geometry shaders + transform feedback

1. Render quads (use 4-vertex line adjacency primitive) from vertex buffer object

2. Fetch height field3. Stream subdivided positions and normals

to transform feedback “other” buffer object

4. Use buffer object as vertex buffer5. Repeat, ping-pong buffer objects

Computation and data all stays on the GPU!

Page 70: SIGGRAPH Asia 2008 Modern OpenGL

70

Skin DeformationSkin Deformation

• Capture & re-use geometric deformations

Transform feedback allows the GPU to calculate the interactive, deforming elastic skin of the frog

Page 71: SIGGRAPH Asia 2008 Modern OpenGL

71

Silhouette Edge RenderingSilhouette Edge Rendering

• Uses geometry shader

silhouetteedgedetectiongeometryshader

Complete mesh

Silhouette edges

Useful for non-photorealistic rendering

Looks like human sketching

Page 72: SIGGRAPH Asia 2008 Modern OpenGL

72

More Geometry Shader ExamplesMore Geometry Shader Examples

Shimmering point

sprites

Generate fins for lines

Generate shells for fur rendering

Page 73: SIGGRAPH Asia 2008 Modern OpenGL

73

Improved Interpolation TechniquesImproved Interpolation Techniques

•Using geometry shader functionality

Quadratic normal interpolation

True quadrilateral rendering with mean value coordinate interpolation

Page 74: SIGGRAPH Asia 2008 Modern OpenGL

74

“Fair” Quadrilateral Interpolation“Fair” Quadrilateral Interpolation

• glBegin(GL_QUADS);• glColor3fv(red); glVertex3fv(lowerLeft);

• glColor3fv(green); glVertex3fv(lowerRight);

• glColor3fv(red); glVertex3fv(upperRight);

• glColor3fv(blue); glVertex3fv(upperLeft);

• glEnd();

• Geometry shader actually operates on 4-vertex GL_LINE_ADJACENCY primitives instead of quads

Wrong, slashtriangle split

Wrong, backslashtriangle split

Better: Mean valuecoordinates

Page 75: SIGGRAPH Asia 2008 Modern OpenGL

75

OpenGL 2.x ARB ExtensionsOpenGL 2.x ARB Extensions

• Many OpenGL 3.0 extensions have corresponding ARB extensions for OpenGL 2.1 implementations to advertise• Helps get 3.0 functionality out sooner, rather than later

• New ARB extensions for 3.0 functionality• ARB_framebuffer_object—framebuffer objects (FBOs) for render-to-texture • ARB_texture_rg—red and red/green texture formats • ARB_map_buffer_region—non-blocking and fine-grain update of buffer object

data stores • ARB_instanced_arrays—instance ID available to shaders• ARB_half_float_vertex—half-precision floating-point vertex array formats • ARB_framebuffer_sRGB—rendering into sRGB color space framebuffers• ARB_texture_compression_rgtc—compressed red and red/green texture

formats• ARB_depth_buffer_float—floating-point depth buffers • ARB_vertex_array_object—objects to manage vertex array configuration client

state

Page 76: SIGGRAPH Asia 2008 Modern OpenGL

76

Beyond OpenGL 3.0Beyond OpenGL 3.0

OpenGL 3.0 • EXT_gpu_shader4• NV_conditional_render• ARB_color_buffer_float• NV_depth_buffer_float• ARB_texture_float• EXT_packed_float• EXT_texture_shared_exponent• NV_half_float• ARB_half_float_pixel• EXT_framebuffer_object• EXT_framebuffer_multisample• EXT_framebuffer_blit• EXT_texture_integer• EXT_texture_array• EXT_packed_depth_stencil• EXT_draw_buffers2• EXT_texture_compression_rgtc• EXT_transform_feedback• APPLE_vertex_array_object• EXT_framebuffer_sRGB• APPLE_flush_buffer_range (modified)

In GeForce 8, 9, & 2xx Series but not yet core

• EXT_geometry_shader4 (now ARB)• EXT_bindable_uniform• NV_gpu_program4• NV_parameter_buffer_object• EXT_texture_compression_latc• EXT_texture_buffer_object (now ARB)• NV_framebuffer_multisample_coverage• NV_transform_feedback2• NV_explicit_multisample• NV_multisample_coverage• EXT_draw_instanced (now ARB)• EXT_direct_state_access• EXT_vertex_array_bgra• EXT_texture_swizzle

Plenty of proven OpenGL extensionsfor OpenGL Working Group

to draw upon for OpenGL 3.1

Page 77: SIGGRAPH Asia 2008 Modern OpenGL

77

OpenGL Version EvolutionOpenGL Version Evolution

• Now OpenGL is part of Khronos Group• Previously OpenGL’s evolution was governed by the OpenGL

Architectural Review Board (ARB)• Now officially a Khronos working group

• Khronos also standardizes OpenCL, OpenVG, etc.

• How OpenGL version updates happen• OpenGL participants proposing extensions• Successful extensions are polished and incorporated into core• OpenGL 3.0 is great example of this process

• Roughly 20 extensions folded into “core”• Just 3 of those previously unimplemented

Page 78: SIGGRAPH Asia 2008 Modern OpenGL

78

29%

17%

15%

15%4%

2%

2%

2%2%

2%

2%

2%

1% 1%

4%

15%

Multi-vendor

Silicon Graphics

Architectural Review Board

NVIDIA

ATI

Apple

Mesa3D

Sun Microsystems

OpenGL ES

OpenML

IBM

Intense3D

Hewlett Packard

3Dfx

Other

EXT

SGISGISSGIX

ARB

NV

Others Others

OpenGL Extensions by SourceOpenGL Extensions by Source

• 44% of extensions are “core” or multi-vendor• Lots of vendors have initiated extensions

• Extending OpenGL is industry-wide collaboration

ATI

APPLE

MESA

Source: http://www.opengl.org/registry (Dec 2008)

Page 79: SIGGRAPH Asia 2008 Modern OpenGL

79

What’s Driving OpenGL Modernization?What’s Driving OpenGL Modernization?

Human desire for VisualIntuition and Entertainment

Embarrassing

Parallelism ofGraphics

Increasing

Semiconductor

Density

Particularly thehardware-amenable,

latency tolerantnature of rasterization Particularly

interactive video games

Page 80: SIGGRAPH Asia 2008 Modern OpenGL

80

Kurt Akeley

Principal Researcher

Microsoft Research Silicon Valley

OpenGL’s Evolution:A Personal Retrospective

Page 81: SIGGRAPH Asia 2008 Modern OpenGL

81

A A personalpersonal retrospective retrospectiveA A personalpersonal retrospective retrospective

• My background:• Silicon Graphics, 1982-2001• OpenGL, 1990-2004

• Today’s topics:• Computer architecture• Culture and process

• For a more complete coverage see:• https://graphics.stanford.edu/wikis/cs448-07-spring/• Mark Kilgard’s excellent course notes

Page 82: SIGGRAPH Asia 2008 Modern OpenGL

82

Jim Clark and the Geometry EngineJim Clark and the Geometry EngineJim Clark and the Geometry EngineJim Clark and the Geometry Engine

• This text is 24 points – Sub bullets look like this

The Geometry Engine: A VLSI Geometry System for GraphicsComputer Graphics, Volume 16, Number 3

(Proceedings of SIGGRAPH 1982) p127-133, 1982

Page 83: SIGGRAPH Asia 2008 Modern OpenGL

83

Jim’s helpers: the Stanford gangJim’s helpers: the Stanford gangJim’s helpers: the Stanford gangJim’s helpers: the Stanford gang

IRIS GL

Geometry Engine

IRIS GL

Hardware back-end

Hardware front-end

Page 84: SIGGRAPH Asia 2008 Modern OpenGL

84

Success! Success! (in 1995)(in 1995)Success! Success! (in 1995)(in 1995)

Page 85: SIGGRAPH Asia 2008 Modern OpenGL

85

Computer Architecture

Page 86: SIGGRAPH Asia 2008 Modern OpenGL

86

What is computer architecture?What is computer architecture?What is computer architecture?What is computer architecture?

• Architecture: “the minimal set of properties that determine what programs will run and what results they will produce”

• Implementation: “the logical organization of the [computer’s] dataflow and controls”

• Realization: “the physical structure embodying the implementation”

Page 87: SIGGRAPH Asia 2008 Modern OpenGL

87

Example: the analog clockExample: the analog clockExample: the analog clockExample: the analog clock

• Architecture• Circular dial divided into twelfths

• Hour hand (short) and minute hand (long)

Example from Computer Architecture, Concepts and Evolution,Gerrit A. Blaauw and Frederick P. Brooks, Jr., Addison-Wesley, 1997

• Implementation• A weight, driving a pendulum, or

• A spring, driving a balance wheel, or• A battery, driving an oscillator, or ….

• Realization• Gear ratios, pendulum lengths, battery sizes, ...

1211

10

6

8

9

7 5

4

2

1

3

Page 88: SIGGRAPH Asia 2008 Modern OpenGL

88

A useful distinctionA useful distinctionA useful distinctionA useful distinction

• NVIDIA 8800• SIMD, or• SPMD ?

L2

FB

SP SP

L1

TF

Th

rea

d P

roc

es

so

r

Vertex Thread Issue

Setup / Rasterization / ZCull

Primitive Thread Issue Fragment Thread Issue

Data Assembler

Application

SP SP

L1

TF

SP SP

L1

TF

SP SP

L1

TF

SP SP

L1

TF

SP SP

L1

TF

SP SP

L1

TF

SP SP

L1

TF

L2

FB

L2

FB

L2

FB

L2

FB

L2

FB

• Architecture:• SPMD

• Implementation:• SIMD

• Realization:• ASIC

SIMD = Single Instruction, Multiple DataSPMD = Single Program, Multiple DataASIC = Application Specific Integrated Circuit

Page 89: SIGGRAPH Asia 2008 Modern OpenGL

89

The mainstream viewThe mainstream viewThe mainstream viewThe mainstream view

• Table of Contents:• Fundamentals• Instruction Sets• Pipelining• Advanced Pipelining and ILP• Memory-Hierarchy Design• Storage Systems• Interconnection Networks• Multiprocessors

Page 90: SIGGRAPH Asia 2008 Modern OpenGL

90

OpenGL is an architectureOpenGL is an architecture

Blaauw/Brooks OpenGL

Different implementations

IBM 360 30/40/50/65/75Amdahl

SGI Indy/Indigo/InfiniteRealityNVIDIA GeForce, ATI Radeon, …

CompatibilityCode runs equivalently on all implementations

Top-level goalConformance tests, …

Intentional designIt’s an architecture, whether it was planned or not .

Carefully planned, though mistakes were made

ConfigurationCan vary amount of resource (e.g., memory)

No feature sub-settingConfiguration attributes (e.g., framebuffer)

Speed Not a formal aspect of architecture No performance queries

Validity of inputs No undefined operationAll errors specifiedNo side effectsLittle undefined operation

EnforcementWhen implementation errors are found, they are fixed.

Specification rules!

Page 91: SIGGRAPH Asia 2008 Modern OpenGL

91

But OpenGL is an APIBut OpenGL is an API(Application Programming Interface)(Application Programming Interface)But OpenGL is an APIBut OpenGL is an API(Application Programming Interface)(Application Programming Interface)

• Yes, Blaauw and Brooks talk about (computer) architecture as though it is always expressed as ISA (Instruction-Set Architecture)

• But …• API is just a higher-level programming interface• “Instruction-Set” Architecture implies other types of computer architectures (such as “API” Architecture)• OpenGL has evolved to include ISA-like interfaces (e.g., the interface below GLSL)

Page 92: SIGGRAPH Asia 2008 Modern OpenGL

92

We didn’t know …We didn’t know …We didn’t know …We didn’t know …

• No mention in spec (even 3.0)• “We view OpenGL as a state …”

• First use in “ARB”• Architecture Review Board• Coined by Bill Glazier from “Palo

Alto Architecture Review Board”• First formal usage (I know of)

• Mark J. Kilgard, Realizing OpenGL: two implementations of one architecture, Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware, p.45-55, August 03-04, 1997, Los Angeles, California, United States.

Page 93: SIGGRAPH Asia 2008 Modern OpenGL

93

Fred is magnanimousFred is magnanimousFred is magnanimousFred is magnanimous

Page 94: SIGGRAPH Asia 2008 Modern OpenGL

94

What is implied by “programmable”?What is implied by “programmable”?What is implied by “programmable”?What is implied by “programmable”?

• What does it mean to teach programming?• Does running a microwave oven count?• Does defining the geometry of a game “level” count?• Does specifying OpenGL modes count?

• This seems to be a somewhat open question• Butler Lampson couldn’t tell me .• Microsoft developers of teaching tools couldn’t tell me.• An online search wasn’t very helpful.

• Do we just “know it when we see it”?• Justice Potter Stewart’s definition of pornography

Page 95: SIGGRAPH Asia 2008 Modern OpenGL

95

My try at some formalizationMy try at some formalizationMy try at some formalizationMy try at some formalization

• Key ideas:• Composition choice of placement, sequence• Non-obvious semantics are interesting and novel• Imperative maybe there are other kinds of programming

“Composition, the organization of elemental operations into a non-obvious whole, is the essence of imperative programming.”

-- Kurt Akeley (Foreword to GPU Gems 3)

Page 96: SIGGRAPH Asia 2008 Modern OpenGL

96

OpenGL has always been programmableOpenGL has always been programmableOpenGL has always been programmableOpenGL has always been programmable

• Follows directly from being an “architecture”• OpenGL commands are instructions (API as an ISA)• They can be “composed” to create programs• Multi-pass rendering is the prototypical example• But Peercy et al. implemented a RenderMan shader compiler• Invariance was specified from the start (e.g., same fragments)

• We set out to enable “usage that we didn’t anticipate”• Obvious for a traditional ISA (e.g., IA32)• Not so obvious for a graphics API• Example: texture applies to all primitives, not just triangles

Page 97: SIGGRAPH Asia 2008 Modern OpenGL

97

Example multi-pass OpenGL “program”Example multi-pass OpenGL “program”Example multi-pass OpenGL “program”Example multi-pass OpenGL “program”

glEnable(GL_DEPTH_TEST);glDisable(GL_LIGHTING);glColorMask(false, false, false, false);glEnable(GL_POLYGON_OFFSET_FILL);glPolygonOffset(maxwidth/2, 1);draw solid objects

glDepthMask(GL_FALSE);glColorMask(true, true, true, true);glColor3f(linecolor);glDisable(GL_POLYGON_OFFSET_FILL);glPolygonMode(GL_FRONT_AND_BACK, GL_LINE);draw solid objects again

glDisable(GL_DEPTH_TEST);glPolygonMode(GL_FRONT_AND_BACK, GL_FILL);glDepthMask(GL_TRUE);

Hidden-line rendering

Page 98: SIGGRAPH Asia 2008 Modern OpenGL

98

Example multi-pass OpenGL “program”Example multi-pass OpenGL “program”Example multi-pass OpenGL “program”Example multi-pass OpenGL “program”

glEnable(GL_DEPTH_TEST);glDisable(GL_LIGHTING);glColorMask(false, false, false, false);glEnable(GL_POLYGON_OFFSET_FILL);glPolygonOffset(maxwidth/2, 1);draw solid objects

glDepthMask(GL_FALSE);glColorMask(true, true, true, true);glColor3f(1, 1, 1);glDisable(GL_POLYGON_OFFSET_FILL);glPolygonMode(GL_FRONT_AND_BACK, GL_LINE);glEnable(GL_CULL_FACE);glCullFace(GL_FRONT);draw solid objects againdraw true edges // for a complete hidden-line drawing

glDisable(GL_DEPTH_TEST);glPolygonMode(GL_FRONT_AND_BACK, GL_FILL);glDepthMask(GL_TRUE);glDisable(GL_CULL_FACE);

Additions to the hidden-line

algorithm (previous slide) highlighted

in red

Silhouette rendering

Page 99: SIGGRAPH Asia 2008 Modern OpenGL

99

InvarianceInvarianceInvarianceInvariance

Corollary 1 Fragment generation is invariant with respect to the state values marked with in Rule 2.

Page 100: SIGGRAPH Asia 2008 Modern OpenGL

100

• Intended to capture complete sequence of operations

• Also inspired design changes

Page 101: SIGGRAPH Asia 2008 Modern OpenGL

101

Vertex assembly

Primitive assembly

Rasterization

Fragment operations

Display

Vertex operations

Application

Primitive operations

Framebuffer

Texture memory

Pixel assembly(unpack)

Pixel operations

Pixel pack

Vertex pipelinePixel pipeline

Application

All primitives (including pixels)

are rasterized

All vertexes are treated equally (e.g., lighted)

All fragments are treated equally (e.g., texture

mapped and depth-buffered)

Not a required implementation,but “abstraction

distance” matters

Page 102: SIGGRAPH Asia 2008 Modern OpenGL

102

Culture and Process

Page 103: SIGGRAPH Asia 2008 Modern OpenGL

103

Suppose …Suppose …Suppose …Suppose …

http://www.opengl.org/registry/

Name ARB_texture_cube_mapName Strings GL_ARB_texture_cube_mapNotice Copyright OpenGL Architectural Review Board, 1999.Contact Michael Gold, NVIDIA (gold 'at' nvidia.com)Status Complete. Approved by ARB on 12/8/1999Version Last Modified Date: December 14, 1999Number ARB Extension #7Dependencies None. Written based on the wording of the OpenGL 1.2.1 specification but not dependent on it.Overview This extension provides a new texture generation scheme for cube map textures. Instead of the current texture providing a 1D, 2D, or 3D lookup into a 1D, 2D, or 3D texture image, the texture is a set of six 2D images representing the faces of a cube. The (s,t,r) texture coordinates …

Page 104: SIGGRAPH Asia 2008 Modern OpenGL

104

Complete specificationComplete specificationComplete specificationComplete specification

NameName StringsNoticeContactStatusVersionNumberDependenciesOverviewIssuesNew Procedures and FunctionsNew TokensAdditions to Chapter 2 of the OpenGL SpecificationAdditions to Chapter 3 of the OpenGL SpecificationAdditions to Chapter 4 of the OpenGL SpecificationAdditions to Chapter 5 of the OpenGL SpecificationAdditions to Chapter 6 of the OpenGL SpecificationAdditions to the GLX SpecificationErrorsNew State (type, query mechanism, initial value, attribute set, specification section)Usage Examples

Page 105: SIGGRAPH Asia 2008 Modern OpenGL

105

19 issues19 issues19 issues19 issues

The spec just linearly interpolates the reflection vectors computed per-vertex across polygons. Is there a problem interpolating reflection vectors in this way?

Probably. The better approach would be to interpolate the eye vector and normal vector over the polygon and perform the reflection vector computation on a per-fragment basis. Not doing so is likely to lead to artifacts because angular changes in the normal vector result in twice as large a change in the reflection vector as normal vector changes. The effect is likely to be reflections that become glancing reflections too fast over the surface of the polygon.

Note that this is an issue for REFLECTION_MAP_ARB, but not NORMAL_MAP_ARB.

Page 106: SIGGRAPH Asia 2008 Modern OpenGL

106

19 issues …19 issues …19 issues …19 issues …

What happens if an (s,t,q) is passed to cube map generation that is close to (0,0,0), ie. a degenerate direction vector?

RESOLUTION: Leave undefined what happens in this case (but may not lead to GL interruption or termination).

Note that a vector close to (0,0,0) may be generated as a result of the per-fragment interpolation of (s,t,r) between vertices.

Page 107: SIGGRAPH Asia 2008 Modern OpenGL

107

Trust and integrityTrust and integrityTrust and integrityTrust and integrity

• Lots of collaboration during the initial design• But final decisions made by a small group

• SGI played fair• OpenGL 1.0 didn’t favor SGI equipment (our ports were late )• SGI obeyed all conformance rules• SGI didn’t adjust the spec to match our equipment

• The ARB avoided marketing tasks such as benchmarks• We stuck with technical design issues

• We documented rigorously• Specification, man pages, …

Page 108: SIGGRAPH Asia 2008 Modern OpenGL

108

Five Kinkos in Austin TexasFive Kinkos in Austin TexasFive Kinkos in Austin TexasFive Kinkos in Austin Texas

The OpenGL Graphics System: A Specification (Version 1.1) Mark Segal Kurt Akeley

Editor: Chris Frazier Copyright © 1992-1997 Silicon Graphics, Inc.

This document contains unpublished information of Silicon Graphics, Inc.

Page 109: SIGGRAPH Asia 2008 Modern OpenGL

109

Extension factsExtension factsExtension factsExtension facts

• 442 Vendor and “EXT” extension specifications• Vendor: specific to a single vendor• EXT: shared by two or more vendors

• 56 “ARB” extensions• Standardized , likely to be in the next spec revision

• Lots of text …

Source: OpenGL extension registry, December 2008

Page 110: SIGGRAPH Asia 2008 Modern OpenGL

110

““Specification” sizesSpecification” sizes““Specification” sizesSpecification” sizes

Lines Words Chars

56 ARB Extensions 48,674 263,908 2,221,347

All 442 Extensions 209,426 1,076,008 9,079,063

King James Bible 114,535 823,647 5,214,085

New Testament 27,319 188,430 1,197,812

Old Testament 86,783 632,515 3,998,303

Page 111: SIGGRAPH Asia 2008 Modern OpenGL

111

Beyond the specificationBeyond the specificationBeyond the specificationBeyond the specification

• The ARB (now replaced with Khronos)• Rules of order, secretary, IP, …

• The extension process• Categories, token syntax, spec templates, enums, registry, …

• Licensing• Conformance• …

Page 112: SIGGRAPH Asia 2008 Modern OpenGL

112

SummarySummarySummarySummary

• Many mistakes made (see other presentations for lists)

• Created a sustainable culture that values quality and rigorous documentation

• Defined and evolved the architecture for interactive 3-D computer graphics

Page 113: SIGGRAPH Asia 2008 Modern OpenGL

113

Writing better OpenGL

Mark Kilgard

Principal System Software Engineer

NVIDIA

Page 114: SIGGRAPH Asia 2008 Modern OpenGL

114

MotivationMotivation

• Complex APIs and systems have pitfalls• After 17 years of designed evolution, OpenGL

certainly has its share• Normal documentation focus:

• What can you do?• Rather than: What should you do?

Page 115: SIGGRAPH Asia 2008 Modern OpenGL

115

Communicating Vertex DataCommunicating Vertex Data

• The way you learn OpenGL:• Immediate mode

• glBegin, glColor3f, glVertex3f, glEnd• Straightforward—no ambiguity about vertex data is

• All vertex components are function parameters• The problem—too function call intensive

• And all vertex data must flow through CPU

Page 116: SIGGRAPH Asia 2008 Modern OpenGL

116

Example ScenarioExample Scenario

• An OpenGL application has to render a set of rectangles

• Rectangle with its parameters• x, y, height, width, left color, right color, depth

(x,y)

depth order

0.0

1.0left side color

right side color

height

width

Page 117: SIGGRAPH Asia 2008 Modern OpenGL

117

Scene RepresentationScene Representation

• Each rectangle specified by following RectInfo structure:

• Array of RectInfo structures describes “scene”• Simplistic scene for sake of teaching

typedef struct { GLfloat x, y, width, height; GLfloat depth_order; GLfloat left_side_color[3]; // red, green, then blue GLfloat right_side_color[3]; // red, green, then blue} RectInfo;

Page 118: SIGGRAPH Asia 2008 Modern OpenGL

118

Example Scene and Rendering ResultExample Scene and Rendering Result

• Scene of 4 rectangles:

RectInfo rect_list[4] = { { 10, 20, 180, 140, 0.5, { 1, 1, 1 }, { 1, 0, 1 } }, { 30, 40, 100, 60, 0.5, { 1, 0, 0 }, { 0, 0, 1 } }, { 140, 60, 100, 80, 0.5, { 0, 0, 1 }, { 0, 1, 0 } }, { 70, 120, 80, 60, 0.7, { 1, 1, 0 }, { 0, 1, 1 } },};

• OpenGL-rendered result

Page 119: SIGGRAPH Asia 2008 Modern OpenGL

119

Immediate Mode Rectangle RenderingImmediate Mode Rectangle Rendering

• Given sized RectInfo array, render vertices of quads

1st vertex

2nd vertex

3rd vertex

4th vertex

void drawRectangles(int count, const RectInfo *list){ glBegin(GL_QUADS); for (int i=0; i<count; i++) { const RectInfo *r = &list[i];

glColor3fv(r->left_side_color); glVertex3f(r->x, r->y, r->depth_order); glColor3fv(r->right_side_color); glVertex3f(r->x+r->width, r->y, r->depth_order); // right_side_color “sticks” glVertex3f(r->x+r->width, r->y+r->height, r->depth_order); glColor3fv(r->left_side_color); glVertex3f(r->x, r->y+r->height, r->depth_order); } glEnd();}

Foreach

rectangle

Page 120: SIGGRAPH Asia 2008 Modern OpenGL

120

Critique of Immediate ModeCritique of Immediate Mode

• Advantages• Straightforward to code and debug• Easy-to-understand conceptual model

• Building stream of vertices with OpenGL commands• Avoids driver & application copies of vertex data• Flexible, allowing totally dynamic vertex generation

• Disadvantages• Rendering continuously streams attributes through CPU

• Pollutes CPU cache with vertex data• Function call intensive• Unable to saturate fast graphics hardware

• CPUs just too slow• Contrast with vertex array approach…

Page 121: SIGGRAPH Asia 2008 Modern OpenGL

121

Vertex Array ApproachVertex Array Approach

• Step 1: Copy vertex attributes into vertex arrays• From: RectInfo array (CPU memory)• To: interleaved arrays of vertex attributes (CPU

memory)• Step 2: To render

• Configure OpenGL vertex array client state• Use glEnableClientState, glVertexPointer,

glColorPointer• Render quads based on indices into vertex arrays

• Use glDrawArrays

Page 122: SIGGRAPH Asia 2008 Modern OpenGL

122

Vertex Array FormatVertex Array Format

• Interleave vertex attributes in color & position arrays

color

position

float = 4 bytes

vertex 0

vertex 1

redgreenblue

xyz

redgreenblue

xyz

color

position

24 bytesper vertex

Page 123: SIGGRAPH Asia 2008 Modern OpenGL

123

Step 1:Copy Rectangle Attributes to Vertex ArraysStep 1:Copy Rectangle Attributes to Vertex Arraysvoid *initVarrayRectangles(int count, const RectInfo *list)

{

void *varray = (char*) malloc(sizeof(GLfloat)*6*4*count);

GLfloat *p = varray;

for (int i=0; i<count; i++, p+=24) {

const RectInfo *r = &list[i];

// quad vertex #1

memcpy(&p[0], r->left_side_color, sizeof(GLfloat)*3);

p[3] = r->x; p[4] = r->y; p[5] = r->depth_order;

// quad vertex #2

memcpy(&p[6], r->right_side_color, sizeof(GLfloat)*3);

p[9] = r->x+r->width; p[10] = r->y; p[11] = r->depth_order;

// quad vertex #3

memcpy(&p[12], r->right_side_color, sizeof(GLfloat)*3);

p[15] = r->x+r->width; p[16] = r->y+r->height; p[17] = r->depth_order;

// quad vertex #4

memcpy(&p[18], r-> left_side_color, sizeof(GLfloat)*3);

p[21] = r->x; p[22] = r->y+r->height; p[23] = r->depth_order;

}

return varray;

}

Page 124: SIGGRAPH Asia 2008 Modern OpenGL

124

Step 2:Configure & Render from Vertex ArraysStep 2:Configure & Render from Vertex Arrays

void drawVarrayRectangles(int count, const RectInfo *list){ char *varray = initVarrayRectangles(count, list); const GLfloat *p = (const GLfloat*) varray; const GLsizei stride = sizeof(GLfloat)*6;//3 RGB floats,3 XYZ floats glColorPointer(/*rgb*/3, GL_FLOAT, stride, p+0); glVertexPointer(/*xyz*/3, GL_FLOAT, stride, p+3); glEnableClientState(GL_COLOR_ARRAY); glEnableClientState(GL_VERTEX_ARRAY); glDrawArrays(GL_QUADS, /*firstIndex*/0,

/*indexCount*/count*4); free(varray);}

Page 125: SIGGRAPH Asia 2008 Modern OpenGL

125

Critique ofSimplistic Vertex Array RenderingCritique ofSimplistic Vertex Array Rendering

• Advantages• Far fewer OpenGL commands issued

• Disadvantages• Every render with drawVarrayRectangles calls initVarrayRectangles

• Allocates, initializes, & frees vertex array memory every render

• Improve by separating vertex array construction from rendering

Page 126: SIGGRAPH Asia 2008 Modern OpenGL

126

Initialize Once, Render Many ApproachInitialize Once, Render Many Approach

• This routine expects base pointer returned by initVarrayRectangles

void drawInitializedVarrayRectangles(int count, const void *varray)

{

const GLfloat *p = (const GLfloat*) varray;

const GLsizei stride = sizeof(GLfloat)*6; // 3 RGB floats, 3 XYZ floats

glColorPointer(/*rgb*/3, GL_FLOAT, stride, p+0);

glVertexPointer(/*xyz*/3, GL_FLOAT, stride, p+3);

// Assume GL_COLOR_ARRAY and GL_VERTEX_ARRAY are already enabled!

glDrawArrays(GL_QUADS, /*firstIndex*/0, /*indexCount*/count*4);

}

Page 127: SIGGRAPH Asia 2008 Modern OpenGL

127

Client Memory Vertex Attribute TransferClient Memory Vertex Attribute Transfer

GPUProcessor

commandprocessor

vertexpuller

hardwarerenderingpipeline

CPU

command queue

CPU writes ofcommand + vertex data

GPU DMA transfer ofcommand + vertex data

application(client)

memory

vertexarray

vertexdata travelsthroughCPU

memoryreadsCPU

Page 128: SIGGRAPH Asia 2008 Modern OpenGL

128

Vertex Buffer Object Vertex Attribute PullingVertex Buffer Object Vertex Attribute Pulling

OpenGL(vertex)bufferobject

GPU

commandprocessor

vertexpuller

hardwarerenderingpipeline

CPU

command queue

CPU writes ofcommand + vertex indices

vertexarray

GPU DMA transfer ofcommand data

application(client)

memory

memoryreadsCPU

GPU DMAtransferof vertexdata—CPU never reads data

Page 129: SIGGRAPH Asia 2008 Modern OpenGL

129

Initializing Vertex Buffer Objects (VBOs)Initializing Vertex Buffer Objects (VBOs)

• Once using vertex arrays, easy to switch to VBOs• Make the vertex array as before• Then bind to buffer object and copy data to the buffer

void initVarrayRectanglesInVBO(GLuint bufferName,

int count, const RectInfo *list)

{

char *varray = initVarrayRectangles(count, list);

const GLsizei stride = sizeof(GLfloat)*6; // 3 RGB floats, 3 XYZ floats

const GLint numVertices = 4*count;

const GLsizeiptr bufferSize = stride*numVertices;

glBindBuffer(GL_ARRAY_BUFFER, bufferName);

glBufferData(GL_ARRAY_BUFFER, bufferSize, varray, GL_STATIC_DRAW);

free(varray);

}

Page 130: SIGGRAPH Asia 2008 Modern OpenGL

130

Rendering from Vertex Buffer ObjectsRendering from Vertex Buffer Objects

• Once initialized, glBindBuffer to bind to buffer ahead of vertex array configuration• Send offsets instead of points

void drawVarrayRectanglesFromVBO(GLuint bufferName,

int count)

{

const char *base = NULL;

const GLsizei stride = sizeof(GLfloat)*6; // 3 RGB floats, 3 XYZ floats

glBindBuffer(GL_ARRAY_BUFFER, bufferName);

glColorPointer(/*rgb*/3, GL_FLOAT, stride, base+0*sizeof(GLfloat));

glVertexPointer(/*xyz*/3, GL_FLOAT, stride, base+3*sizeof(GLfloat));

// Assume GL_COLOR_ARRAY and GL_VERTEX_ARRAY are already enabled!

glDrawArrays(GL_QUADS, /*firstIndex*/0, /*indexCount*/count*4);

}

Page 131: SIGGRAPH Asia 2008 Modern OpenGL

131

Understanding glBindBufferUnderstanding glBindBuffer

• Buffer object bindings are frequent point of confusion for programmers• What does glBindBuffer do really?

• Lots of buffer binding targets:• GL_ARRAY_BUFFER target—for vertex attribute arrays

• Query with GL_ARRAY_BUFFER_BINDING• GL_ARRAY_ELEMENT_BUFFER target—for vertex indices,

effectively topology• Query with GL_ELEMENT_ARRAY_BUFFER_BINDING

• Each vertex array has its own buffer, query with• GL_VERTEX_ARRAY_BUFFER_BINDING• GL_COLOR_ARRAY_BUFFER_BINDING• GL_TEXCOORD_ARRAY_BUFFER_BINDING, etc.

Page 132: SIGGRAPH Asia 2008 Modern OpenGL

132

Bind and Query Buffer TargetsBind and Query Buffer Targets

Buffer Bind Tokens• GL_ARRAY_BUFFER

• GL_ELEMENT_ARRAY_BUFFER

Buffer Query Tokens• GL_ARRAY_BUFFER_BINDING

• GL_ELEMENT_ARRAY_BUFFER_BINDING

• GL_COLOR_ARRAY_BUFFER_BINDING

• GL_VERTEX_ARRAY_BUFFER_BINDING

• GL_FOGCOORD_ARRAY_BUFFER_BINDING

• GL_TEXCOORD_ARRAY_BUFFER_BINDING

• GL_VERTEX_ATTRIB_ARRRAY_BUFFER_BINDING

Target tokensfor glBindBuffer

Query tokensto glGetIntegerv

Query tokensto glGetVertexAttribiv

Page 133: SIGGRAPH Asia 2008 Modern OpenGL

133

Latched Vertex Array Buffer BindingsLatched Vertex Array Buffer Bindings

• Here’s the confusing part:glBindBuffer(GL_ARRAY_BUFFER, 34);

glColorPointer(3, GL_FLOAT, color_stride, (void*)color_offset);

• The glBindBuffer doesn’t change any vertex array binding• The GL_ARRAY_BUFFER_BINDING state that

glBindBuffer sets does not itself affect rendering• It is the glColorPointer call that latches the array buffer

binding to change the color array’s buffer binding!• Same with all vertex array buffer bindings

Page 134: SIGGRAPH Asia 2008 Modern OpenGL

134

Binding Buffer Zero is SpecialBinding Buffer Zero is Special

• By default, vertex arrays don’t access buffer objects• Instead client memory is accessed

• This is because• The initial buffer binding for a context is zero• And zero is special

• Zero means access client memory• You can always resume client memory vertex array access for a given array like this

glBindBuffer(GL_ARRAY_BUFFER, 0); // use client memoryglColorPointer(3, GL_FLOAT, color_stride, color_pointer);

• Different treatment of the “pointer” parameter to vertex array specification commands• When the current array buffer binding is zero, the pointer value is a client

memory pointer• When the current array buffer binding is non-zero (meaning it names a buffer

object), the pointer value is “recast” as an offset from the beginning of the buffer• Once again

• The glBindBuffer(GL_ARRAY_BUFFER,0) call alone doesn’t change any vertex array buffer bindings

• It takes a vertex array specification command such as glColorPointer to latch the zero

ensures compatibilitywith pre-VBO OpenGL

Page 135: SIGGRAPH Asia 2008 Modern OpenGL

135

Texture Coordinate Set SelectorTexture Coordinate Set Selector

• A selector in OpenGL is• A state variable that controls what state a subsequent command

updates• Examples of commands that modify selectors

• glMatrixMode, glActiveTexture, glClientActiveTexture• A selector is different from latched state

• Latched state is a specified value that is set (or “latched”) when a subsequent command is called

• Pitfall warning: glTexCoordPointer both• Relies on the glClientActiveTexture command’s selector• And latches the current array buffer binding for the selected

texture coordinate vertex array• Example

glBindBuffer(GL_ARRAY_BUFFER, 34);glClientActiveTexture(GL_TEXTURE3);glTexCoordPointer(2, GL_FLOAT, uv_stride, (void*)buffer_offset);

buffer value glTexCoordPointer latches

selector glTexCoordPointer uses

Page 136: SIGGRAPH Asia 2008 Modern OpenGL

136

OpenGL’s Modern Buffer-centricProcessing ModelOpenGL’s Modern Buffer-centricProcessing Model

Vertex Array Buffer Object (VaBO)

Transform Feedback Buffer (XBO)

Parameter Buffer (PaBO)

Pixel Unpack Buffer (PuBO)

Pixel Pack Buffer (PpBO)Bindable

Uniform Buffer (BUB)

Texture Buffer Object (TexBO)

Vertex Puller

Vertex Shading

Geometry Shading

FragmentShading

Texturing

Array Element Buffer Object (VeBO)

Pixel Pipeline

vertex data

texel data

pixel data

parameter data(not ARB functionality yet)

glBegin, glDrawElements, etc.

glDrawPixels, glTexImage2D, etc.

glReadPixels,etc.

Framebuffer

Page 137: SIGGRAPH Asia 2008 Modern OpenGL

137

Usages of OpenGL Buffers ObjectsUsages of OpenGL Buffers Objects

• Vertex uses (VBOs)• Input to GL: Vertex attribute buffer objects

• Color, position, texture coordinate sets, etc.• Input to GL: Vertex element buffer objects

• Indices• Output from GL: Transform feedback

• Streaming vertex attributes out• Texture uses (TexBOs)

• Texturing from: Texture buffer objects• Pixel uses (PBOs)

• Output from GL: Pixel pack buffer objects• glReadPixels

• Input from GL: Pixel unpack buffer objects• glDrawPixels, glBitmap, glTexImage2D, etc.

• Shader uses (PaBOs, UBOs)• Input to assembly program: Parameter buffer objects• Input to GLSL program: Bind-able uniform buffer objects

Key point: OpenGL buffers are containers for bytes; a buffer is not tied to any particular usage

Page 138: SIGGRAPH Asia 2008 Modern OpenGL

138

Continuum of OpenGL UsageContinuum of OpenGL Usage

Tweak-able Performance

Immediatemode

Client vertexarrays

Vertex bufferobjects (VBOs)

Display lists

Page 139: SIGGRAPH Asia 2008 Modern OpenGL

139

Mid-session break

15 minutes

Page 140: SIGGRAPH Asia 2008 Modern OpenGL

140

Implementing OpenGL

Mark Kilgard

Principal System Software Engineer

NVIDIA

Page 141: SIGGRAPH Asia 2008 Modern OpenGL

141

Topics in OpenGL ImplementationTopics in OpenGL Implementation

• Dual-core OpenGL driver operation• What goes into a texture fetch?

• You give me some texture coordinates• I give you back a color• Could it be any simpler?

Page 142: SIGGRAPH Asia 2008 Modern OpenGL

142

OpenGL Drivers for Multi-core CPUsOpenGL Drivers for Multi-core CPUs

• Today dual-core processors in PCs is nearly ubiquitous• 4, 6, 8, and more cores are clearly coming

• How does OpenGL implementation exploit this trend?• Answer: develop dual-core OpenGL driver

Page 143: SIGGRAPH Asia 2008 Modern OpenGL

143

Dual-core OpenGL Driver ArchitectureDual-core OpenGL Driver Architecture

Application thread …

Application thread DContext 1

Application thread A

Application rendering threadApp

ICDICD’s app thread(tokenize thread)

Worker thread 1(server thread)

Application thread C

Application audio thread (no OpenGL)

Context 2

Application thread B

Application rendering thread

ICD’s app thread(tokenize thread)

Worker thread 2(server thread)

Circularcommand FIFO

Circularcommand FIFO

Page 144: SIGGRAPH Asia 2008 Modern OpenGL

144

Dual-core Performance ResultsDual-core Performance Results

• A well-behaved OpenGL application benefiting from a dual-core mode of OpenGL driver operations

0

50

100

150

200

250

Single core Dual core Null driver

Framesper second

Mode of OpenGL driver operation

Page 145: SIGGRAPH Asia 2008 Modern OpenGL

145

Good Dual-core Driver PracticesGood Dual-core Driver Practices

• General advice• Display lists execute on the driver’s worker thread!• You want to avoid situations where the application thread must

“sync” with the driver thread• Specific advice

• Avoid OpenGL state queries• More on this later

• Avoid querying OpenGL errors in production code• Bad behavior is detected automatically and leads to exit from the

dual-core mode• Back to the standard single-core driver mode of operation• “Do no harm”

Page 146: SIGGRAPH Asia 2008 Modern OpenGL

146

Consider an OpenGL texture fetchConsider an OpenGL texture fetch

• Seems very simple• Input: texture coordinates (s,t,r,q)• Output: some color (r,g,b,a)• Just a simple function, written in Cg/HLSL:

uniform sampler2D decal : TEXUNIT2;float4 texcoord : TEXCOORD3;float4 rgba = tex2D(decal, texcoordset.st);

• Compiles to single instruction: TEX o[COLR], f[TEX3], TEX2, 2D;

• Implementation is much more involved!

Page 147: SIGGRAPH Asia 2008 Modern OpenGL

147

Anatomy of a Texture FetchAnatomy of a Texture Fetch

Filteredtexel vector

TexelSelection

TexelCombination

Texel offsets

Texel data

Texture images

Combination parameters

Texture coordinate

vector

Texture parameters

Page 148: SIGGRAPH Asia 2008 Modern OpenGL

148

Texture Fetch Functionality (1)Texture Fetch Functionality (1)

• Texture coordinate processing• Projective texturing (OpenGL 1.0)• Cube map face selection (OpenGL 1.3)• Texture array indexing (OpenGL 2.1)• Coordinate scale: normalization (ARB_texture_rectangle)

• Level-of-detail (LOD) computation• Log of maximum texture coordinate partial derivative (OpenGL 1.0)• LOD clamping (OpenGL 1.2)• LOD bias (OpenGL 1.3)• Anisotropic scaling of partial derivatives (SGIX_texture_lod_bias)

• Wrap modes• Repeat, clamp (OpenGL 1.0)• Clamp to edge (OpenGL 1.2), Clamp to border (OpenGL 1.3)• Mirrored repeat (OpenGL 1.4)• Fully generalized clamped mirror repeat (EXT_texture_mirror_clamp)• Wrap to adjacent cube map face• Region clamp & mirror (PlayStation 2)

Page 149: SIGGRAPH Asia 2008 Modern OpenGL

149

Texture Fetch Functionality (2)Texture Fetch Functionality (2)

• Filter modes• Minification / magnification transition (OpenGL 1.0)• Nearest, linear, mipmap (OpenGL 1.0)• 1D & 2D (OpenGL 1.0), 3D (OpenGL 1.2), 4D (SGIS_texture4D)• Anisotropic (EXT_texture_filter_anisotropic)• Fixed-weights: Quincunx, 3x3 Gaussian

• Used for multi-sample resolves• Detail texture magnification (SGIS_detail_texture)• Sharpen texture magnification (SGIS_sharpen_texture)• 4x4 filter (SGIS_texture_filter4)• Sharp-edge texture magnification (E&S Harmony)• Floating-point texture filtering (ARB_texture_float, OpenGL 3.0)

Page 150: SIGGRAPH Asia 2008 Modern OpenGL

150

Texture Fetch Functionality (3)Texture Fetch Functionality (3)

• Texture formats• Uncompressed

• Packing: RGBA8, RGB5A1, etc. (OpenGL 1.1)• Type: unsigned, signed (NV_texture_shader)• Normalized: fixed-point vs. integer (OpenGL 3.0)

• Compressed• DXT compression formats (EXT_texture_compression_s3tc)• 4:2:2 video compression (various extensions)• 1- and 2-component compression (EXT_texture_compression_latc,

OpenGL 3.0)• Other approaches: IDCT, VQ, differential encoding, normal maps,

separable decompositions• Alternate encodings

• RGB9 with 5-bit shared exponent (EXT_texture_shared_exponent)• Spherical harmonics• Sum of product decompositions

Page 151: SIGGRAPH Asia 2008 Modern OpenGL

151

Texture Fetch Functionality (4)Texture Fetch Functionality (4)

• Pre-filtering operations• Gamma correction (OpenGL 2.1)

• Table: sRGB / arbitrary• Shadow map comparison (OpenGL 1.4)

• Compare functions: LEQUAL, GREATER, etc. (OpenGL 1.5)

• Needs “R” depth value per texel• Palette lookup (EXT_paletted_texture)• Thresh-holding

• Color key• Generalized thresh-holding

Page 152: SIGGRAPH Asia 2008 Modern OpenGL

152

Texture Fetch Functionality (5)Texture Fetch Functionality (5)

• Optimizations• Level-of-detail weighting adjustments• Mid-maps (extra pre-filtered levels in-between existing levels)

• Unconventional uses• Bitmap textures for fonts with large filters (Direct3D 10)• Rip-mapping• Non-uniform texture border color• Clip-mapping (SGIX_clipmap)• Multi-texel borders• Silhouette maps (Pardeep Sen’s work)

• Shadow mapping• Sharp piecewise linear magnification

Page 153: SIGGRAPH Asia 2008 Modern OpenGL

153

Phased Data FlowPhased Data Flow

• Must hide long memory read latency between Selection and Combination phases

TexelSelection

TexelCombination

Texel offsets

Texel data

Texture images

Combination parameters

Texture coordinate

vector

Texture parameters

Memoryreads for samples

FIFOing of combination

parameters

Page 154: SIGGRAPH Asia 2008 Modern OpenGL

154

What really happens?What really happens?

• Let’s consider a simple tri-linear mip-mapped 2D projective texture fetch

• Logically just one instructionTXP o[COLR], f[TEX3], TEX2, 2D;

• Logically• Texel selection• Texel combination

• How many operations are involved?

Page 155: SIGGRAPH Asia 2008 Modern OpenGL

155

Medium-Level Dissectionof a Texture FetchMedium-Level Dissectionof a Texture Fetch

Converttexel

coordsto

texeloffsets

integer /fixed-point

texelcombination

texel offsets

texel data

texture images

combinationparameters

interpolatedtexture coords

vector

texture parameters

Converttexturecoords

totexel

coords

filteredtexelvector

texel coords floor /

frac integercoords & fractional

weights

floating-pointscaling

andcombination

integer /fixed-pointtexelintermediates

Page 156: SIGGRAPH Asia 2008 Modern OpenGL

156

InterpolationInterpolation

• First we need to interpolate (s,t,r,q)• This is the f[TEX3] part of the TXP instruction• Projective texturing means we want (s/q, t/q)

• And possible r/q if shadow mapping• In order to correct for perspective, hardware actually interpolates

• (s/w, t/w, r/w, q/w)• If not projective texturing, could linearly interpolate inverse w (or 1/w)

• Then compute its reciprocal to get w• Since 1/(1/w) equals w

• Then multiply (s/w,t/w,r/w,q/w) times w• To get (s,t,r,q)

• If projective texturing, we can instead• Compute reciprocal of q/w to get w/q• Then multiple (s/w,t/w,r/w) by w/q to get (s/q, t/q, r/q)

Observe projective texturing is same cost as perspective correction

Page 157: SIGGRAPH Asia 2008 Modern OpenGL

157

Interpolation OperationsInterpolation Operations

• Ax + By + C per scalar linear interpolation• 2 MADs

• One reciprocal to invert q/w for projective texturing• Or one reciprocal to invert 1/w for perspective

texturing• Then 1 MUL per component for s/w * w/q

• Or s/w * w• For (s,t) means

• 4 MADs, 2 MULs, & 1 RCP• (s,t,r) requires 6 MADs, 3 MULs, & 1 RCP

• All floating-point operations

Page 158: SIGGRAPH Asia 2008 Modern OpenGL

158

Texture Space MappingTexture Space Mapping

• Have interpolated & projected coordinates• Now need to determine what texels to fetch

• Multiple (s,t) by (width,height) of texture base level• Could convert (s,t) to fixed-point first

• Or do math in floating-point• Say based texture is 256x256 so

• So compute (s*256, t*256)=(u,v)

Page 159: SIGGRAPH Asia 2008 Modern OpenGL

159

Mipmap Level-of-detail SelectionMipmap Level-of-detail Selection

• Tri-linear mip-mapping means compute appropriate mipmap level

• Hardware rasterizes in 2x2 pixel entities• Typically called quad-pixels or just quad• Finite difference with neighbors to get change in u

and v with respect to window space• Approximation to ∂u/∂x, ∂u/∂y, ∂v/∂x, ∂v/∂y• Means 4 subtractions per quad (1 per pixel)

• Now compute approximation to gradient length• p = max(sqrt((∂u/∂x)2+(∂u/∂y)2),

sqrt((∂v/∂x)2+(∂v/∂y)2))

one-pixel separation

Page 160: SIGGRAPH Asia 2008 Modern OpenGL

160

Level-of-detail Bias and ClampingLevel-of-detail Bias and Clamping

• Convert p length to power-of-two level-of-detail and apply LOD bias• λ = log2(p) + lodBias

• Now clamp λ to valid LOD range• λ’ = max(minLOD, min(maxLOD, λ))

Page 161: SIGGRAPH Asia 2008 Modern OpenGL

161

Determine Mipmap Levels andLevel Filtering WeightDetermine Mipmap Levels andLevel Filtering Weight

• Determine lower and upper mipmap levels• b = floor(λ’)) is bottom mipmap level• t = floor(λ’+1) is top mipmap level

• Determine filter weight between levels• w = frac(λ’) is filter weight

Page 162: SIGGRAPH Asia 2008 Modern OpenGL

162

Determine Texture Sample PointDetermine Texture Sample Point

• Get (u,v) for selected top and bottom mipmap levels• Consider a level l which could be either level t or b

• With (u,v) locations (ul,vl)• Perform GL_CLAMP_TO_EDGE wrap modes

• uw = max(1/2*widthOfLevel(l), min(1-1/2*widthOfLevel(l), u))

• vw = max(1/2*heightOfLevel(l), min(1-1/2*heightOfLevel(l), v))

• Get integer location (i,j) within each level• (i,j) = ( floor(uw* widthOfLevel(l)),

floor(vw* ) )

border

edge

s

t

Page 163: SIGGRAPH Asia 2008 Modern OpenGL

163

Determine Texel LocationsDetermine Texel Locations

• Bilinear sample needs 4 texel locations• (i0,j0), (i0,j1), (i1,j0), (i1,j1)

• With integer texel coordinates• i0 = floor(i-1/2)• i1 = floor(i+1/2)• j0 = floor(j-1/2)• j1 = floor(j+1/2)

• Also compute fractional weights for bilinear filtering• a = frac(i-1/2)• b = frac(j-1/2)

Page 164: SIGGRAPH Asia 2008 Modern OpenGL

164

Determine Texel AddressesDetermine Texel Addresses

• Assuming a texture level image’s base pointer, compute a texel address of each texel to fetch• Assume bytesPerTexel = 4 bytes for RGBA8 texture

• Example• addr00 = baseOfLevel(l) +

bytesPerTexel*(i0+j0*widthOfLevel(l))• addr01 = baseOfLevel(l) +

bytesPerTexel*(i0+j1*widthOfLevel(l))• addr10 = baseOfLevel(l) +

bytesPerTexel*(i1+j0*widthOfLevel(l))• addr11 = baseOfLevel(l) +

bytesPerTexel*(i1+j1*widthOfLevel(l))• More complicated address schemes are needed for good texture

locality!

Page 165: SIGGRAPH Asia 2008 Modern OpenGL

165

Initiate Texture ReadsInitiate Texture Reads

• Initiate texture memory reads at the 8 texel addresses• addr00, addr01, addr10, addr11 for the upper level• addr00, addr01, addr10, addr11 for the lower level

• Queue the weights a, b, and w• Latency FIFO in hardware makes these weights

available when texture reads complete

Page 166: SIGGRAPH Asia 2008 Modern OpenGL

166

Phased Data FlowPhased Data Flow

• Must hide long memory read latency between Selection and Combination phases

TexelSelection

TexelCombination

Texel offsets

Texel data

Texture images

Combination parameters

Texture coordinate

vector

Texture parameters

Memoryreads for samples

FIFOing of combination

parameters

Page 167: SIGGRAPH Asia 2008 Modern OpenGL

167

Texel CombinationTexel Combination

• When texels reads are returned, begin filtering• Assume results are

• Top texels: t00, t01, t10, t11• Bottom texels: b00, b01, b10, b11

• Per-component filtering math is tri-linear filter• RGBA8 is four components

• result = (1-a)*(1-b)*(1-w)*b00 + (1-a)*b*(1-w)*b*b01 + a*(1-b)*(1-w)*b10 + a*b*(1-w)*b11 + (1-a)*(1-b)*w*t00 + (1-a)*b*w*t01 + a*(1-b)*w*t10 + a*b*w*t11;

• 24 MADs per component, or 96 for RGBA• Lerp-tree could do 14 MADs per component, or 56 for RGBA

Page 168: SIGGRAPH Asia 2008 Modern OpenGL

168

Total Texture Fetch OperationsTotal Texture Fetch Operations

• Interpolation• 6 MADs, 3 MULs, & 1 RCP (floating-point)

• Texel selection• Texture space mapping

• 2 MULs (fixed-point)• LOD determination (floating-point)

• 1 pixel difference, 2 SQRTs, 4 MULs, 1 LOG2• LOD bias and clamping (fixed-point)

• 1 ADD, 1 MIN, 1 MAX• Level determination and level weighting (fixed-point)

• 1 FLOOR, 1 ADD, 1 FRAC• Texture sample point

• 4 MAXs, 4 MINs, 2 FLOORs (fixed-point)• Texel locations and bi-linear weights

• 8 FLOORs, 4 FRACs, 8 ADDs (fixed-point)• Addressing

• 16 integer MADs (integer)• Texel combination

• 56 fixed-point MADs (fixed-point)

Page 169: SIGGRAPH Asia 2008 Modern OpenGL

169

Observations about the Texture FetchObservations about the Texture Fetch

• Lots of ways to implement the math• Lots of clever ways to be efficient• Lots more texture operations not considered in this analysis

• Compression• Anisotropic filtering• sRGB• Shadow mapping

• Arguably TEX instructions are “world’s most CISC instructions”• Texture fetches are incredibly complex instructions

• Good deal of GPU’s superiority at graphics operations over CPUs is attributable to TEX instruction efficiency• Good for compute too

Page 170: SIGGRAPH Asia 2008 Modern OpenGL

170

OpenGL’s Future Evolution

Mark Kilgard

Principal System Software Engineer

NVIDIA

Page 171: SIGGRAPH Asia 2008 Modern OpenGL

171

What drives OpenGL’s future?What drives OpenGL’s future?

• GPU graphics functionality• Tessellation & geometry amplification

• Ratio of GPU to single-core CPU performance• Compatibility

• Direct3Disms• OpenGLisms• Deprecation

• Compute support• OpenCL, CUDA, Stream processing

• Unconventional graphics devices

Page 172: SIGGRAPH Asia 2008 Modern OpenGL

172

Better Graphics FunctionalityBetter Graphics Functionality

• Expect more graphics performance• Easy prediction• Rasterization nowhere near peaked

• Ray tracing fans—GPUs make rays and triangles faster

– Market still values triangles more than rays• Expect more generalized graphics functionality

• Trend for texture enhancements likely to continue

Page 173: SIGGRAPH Asia 2008 Modern OpenGL

173

Geometry AmplificationGeometry Amplification

• Tessellation• Programmable hardware support coming

• True market demand probably not tessellation per se• Games want visual richness

• Texture and shading have created much richness– Often “pixel richness” as substitute for geometry richness

• Increasingly “visual richness” means geometric complexity• Geometry Amplification may be better term

• Tessellation is one way to improve tessellation– Recognize the limits of bi-variate patches for

representing geometry

Page 174: SIGGRAPH Asia 2008 Modern OpenGL

174

Programmable TessellationProgrammable Tessellation

• Stunning real-time geometric detail + animation possible• Programmable tessellation + vertex textured displacements

Page 175: SIGGRAPH Asia 2008 Modern OpenGL

175

Continuous Level-of-detail for TessellationContinuous Level-of-detail for Tessellation

Increasing tessellation level-of-detail

• Same patch mesh for all 3 scenes

Page 176: SIGGRAPH Asia 2008 Modern OpenGL

176

Adaptive Programmable TessellationAdaptive Programmable Tessellation

Programmable level-of-detail determination allows more tessellation along silhouette edges

Page 177: SIGGRAPH Asia 2008 Modern OpenGL

177

Limits of Patch TessellationLimits of Patch Tessellation

• What games tend to want• Here’s 8 vertices (bounding

box), go draw a fire truck• Here’s a few vertices, go draw

a tree

Page 178: SIGGRAPH Asia 2008 Modern OpenGL

178

Tessellation Not New to OpenGLTessellation Not New to OpenGL

• At least three different bi-variate patch tessellation schemes have been added to OpenGL• Evaluators (OpenGL 1.0)• NV_evaluators (GeForce 3)

• water-tight• adaptive level-of-detail• forward differencing approach

• ATI_pn_triangles Curved PN Triangles (Radeon)• tessellated triangle based on positions+normals

• None succeeded• Hard to integrate into art pipelines• Didn’t offer enough performance advantage

GLUT’s wire-frameteapot

[Moreton 20001]

[Vlachos 20001]

Page 179: SIGGRAPH Asia 2008 Modern OpenGL

179

Ratio of CPU core-to-GPU PerformanceRatio of CPU core-to-GPU Performance

• Well known computer architecture trends now• Single-threaded CPU performance trends are stalled

• Multi-core is CPU designer response• GPU performance continues on-trend

• What does this mean for graphics API design?• CPUs must generate more visually rich API command

streams to saturate GPUs• Can’t just send more commands faster

• Single-threaded CPUs can only do so much• So must send more powerful commands

Page 180: SIGGRAPH Asia 2008 Modern OpenGL

180

Déjà vuDéjà vu

• We’ve been here before• Early 1980s: Graphics terminals used to be

connected to minicomputers by slow speed interconnects

• CPUs themselves far too slow for real-time rendering

• Resulting rendering model• Download scene database to graphics terminal• Adjust viewing and modeling parameters• Send “redraw scene” command

Page 181: SIGGRAPH Asia 2008 Modern OpenGL

181

What HappenedWhat Happened

• Such “scene processor” hardware not very flexible• Difficult to animate anything beyond rigid dynamics

• Eventually SGI and others matched CPUs and interconnects to graphics performance• Result was IRIS GL’s immediate mode• CPU fast enough to send geometry every frame

• OpenGL took this model• Over time added vertex arrays, vertex buffers, texturing,

programmable shading, and more performance• CPU performance became limiter still

• Better graphics driver tuning helped• Dual-core drivers help some more

Page 182: SIGGRAPH Asia 2008 Modern OpenGL

182

OpenGL’s Most Powerful CommandOpenGL’s Most Powerful Command

• Available since OpenGL 1.0• Can render essentially anything OpenGL can render!• Takes just one parameter• The command

glCallList(GLuint displayListName);• Power of display lists comes from

• Playing back arbitrary compiled commands• Allowing for hierarchical calling of display list

• A display list can contain glCallList or glCallLists• Ability of application to re-define display lists

• No editing, but can be re-defined

Page 183: SIGGRAPH Asia 2008 Modern OpenGL

183

Enhanced Display ListsEnhanced Display Lists

• OpenGL 1.0 display lists are too inflexible• Pixel & vertex data “compiled into” display lists• Binding objects always “by name”

• Rather than “by reference• These problems can be fixed

• Modern OpenGL supports buffers for transferring vertices and pixels

• Compile commands into display lists that defer vertex and pixel transfers until execute-time

– Rather than compile-time• Allow objects (textures, buffers, programs) to be bound “by

reference” or “by name”

Page 184: SIGGRAPH Asia 2008 Modern OpenGL

184

Other Display List EnhancementsOther Display List Enhancements

• Conditional display list execution• Relaxed vertex index and command order• Parallel construction of display lists by multiple threads

General insight: Easier for driver to optimize application’s graphics command stream if it gets to1) see the repetition in the command stream clearly2) take time to analyze and optimize usage

Page 185: SIGGRAPH Asia 2008 Modern OpenGL

185

Conditional Display List ExecutionConditional Display List Execution

• Today’s occlusion query• Application must “query” to learn occlusion result

• Latency too great to respond• Application can use OpenGL 3.0’s conditional render

capability• But just skips vertex pulling, not state changes

• Conditional display list execution• Allow a glCallList to depend on the occlusion result

from an occlusion query object• Allows in-band occlusion querying• Skip both vertex pulling and state changes

Page 186: SIGGRAPH Asia 2008 Modern OpenGL

186

Relaxed Vertex Index and Command OrderRelaxed Vertex Index and Command Order

• OpenGL today always executes commands “in order”• Sequentially requirement

• Provide compile-time specification of re-ordering allowances• Allows GL implementation to re-order

• Vertex indices within display list’s vertex batch• Commands within display list

• Key rule: state vector rendering command executes in must match the state if command was rendered sequentially

• Allow static or dynamic re-ordering• Static re-ordering needed for multi-pass invariances

• Past practice• IRIS Performer would sort rendering by state changes for

performance• [Sander 2007] show substantial benefit for vertex ordering

Page 187: SIGGRAPH Asia 2008 Modern OpenGL

187

Parallel Display List ConstructionParallel Display List Construction

• Today’s model• Single thread makes all OpenGL rendering calls

• Minimizes GPU context switch overhead• Ties command generation rate to single core’s

CPU performance• Enhanced display list model

• Multiple threads can build display lists in parallel• Single thread still executes display lists• Countable semaphore objects used to synchronize

hand-off of display lists built by other threads with main rendering thread

Page 188: SIGGRAPH Asia 2008 Modern OpenGL

188

Rethinking Display ListsRethinking Display Lists

• Display lists have been proposed for deprecation• Right as we really need them!

• Much more interesting to enhance display lists• Dual-core driver already off-loads display list traversal

to driver’s thread• Multi-core driver could scan frequently executed

display lists to optimize their order and error processing

• Includes adding pre-fetching to avoid stalling CPU on cache misses for object accesses

Page 189: SIGGRAPH Asia 2008 Modern OpenGL

189

Direct3DismsDirect3Disms

• Developing a shader-rich game title costs $$$• For top titles, often US$ 5,000,000+• Investment typically amortized over multiple platforms

• Consoles are primary target, then PCs• PC version typically developed for Direct3D• Reality: OpenGL is often 3rd or worse priority

• API differences = porting & performance pitfalls• Stops or slows Direct3D-developed 3D content from

working easily on OpenGL platforms

Page 190: SIGGRAPH Asia 2008 Modern OpenGL

190

Supporting Direct3D: Not NewSupporting Direct3D: Not New

• OpenGL has always supported multiple formats well• OpenGL’s plethora of pixel and vertex formats• Very first OpenGL extension: EXT_bgra

• Provides a pixel component ordering to match the color component ordering of Windows for 2D GDI rendering

• Made core functionality by OpenGL 1.3• Many OpenGL extensions have embraced Direct3Disms

• Secondary color• Fog coordinate• Point sprites

Page 191: SIGGRAPH Asia 2008 Modern OpenGL

191

Direct3D vs. OpenGLCoordinate System ConventionsDirect3D vs. OpenGLCoordinate System Conventions

• Window origin conventions• Direct3D = upper-left origin• OpenGL = lower-left origin

• Pixel center conventions• Direct3D9 = pixel centers at integer locations• OpenGL (and Direct3D 10) = pixel centers at half-pixel locations

• Clip space conventions• Direct3D = [-1,+1] for XY, [0,1] for Z• OpenGL = [-1,+1] range for XYZ

• Affects• How projection matrix is loaded• Fragment shaders that access the window position• Point sprites have upper-left texture coordinate origin

• OpenGL already lets application choose lower-left or upper-left

Page 192: SIGGRAPH Asia 2008 Modern OpenGL

192

Direct3D vs. OpenGLProvoking Vertex ConventionsDirect3D vs. OpenGLProvoking Vertex Conventions

• Direct3D uses “first” vertex of a triangle or line to determine which color is used for flat shading

• OpenGL uses “last” vertex for lines, triangles, and quads• Except for polygons (GL_POLYGON) mode that use the

first vertex

Direct3D 9pDev->SetRenderState( D3DRS_SHADEMODE, D3DSHADE_FLAT);

OpenGLglShadeModel(GL_FLAT);

Input triangle stripwith per-vertex colors

Page 193: SIGGRAPH Asia 2008 Modern OpenGL

193

BGRA Vertex Array OrderBGRA Vertex Array Order

• Direct3D 9’s most common usage for sending per-vertex colors is 32-bit D3DCOLOR data type:• Red in bits 16:23• Green in bits 8:15• Blue in bits 0:7• Alpha in bits 24:31

• Laid in memory, looks like BGRA order• OpenGL assumes RGBA order for all vertex arrays

• Direct3Dism EXT_vertex_array_bgra extension allows:

glColorPointer(GL_BGRA, GL_UNSIGNED_BYTE, stride, pointer);glSecondaryColorPointer(GL_BGRA, GL_UNSIGNED_BYTE, stride, pointer);glVertexAttribPointer(GL_BGRA, GL_UNSIGNED_BYTE, stride, pointer);

8-bitred

8-bitalpha

8-bitgreen

8-bitblue

bit 31bit 0

Page 194: SIGGRAPH Asia 2008 Modern OpenGL

194

OpenGLismsOpenGLisms

• Things about OpenGL’s operation that make it hard for non-OpenGL applications to port to OpenGL

• Examples• Selectors• Linked GLSL program objects

Page 195: SIGGRAPH Asia 2008 Modern OpenGL

195

Eliminating Selectors from OpenGLEliminating Selectors from OpenGL

• OpenGL has lots of selectors• Selectors set state that indicates what state subsequent

commands will update• Already mentioned selectors: glClientActiveTexture

• Other examples: glActiveTexture, glMatrixMode, glBindTexture, glBindBuffer, glUseProgram, glBindProgramARB

• OpenGL is full of selectors– Partly OpenGL’s extensibility strategy– Partly because objects are bound into context

» Bind-to-edit objects» Rather than edit-by-name

• Direct State Access extension: EXT_direct_state_access• Provides complete selector-free additional API for OpenGL• Shipping in NVIDIA’s 180.43 drivers

Page 196: SIGGRAPH Asia 2008 Modern OpenGL

196

Reasons to Eliminate SelectorsReasons to Eliminate Selectors

• Direct3D has an “edit-by-name” model of operation• Means Direct3D has no selectors• Having to manage selectors when porting Direct3D or console

code to OpenGL is awkward• Requires deferring updates to minimize selector and object

bind changes• Layered libraries can’t count of selector state

• To be safe when updating sate controlled by selectors, such libraries must use idiom

• Save selector, Set selector, Update state, Restore selector• Bad for performance, particularly bad for dual-core drivers

since queries are expensive

Page 197: SIGGRAPH Asia 2008 Modern OpenGL

197

GLSL Program Object LinkingGLSL Program Object Linking

• GLSL requires shader objects from different domains (vertex, geometry, fragment) to be linked into single GLSL program object• Means you can’t mix-and-match shaders easily

• Other APIs don’t have this limitation• Direct3D• Prior OpenGL assembly language extensions• Consoles

• Have a “separate shader objects” extension could fix this problem

Page 198: SIGGRAPH Asia 2008 Modern OpenGL

198

Separate Shader Objects ExampleSeparate Shader Objects Example

• Combining different GLSL shaders at once

Specular brickbump mapping

Red diffuse

Wobbly torus

Smooth torus

DifferentGLSLvertex

shaders

Different GLSL fragment shaders

Page 199: SIGGRAPH Asia 2008 Modern OpenGL

199

DeprecationDeprecation

• Part of OpenGL 3.0 is a marking of features for deprecation

• LOTS of functionality is marked for deprecation

• I contend no real application today uses the non-deprecated subset of OpenGL—all apps would have to change due to deprecation

• Some vendors believe getting rid of features will make OpenGL better in some way

• NVIDIA does not believe in abandoning API compatibility this way

• OpenGL is part of a large ecosystem so removing features this way undermines the substantial investment partners have made in OpenGL over years

• API compatibility and stability is one of OpenGL’s great strengths

Page 200: SIGGRAPH Asia 2008 Modern OpenGL

200

Synergy between OpenGL and OpenCLSynergy between OpenGL and OpenCL

• Complimentary capabilities• OpenGL 3.0 = state-of-the-art, cross-platform graphics• OpenCL 1.0 = state-of-the-art, cross-platform compute

• Computation & Graphics should work together• Most natural way to intuit compute results is with graphics • When Compute is done on a GPU, there’s no need to “copy” the

data to see it visualized

• Appendix B of OpenCL specification• Details with sharing objects between OpenGL and OpenCL

• Called “GL” and “CL” from here on…

Page 201: SIGGRAPH Asia 2008 Modern OpenGL

201

Four Kinds of Shared ObjectsFour Kinds of Shared Objects

OpenCL 3D image object cl_mem

OpenGL renderbuffer object GLuint renderbuffer

OpenGL buffer object GLuint bufferobj

OpenCL buffer object cl_mem

OpenGL texture 2D object GLenum target GLuint texture GLint miplevel

OpenGL texture 3D object GLenum target GLuint texture GLint

OpenCL 2D image object cl_mem

2D image object cl_mem

clCreateFromGLBuffer

clCreateFromGLTexture2D

clCreateFromGLTexture3D

clCreateFromGLRenderbuffer

OpenGL OpenCL

Page 202: SIGGRAPH Asia 2008 Modern OpenGL

202

OpenGL / OpenCL SharingOpenGL / OpenCL Sharing

• Requirements for GL object sharing with CL

• CL context must be created with an OpenGL context

• Each platform-specific API will provide its appropriate way to create an OpenGL-compatible CL context

• For WGL (Windows), CGL (OS X), GLX (X11/Linux), EGL (OpenGL ES), etc.

• Creating cl_mem for GL Objects does two things

1.Ensures CL has a reference to the GL objects

2.Provides cl_mem handle to acquire GL object for CL’s use

• clRetainMemObject & clReleaseMemObject can create counted references to cl_mem objects

Page 203: SIGGRAPH Asia 2008 Modern OpenGL

203

Acquiring GL Objects for Compute AccessAcquiring GL Objects for Compute Access

• Still must “enqueue acquire” GL objects for compute kernels to use them

• Otherwise reading or writing GL objects with CL is undefined• Enqueue acquire and release provide sequential consistency

with GL command processing• Enqueue commands for GL objects

• clEnqueueAcquireGLObjects• Takes list of cl_mem objects for GL objects & list of

cl_events that must complete before acquire• Returns a cl_event for this acquire operation

• clEnqueueReleaseGLObjects• Takes list of cl_mem objects for GL objects & list of

cl_events that must complete before release• Returns a cl_event for this release operation

Page 204: SIGGRAPH Asia 2008 Modern OpenGL

204

Unconventional OpenGL DeploymentsUnconventional OpenGL Deployments

• Workstation PCs—Quadro

• Consumer PCs—GeForce

• High-end Visualization—QuadroPlex VisualComputing Solution (VCS)

• Embedded Applications

• Handheld Devices

• Game Consoles

ConventionalPC

OpenGLProducts

Unconventional

Page 205: SIGGRAPH Asia 2008 Modern OpenGL

205

OpenGL in Context

A facilitated conversation

with Dr. Marc Levoy, Stanford University

Page 206: SIGGRAPH Asia 2008 Modern OpenGL

206

Questions?Questions?