status – week 207 victor moya. summary z test box. z test box. z compression. z compression. z...

28
Status – Week Status – Week 207 207 Victor Moya Victor Moya

Post on 22-Dec-2015

223 views

Category:

Documents


4 download

TRANSCRIPT

Status – Week Status – Week 207207

Victor MoyaVictor Moya

SummarySummary

Z Test box.Z Test box. Z Compression.Z Compression. Z Cache.Z Cache. Stencil.Stencil. HZ Box.HZ Box. HZ Test.HZ Test. Traces.Traces.

Z Test boxZ Test box

Z Test box includes:Z Test box includes: Z cache.Z cache. Z encoder (compress and reference Z encoder (compress and reference

value).value). Z decoder (decompress).Z decoder (decompress). Z test.Z test. Z update.Z update. Stencil test.Stencil test. Stencil update.Stencil update.

Stencil Test

Read

Fetch

Z Test

Stencil Update

Write

ZCache

Enc

Dec

Fragments/Stamps

Fragments/Stamps

Reference Z value

Compressed Z Line/Block

Z Compression.Z Compression.

ATI HOT 3D in Eurographics 2000.ATI HOT 3D in Eurographics 2000. 8x8 pixel block (Z cache line).8x8 pixel block (Z cache line). DDPCM : differential differential pulse code DDPCM : differential differential pulse code

modulation.modulation. Two modes:Two modes:

½ of original size.½ of original size. ¼ of original size.¼ of original size.

Entropy encoder.Entropy encoder. Entropy encoders?Entropy encoders?

Hufffman.Hufffman. Arithmetic encoder.Arithmetic encoder.

Entropy Encoder

- - - -

- - - -

- - -

- -

8 input z values

1D Z Compression

2D Z Compression2D Z Compression

64pixels

2DDDPCM

EntropyEncoder

Packer

Z CompressionZ Compression

ATI patent application 20030038803.ATI patent application 20030038803. Two reference values MAX and MIN.Two reference values MAX and MIN. Offset values.Offset values. Windows.Windows. Other method I don’t understand yet …Other method I don’t understand yet …

S3 patent 6,411,295.S3 patent 6,411,295. Similar approach.Similar approach.

Others.Others.

Z CompressionZ Compression

Method 1:Method 1: MIN and MAX per cache line/block.MIN and MAX per cache line/block. 1 bit flag per pixel/Z value telling which 1 bit flag per pixel/Z value telling which

reference value to use.reference value to use. The offset from MIN or MAX reference The offset from MIN or MAX reference

values are stored in the compressed output.values are stored in the compressed output. The offsets must be inside a window of T The offsets must be inside a window of T

values (log2T = bits per offset) from MIN values (log2T = bits per offset) from MIN and MAX.and MAX.

Z = 0 Z = 1Zmin Zmax

z = Zmin + T - 1 z = Zmax - T + 1

MAX

MIN

Z CompressionZ Compression

Method 2:Method 2: Z values are divided into upper and Z values are divided into upper and

lower bits.lower bits. Keep UMAX and UMIN.Keep UMAX and UMIN. Calculate A = Umin - 1, B = UMAX + 1.Calculate A = Umin - 1, B = UMAX + 1. 2-bit flag per pixel/Z value references 2-bit flag per pixel/Z value references

the upper bits from { UMAX, UMIN, A, the upper bits from { UMAX, UMIN, A, B}.B}.

Lower bits per pixel/Z value are stored Lower bits per pixel/Z value are stored in the compressed output.in the compressed output.

Z = 0 Z = 1

Umin << a Umax << a

Zmin Zmax

A B

Umin

Umin

Z CompressionZ Compression

Reference values in the Reference values in the compressed output.compressed output.

Compression flags on die.Compression flags on die. Useful for fast clear too.Useful for fast clear too.

Z CacheZ Cache

Normal cache?Normal cache? Or ‘fetch’ cache?Or ‘fetch’ cache?

Normal cache that supports a large Normal cache that supports a large number of active misses (miss on number of active misses (miss on miss, miss on hit).miss, miss on hit).

Or prefetching?Or prefetching?

Z CacheZ Cache

Fetch vs Prefetch.Fetch vs Prefetch. Fetch needs additional state (bits) per cache Fetch needs additional state (bits) per cache

line.line. Fetch needs additional port to the cache tag Fetch needs additional port to the cache tag

file.file. Fetch implies a large queue or stalls Fetch implies a large queue or stalls

somewhere.somewhere. Prefetch requires a predictor.Prefetch requires a predictor. Prefetch may request data that won’t be Prefetch may request data that won’t be

used (failed predictions).used (failed predictions).

Z CacheZ Cache

Prefetching.Prefetching. Very easy to predict next data inside a Very easy to predict next data inside a

triangle (large).triangle (large). Quite common (middle-small triangles).Quite common (middle-small triangles).

Easy to predict next data inside a tristrip or Easy to predict next data inside a tristrip or triangle list batch.triangle list batch.

Very common.Very common. Hard to predict next data between batches Hard to predict next data between batches

(or meshes)?(or meshes)? But will happen rarely.But will happen rarely.

Z CacheZ Cache ““Fetch cache”Fetch cache”

In fact prefetching.In fact prefetching. Texture Prefetching Architecture.Texture Prefetching Architecture.

Akeley course.Akeley course. Igehy, Eldridge, Proudfoot, Prefetching in a texture Igehy, Eldridge, Proudfoot, Prefetching in a texture

cache architecture.cache architecture.– Not read yet.Not read yet.

Slightly different concept:Slightly different concept: Our fetch cache is accessing twice the tag file.Our fetch cache is accessing twice the tag file.

But simulated is the same as we are not taxing But simulated is the same as we are not taxing the tag file access!!the tag file access!!

Change mechanism so that fetch returns pointer to the Change mechanism so that fetch returns pointer to the cache line.cache line.

Rasterizer

FIFO

Stall

Cache Tags RequestFIFO

ReorderBuffer

TextureMemory

Cache Data

TextureFilter

Texture Apply

StencilStencil

Stencil and Z share a 32 bit word Stencil and Z share a 32 bit word per pixel:per pixel: 8/24.8/24. 0/32.0/32. 2x16 (Z only!!).2x16 (Z only!!).

StencilStencil

Stencil compression:Stencil compression: If stencil is not active and is cleared:If stencil is not active and is cleared:

Remove stencil field from compressed data.Remove stencil field from compressed data. If stencil is active or not cleared:If stencil is active or not cleared:

Compress stencil?Compress stencil?– Independent of Z compression.Independent of Z compression.– Needs more compression flag bits.Needs more compression flag bits.– Which is the average stencil value? Or log2 of the Which is the average stencil value? Or log2 of the

value?value?– How much can be saved? 8b to 1b, 2b, 4b? Worth How much can be saved? 8b to 1b, 2b, 4b? Worth

of it?of it?

HZ BoxHZ Box

Hierarchical Z buffer.Hierarchical Z buffer. Number of levels?Number of levels? Size?Size? On die?On die? Includes:Includes:

Memory for storing the different levels.Memory for storing the different levels. Update mechanism.Update mechanism. Process requests and updates.Process requests and updates.

HZ BoxHZ Box

ATI model (from patents XXX, and XXX).ATI model (from patents XXX, and XXX). 2 levels.2 levels.

11stst level is from original 8x8 blocks (z cache line). level is from original 8x8 blocks (z cache line). 22ndnd level is 2x2 (?) values from level 1. level is 2x2 (?) values from level 1.

Update mechanism:Update mechanism: Z Max (or Z Min) from the Z encoder (compressor) Z Max (or Z Min) from the Z encoder (compressor)

for a 8x8 block (cache line).for a 8x8 block (cache line). Combining cache for level 2 (?).Combining cache for level 2 (?). Write and update on eviction from combining Write and update on eviction from combining

cache (?).cache (?).

HZ TestHZ Test Compares the incoming Z value from a graphic Compares the incoming Z value from a graphic

object to the reference Z value stored in one or object to the reference Z value stored in one or more of the Hierarchical Z levels.more of the Hierarchical Z levels.

What can be tested:What can be tested: Triangle Z (or 3 vertex Z).Triangle Z (or 3 vertex Z).

Cull a whole triangle.Cull a whole triangle. Blocks of fragments:Blocks of fragments:

Good for recursive descent or tiled!!.Good for recursive descent or tiled!!. Large blocks to level 2.Large blocks to level 2. 8x8 (or less) blocks to level 1.8x8 (or less) blocks to level 1.

Stamps (2x2) or fragments:Stamps (2x2) or fragments: Against level 1 (slow access? fast update?).Against level 1 (slow access? fast update?). Against level 2 (fast access? slow update?).Against level 2 (fast access? slow update?).

HZL2

HZL1

TracesTraces

I stalled Carlos work so delayed I stalled Carlos work so delayed until next week.until next week.

WebWeb

I’m writing my web page.I’m writing my web page. GPU3D page?GPU3D page?

Public/private.Public/private.