amd r700 series processors
DESCRIPTION
AMD R700 Series Processors. AMD R700 Series History. The AMD 700 chipset series – also known as the AMD 7-Series Chipsets A set of chipsets designed by ATI for AMD Phenom processors GA - late 2007 to end of 2008. CPU verses GPU. CPU Typically use a basic load instructions for data loads - PowerPoint PPT PresentationTRANSCRIPT
AMD R700 Series Processors
AMD R700 Series History
• The AMD 700 chipset series – also known as the AMD 7-Series Chipsets
• A set of chipsets designed by ATI for AMD Phenom processors
• GA - late 2007 to end of 2008
CPU verses GPU
CPU • Typically use a basic load
instructions for data loads
• Processes instructions one at a time
• Located on the motherboard
GPU • Typically uses texture-fetch
instructions for data loads AND vertex-fetch for data loads
• Processes hundreds of instructions simultaneously
• Typically located on an IO card attached to the BUS
AMD R700 Series Processor
– Design philosophy/rational of the AMD R7000 – related to the good design policies studied in class
AMD R700 Instructions Control-flow
A program consists of two sections, control flow and clause.• Control flow instructions can initiate executions of the
following:• ALU (by referring to an appropriate clause)• Texture-fetch• Vertex-fetch
• Clause is a homogeneous group of instructions comprised of:• ALU• Texture-fetch• Vertex-fetch• Local data share• Memory read
AMD R700 Registers• 128 General-purpose registers
– 128 bits wide– Organized as four 32-bit values
• 512 Constant registers– 128 bits wide,– Organized as four 32-bit values
• Address Register
AMD R700 Registers• Loop index
– Initialized by software – Incremented by hardware on each iteration of a loop
• Integer Constant register– 96 bits wide (3x32)– GPU has read access– Main CPU has write access– Specified in the CF_CONST field of the CF_DWORD1
microcode format for the current LOOP* instruction
AMD R700 Addressing
Addressing modes• Absolute • Loop-index-relative • Relative addressing
AMD R700 Operands
• 3 source operands and 1 destination operand all of which have an absolute addressing mode enabling each to be accessed relative to address zero.
• Float• Double• Half• Signed/unsigned Integer
AMD R700 Operation Repertoire
Arithmetic Operations on built-in integer, floating-point scalar, and vector data types.
•Add •Subtract•Multiply•Divide•Basic Linear Algebra Subroutines•Linear Algebra Package
•Fast Fourier Transform•Math Transcendental•Random Number Generator Routines•Stream Processing backend for load balancing of computations between CPU and stream processing
AMD R700 Features
Instructions operate on 32-bit or 64-bit IEEE floating-point values and signed/unsigned integers.
• Instruction set• Control-flow• ALU Clause• Vertex-fetch• Texture-fetch• Memory Read• Data-Share Read/Write
AMD R700 Instructions Memory Read
• Software initiated with the VTX or VTX_TC instructions
• Fetch data from one of three types of buffers• Scratch• Reduction• Scatter (general read/write)
• Can be intermixed within a clause that can consist to as many as 16 memory read instructions (memory read instructions cannot be in the same clause as texture or vertex fetch
instructions, or with local data share instructions).
AMD R700 Instructions Data-Share Read/Write
• Software initiated with the TEX control flow instructions
• Within the clause, LDS uses common instruction encodings:
• MEM_DSR – reads• MEM_DSW – writes
LDS clause contains instructions that are issued sequentially. A write instruction followed by a read has all of the write data posted before the read so that data share within a clause can use a location repeatedly to exchange data.
AMD R700 InstructionsVertex-fetch
• Software initiated with the VTX or VTX_TC instruction.
• Fetch vertices from the vertex buffer based on a GPR address.
• At most eight instructions long
Relative byte offset of the word in memory
AMD R700 Instructions Texture-fetch
• Software initiated with the TEX instruction• Consists of instructions that lookup texture
elements known as texels, based on a GPR address or constant-fetch operations
• At most eight instructions long
Relative byte offset of the word in memory
AMD R700 ALU InstructionsALU instructions are organized in pairs of two 32-
bit double words.• OP2 instruction - ALU_INST field uses a seven-bit
opcode, with the high three bits set to 000b.• OP3 instruction – at least 1 of the three high bits
of the ALU_INST field has a nonzero value.
Choice of 2 or 3 source operands
Byte offset of the double words
AMD R700 ALU Instructions
The processor contains multiple sets of five scalar ALUs.
Four of the Five are called ALU.[X, Y, Z, W] and perform scalar operations on as many as three 32-bit data elements.
128 bits containing 4 – 32 bit elements in little-endian order
Most-significant element Lease-significant element
AMD R700 Procedure CallsSync Barrier1-can run in parallel with prior instruction
COUNT Number of instructions slots to execute in the clause (values 1-16)
MSB of Count Field and Amount to increment call nesting counter by when executing a call statement (the call is skipped if the nesting depth + CALL_COUNT > 32) range 0-31
31 32 29:23 22 21 20 19 18:13 12:10 9:8 7:3 2:0
Whole_Quad_ModeAnd VPM are mutually exclusive (either WQM or VPM are set to 1)1-Execute instruction if ALL pixels are active and valid.
Valid_Pixel_Mode1-Execute instruction if invalid pixels are inactive
Control Flow Instruction, i.e. CF_INST_JUMP – execute jump statement End of Program
Specifies how to evaluate the condition test for each pixel
Control flow constant to use for flow control statements. Pop Count
31 0
Address
Offsets +4 and +0 are relative to the byte address specified in the host-written PGM_START_* register. Texture and Vertex clauses must start on 16-byte aligned addresses.
AMD R700CISC or RISC
• CISC characteristics: • Number of operands per instruction• Complex set of operations in the ISA• Instructions work out of both on and off chip memory
• RISC characteristics: • Large number of registers• Separate instructions for load/store and data
processing
Design Policies
The Good and the Bad1. Simplicity favors regularity
R700 series specializes in the processing of graphic instructions in parallel quickly
2. Smaller is fasterNot so good – it’s all about trade-offs
3. Make the common case fastThe R700 series processes graphics efficiently at high speeds
4. Good design demands good compromiseTrade error handling for high speed
Conclusion
Pros • Multiple parallel stream
processing units (SPU)
• Each single instruction multiple data pipeline maintains a separate interface to memory
• Speed
Cons • Cost
• R700 programs do not support
• Exceptions
• Interrupts
• Errors
• Any event that can interrupt pipeline operations
• Size of the circuit board
Conclusion
• AMD R7000 GPU is a specialized processing unit
• Depending on the application/use the trade-offs can be worth it
References
• Ali Umut ˙Irt¨urk. "GUSTO: General Architecture Design Utility and Synthesis Tool for Optimization." Thesis. UNIVERSITY OF CALIFORNIA, SAN DIEGO, 2009. Web. 20 Apr. 2010. <http://cseweb.ucsd.edu/~kastner/papers/phd-thesis-irturk.pdf>.
• AMD 700 Chipset Series. 14 Apr. 2010. Web. 16 Apr. 2010. <http://en.wikipedia.org/wiki/AMD_700_chipset_series>.
• AMD 700 Chipset Series. Advanced Micro Devices, 2009. Print.
• ATI CTM Guide. Advance Micro Devices, Inc, 2006. Print.