power pc architecture

Click here to load reader

Upload: m-junaid-sultan

Post on 01-May-2017

214 views

Category:

Documents


1 download

TRANSCRIPT

Power PC Architecture

Nirmal ChhuganiPower PC Architecture

IntroductionPowerPC (Performance Optimization With Enhanced RISC Performance Computing) is a RISC architecture created by (AIM) AppleIBMMotorola alliance in 1991.

The original idea for the PowerPC architecture came from IBMs Power architecture (introduced in the Risc/6000) and retains a high level of compatibility with it.

The intention was to build a high-performance, superscalar low-cost processor.

HistoryThe history of the PowerPC began with IBM's 801 prototype chip of John Cocke s(IBM Watson Research Lab) RISC ideas in the late 1970s (with further refinements developed by David Paterson). 801-based cores were used in a number of IBM embedded products, eventually becoming the 16-register ROMP (Research Office Products Division Micro Processor was a 10 MHz RISC microprocessor designed by IBM in the early 1980) processor used in the IBM RT(computer workstation by IBM). The RT had disappointing performance and IBM started the project to build the fastest processor on the market. The result was the POWER architecture, introduced with the RISC System/6000 in early 1990.

History.. POWER architecture The POWER architecture incorporated lots of the RISC characteristics :fixed-length instructions,register-to-register architecture,simple addressing modes,large general register file three-operand instruction format.

Additionally, it has other features more characteristic of more complex ISAs.

Power ArchitectureDesigned to be superscalar- dispatched across three independent units: branch, fixed-point arithmetic, and floating point units. This allows out of order execution.

Compound instructions--updating the base register on a load and store with the newly calculated effective address, thus eliminating the need for extra add instructions required to increment the index for array traversals.

Does not implement delayed branches- Instead the POWER architecture uses a branch target buffer, and the now well known branch folding technique.

Branching technique- The POWER architecture has eight condition registers that are set by compare instructions. One additional bit in the opcode of each instruction signaled that instructions should be executed only under certain conditions, a form of predicated execution.

Shortfalls..The original POWER microprocessor, one of the first superscalar RISC implementations, was a high performance, multi-chip design. IBM soon realized that they would need a single-chip microprocessor to scale their RS/6000 line from lower-end to high-end machines. Work on a single-chip POWER microprocessor, called the RSC (RISC Single Chip) began. In early 1991 IBM realized that their design could potentially become a high-volume microprocessor used across the industry.

PowerPC ArchitectureIn order to maintain RS/6000 software compatibility, the PowerPC adapted the POWER architecture, and many enhancements were added to provide a low-cost, single-chip, superscalar, multiprocessor capable, and 64-bit processor. Several bit/field instructions that use three source operands were eliminated to avoid the need for extra register ports. Complex string instructions were left out, consistent with the RISC philosophy. Instructions whose operation was dependent on the value of source operand were eliminated. Precision shifts, integer multiplies, and divide-with-reminder instructions were omitted. Support for operation in both big-endian and little-endian modesSingle and double precision floating-point arithmetic 64-bit architecture, backward compatible to 32-bit

PowerPC familyPowerPC 601:medium sized and medium performance processor includes a more sophisticated branch unitcapable to dispatch three out-of-order instructions per cycle. up to 8 instructions per cycle can be fetched directly into an eight-entry instruction queue (IQ), where they're decoded before being dispatched to the execution core.Branch folding: The instruction queue is used for detecting and dealing with branches. The branch unit scans bottom four entries of the queue, identifying branch instructions and determining what type they are (conditional, unconditional). In cases where the branch unit has enough information to resolve the branch right then and there (an unconditional branch, or a conditional branch whose condition is dependent on information that's already in the condition register) then the branch instruction is simply deleted from the instruction queue and replaced with the instruction located at the branch target.PowerPC 603:smaller die size than the 601smaller cache capable to dispatch three out-of-order instructions per cycle. The 604 and 620 microprocessors were developed in the sequel of the PowerPC production line. Both aimed for higher performance. The 604 was based on the 32-bit architecture while the 620 is a 64-bit architecture.

Current Status

PowerPC e200 - 32 bit power architecture microprocessor - speed ranging up to 600MHz - ideal for embedded applications. PowerPC e300 similar to e200 with an increase in speed upto 667 MHz. PowerPC e600 speed upto 2 Ghz ideal for high performance routing and telecommunications applications. POWER5 IBM dual core P POWER6 IBM Dual core P - A notable difference from POWER5 is that the POWER6 executes instructions in-order instead of out-of-order PowerPC G3 - Apple Macintosh computers such as the PowerBook G3, the multicolored iMacs, iBooks and several desktops, including both the Beige and Blue and White Power Macintosh G3s. PowerPC G4 - is a designation used by Apple Computer to describe a fourth generation of 32-bit PowerPC microprocessors.PowerPC G5 - 64-bit Power Architecture processors Xenon - based on IBMs PowerPC ISA XBOX 360 game console. Broadway based on IBMs PowerPC ISA Nintendo Wii gaming console

Blue Gene/L - dual core PowerPC 440, 700 MHz, 2004 Blue Gene/P - quad core PowerPC 450, 850 MHz, 2007

PowerPC ISAMix between Sparc(Risc) and Motorola(Cisc). Different implementation levels ( so the chip does not need to be fully implemented for embedded solutions ). Load and store architecture. Operations are always done over registers. Memory is never directly addressed. Offers a large number of mnemonics that increase the number of instructions without increasing the number of on-chip instruction. Passes arguments using registers and the stack.32-bit Registers, allow to address 4 gigabytes of virtual memory.

Overall designInteger Execution UnitFloating Point UnitLoad/Store Unit (LSU)Branch Execution UnitsMemory Management UnitMemory UnitCache

PowerPC Registers PowerPC's application-level registers are broken into three categories : general purpose, floating point and special purpose registers.General-purpose registers (GPRs) - r0 to r31flat-scheme of 32 general purpose registers.Source and destination for all integer operationsaddress source for all load/store operations.They also provide access to SPRs. All GPRs are available for use with one exception: in certain instructions, GPR0 simply means the value 0, and no lookup is done for GPR0's contents. Some of these registers have special tasks assigned to them:r0 Volatile register which may be modified during function linkage r1 Stack frame pointer, always validr2 System-reserved registerr3-r4 Volatile registers used for parameter passing and return valuesr5-r10 Volatile registers used for parameter passingr11-r12 Volatile registers which may be modified during function linkager13 Small data area pointer registerr14-r30 Registers used for local variablesr31 Used for local variables or "environment pointers

Floating point registersFloating-point registers (FPRs)- fr0 to fr3132 floating-point registers with 64-bit precision.source and destination operands of all floating-point operationscan contain 32-bit and 64-bit signed and unsigned integer values, as well as single-precision and double-precision floating-point values.FPRs also provide access to the FPSCR(Floating-Point Status and Control Register) FPSCR captures status and exceptions resulting from floating-point operations, and also provides control bits for enabling specific exception types.Instructions to load and store double precision floating point numbers transfers 64-bit of data without conversion.Instructions to load from memory single precision floating point numbers convert to double precision format before storing them in the register.f0 Volatile register f1 Volatile register used for parameter passing and return valuesf2-f8 Volatile registers used for parameter passingf9-f13 Volatile registersf14-f31 Registers used for local variables

Special-purpose registers (SPRs)The Fixed-Point Exception Register (XER)- used for indicating conditions for integer operations, such as carries and overflows.

The Floating-Point Status and Control Register (FPSCR)- 32-bit register used to store the status and control of the floating-point operations.

The Count Register (CTR)- used to hold a loop count that can be decremented during the execution of branch instructions.

The Condition Register (CR)-32-bit register grouped into eight fields, where each field is 4 bits that signify the result of an instructions operation: Equal (EQ), Greater Than (GT), Less Than (LT), and Summary Overflow (SO).

The Link Register (LR) contains the address to return to at the end of a function call.

Data Types It can use either little-endian or big-endian style.

Fixed-point data types include:Unsigned byte 8bits Unsigned halfword 16-bitsSigned halfword 16-bitsUnsigned word 32-bitSigned word 32-bitUnsigned doubleword 64-bitsByte Strings: From 0 128 bytes in length

2s complement is used for negative valuesfloating-point data formatssingle-precision, 32 bits long (23 + 8 + 1)double-precision, 64 bits long (52 + 11 + 1)characters are stored using 8-bit ASCII codes

Instruction types

Instruction FormatAll instruction encodings are 32 bits in length. Bit numbering for PowerPC is the opposite of most other definitions: bit 0 is the most significant bit, and bit 31 is the least significant bit. Instructions are first decoded by the upper 6 bits in a field, called the primary opcode. The remaining 26 bits contain fields for operand specifiers, immediate operands, and extended opcodes, and these may be reserved bits or fields.Common Instruction formats:Format0-56-1011-1516-2021-2526-293031D-formopcdtgt/srcsrc/tgtimmediateX-formopcdtgt/srcsrc/tgtsrc extended opcdA-formopcdtgt/srcsrc/tgtsrcsrcextended opcdRcBD-formopcdBOBIBDAALKI-formopcdLIAALK

Instruction formatD-form- provides up to two registers as source operands, one immediate source, and up to two registers as target operands. Some variations of this instruction format use portions of the target and source register operand specifiers as immediate fields or as extended opcodes.

X-form- provides up to two registers as source operands and up to two target operands. Some variations of this instruction format use portions of the target and source operand specifiers as immediate fields or as extended opcodes.

A-form- provides up to three registers as source operands, and one target operand. Some variations of this instruction format use portions of the target and source operand specifiers as immediate fields or as extended opcodes.

BD-form- conditional branch instruction. The BO field specifies the type of condition ; BI field specifies which CR bit to be used as the condition; BD field is used as the branch displacement. AA bit specifies whether the branch is an absolute or relative branch. The LK bit specifies whether the address of the next sequential instruction is saved in the Link Register as a return address for a subroutine call.

I-form- used by the unconditional branch instruction. Being unconditional, the BO and BI fields of the BD format are exchanged for additional branch displacement to form the LI instruction field. This instruction format also supports the AA and LK bits in the same fashion as the BD format.

Simplified powerpc instrution set http://pds.twi.tudelft.nl/vakken/in1200/labcourse/instruction-set/

D-formopcdtgt/srcsrc/tgtimmediate

X-formopcdtgt/srcsrc/tgtsrc extended opcd

A-formopcdtgt/srcsrc/tgtsrcsrcextended opcdRc

BD-formopcdBOBIBDAALK

I-formopcdLIAALK

Instruction formats

A-FormBD-FormD-Form

PowerPC Addressing ModesLoad/store architectureIndirectInstruction includes 16 bit displacement to be added to base register (may be GP register)Can replace base register content with new addressIndirect indexedInstruction references base register and index register (both may be GP)EA is sum of contentsBranch address Target address calculationAbsolute TA= actual addressRelative TA= current instruction address + displacement {25 bits, signed}IndirectArithmeticOperands in registers or part of instructionFloating point is register onlyLink Register TA= (LR)Count RegisterTA= (CR)

PowerPC function call conventionsResults from a function call are returned in GPR3, FPR1, or by passing a pointer to a structure as the implicit leftmost parameter.Any parameters that do not fit into the designated registers are passed on the stack. In addition, enough space is allocated on the stack to hold all parameters, whether they are passed in registers or not.PowerPC run-time environment uses a grow-down stack that allocates space for a function's parameters, linkage information, and for local variables.The environment uses a single stack pointer without any frame pointer. To achieve this simplification, the PowerPC stack has a much more rigidly defined structure.

PowerPC G4e PipeliningSeven Stage Pipeline

Superscalar Microprocessor allows multiple instructions to be executed in parallel.

Nine Execution UnitsBPU : Branch Processing UnitVPU : Vector Permute UnitVIU : Vector Integer UnitVCIU : Vector Complex Integer UnitVFPU : Vector Floating Point UnitFPU : Floating Point UnitIU : Integer UnitCIU : Complex Integer UnitLSU : Load/Store Unit

PowerPC G4e Pipeline StagesStages 1 and 2 - Instruction Fetch:

These two stages are both dedicated primarily to grabbing an instruction from the L1 cache.

The G4e can fetch four instructions per clock cycle from the L1 cache and send them on to the next stage

Stage 3 - Decode/Dispatch:

Once an instruction has been fetched, it goes into a 12-entry instruction queue to be decoded. The G4e's decoder can dispatch up to three instructions per clock cycle to the next stage.

PowerPC G4e Pipeline StagesStage 4 - Issue:

The first queue Floating-Point Issue Queue (FIQ), which holds floating-point (FP) instructions that are waiting to be executed.

The second is the Vector Issue Queue (VIQ), which holds vector operations.

The third queue is the General Instruction Queue (GIQ), which holds everything else.

Once the instruction leaves its issue queue, it goes to the execution engine to be executed.

PowerPC G4e Pipeline StagesStage 5 - Execute:

The instructions can pass out-of-order from their issue queues into their respective functional units and be executed.

Stage 6 and 7 - Complete and Write-Back :

In these two stages, the instructions are put back into the order in which they came into the processor, and their results are written back to memory.

Design principlesSimplicity favors' regularityStandard 32 bit instruction format for all instructionsfixed-length instructions,register-to-register architecturethree-operand instruction format.Smaller is faster3- Categories of registers , but each handles specific instructions so presumably faster access timeMake the common case fastInteger and floating point instructionsGood design demands good compromisesTo align with RISC principles many instructions that required three source operands were eliminatedMany complex instructions curtailed to confirm with RISC principles but compensated by large number of mnemonics that increase the number of instructions .

Pros and ConsInstruction Set200 machine instructionsMore complex than most RISC machinese.g. floating-point multiply and add instructions that take three input operandse.g. load and store instructions may automatically update the index register to contain the just-computed target addressPipelined executionMore sophisticated than SPARCInput and OutputTwo different modesDirect-store segment: map virtual address space to an external address spaceNormal virtual memory accessPermits a range of implementation from low cost controllers through high performance processors.

Referenceshttp://www.ibm.com/developerworks/linux/library/l-powarch/http://www.cresco.enea.it/LA1/cresco_sp14_ylichron/CBE-docs/PowerPC_Vers202_Book1_public.pdfhttp://en.wikipedia.org/wiki/PowerPC http://pds.twi.tudelft.nl/vakken/in1200/labcourse/instruction-set http://www.eecs.umich.edu/~stever/373/lecnotes2.pdf http://www.devx.com/ibm/Article/20943