4-inst set architecture

Upload: bilal-hassan

Post on 05-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 4-Inst Set Architecture

    1/43

  • 8/2/2019 4-Inst Set Architecture

    2/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 2

    Instruction Set Architecture

    Instruction set architecture is based on thestructure of a computer i.e. the description ofthe CPU in terms of Registers, Addressabilityand various Arithmetic / Control and Storeoperations etc.

    Assembly / Machine language programmermust understand ISA of target processor toprogram for it.

  • 8/2/2019 4-Inst Set Architecture

    3/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 3

    Instruction Set Architecture

    instruction set

    High level language code : C, C++, Java,Fortan,

    hardware

    Assembly language code: architecture specific statements

    Machine language code: architecture specific bit patterns

    software

    compiler

    assembler

    The programs written in any higher level languageeventually get converted to assembly level containinginstructions in mnemonics of instruction set.

    The Assembler converts these into machine languagebefore execution

  • 8/2/2019 4-Inst Set Architecture

    4/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 4

    ISA Metrics

    Orthogonally All operand modes are available with any data type or

    instruction type.

    Completeness

    Support for a wide range of operations and target

    applications

    Regularity

    No overloading for the meanings of instruction fields

    Streamlined

    Resource needs easily determined

    Ease of assembly language programming

    Ease of implementation

  • 8/2/2019 4-Inst Set Architecture

    5/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 5

    Instruction Set Design Issues

    Instruction set design issues include: Where are operands stored?

    registers, memory, stack, accumulator

    How many explicit operands are there?

    0, 1, 2, or 3 How is the operand location specified?

    register, immediate, indirect, . . .

    What type & size of operands are supported?

    byte, int, float, double, string, vector. . . What operations are supported?

    add, sub, mul, move, compare . . .

  • 8/2/2019 4-Inst Set Architecture

    6/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 6

    Evolution of Instruction Sets

    Single Accumulator (EDSAC 1950)

    Accumulator + Index Registers

    (Manchester Mark I, IBM 700 series 1953)

    Separation of Programming Modelfrom Implementation

    High-level Language Based Concept of a Family

    (B5000 1963) (IBM 360 1964)

    General Purpose Register Machines

    Complex Instruction Sets Load/Store Architecture

    RISC

    (Vax, Intel 8086 1977-80) (CDC 6600, Cray 1 1963-76)

    (Mips,Sparc,88000,IBM RS6000, . . .1987+)

  • 8/2/2019 4-Inst Set Architecture

    7/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 7

    Classifying ISAs

    Accumulator (before 1960):1 address add A acc acc + mem[A]

    Stack (1960s to 1970s):0 address add tos tos + next

    Memory-Memory (1970s to 1980s):2 address add A, B mem[A] mem[A] + mem[B]3 address add A, B, C mem[A] mem[B] + mem[C]

    Register-Memory (1970s to present):2 address add R1, A R1 R1 + mem[A]

    load R1, A R1 mem[A]

    Register-Register (Load/Store) (1960s to present):3 address add R1, R2, R3 R1 R2 + R3

    load R1, R2 R1 mem[R2]store R1, R2 mem[R1] R2

  • 8/2/2019 4-Inst Set Architecture

    8/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 8

    Types of Addressing Modes (VAX)

    Addressing Mode Example Action

    1. Register direct Add R4, R3 R4

  • 8/2/2019 4-Inst Set Architecture

    9/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 9

    Types of Addressing Modes Intel Instruction Set

    Register Instructions involving data manipulation through registers

    Immediate Involves immediate values contained within the instructionDirect Transfer data to/from memory location to memory/ registerRegister Indirect Transfer a data byte/word to a location whose address is

    specified in a register e.g. [Bx]Use of Byte PTR, Word PTR, DWord PTR specifies boundary

    of data.Base + IndexIndirect

    MOV AX, [BX+SI]Relative MOV AX, (BX+4)Base relative

    plus index MOV AX, (BX+SI+4)Scaled Index MOV AX,[AX+4*BX]

  • 8/2/2019 4-Inst Set Architecture

    10/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 10

    Instruction Encoding

    Variable Size

    Instruction length varies based on opcode and addressspecifiers

    For example, VAX instructions vary between 1 and 53 bytes,while x86 instruction vary between 1 and 17 bytes.

    Good source code density, but difficult to decode and pipeline

    Fixed Size

    Only a single size for all instructions

    For example, DLX, MIPS, Power PC, Sparc all have 32 bitinstructions

    Not as good code density, but easier to decode and pipeline

    Hybrid Size

    Have multiple format lengths specified by the opcode

    For example, IBM 360/370

    Compromise between code density and ease in decoding

  • 8/2/2019 4-Inst Set Architecture

    11/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 11

    DLX Architecture

    Introduced by Hennessey and Patterson in 1990 Derived from many different instruction set architectures

    from MIPS, Sun, IBM, Intel, HP, AMD, etc.

    DLX is a typical RISC architecture.

    32-bit fixed length instructions 3 instruction formats

    Load/store architecture

    Simple branch conditions (no condition codes)

    DLX registers 32 32-bit general-purpose registers (R0 = 0)

    32 32-bit (or 16 64-bit) floating point registers

    Special purpose registers (e.g., FP Status and PC)

  • 8/2/2019 4-Inst Set Architecture

    12/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 12

    DLX Design Decisions

    DLX is based on the following design decisions Use general purpose registers with a load-store architecture

    Support commonly used addressing modes

    displacement, immediate, and register deferred

    Support simple instructions that occur frequently load, store, add, subtract, move, and, shift, compare

    equal, branch, jump, call, and return

    Support commonly required data sizes

    8 (byte), 16 (half word), and 32-bit (word) integers

    32 (float) and 64-bit (double) floating point Use fixed length instructions that are easy to decode

    Provide plenty of general purpose registers and separatefloating point registers

  • 8/2/2019 4-Inst Set Architecture

    13/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 13

    DLX Instruction Formats

    Op

    31 26 01516202125

    rs1 rd immediate

    Op

    31 26 025

    Op

    31 26 01516202125

    rs1 rs2

    offset added to PC

    rd

    (a) Register-Register (R-type) ADD R1, R2, R3

    561011

    (b) Register-Immediate (I-type) SUB R1, R2, #3

    (c) Jump / Call (J-type) JUMP end

    function

    (ALU immediate operations, loads and stores, conditional branch, jump )

    (jump, jump and link, trap and return from exception)

    (ALL reg. operations, read/write special registers and moves)

  • 8/2/2019 4-Inst Set Architecture

    14/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 14

    Intel 80x86 Integer Registers

  • 8/2/2019 4-Inst Set Architecture

    15/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 15

    X86 Operand Types

    x86 instructions typically have two operands,where one operand is both a source and adestination operand.

    Possible combinations include

    Source/destination type Second source typeRegister Register

    Register Immediate

    Register Memory

    Memory RegisterMemory Immediate

    No memory-memory or immediate-immediate

    Immediate can be 8, 16, or 32 bits

  • 8/2/2019 4-Inst Set Architecture

    16/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 16

    Intel 80x86 Floating Point Registers

  • 8/2/2019 4-Inst Set Architecture

    17/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 17

    80x86 Instructions

    Data movement

    (move, push, pop) Arithmetic and logic

    (logic ops, tests CCs, shifts, integer and decimal arithmetic)

    Control flow(branches, jumps, calls, returns)

    String instructions(move and compare)

    FP data movement(load, load const., store)

    Arithmetic instructions

    (add, subtract, multiply, divide, square root, absolute value) Comparisons

    (Result to Flag)

    Transcendental functions(sin, cos, log, etc.)

  • 8/2/2019 4-Inst Set Architecture

    18/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 18

    80x86 Instruction Format

    Instructions sizes vary from 1 to 17 bytes

  • 8/2/2019 4-Inst Set Architecture

    19/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 19

    Instruction Set 8088 / 8086 CPU

    FORMATS

    1. One Byte The instructions have implied data or registeroperands.The least significant three bits specify register if any

    2. Register to Register

    Two Byte instruction where first byte contains Opcode followed by widthand second operand has 2nd register and R/ M fields. Mod field is 11

    3. Register to / from Memory without displacement

    NOTE: W fields D1 gives Dir i.e. 0 Byte2 Reg is Source, 1 Byte 2 Reg is DestinationW fields D0 bit specifies whether it is a eight bit data of 16 bit data

    R/M field specifies one of 8 registers. The MOD field is 11 for Register, 00 for memorywithout displacement, 01 for memory with 8 bit displacement and 10 for 16 bit displacement

  • 8/2/2019 4-Inst Set Architecture

    20/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 20

  • 8/2/2019 4-Inst Set Architecture

    21/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 21

    4. Register to / from Memory with DisplacementOne or Two additional bytes specify displacement

    5. Immediate operand to RegisterIn this instruction the 7bits of first byte and bits 3-4 of secondbyte specify the op code. The last two bytes specify the data

  • 8/2/2019 4-Inst Set Architecture

    22/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 22

    6. Immediate Operand to Memory with 16 bit Displacement

    First two bytes specify the Opcode MOD and R/M as before followed bytwo bytes of displacement and two bytes of data

    Significance of OPCODE fields

  • 8/2/2019 4-Inst Set Architecture

    23/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 23

  • 8/2/2019 4-Inst Set Architecture

    24/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 24

  • 8/2/2019 4-Inst Set Architecture

    25/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 25

  • 8/2/2019 4-Inst Set Architecture

    26/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 26

  • 8/2/2019 4-Inst Set Architecture

    27/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 27

  • 8/2/2019 4-Inst Set Architecture

    28/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 28

  • 8/2/2019 4-Inst Set Architecture

    29/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 29

  • 8/2/2019 4-Inst Set Architecture

    30/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 30

    Graphics and Multimedia Instruction Set Extensions

    Several companies have extended their computers

    instruction sets to support graphics and multimediaapplications. Intels MMX Technology Intels Internet Streaming SIMD Extensions AMDs 3DNow! Technology

    Suns Visual Instruction Set Motorolas and IBMs AltiVec Technology

    These extensions improve the performance of Computer-aided design

    Internet applications Computer visualization Video games Speech recognition

  • 8/2/2019 4-Inst Set Architecture

    31/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 31

    MMX Instructions

    MMX Technology adds 57 new instructions tothe x86 architecture (Reference article on PII MMX)

    Some of these instructions include PADD(b, w, d) Packed addition PSUB(b, w, d) Packed subtraction PCMPE(b, w, d) Packed compare equal PMULLw Packed word multiply low PMULHw Packed word multiply high PMADDwd Packed word multiply-add PSRL(w, d, q) Pack shift right logical PACKSS(wb, dw) Pack data PUNPCK(bw, wd, dq) Unpack data PAND, POR, PXOR Packed logical operations

  • 8/2/2019 4-Inst Set Architecture

    32/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 32

    MMX Data Types

    MMX Technology supports operations on thefollowing 64-bit integer data types.

    Packed byte (eight 8-bit elements)

    Packed word (four 16-bit elements)

    Packed double word (two 32-bit elements)

    Packed quad word (one 64-bit elements)

  • 8/2/2019 4-Inst Set Architecture

    33/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 33

    SIMD Operations

    A3 A2 A1 A0

    B3 B2 B1 B0

    A3+B3 A2+B2 A1+B1 A0+B0

    MMX Technology allows a Single Instruction to work on Multiplepieces of Data (SIMD).

    Example: PADD[W]: Packed add word

    4 parallel adds are performed on 16-bit elements. Most MMX instructions only require a single cycle.

  • 8/2/2019 4-Inst Set Architecture

    34/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 34

    Saturating Arithmetic

    Both wrap-around and saturating ADD instructions are supported.

    With saturating arithmetic, results that overflow are set to thelargest value.

    Below are examples for both types

    PADD[W]: Packed wrap-around add PADDUS[W]: Packed saturating add

  • 8/2/2019 4-Inst Set Architecture

    35/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 35

    Pack and Unpack Instructions

    Pack and unpack instructions provideconversion between standard data types andpacked data types

    PACKSS[DW]: Packed Signed with Saturating Double to Packed Word

  • 8/2/2019 4-Inst Set Architecture

    36/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 36

    Multiply-Add Operations

    Many graphics applications require multiply-accumulate operations

    Vector Dot Products

    Matrix Multiplication

    Fast Fourier Transforms (FFTs)

    Filter implementations

    PMADDWD: Packed multiply-add word to double

  • 8/2/2019 4-Inst Set Architecture

    37/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 37

    Vector Dot Product of two 8 Byte vectors

    A dot product on two 8-element vector can be performed

    using 9 MMX instructions

    a0*c0+..+ a3*c3 a4*c4+..+ a7*c70 0

    a0*c0+..+ a7*c7

    With MMX 9 Instructions2 loads for one of the vectors

    Other vector is loaded by PMADD2 PMADDs,2 PADDs,2 shifts (if reqd. to fix precision)1 Store

  • 8/2/2019 4-Inst Set Architecture

    38/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 38

    Vector Dot Product of two 8 Byte vectors

    Without MMX 40 instructions

    16 Load8 Multiply8 Shift7 Add1 Store

    a0*c0+..+ a3*c3 a4*c4+..+ a7*c70 0

    a0*c0+..+ a7*c7

  • 8/2/2019 4-Inst Set Architecture

    39/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 39

    MMX Technology Summary

    MMX technology extends the Intel x86 architecture toimprove the performance of multimedia and graphicsapplications.

    Most MMX instructions can be executed in one clock cycle,so the performance improvement will be more dramatic than

    the simple ratio of instruction counts. It provides a speedup of 1.5 to 2.0 for certain applications.

    Only increase the chip area by about 5%.

  • 8/2/2019 4-Inst Set Architecture

    40/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 40

    MMX Technology Summary

    MMX instructions are hand-coded in assembly orimplemented as libraries to achieve high

    performance.

    MMX data types use the x86 floating point registers

    Makes it easy to handle context switches

    Makes it hard to perform MMX and floating pointinstructions at the same time

  • 8/2/2019 4-Inst Set Architecture

    41/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 41

    Internet Streaming SIMD Extensions

    ISSE introduced eight 128-bit data registers(called XMM registers)

    In 64-bit modes, they are available as 16 X 64-bit registers

    The 128-bit packed single-precision floating-point data type,allows four single-precision operations to be performed

    simultaneously

  • 8/2/2019 4-Inst Set Architecture

    42/43

    4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed42

    ISSE Data Type

    ISSE extensions introduced one new data type

    128-Bit Packed Single-Precision Floating-Point Data Type

    SSE 2 introduced five data types

  • 8/2/2019 4-Inst Set Architecture

    43/43

    43

    Internet Streaming SIMD Extensions

    Intels Internet Streaming SIMD Extensions (ISSE) Help improve the performance of video and 3D applications

    70 new instructions beyond MMX Technology

    Adds new 128-bit registers

    Provide the ability to perform parallel floating point operations

    Four parallel operations on 32-bit numbers

    Reciprocal and reciprocal root instructions - normalization

    Packed average instruction Motion compensation

    Provide data pre-fetch instructions

    Make certain applications 1.5 to 2.0 times faster.