architecture and assembly basics - 國立臺灣大學 · 2007-11-05 · architecture and assembly...

42
Architecture and Assembly Basics Computer Organization and Assembly Languages Yung-Yu Chuang 2007/10/29 with slides by Peng-Sheng Chen, Kip Irvine, Robert Sedgwick and Kevin Wayne

Upload: others

Post on 09-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Architecture and Assembly Basics

Computer Organization and Assembly Languages Yung-Yu Chuang 2007/10/29

with slides by Peng-Sheng Chen, Kip Irvine, Robert Sedgwick and Kevin Wayne

Page 2: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Announcements

• Midterm exam? Date? 11/12 (I prefer this one) or 11/19?

• 4 assignments plus midterm v.s. 5 assignments• Open-book

Page 3: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Basic architecture

Page 4: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Basic microcomputer design

• clock synchronizes CPU operations• control unit (CU) coordinates sequence of

execution steps• ALU performs arithmetic and logic operations

Central Processor Unit(CPU)

Memory StorageUnit

registers

ALU clock

I/ODevice

#1

I/ODevice

#2

data bus

control bus

address bus

CU

Page 5: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Basic microcomputer design

• The memory storage unit holds instructions and data for a running program

• A bus is a group of wires that transfer data from one part to another (data, address, control)

Central Processor Unit(CPU)

Memory StorageUnit

registers

ALU clock

I/ODevice

#1

I/ODevice

#2

data bus

control bus

address bus

CU

Page 6: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Clock

• synchronizes all CPU and BUS operations• machine (clock) cycle measures time of a single

operation• clock is used to trigger events

one cycle

1

0

• Basic unit of time, 1GHz→clock cycle=1ns• A instruction could take multiple cycles to

complete, e.g. multiply in 8088 takes 50 cycles

Page 7: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Instruction execution cycle

• Fetch• Decode• Fetch

operands• Execute• Store output

I-1 I-2 I-3 I-4

PC program

I-1instructionregister

op1op2

memory fetch

ALU

registers

writ

e

decode

execute

read

writ

e

(output)

registers

flags

program counterinstruction queue

Page 8: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Advanced architecture

Page 9: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Multi-stage pipeline

• Pipelining makes it possible for processor to execute instructions in parallel

• Instruction execution divided into discrete stages

S1 S2 S3 S4 S51

Cyc

les

StagesS6

23456789

101112

I-1

I-2

I-1

I-2

I-1

I-2

I-1

I-2

I-1

I-2

I-1

I-2

Example of a non-pipelined processor. For example, 80386. Many wasted cycles.

Page 10: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Pipelined execution

• More efficient use of cycles, greater throughput of instructions: (80486 started to use pipelining)

S1 S2 S3 S4 S51

Cyc

les

StagesS6

234567

I-1I-2 I-1

I-2 I-1I-2 I-1

I-2 I-1I-2 I-1

I-2

For k stages and n instructions, the number of required cycles is:

k + (n – 1)

compared to k*n

Page 11: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Wasted cycles (pipelined)

• When one of the stages requires two or more clock cycles, clock cycles are again wasted.

S1 S2 S3 S4 S51

Cyc

les

Stages

S6

234567

I-1I-2I-3

I-1I-2I-3

I-1I-2I-3

I-1

I-2 I-1I-1

89

I-3 I-2I-2

exe

1011

I-3I-3

I-1

I-2

I-3

For k stages and ninstructions, the number of required cycles is:

k + (2n – 1)

Page 12: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Superscalar

A superscalar processor has multiple execution pipelines. In the following, note that Stage S4 has left and right pipelines (u and v).

S1 S2 S3 u S51

Cyc

les

Stages

S6

234567

I-1I-2I-3I-4

I-1I-2I-3I-4

I-1I-2I-3I-4

I-1

I-3 I-1I-2 I-1

v

I-2

I-4

S4

89

I-3I-4

I-2I-3

10 I-4

I-2

I-4

I-1

I-3

For k states and ninstructions, the number of required cycles is:

k + n

Pentium: 2 pipelinesPentium Pro: 3

Page 13: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

More stages, better performance?

• Pentium 3: 10• Pentium 4: 20~31• Next-generation micro-architecture: 14• ARM7: 3

Page 14: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Pipeline hazardsadd r1, r2, #10 ; write r1sub r3, r1, #20 ; read r1

fetch decode reg ALU memory wb

fetch decode reg ALUstall

Page 15: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Pipelined branch behavior

fetch decode reg ALU memory wb

fetch decode reg ALU memory wb

fetch decode reg ALU memory wb

fetch decode reg ALU memory

fetch decode reg ALU

Page 16: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Reading from memory

• Multiple machine cycles are required when reading from memory, because it responds much more slowly than the CPU (e.g.33 MHz). The wasted clock cycles are called wait states.

Processor ChipProcessor Chip

L1 Data1 cycle latency

16 KB4-way assoc

Write-through32B lines

L1 Instruction16 KB, 4-way

32B lines

Regs.L2 Unified

128KB--2 MB4-way assocWrite-back

Write allocate32B lines

L2 Unified128KB--2 MB4-way assocWrite-back

Write allocate32B lines

MainMemory

Up to 4GB

MainMemory

Up to 4GB

Pentium III cache hierarchy

Page 17: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Cache memory

• High-speed expensive static RAM both inside and outside the CPU.– Level-1 cache: inside the CPU– Level-2 cache: outside the CPU

• Cache hit: when data to be read is already in cache memory

• Cache miss: when data to be read is not in cache memory. When? compulsory, capacity and conflict.

• Cache design: cache size, n-way, block size, replacement policy

Page 18: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Memory system in practice

Larger, slower, and cheaper (per byte)storage devices

registers

on-chip L1cache (SRAM)

main memory(DRAM)

local secondary storage (virtual memory)(local disks)

remote secondary storage(tapes, distributed file systems, Web servers)

off-chip L2cache (SRAM)

L0:

L1:

L2:

L3:

L4:

L5:

Smaller, faster, andmore expensive (per byte) storage devices

Page 19: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Multitasking

• OS can run multiple programs at the same time.• Multiple threads of execution within the same

program.• Scheduler utility assigns a given amount of CPU

time to each running program.• Rapid switching of tasks

– gives illusion that all programs are running at once– the processor must support task switching– scheduling policy, round-robin, priority

Page 20: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Trade-offs of instruction sets

• Before 1980, the trend is to increase instruction complexity (one-to-one mapping if possible) to bridge the gap. Reduce fetch from memory. Selling point: number of instructions, addressing modes. (CISC)

• 1980, RISC. Simplify and regularize instructions to introduce advanced architecture for better performance, pipeline, cache, superscalar.

high-level language machine codesemantic gap

compiler

C, C++Lisp, Prolog, Haskell…

Page 21: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

RISC

• 1980, Patternson and Ditzel (Berkeley),RISC• Features

– Fixed-length instructions– Load-store architecture– Register file

• Organization– Hard-wired logic– Single-cycle instruction– Pipeline

• Pros: small die size, short development time, high performance

• Cons: low code density, not x86 compatible

Page 22: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

CISC and RISC

• CISC – complex instruction set– large instruction set– high-level operations (simpler for compiler?)– requires microcode interpreter (could take a long

time)– examples: Intel 80x86 family

• RISC – reduced instruction set– small instruction set– simple, atomic instructions– directly executed by hardware very quickly– easier to incorporate advanced architecture design– examples: ARM (Advanced RISC Machines) and DEC

Alpha (now Compaq)

Page 23: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Instruction set design

• Number of addresses• Addressing modes• Instruction types

Page 24: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Instruction types

• Arithmetic and logic• Data movement• I/O (memory-mapped, isolated I/O) • Flow control

– Branches (unconditional, conditional)• set-then-jump (cmp AX, BX; je target)• Test-and-jump (beq r1, r2, target)

– Procedure calls (register-based, stack-based)• Pentium: ret; MIPS: jr• Register: faster but limited number of parameters• Stack: slower but more general

Page 25: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Number of addresses

T ← (T-1) OP TOP0

AC ← AC OP AOP A1

A ← A OP BOP A, B2

A ← B OP COP A, B, C3

operationinstructionNumber of addresses

A, B, C: memory or register locationsAC: accumulatorT: top of stackT-1: second element of stack

Page 26: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Number of addresses

SUB Y, A, B ; Y = A - B

MUL T, D, E ; T = D × EADD T, T, C ; T = T + CDIV Y, Y, T ; Y = Y / T

)( EDCBAY×+

−=3-address instructions

opcode A B C

Page 27: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Number of addresses

MOV Y, A ; Y = ASUB Y, B ; Y = Y - BMOV T, D ; T = D

MUL T, E ; T = T × EADD T, C ; T = T + CDIV Y, T ; Y = Y / T

)( EDCBAY×+

−=2-address instructions

opcode A B

Page 28: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Number of addresses

LD D ; AC = D

MUL E ; AC = AC × EADD C ; AC = AC + CST Y ; Y = ACLD A ; AC = ASUB B ; AC = AC – BDIV Y ; AC = AC / YST Y ; Y = AC

)( EDCBAY×+

−=1-address instructions

opcode A

Page 29: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Number of addresses

PUSH A ; APUSH B ; A, BSUB ; A-BPUSH C ; A-B, CPUSH D ; A-B, C, DPUSH E ; A-B, C, D, EMUL ; A-B, C, D× EADD ; A-B, C+(D× E)DIV ; (A-B) / (C+(D× E))POP Y

)( EDCBAY×+

−=0-address instructions

opcode

Page 30: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Number of addresses

• A basic design decision; could be mixed• Fewer addresses per instruction result in

– a less complex processor– shorter instructions– longer and more complex programs– longer execution time

• The decision has impacts on register usage policy as well– 3-address usually means more general-

purpose registers– 1-address usually means less

Page 31: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Addressing modes

• How to specify location of operands? Trade-off for address range, address flexibility, number of memory references, calculation of addresses

• Often, a mode field is used to specify the mode• Common addressing modes

– Implied – Immediate (st R1, 1)– Direct (st R1, A)– Indirect– Register (add R1, R2, R3)– Register indirect (sti R1, R2)– Displacement– Stack

Page 32: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Implied addressing

• No address field; operand is implied by the instruction CLC ; clear carry

• A fixed and unvarying address

instructionopcode

Page 33: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Immediate addressing

• Address field contains the operand value ADD 5; AC=AC+5

• Pros: no extra memory reference; faster

• Cons: limited range

instructionoperandopcode

Page 34: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Direct addressing

• Address field contains the effective address of the operandADD A; AC=AC+[A]

• single memory reference

• Pros: no additional address calculation

• Cons: limited address space

address A

operand

opcodeinstruction

Memory

Page 35: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Indirect addressing

• Address field contains the address of a pointer to the operandADD [A]; AC=AC+[[A]]

• multiple memory references

• Pros: large address space

• Cons: slower

address Aopcodeinstruction

operand

Memory

Page 36: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Register addressing

• Address field contains the address of a registerADD R; AC=AC+R

• Pros: only need a small address field; shorter instruction and faster fetch; no memory reference

• Cons: limited address space

Ropcodeinstruction

operand

Registers

Page 37: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Register indirect addressing

• Address field contains the address of the register containing a pointer to the operandADD [R]; AC=AC+[R]

• Pros: large address space

• Cons: extra memory reference

Ropcodeinstruction

Registersoperand

Memory

Page 38: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Displacement addressing

• Address field could contain a register address and an addressMOV EAX, [A+ESI*4]

• EA=A+[R×S] or vice versa

• Several variants– Base-offset: [EBP+8]– Base-index: [EBX+ESI]– Scaled: [T+ESI*4]

• Pros: flexible• Cons: complex

Ropcodeinstruction

Registersoperand

Memory

A

+

Page 39: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Displacement addressing

MOV EAX, [A+ESI*4]

• Often, register, called indexing register, is used for displacement.

• Usually, a mechanism is provided to efficiently increase the indexing register.

opcodeinstruction

Registersoperand

Memory

A

+

R

Page 40: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Effective address calculation

8

base index s displacement3 3 2 8 or 32A dummy format for one operand

registerfile

addershifter adder memory

Page 41: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Stack addressing

• Operand is on top of the stackADD [R]; AC=AC+[R]

• Pros: large address space

• Pros: short and fast fetch

• Cons: limited by FILO order

opcodeinstruction

Stack

implicit

Page 42: Architecture and Assembly Basics - 國立臺灣大學 · 2007-11-05 · Architecture and Assembly Basics Computer Organization and Assembly Languages ... high performance • Cons:

Addressing modes

Limited applicabilityNo memory refEA=stack topstack

ComplexityFlexibilityEA=A+[R]Displacement

Extra memory refLarge address spaceEA=[R]Register indirect

Limited address spaceNo memory refEA=RRegister

Multiple memory refLarge address spaceEA=[A]Indirect

Limited address spaceSimpleEA=ADirect

Limited operandNo memory refOperand=AImmediate

Limited instructionsFast fetchImplied

ConsProsMeaningMode