computer architecture lecture 3 basic fundamentals and instruction sets

57
Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

Upload: noah-cross

Post on 31-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

Computer Architecture

Lecture 3

Basic Fundamentals

and

Instruction Sets

Page 2: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

2

The Task of a Computer Designer

1.1 Introduction 1.2 The Task of a Computer

Designer 1.3 Technology and Computer

Usage Trends 1.4 Cost and Trends in Cost 1.5 Measuring and Reporting

Performance 1.6 Quantitative Principles of

Computer Design 1.7 Putting It All Together: The

Concept of Memory Hierarchy

Evaluate ExistingEvaluate ExistingSystems for Systems for BottlenecksBottlenecks

Simulate NewSimulate NewDesigns andDesigns and

OrganizationsOrganizations

Implement NextImplement NextGeneration SystemGeneration System

TechnologyTrends

Benchmarks

Workloads

ImplementationComplexity

Page 3: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

3

Technology and Computer Usage Trends

1.1 Introduction

1.2 The Task of a Computer Designer

1.3 Technology and Computer Usage Trends

1.4 Cost and Trends in Cost

1.5 Measuring and Reporting Performance

1.6 Quantitative Principles of Computer Design

1.7 Putting It All Together: The Concept of Memory Hierarchy

Similarly, Computer Architecture is about working within constraints:

• What will the market buy?

• Cost/Performance

• Tradeoffs in materials and processes

When building a Cathedral numerous very practical considerations need to be taken into account:

• available materials

• worker skills

• willingness of the client to pay the price.

Page 4: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

4

TrendsGordon Moore (Founder of Intel) observed in 1965 that the number of

transistors that could be crammed on a chip doubles every year.

This has CONTINUED to be true since then.Transistors Per Chip

1.E+03

1.E+04

1.E+05

1.E+06

1.E+07

1.E+08

1970 1975 1980 1985 1990 1995 2000 2005

4004

Power PC 601486

386

80286

8086

Pentium

Pentium Pro

Pentium II

Power PC G3

Pentium 3

Page 5: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

5

Measuring And Reporting Performance

1.1 Introduction

1.2 The Task of a Computer Designer

1.3 Technology and Computer Usage Trends

1.4 Cost and Trends in Cost

1.5 Measuring and Reporting Performance

1.6 Quantitative Principles of Computer Design

1.7 Putting It All Together: The Concept of Memory Hierarchy

This section talks about:

1. Metrics – how do we describe in a numerical way the performance of a computer?

2. What tools do we use to find those metrics?

Page 6: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

6

Metrics

• Time to run the task (ExTime)– Execution time, response time, latency

• Tasks per day, hour, week, sec, ns … (Performance)

– Throughput, bandwidth

Plane

Boeing 747

BAD/Sud Concodre

Speed

610 mph

1350 mph

DC to Paris

6.5 hours

3 hours

Passengers

470

132

Throughput (pmph)

286,700

178,200

Page 7: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

7

Metrics - Comparisons

"X is n times faster than Y" means

ExTime(Y) Performance(X)

--------- = ---------------

ExTime(X) Performance(Y)

Speed of Concorde vs. Boeing 747

Throughput of Boeing 747 vs. Concorde

Page 8: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

8

Metrics - ComparisonsPat has developed a new product, "rabbit" about which she wishes to determine

performance. There is special interest in comparing the new product, rabbit to the old product, turtle, since the product was rewritten for performance reasons. (Pat had used Performance Engineering techniques and thus knew that rabbit was "about twice as fast" as turtle.) The measurements showed:

 

Performance Comparisons

 Product Transactions / second Seconds/ transaction Seconds to process transaction

Turtle 30 0.0333 3

Rabbit 60 0.0166 1

Which of the following statements reflect the performance comparison of rabbit and turtle?

 o Rabbit is 100% faster than turtle.o Rabbit is twice as fast as turtle.o Rabbit takes 1/2 as long as turtle.o Rabbit takes 1/3 as long as turtle.o Rabbit takes 100% less time than turtle.

o Rabbit takes 200% less time than turtle.o Turtle is 50% as fast as rabbit.o Turtle is 50% slower than rabbit.o Turtle takes 200% longer than rabbit.o Turtle takes 300% longer than rabbit.

Page 9: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

9

Metrics - Throughput

Compiler

Programming Language

Application

DatapathControl

Transistors Wires Pins

ISA

Function Units

(millions) of Instructions per second: MIPS(millions) of (FP) operations per second: MFLOP/s

Cycles per second (clock rate)

Megabytes per second

Answers per monthOperations per second

Page 10: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

10

Methods For Predicting Performance

• Benchmarks, Traces, Mixes• Hardware: Cost, delay, area, power estimation• Simulation (many levels)

– ISA, RT, Gate, Circuit• Queuing Theory• Rules of Thumb• Fundamental “Laws”/Principles

Page 11: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

11

Benchmarks

• First Round 1989– 10 programs yielding a single number (“SPECmarks”)

• Second Round 1992– SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs)

• Compiler Flags unlimited. March 93 of DEC 4000 Model 610:

spice: unix.c:/def=(sysv,has_bcopy,”bcopy(a,b,c)=memcpy(b,a,c)”

wave5: /ali=(all,dcom=nat)/ag=a/ur=4/ur=200

nasa7: /norecu/ag=a/ur=4/ur2=200/lc=blas• Third Round 1995

– new set of programs: SPECint95 (8 integer programs) and SPECfp95 (10 floating point)

– “benchmarks useful for 3 years”– Single flag setting for all programs: SPECint_base95, SPECfp_base95

SPEC: System Performance Evaluation Cooperative

Page 12: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

12

BenchmarksCINT2000 (Integer Component of SPEC CPU2000):

Program Language What Is It164.gzip C Compression

175.vpr C FPGA Circuit Placement and Routing

176.gcc C C Programming Language Compiler

181.mcf C Combinatorial Optimization

186.crafty C Game Playing: Chess

197.parser C Word Processing

252.eon C++ Computer Visualization

253.perlbmk C PERL Programming Language

254.gap C Group Theory, Interpreter

255.vortex C Object-oriented Database

256.bzip2 C Compression

300.twolf C Place and Route Simulator

http://www.spec.org/osg/cpu2000/CINT2000/

Page 13: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

13

BenchmarksCFP2000 (Floating Point Component of SPEC

CPU2000):Program Language What Is It168.wupwiseFortran 77 Physics / Quantum Chromodynamics171.swim Fortran 77 Shallow Water Modeling172.mgrid Fortran 77 Multi-grid Solver: 3D Potential Field173.applu Fortran 77 Parabolic / Elliptic Differential Equations177.mesa C 3-D Graphics Library178.galgel Fortran 90 Computational Fluid Dynamics179.art C Image Recognition / Neural Networks183.equake C Seismic Wave Propagation Simulation187.facerec Fortran 90 Image Processing: Face Recognition188.ammp C Computational Chemistry189.lucas Fortran 90 Number Theory / Primality Testing191.fma3d Fortran 90 Finite-element Crash Simulation 200.sixtrack Fortran 77 High Energy Physics Accelerator Design 301.apsi Fortran 77 Meteorology: Pollutant Distribution

http://www.spec.org/osg/cpu2000/CFP2000/

Page 14: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

14

Benchmarks Sample Results For SpecINT2000

Base Base Base Peak Peak Peak

Benchmarks Ref Time Run Time Ratio Ref Time Run Time Ratio

164.gzip 1400 277 505* 1400 270 518*

175.vpr 1400 419 334* 1400 417 336*

176.gcc 1100 275 399* 1100 272 405*

181.mcf 1800 621 290* 1800 619 291*

186.crafty 1000 191 522* 1000 191 523*

197.parser 1800 500 360* 1800 499 361*

252.eon 1300 267 486* 1300 267 486*

253.perlbmk 1800 302 596* 1800 302 596*

254.gap 1100 249 442* 1100 248 443*

255.vortex 1900 268 710* 1900 264 719*

256.bzip2 1500 389 386* 1500 375 400*

300.twolf 3000 784 382* 3000 776 387*

SPECint_base2000 438

SPECint2000 442

http://www.spec.org/osg/cpu2000/results/res2000q3/cpu2000-20000718-00168.asc

Intel OR840(1 GHz Pentium III processor)

Page 15: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

15

BenchmarksPerformance Evaluation

• “For better or worse, benchmarks shape a field”• Good products created when have:

– Good benchmarks– Good ways to summarize performance

• Given sales is a function in part of performance relative to competition, investment in improving product as reported by performance summary

• If benchmarks/summary inadequate, then choose between improving product for real programs vs. improving product to get more sales;Sales almost always wins!

• Execution time is the measure of computer performance!

Page 16: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

16

Benchmarks

Management would like to have one number.

Technical people want more:

1. They want to have evidence of reproducibility – there should be enough information so that you or someone else can repeat the experiment.

2. There should be consistency when doing the measurements multiple times.

How to Summarize Performance

How would you report these results?

Computer A Computer B Computer C

Program P1 (secs) 1 10 20

Program P2 (secs) 1000 100 20

Total Time (secs) 1001 110 40

Page 17: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

17

Quantitative Principles of Computer Design

1.1 Introduction

1.2 The Task of a Computer Designer

1.3 Technology and Computer Usage Trends

1.4 Cost and Trends in Cost

1.5 Measuring and Reporting Performance

1.6 Quantitative Principles of Computer Design

1.7 Putting It All Together: The Concept of Memory Hierarchy

Make the common case fast.Amdahl’s Law:

Relates total speedup of a system to the speedup of some portion of that system.

Page 18: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

18

Amdahl's Law

Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected

Quantitative Design

tEnhancemenWithoutePerformanc

tEnhancemenWithePerformanc

tEnhancemenWithTimeExecution

tEnhancemenWithoutTimeExecutionESpeedup

__

__

___

___)(

Speedup due to enhancement E:

This fraction enhanced

Page 19: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

19

Quantitative Design

“Instruction Frequency”

Invest Resources where time is Spent!

CPI = (CPU Time * Clock Rate) / Instruction Count = Cycles / Instruction Count

n

iii ICPITimeCycleTimeCPU

1

**__

n

iii FCPICPI

1

* whereCountnInstructio

Ii

iF _

Number of instructions of type I.

Cycles Per Instruction

Page 20: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

20

Quantitative Design

Base Machine (Reg / Reg)

Op Freq Cycles CPI(i) (% Time)

ALU 50% 1 .5 (33%)

Load 20% 2 .4 (27%)

Store 10% 2 .2 (13%)

Branch 20% 2 .4 (27%)

Total CPI 1.5

Suppose we have a machine where we can count the frequency with which instructions are executed. We also know how many cycles it takes for each instruction type.

Cycles Per Instruction

Page 21: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

21

Quantitative Design

Locality of Reference

Programs access a relatively small portion of the address space at any instant of time.

There are two different types of locality:

Temporal Locality (locality in time): If an item is referenced, it will tend to be referenced again soon (loops, reuse, etc.)

Spatial Locality (locality in space/location): If an item is referenced, items whose addresses are close by tend to be referenced soon (straight line code, array access, etc.)

Page 22: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

22

The Concept of Memory Hierarchy

1.1 Introduction

1.2 The Task of a Computer Designer

1.3 Technology and Computer Usage Trends

1.4 Cost and Trends in Cost

1.5 Measuring and Reporting Performance

1.6 Quantitative Principles of Computer Design

1.7 Putting It All Together: The Concept of Memory Hierarchy

Fast memory is expensive.

Slow memory is cheap.

The goal is to minimize the price/performance for a particular price point.

Page 23: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

23

Memory Hierarchy

RegistersLevel 1 cache

Level 2Cache

Memory Disk

Typical Size

4 - 64 <16K bytes <2 Mbytes <16 Gigabytes

> 5 Gigabytes

Access Time

1 nsec 3 nsec 15 nsec 150 nsec 5,000,000 nsec

Bandwidth (in MB/sec)

10,000 – 50,000

2000 - 5000 500 - 1000 500 - 1000 100

Managed By

Compiler Hardware Hardware OS OS/User

Page 24: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

24

Memory Hierarchy

• Hit: data appears in some block in the upper level (example: Block X)

– Hit Rate: the fraction of memory access found in the upper level– Hit Time: Time to access the upper level which consists of

RAM access time + Time to determine hit/miss• Miss: data needs to be retrieve from a block in the lower level

(Block Y)– Miss Rate = 1 - (Hit Rate)– Miss Penalty: Time to replace a block in the upper level +

Time to deliver the block the processor• Hit Time << Miss Penalty (500 instructions on 21264!)

Page 25: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

25

Memory Hierarchy

RegistersLevel 1 cache

Level 2Cache

Memory Disk

What is the cost of executing a program if:• Stores are free (there’s a write pipe)• Loads are 20% of all instructions• 80% of loads hit (are found) in the Level 1 cache• 97 of loads hit in the Level 2 cache.

Page 26: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

26

The Instruction Set2.1 Introduction

2.2 Classifying Instruction Set Architectures

2.3 Memory Addressing

2.4 Operations in the Instruction Set

2.5 Type and Size of Operands

2.6 Encoding and Instruction Set

2.7 The Role of Compilers

2.8 The MIPS Architecture

Bonus

Page 27: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

27

IntroductionThe Instruction Set Architecture is that portion of the machine visible

to the assembly level programmer or to the compiler writer.

1. What are the advantages and disadvantages of various instruction set alternatives.

2. How do languages and compilers affect ISA.

3. Use the DLX architecture as an example of a RISC architecture.

instruction set

software

hardware

Page 28: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

28

Classifying Instruction Set Architectures

Classifications can be by:

1. Stack/accumulator/register

2. Number of memory operands.

3. Number of total operands.

2.1 Introduction

2.2 Classifying Instruction Set Architectures

2.3 Memory Addressing

2.4 Operations in the Instruction Set

2.5 Type and Size of Operands

2.6 Encoding and Instruction Set

2.7 The Role of Compilers

2.8 The DLX Architecture

Page 29: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

29

Instruction Set Architectures

Accumulator:

1 address add A acc acc + mem[A]

1+x address addx A acc acc + mem[A + x]

Stack:

0 address add tos tos + next

General Purpose Register:

2 address add A B EA(A) EA(A) + EA(B)

3 address add A B C EA(A) EA(B) + EA(C)

Load/Store:

0 Memory load R1, Mem1

load R2, Mem2

add R1, R2

1 Memory add R1, Mem2

Basic ISA Classes

ALU Instructions can have two or three operands.

ALU Instructions can have 0, 1, 2, 3 operands. Shown here are cases of 0 and 1.

Page 30: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

30

Instruction Set Architectures

Basic ISA Classes

Stack Accumulator Register

(Register-memory)

Register

(load-store)

Push A Load A Load R1, A Load R1, A

Push B Add B Add R1, B Load R2, B

Add Store C Store C, R1 Add R3, R1, R2

Pop C Store C, R3

The results of different address classes is easiest to see with the examples here, all of which implement the sequences for C = A + B.

Registers are the class that won out. The more registers on the CPU, the better.

Page 31: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

31

Instruction Set Architectures

Intel 80x86 Integer Registers

GPR0 EAX Accumulator

GPR1 ECX Count register, string, loop

GPR2 EDX Data Register; multiply, divide

GPR3 EBX Base Address Register

GPR4 ESP Stack Pointer

GPR5 EBP Base Pointer – for base of stack seg.

GPR6 ESI Index Register

GPR7 EDI Index Register

CS Code Segment Pointer

SS Stack Segment Pointer

DS Data Segment Pointer

ES Extra Data Segment Pointer

FS Data Seg. 2

GS Data Seg. 3

PC EIP Instruction Counter

Eflags Condition Codes

Page 32: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

32

Memory Addressing

Sections Include:

Interpreting Memory Addresses

Addressing Modes

Displacement Address Mode

Immediate Address Mode

2.1 Introduction

2.2 Classifying Instruction Set Architectures

2.3 Memory Addressing

2.4 Operations in the Instruction Set

2.5 Type and Size of Operands

2.6 Encoding and Instruction Set

2.7 The Role of Compilers

2.8 The DLX Architecture

Page 33: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

33

Memory Addressing

What object is accessed as a function of the address and length?

Objects have byte addresses – an address refers to the number of bytes counted from the beginning of memory.

Little Endian – puts the byte whose address is xx00 at the least significant position in the word.

Big Endian – puts the byte whose address is xx00 at the most significant position in the word.

Alignment – data must be aligned on a boundary equal to its size. Misalignment typically results in an alignment fault that must be handled by the Operating System.

Interpreting Memory Addresses

Page 34: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

34

Memory Addressing

Addressing Modes

This table shows the most common modes.

Addressing Mode Example Instruction

Meaning When Used

Register Add R4, R3 R[R4] <- R[R4] + R[R3] When a value is in a register.

Immediate Add R4, #3 R[R4] <- R[R4] + 3 For constants.

Displacement Add R4, 100(R1) R[R4] <- R[R4] +

M[100+R[R1] ]

Accessing local variables.

Register Deferred Add R4, (R1) R[R4] <- R[R4] +

M[R[R1] ]

Using a pointer or a computed address.

Absolute Add R4, (1001) R[R4] <- R[R4] + M[1001] Used for static data.

Page 35: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

35

Memory Addressing

Displacement Addressing Mode

How big should the displacement be?

For addresses that do fit in displacement size:Add R4, 10000 (R0)

For addresses that don’t fit in displacement size, the compiler must do the following:

Load R1, addressAdd R4, 0 (R1)

Depends on typical displaces as to how big this should be.

On both IA32 and DLX, the space allocated is 16 bits.

Page 36: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

36

Memory Addressing

Immediate Address Mode

Used where we want to get to a numerical value in an instruction.

At high level:

a = b + 3;

if ( a > 17 )

goto Addr

At Assembler level:

Load R2, 3Add R0, R1, R2

Load R2, 17CMPBGT R1, R2

Load R1, AddressJump (R1)

Page 37: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

37

Operations In The Instruction Set

Sections Include:

Detailed information about types of instructions.

Instructions for Control Flow (conditional branches, jumps)

2.1 Introduction

2.2 Classifying Instruction Set Architectures

2.3 Memory Addressing

2.4 Operations in the Instruction Set

2.5 Type and Size of Operands

2.6 Encoding and Instruction Set

2.7 The Role of Compilers

2.8 The DLX Architecture

Page 38: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

38

Operations In The Instruction Set

Arithmetic and logical and, add

Data transfer move, load

Control branch, jump, call

System system call, traps

Floating point add, mul, div, sqrt

Decimal add, convert

String move, compare

Multimedia - 2D, 3D? e.g., Intel MMX and Sun VIS

Operator Types

Page 39: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

39

Operations In The Instruction Set

Control Instructions Issues:• taken or not• where is the target • link return address• save or restore

Instructions that change the PC:• (conditional) branches, (unconditional) jumps• function calls, function returns• system calls, system returns

Control InstructionsConditional branches are 20%

of all instructions!!

Page 40: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

40

Operations In The Instruction Set

There are numerous tradeoffs:

Compare and branch

+ no extra compare, no state passed between instructions

-- requires ALU op, restricts code scheduling opportunities

Implicitly set condition codes Z, N, V, C

+ can be set ``for free''

-- constrains code reordering, extra state to save/restore

Explicitly set condition codes

+ can be set ``for free'', decouples branch/fetch from pipeline

-- extra state to save/restore

Control Instructions

There are numerous tradeoffs:

condition in general purpose register + no special state but uses up a register -- branch condition separate from branch

logic in pipeline some data for MIPS

> 80% branches use immediate data, > 80% of those zero

50% branches use == 0 or <> 0 compromise in MIPS

branch==0, branch<>0 compare instructions for all other

compares

Page 41: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

41

Operations In The Instruction Set

Link Return Address:

implicit register many recent architectures use this

+ fast, simple

-- s/w save register before next call, surprise traps?

explicit register + may avoid saving register

-- register must be specified

processor stack + recursion direct

-- complex instructions

Control Instructions

Save or restore state:

What state? function calls: registers system calls: registers, flags, PC, PSW, etc

Hardware need not save registers Caller can save registers in use

Callee save registers it will use Hardware register save

IBM STM, VAX CALLS Faster?

Many recent architectures do no register saving

Or do implicit register saving with register windows (SPARC)

Page 42: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

42

Type And Size of Operands

The type of the operand is usually encoded in the Opcode – a LDW implies loading of a word.

Common sizes are:Character (1 byte)

Half word (16 bits)

Word (32 bits)

Single Precision Floating Point (1 Word)

Double Precision Floating Point (2 Words)

Integers are two’s complement binary.

Floating point is IEEE 754.

Some languages (like COBOL) use packed decimal.

2.1 Introduction

2.2 Classifying Instruction Set Architectures

2.3 Memory Addressing

2.4 Operations in the Instruction Set

2.5 Type and Size of Operands

2.6 Encoding and Instruction Set

2.7 The Role of Compilers

2.8 The DLX Architecture

Page 43: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

43

Encoding And Instruction SetThis section has to do with how an assembly level instruction is encoded into binary.

Ultimately, it’s the binary that is read and interpreted by the machine.

2.1 Introduction

2.2 Classifying Instruction Set Architectures

2.3 Memory Addressing

2.4 Operations in the Instruction Set

2.5 Type and Size of Operands

2.6 Encoding and Instruction Set

2.7 The Role of Compilers

2.8 The DLX Architecture

Page 44: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

44

Encoding And Instruction Set

80x86 Instruction Encoding

for ( index = 0; index < iterations; index++ )

0040D3AF C7 45 F0 00 00 00 00 mov dword ptr [ebp-10h],0

0040D3B6 EB 09 jmp main+0D1h (0040d3c1)

0040D3B8 8B 4D F0 mov ecx,dword ptr [ebp-10h]

0040D3BB 83 C1 01 add ecx,1

0040D3BE 89 4D F0 mov dword ptr [ebp-10h],ecx

0040D3C1 8B 55 F0 mov edx,dword ptr [ebp-10h]

0040D3C4 3B 55 F8 cmp edx,dword ptr [ebp-8]

0040D3C7 7D 15 jge main+0EEh (0040d3de)

long_temp = (*alignment + long_temp) % 47;

0040D3C9 8B 45 F4 mov eax,dword ptr [ebp-0Ch]

0040D3CC 8B 00 mov eax,dword ptr [eax]

0040D3CE 03 45 EC add eax,dword ptr [ebp-14h]

0040D3D1 99 cdq

0040D3D2 B9 2F 00 00 00 mov ecx,2Fh

0040D3D7 F7 F9 idiv eax,ecx

0040D3D9 89 55 EC mov dword ptr [ebp-14h],edx

0040D3DC EB DA jmp main+0C8h (0040d3b8)

Here’s some sample code that’s been disassembled.

It was compiled with the debugger option so is not

optimized.

This code was

produced using Visual

Studio

Page 45: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

45

Encoding And Instruction Set

80x86 Instruction Encoding

for ( index = 0; index < iterations; index++ )

00401000 8B 0D 40 54 40 00 mov ecx,dword ptr ds:[405440h]

00401006 33 D2 xor edx,edx

00401008 85 C9 test ecx,ecx

0040100A 7E 14 jle 00401020

0040100C 56 push esi

0040100D 57 push edi

0040100E 8B F1 mov esi,ecx

long_temp = (*alignment + long_temp) % 47;

00401010 8D 04 11 lea eax,[ecx+edx]

00401013 BF 2F 00 00 00 mov edi,2Fh

00401018 99 cdq

00401019 F7 FF idiv eax,edi

0040101B 4E dec esi

0040101C 75 F2 jne 00401010

0040101E 5F pop edi

0040101F 5E pop esi

00401020 C3 ret

Here’s some sample code that’s been disassembled.

It was compiled with optimization

This code was

produced using Visual

Studio

Page 46: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

46

Encoding And Instruction Set

80x86 Instruction Encoding

for ( index = 0; index < iterations; index++ )

0x804852f <main+143>: add $0x10,%esp

0x8048532 <main+146>: lea 0xfffffff8(%ebp),%edx

0x8048535 <main+149>: test %esi,%esi

0x8048537 <main+151>: jle 0x8048543 <main+163>

0x8048539 <main+153>: mov %esi,%eax

0x804853b <main+155>: nop

0x804853c <main+156>: lea 0x0(%esi,1),%esi

long_temp = (*alignment + long_temp) % 47;

0x8048540 <main+160>: dec %eax

0x8048541 <main+161>: jne 0x8048540 <main+160>

0x8048543 <main+163>: add $0xfffffff4,%esp

Here’s some sample code that’s been disassembled.

It was compiled with optimization

This code was

produced using gcc and gdb.

Note that the representation of the code is dependent on the compiler/debugger!

Page 47: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

47

Encoding And Instruction Set

80x86 Instruction Encoding

RegADD Disp.

34 8

postbyteSHL

6 8

V/w

2

Disp.

8

TEST

7

W

1

postbyte

8

Immediate

8

W

1 A Morass of disjoint encoding!!

Page 48: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

48

Encoding And Instruction Set

80x86 Instruction Encoding

CALLF Offset Segment Number

CondJE Disp.

44

8 16 16

8

postbyteMOV

6 8

D/w

2

Disp.

8

PUSH

5

Reg

3

Page 49: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

49

The Role of Compilers

Compiler goals: • All correct programs execute

correctly • Most compiled programs

execute fast (optimizations) • Fast compilation • Debugging support

2.1 Introduction

2.2 Classifying Instruction Set Architectures

2.3 Memory Addressing

2.4 Operations in the Instruction Set

2.5 Type and Size of Operands

2.6 Encoding and Instruction Set

2.7 The Role of Compilers

2.8 The DLX Architecture

Page 50: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

50

The Role of Compilers

Steps In Compilation

Parsing > intermediate representation Jump Optimization Loop Optimizations Register Allocation Code Generation > assembly code Common Sub Expression Procedure in-lining Constant Propagation Strength Reduction Pipeline Scheduling

Page 51: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

51

The Role of Compilers

Steps In Compilation

Optimization Name

Explanation % of the total number of optimizing

transformations

High Level At or near the source level; machine-independent

Not Measured

Local Within Straight Line Code 40%

Global Across A Branch 42%

Machine Dependent Depends on Machine Knowledge Not Measured

Page 52: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

52

The Role of Compilers

What compiler writers want:

• regularity • orthogonality • composability

Compilers perform a giant case analysis

• too many choices make it hard

Orthogonal instruction sets • operation, addressing mode,

data type

One solution or all possible solutions • 2 branch conditions eq, lt • or all six eq, ne, lt, gt, le, ge • not 3 or 4

There are advantages to having instructions that are primitives.

Let the compiler put the instructions together to make more complex sequences.

Page 53: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

53

The MIPS Architecture

MIPS is very RISC oriented.

MIPS will be used for many examples throughout the course.

2.1 Introduction

2.2 Classifying Instruction Set Architectures

2.3 Memory Addressing

2.4 Operations in the Instruction Set

2.5 Type and Size of Operands

2.6 Encoding and Instruction Set

2.7 The Role of Compilers

2.8 The MIPS Architecture

MIPS (originally an acronym for Microprocessor without Interlocked Pipeline Stages) is a RISC microprocessor architecture developed by MIPS Technologies. We will look at the Pipeline concept in our next lecture.

The acronym RISC (pronounced risk), for reduced instruction set computer represents a CPU design strategy emphasizing the insight that simplified instructions which "do less" may still provide for higher performance if this simplicity can be utilized to make instructions execute very fast. Well known RISC families include DEC Alpha, ARC, ARM, AVR, MIPS, PA-RISC, Power Architecture (including PowerPC), and SPARC.

Page 54: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

54

The MIPS Architecture

MIPS Characteristics

32 bit byte addresses aligned Load/store only displacement

addressing Standard data types 3 fixed length formats 32 32 bit GPRs (r0 = 0) 16 64 bit (32 32 bit) FPRs FP status register No Condition Codes

Data transfer • load/store word, load/store byte/half

word signed? • load/store FP single/double • moves between GPRs and FPRs ALU • add/subtract signed? immediate? • multiply/divide signed? • and, or, xor immediate?, shifts: ll, rl,

ra immediate? • sets immediate?

There’s MIPS – 64 – the current arch.Standard datatypes 4 fixed length formats (8,16,32,64)32 64 bit GPRs (r0 = 0) 64 64 bit FPRs

Addressing Modes• Immediate• Displacement • (Register Mode used only for ALU)

Page 55: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

55

The MIPS Architecture

MIPS Characteristics

Control • branches == 0, <> 0 • conditional branch testing FP bit • jump, jump register • jump & link, jump & link register • trap, return from exception

Floating Point• add/sub/mul/div • single/double • fp converts, fp set

Page 56: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

56

The DLX Architecture

The DLX is a RISC processor architecture design by the principal designers of the MIPS and the Berkeley RISC designs, the two benchmark examples of RISC design. The DLX is essentially a cleaned up and simplified MIPS with a simple 32-bit load/store architecture. Intended primarily for teaching purposes, the DLX design is widely used in university-level computer architecture courses.

The next couple of lectures will use the MIPS and DLX architectures as examples to demonstrate concepts.

Page 57: Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets

57

End of Lecture