cse 141 – computer architecture summer session i, 2005...

56
CSE 141 – Computer Architecture Summer Session I, 2005 Lecture 3 Performance and Single Cycle CPU Part 1 Pramod V. Argade

Upload: others

Post on 25-Mar-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

CSE 141 – Computer ArchitectureSummer Session I, 2005

Lecture 3Performance and

Single Cycle CPU Part 1Pramod V. Argade

Page 2: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

2Pramod Argade UCSD CSE 141, Spring 2005

CSE141: Introduction to Computer Architecture

Instructor: Pramod V. Argade ([email protected])Office Hour:

Tue. 7:30 - 9:00 (Center 105)Wed. 5:00 - 6:00 PM (HSS 1330)By Appointment

TAs:Chengmo Yang: [email protected] Rao: [email protected]

Lecture: Mon/Wed. 6:00 - 8:50 PM, HSS 1330

Textbook: Computer Organization & DesignThe Hardware Software Interface, 3nd Edition.Authors: Patterson and Hennessy

Web-page: http://www.cse.ucsd.edu/classes/su05/cse141

Page 3: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

3Pramod Argade UCSD CSE 141, Spring 2005

Summer Session I, 2005CSE141 Course Schedule

Lecture # Date Time Room Topic Quiz topic HomeworkDue

1 Mon. 6/27 6 - 8:50 PM HSS 1330 Introduction, Ch. 1ISA, Ch. 2 - -

2 Wed. 6/29 6 - 8:50 PM HSS 1330 Arithmetic, Ch. 3 ISACh. 2 #1

- Mon. 7/4 July 4th Holiday - -

3 Wed. 7/6 6 - 8:50 PM HSS 1330 Performance, Ch. 4Single-cycle CPU Ch. 5

ArithmeticCh. 3 #2

4 Mon. 7/11 6 - 8:50 PM HSS 1330 Single-cycle CPU Ch. 5 Cont.Multi-cycle CPU Ch. 5

PerformanceCh. 4 #3

5 Tue. 7/12 7:30 - 8:50 PM HSS 1330 Multi-cycle CPU Ch. 5 Cont.(July 4th make up class) - -

6 Wed. 7/13 6 - 8:50 PM HSS 1330 Single and Multicycle CPU Examples andReview for Midterm

Single-cycle CPUCh. 5

-

7 Mon. 7/18 6 - 8:50 PM HSS 1330 Mid-term ExamExceptions - #4

8 Tue. 7/19 7:30 - 8:50 PM HSS 1330 Pipelining Ch. 6(July 4th make up class) - -

9 Wed. 7/20 6 - 8:50 PM HSS 1330 Hazards, Ch. 6 - -

10 Mon. 7/25 6 - 8:50 PM HSS 1330 Memory Hierarchy & Caches Ch. 7 HazardsCh. 6 #5

11 Wed. 7/27 6 - 8:50 PM HSS 1330 Virtual Memory, Ch. 7Course Review

CacheCh. 7 #6

12 Sat. 7/30 TBD TBD Final Exam - -

No Class

Page 4: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

4Pramod Argade UCSD CSE 141, Spring 2005

AnnouncementsDiscussions Sections for 141: Thursday, 7:30 - 8:30 WLH-2205

Office Hour:– Tue. 7:30 - 9:00 (Center 105)– Wed. 5:00 - 6:00 PM (HSS 1330)– By Appointment

Reading Assignment– Chapter 4. Assessing and Understanding Performance

Sections 4.1 - 4.6Homework 3: Due Mon., July 11th in class4.1, 4.2, 4.6, 4.7, 4.8, 4.9, 4.12, 4.19, 4.22, 4.45

QuizWhen: Mon., July 11th, First 10 minutes of the classTopic: Performance, Chapter 4 Need: Paper, pen

Page 5: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

5Pramod Argade UCSD CSE 141, Spring 2005

Performance Assessment

Example definitions of performance– Passengers * MPH– Cruising speed– Passenger capacity– (Passenger * MPH)/$ need more data!

Clearly formulate your definition of performance

Airplane PassengerCapacity

Cruising Range(miles)

Cruising Speed(m.p.h.)

Passenger Throughput(passenger x

m.p.h.)Boeing 777 375 4630 610 228,750Boeing 747 470 4150 610 286,700

Douglas DC-8-50 146 8720 544 79,424Airbus A380 555 8000 645 357,975

Page 6: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

6Pramod Argade UCSD CSE 141, Spring 2005

Why worry about performance?

Learn to measure, report, and summarizeMake intelligent choicesSee through the marketing hype As a designer or purchaser, assess:– which system has best performance?– which system has lowest cost?– which system has highest performance/cost?

Formulate metrics for performance measurement– How to report relative performance?

Understand impact of architectural choices on performance

Page 7: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

7Pramod Argade UCSD CSE 141, Spring 2005

Response Time (latency)— How long does it take for my job to run?— How long does it take to execute a job?— How long must I wait for the database query?

Throughput— How many jobs can the machine run at once?— What is the average execution rate?— How much work is getting done?

If we upgrade a machine with a new processor what do we decrease?

If we add a new machine to the lab what do we increase?

Computer Performance: TIME, TIME, TIME

Page 8: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

8Pramod Argade UCSD CSE 141, Spring 2005

Elapsed Time (2:39)– Counts everything (disk and memory accesses, I/O , etc.)– A useful number, but often not good for comparison purposes

Total CPU time (103.6 s)– Doesn't count I/O or time spent running other programs– Comprised of system time (12.9 s), and user time (90.7 s)

Our focus: user CPU time – Time spent executing the lines of code that are "in" our program

Execution Time% time program... program results ...90.7u 12.9s 2:39 65%%

Page 9: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

9Pramod Argade UCSD CSE 141, Spring 2005

For some program running on machine X,PerformanceX = 1 / Execution timeX

“Machine X is n times faster than Y”n = PerformanceX / PerformanceY

= (Execution timeY) / (Execution timeX )Problem:Execution time: Machine A: 12 seconds, B: 20 seconds– A/B = .6 , so A is 40% faster, or 1.4X faster, or B is 40% slower– B/A = 1.67, so A is 67% faster, or 1.67X faster, or B is 67% slower

Need a precise definition“X is n times faster than Y”, n > 1

– A is 1.67 times faster than B

A Definition of Performance

Page 10: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

10Pramod Argade UCSD CSE 141, Spring 2005

Clock Cycles

Operation of conventional computer is controlled by Clock “ticks”

Clock rate (frequency) = cycles per second (Hertz)MHz = Million cycles per secondGHz = Giga (Billion) cycles per second

Cycle time = time between ticks = seconds per cycle– A 2 GHz clock => Cycle time = 1/(2x109) s = 0.5 nanoseconds

Instead of reporting execution time in seconds, we often use clock cycles

time

secondsprogram

=cycles

program×

secondscycle

Page 11: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

11Pramod Argade UCSD CSE 141, Spring 2005

Assumption: # of cycles = # of instructions

Assumption is incorrect– Typically, different instructions take different number of clock cycles

Integer Multiply instructions are multi-cycleFloating point instructions are multi-cycle

time

1st i

nstru

ctio

n

2nd

inst

ruct

ion

3rd

inst

ruct

ion

4th

5th

6th ...

How many cycles are required for a program?

1st 2nd 5th3rd 4th

time

Page 12: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

12Pramod Argade UCSD CSE 141, Spring 2005

A given program will require– some number of instructions (machine instructions)

– some number of cycles

– some number of seconds

We have a vocabulary that relates these quantities:– cycle time (seconds per cycle)

– clock rate (cycles per second)

– CPI (cycles per instruction)

Now that we understand cycles

Page 13: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

13Pramod Argade UCSD CSE 141, Spring 2005

A given program will require– some number of instructions (machine instructions)

– some number of cycles

– some number of seconds

We have a vocabulary that relates these quantities:– cycle time (seconds per cycle)

– clock rate (cycles per second)

– CPI (cycles per instruction)

Now that we understand cycles

CPU time = Seconds = Instructions x Cycles x SecondsProgram Program Instruction Cycle

CPU time = Seconds = Instructions x Cycles x SecondsProgram Program Instruction Cycle

Page 14: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

14Pramod Argade UCSD CSE 141, Spring 2005

Suppose we have two implementations of the same instruction set architecture (ISA).For some program,– Machine A has a clock cycle time of 10 ns. and a CPI of 2.0– Machine B has a clock cycle time of 20 ns. and a CPI of 1.2

What machine is faster for this program, and by how much?

CPI Example 2.1

Page 15: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

15Pramod Argade UCSD CSE 141, Spring 2005

A compiler designer is trying to decide between two code sequences for a particular machine with 3 instruction classes. – CPI: Class A = 1, Class B = 2, and Class C = 3– Code sequence X has 5 instructions:

2 of A, 1 of B, and 2 of C– Code sequence Y has 6 instructions:

4 of A, 1 of B, and 1 of C.

Which sequence will be faster? How much?

Compiler Optimization: Example 2.2

Page 16: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

16Pramod Argade UCSD CSE 141, Spring 2005

Execution time: Contributing Factors

CPU time = Seconds = Instructions x Cycles x SecondsProgram Program Instruction Cycle

CPU time = Seconds = Instructions x Cycles x SecondsProgram Program Instruction Cycle

Improve performance => reduce execution time– Reduce instruction count (Programmer, Compiler) – Reduce cycles per instruction (ISA, Machine designer)– Reduce clock cycle time (Hardware designer, Process engineer)

InstructionCount CPI Clock

RateProgram XCompiler X (X)ISA X XOrganization X XTechnology X

Page 17: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

17Pramod Argade UCSD CSE 141, Spring 2005

Performance

Performance is determined by program execution timeDo any of the other variables equal performance?– # of cycles to execute program?– # of instructions in program?– # of cycles per second?– average # of cycles per instruction?– average # of instructions per second?

Common pitfall: thinking one of the variables is indicative of performance when it really isn’t.

Page 18: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

18Pramod Argade UCSD CSE 141, Spring 2005

Performance Variation

Number ofinstructions

CPI Clock CycleTime

Same machine differentprograms

different similar same

same programs, differentmachines, same ISA

same different different

Same programs,different machines

somewhatdifferent

different different

CPU Execution Time

Instruction Count

CPI Clock Cycle Time= X X

Page 19: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

19Pramod Argade UCSD CSE 141, Spring 2005

Amdahl’s Law

The impact of a performance improvement is limited by the percent of execution time affected by the improvement

Make the common case fast!Amdahl’s law sets limit on how much improvement can be made

Execution timeafter improvement

=Execution Time Affected

Amount of ImprovementExecution Time Unaffected+

Page 20: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

20Pramod Argade UCSD CSE 141, Spring 2005

Amdahl’s Law: Example 2.3Example: A program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?

Is it possible to get the program to run 5 times faster?

Page 21: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

21Pramod Argade UCSD CSE 141, Spring 2005

Metrics of Performance

Compiler

Programming Language

Application

DatapathControl

TransistorsWiresPins

ISA

Function Units

(millions) of Instructions per second – MIPS(millions) of (F.P.) operations per second –MFLOP/s

Cycles per second (clock rate)

Megabytes per second

Answers per month

Useful Operations per second

Each metric has a place and a purpose, and each can be misused

Page 22: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

22Pramod Argade UCSD CSE 141, Spring 2005

Performance best determined by running a real application– Use programs typical of expected workload– Or, typical of expected class of applications

e.g., compilers/editors, scientific applications, graphics, etc.

Small benchmarks– nice for architects and designers– easy to standardize– can be abused

SPEC (System Performance Evaluation Cooperative)– companies have agreed on a set of real program and inputs– can still be abused (Intel’s “other” bug)– valuable indicator of performance (and compiler technology)

Benchmarks

Page 23: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

23Pramod Argade UCSD CSE 141, Spring 2005

SPEC ‘89

Compiler “enhancements” and performance

0

100

200

300

400

500

600

700

800

tomcatvfppppmatrix300eqntottlinasa7doducspiceespressogcc

BenchmarkCompiler

Enhanced compiler

SP

EC p

erfo

rman

ce ra

tio

Page 24: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

24Pramod Argade UCSD CSE 141, Spring 2005

SPEC ‘95*Benchmark Descriptiongo Artificial intelligence; plays the game of Gom88ksim Motorola 88k chip simulator; runs test programgcc The Gnu C compiler generating SPARC codecompress Compresses and decompresses file in memoryli Lisp interpreterijpeg Graphic compression and decompressionperl Manipulates strings and prime numbers in the special-purpose programming language Perlvortex A database programtomcatv A mesh generation programswim Shallow water model with 513 x 513 gridsu2cor quantum physics; Monte Carlo simulationhydro2d Astrophysics; Hydrodynamic Naiver Stokes equationsmgrid Multigrid solver in 3-D potential fieldapplu Parabolic/elliptic partial differential equationstrub3d Simulates isotropic, homogeneous turbulence in a cubeapsi Solves problems regarding temperature, wind velocity, and distribution of pollutantfpppp Quantum chemistrywave5 Plasma physics; electromagnetic particle simulation

SPEC CPU2000 consists of 12 integer and 14 floating point benchmarks

Page 25: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

25Pramod Argade UCSD CSE 141, Spring 2005

SPEC ‘95

Does doubling the clock rate double the performance?Can a machine with a slower clock rate have better performance?

Clock rate (MHz)

SPE

Cin

t

2

0

4

6

8

3

1

5

7

9

10

200 25015010050

Pentium

Pentium Pro

PentiumClock rate (MHz)

SP

EC

fp

Pentium Pro

2

0

4

6

8

3

1

5

7

9

10

200 25015010050

Page 26: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

26Pramod Argade UCSD CSE 141, Spring 2005

MIPS, MFLOPS etc.

• Hard to relate MIPS/MFLOPS with execution time

• Highly program-dependent metrics• Deceptive (See pages 78 - 79 in the text)

• MIPS would be higher for a program using simple instructions (e.g. a loop of NOPs!)

• MIPS - million instructions per second

number of instructions executed in program Clock rateexecution time in seconds * 106 CPI * 106

• MFLOPS - million floating point operations per secondnumber of floating point operations executed in program

execution time in seconds * 106

= =

=

CPU Execution Time

Instruction Count

CPI Clock Cycle Time= X X

Page 27: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

27Pramod Argade UCSD CSE 141, Spring 2005

RISC Processor: Example 2.4Base Machine (Reg / Reg)Op Freq Cycles CPI(i) % TimeALU 50% 1Load 20% 5Store 10% 3Branch 20% 2

• What is average CPI?• What percentage of time is spent in each instruction class?• How much faster would the machine be if a better data cache reduced the average load time to 2 cycles?

• How does this compare with using branch prediction to shave a cycle off the branch time?

• What if two ALU instructions could be executed at once?

Page 28: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

28Pramod Argade UCSD CSE 141, Spring 2005

Performance is specific to particular program(s)– Total execution time is a consistent summary of performance

For a given architecture performance increases come from:– increases in clock rate (without adverse CPI affects)– improvements in processor organization that lower CPI– compiler enhancements that lower CPI and/or instruction count

Pitfall: expecting improvement in one aspect of a machine’s

performance to affect the total performance

You should not always believe everything you read! Read

carefully!

Performance assessment is tricky

Page 29: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

29Pramod Argade UCSD CSE 141, Spring 2005

AnnouncementsDiscussions Sections for 141: Thursday, 7:30 - 8:30 WLH-2205

Office Hour:– Tue. 7:30 - 9:00 (Center 105)– Wed. 5:00 - 6:00 PM (HSS 1330)– By Appointment

Reading Assignment– Chapter 4. Assessing and Understanding Performance

Sections 4.1 - 4.6Homework 3: Due Mon., July 11th in class4.1, 4.2, 4.6, 4.7, 4.8, 4.9, 4.12, 4.19, 4.22, 4.45

QuizWhen: Mon., July 11th, First 10 minutes of the classTopic: Performance, Chapter 4 Need: Paper, pen

Page 30: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

30Pramod Argade UCSD CSE 141, Spring 2005

Datapath and Control Design

The Five Classic Components of a Computer

Control

Datapath

Memory

ProcessorInput

Output

Page 31: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

31Pramod Argade UCSD CSE 141, Spring 2005

Combinational – Elements that operate on data values– Produces same output if given same inputs

State Elements– Contains internal storage– May produce different output if given same inputs– Clock is used to determine when a state element(s) should be

written

Two Types of Logic Components

CombinationalLogic

A

BC = f(A,B)

StateElement

clk

A

BC = f(A,B,state)

Page 32: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

32Pramod Argade UCSD CSE 141, Spring 2005

Clock

Clock is a free running signal– Fixed cycle time (period)– Frequency = 1/(cycle time)– Duty Cycle: (% high)/(%low), e.g. 50/50 Duty Cycle below– Jitter: Uncertainty/wobble in rising or falling edge

Clock Cycle (Period)

Rising Edge Falling Edge

Page 33: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

33Pramod Argade UCSD CSE 141, Spring 2005

Storage ElementsD Latch• Two inputs:

– the data value to be stored (D)– the clock signal (C) indicating when to read & store D

• Two outputs:– the value of the internal state (Q) and it's complement

Q

C

D

_Q

D

C

Q

Falling edge triggered D flip-flop• Output changes only on the clock edge

QQ

_Q

Q

_Q

D latch

D

C

D latch

DD

C

C

D

C

Q

Page 34: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

34Pramod Argade UCSD CSE 141, Spring 2005

Edge-triggered ClockingValues stored in the machine are updated on a clock edge– The clock edge can be either rising or falling

By default a state element is written every clock edge– An explicit write control signal is required otherwise.

Edge triggered methodology allows, in the same clock cycle to:– read the contents of a register– send the value through some combinational logic, and – write the contents of the same or another register

Possible to have the same state element as input and output

Clock cycle

Stateelement

1Combinational logic

Stateelement

2

Clock cycle

Stateelement

1Combinational logic

Stateelement

1

Page 35: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

35Pramod Argade UCSD CSE 141, Spring 2005

CPU: Clocking

Clk

Don’t CareSetup Hold

.

.

.

.

.

.

.

.

.

.

.

.

Setup Hold

All storage elements are clocked by the same clock edge

CLK CLK

Page 36: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

36Pramod Argade UCSD CSE 141, Spring 2005

Register: A Storage Element– Similar to the D Flip Flop except

• N-bit input and output• Write Enable input

– Write Enable:• 0: Data Out will not change• 1: Data Out will become Data In (on the clock

edge)

Clk

Data In

Write Enable

N N

Data Out

Page 37: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

37Pramod Argade UCSD CSE 141, Spring 2005

Register FileRegister File consists of (32) registers:– Two 32-bit output busses: busA and busB– One 32-bit input bus: busW

Register is selected by:– RA selects the register to put on busA– RB selects the register to put on busB– RW selects the register to be written

via busW when Write Enable is 1

Clock input (CLK)

Clk

busW

Write Enable

3232

busA

32busB

5 5 5RW RA RB

32 32-bitRegisters

Page 38: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

38Pramod Argade UCSD CSE 141, Spring 2005

Memory

Memory– One input bus: Data In– One output bus: Data Out

Memory word is selected by:– Address selects the word to put on Data Out– Write Enable = 1: address selects the memory word to be written

via the Data In bus

Clock input (CLK) – The CLK input is a factor ONLY during write operation– During read operation, behaves as a combinational logic block:

Address valid => Data Out valid after “access time.”

Clk

Data In

Write Enable

32 32DataOut

Address

Page 39: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

39Pramod Argade UCSD CSE 141, Spring 2005

Basic 4 x 2 Static RAM

Dlatch Q

D

C

Enable

Dlatch Q

D

C

Enable

Dlatch Q

D

C

Enable

Dlatch Q

D

C

Enable

Dlatch Q

D

C

Enable

Dlatch Q

D

C

Enable

Dlatch Q

D

C

Enable

Dlatch Q

D

C

Enable

2-to-4decoder

Write enable

Address

Din[0]Din[1]

Dout[1] Dout[0]

0

1

2

3

Page 40: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

40Pramod Argade UCSD CSE 141, Spring 2005

Register Transfer Language (RTL)

Mechanism for describing the movement and manipulation of of data between storage elements.Example:R[rd] R[rs] + R[rt]PC PC + 4R[rt] Mem[R[rs] + immediate]

Page 41: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

41Pramod Argade UCSD CSE 141, Spring 2005

A Simple Implementation of MIPS CPUSimplified to contain only:– Memory-reference instructions: lw, sw– Arithmetic-logical instructions: add, sub, and, or, slt

– Control flow instructions: beq, j

Execution Time = Instructions * CPI * Cycle TimeProcessor design (datapath and control) will determine:– Clock cycle time– Clock cycles per instruction

We will design a single cycle processor:– Advantage: One clock cycle per instruction– Disadvantage: long cycle time

Page 42: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

42Pramod Argade UCSD CSE 141, Spring 2005

Arithmetic Instructions (R-Type)

ADD, SUB, AND, OR, SLTExampleadd rd, rs, rt

e.g. add $t3, $s0, $s5REG[$t3] = REG[$s0] + REG[$s5]

op rs rt rd shamt funct061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

s0 s5 t3

Page 43: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

43Pramod Argade UCSD CSE 141, Spring 2005

Load/Store Instructions (I-Type)

LW, SWExampleslw rt, rs, imm16sw rt, rs, imm16

e.g. lw $s3, -4($s2)REG[$s3] = D-MEM[ REG[$s2] - 4 ]

op rs rt immediate016212631

6 bits 16 bits5 bits5 bits

s2 s3 -4

Page 44: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

44Pramod Argade UCSD CSE 141, Spring 2005

Branch (I-Type)

BeqExamplebeq rs, rt, imm16

e.g.0x4c beq $s1, $t3, -12if( REG[$s1] == REG[$t3] ) {

new_PC = old_PC + 4 - 12 # new_PC = 0x44}else {

new_PC = old_PC + 4 # new_PC = 0x50}

op rs rt displacement016212631

6 bits 16 bits5 bits5 bits

s1 t3 -3

Page 45: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

45Pramod Argade UCSD CSE 141, Spring 2005

Jump (J-Type)

JExampleJ Label

e.g.0x8000 0000 j Label…

Label: 0x8444 4444 ADD $s3, $s5, $t1…

new_PC = 0x8444 4444

op target address02631

6 bits 26 bits

0x111 1111

Page 46: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

46Pramod Argade UCSD CSE 141, Spring 2005

Single Cycle Implementation Datapath and Control

InstructionFetch

InstructionDecode

OperandFetch

Execute

ResultStore

NextInstruction

I. Fe

tch

Dec

ode

Op.

Fet

ch

Exec

ute

Stor

e

Nex

t PC

Clock Cycle

Complete Execution of a Single Instruction

Page 47: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

47Pramod Argade UCSD CSE 141, Spring 2005

Components Required to implement the ISANext PC generation– Add 4 or extended 16-bit immediate to PC

Memory– Instruction read– Data read/write

Registers (32 x 32-bit)– Read register rs– Read register rt– Write register rt or rd

Sign extend immediate operandALU to operate on the operands

Page 48: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

48Pramod Argade UCSD CSE 141, Spring 2005

Abstract / Simplified View:

Datapath

RegistersRegister #

Data

Register #

Data memory

Address

Data

Register #

PC Instruction ALU

Instruction memory

Address

Page 49: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

49Pramod Argade UCSD CSE 141, Spring 2005

CPU: Instruction Fetch

• RTL version of the instruction fetch step: • Fetch the Instruction: mem[PC]– Update the program counter:

• Sequential Code: PC PC + 4 • Branch and Jump: PC “something else”

32Instruction Word

Address

InstructionMemory

PCClk

4

Adder

Page 50: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

50Pramod Argade UCSD CSE 141, Spring 2005

CPU: Register-Register Operations (Add, Subtract etc.)

R[rd] R[rs] op R[rt] Example: addU rd, rs, rt– Ra, Rb, and Rw come from instruction’s rs, rt, and rd fields– ALUctr and RegWr: control logic after decoding the instruction

32Result

ALUctr

Clk

busW

RegWr

3232

busA

32busB

5 5 5

Rw Ra Rb32 32-bitRegisters

Rs RtRdA

LU

op rs rt rd shamt funct061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

Page 51: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

51Pramod Argade UCSD CSE 141, Spring 2005

CPU: Load Operations

R[rt] Mem[R[rs] + SignExt[imm16]] Example: lw rt, rs, imm16

op rs rt immediate016212631

6 bits 16 bits5 bits5 bits

Clk

busW

RegWr

3232

busA

32busB

5 5 5

Rw Ra Rb32 32-bitRegisters

Rs

RtRd

RegDst

Extender

Mux

Mux

3216

imm16

ALUSrc

ExtOp

Clk

Data InWrEn

32

Adr

DataMemory

32

32

ALUctr

ALU

MemWr Mux

W_Src

Rt

Page 52: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

52Pramod Argade UCSD CSE 141, Spring 2005

CPU: Store Operations

Mem[ R[rs] + SignExt[imm16] R[rt] ] Example: sw rt, rs, imm16

32

ALUctr

Clk

busW

RegWr

3232

busA

32busB

55 5

Rw Ra Rb32 32-bitRegisters

Rs

Rt

Rt

RdRegDst

Extender

Mux

Mux

3216imm16

ALUSrcExtOp

Clk

Data InWrEn

32Adr

DataMemory

MemWr

ALU

op rs rt immediate016212631

6 bits 16 bits5 bits5 bits

32

Mux

W_Src

Page 53: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

53Pramod Argade UCSD CSE 141, Spring 2005

CPU: Datapath for Branching

beq rs, rt, imm16 Datapath generates condition (equal)

op rs rt immediate016212631

6 bits 16 bits5 bits5 bits

32

imm16

PC

Clk

00

Adder

Mux

Adder

4nPC_sel

Clk

busW

RegWr

32

busA

32busB

5 5 5

Rw Ra Rb32 32-bitRegisters

Rs Rt

Equa

l?

Cond

PC Ext

Instruction Address

Sign extend to 32 bits and left shift by 2

Page 54: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

54Pramod Argade UCSD CSE 141, Spring 2005

CPU: Binary arithmetic for PC

• In theory, the PC is a 32-bit byte address into the instruction memory:– Sequential operation: PC<31:0> = PC<31:0> + 4– Branch operation: PC<31:0> = PC<31:0> + 4 + SignExt[Imm16] * 4

• The magic number “4” always comes up because:– The 32-bit PC is a byte address– And all our instructions are 4 bytes (32 bits) long

• In other words:– The 2 LSBs of the 32-bit PC are always zeros– There is no reason to have hardware to keep the 2 LSBs

• In practice, we can simplify the hardware by using a 30-bit PC<31:2>:– Sequential operation: PC<31:2> = PC<31:2> + 1– Branch operation: PC<31:2> = PC<31:2> + 1 + SignExt[Imm16]– In either case: Instruction Memory Address = PC<31:2> concat “00”

Page 55: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

55Pramod Argade UCSD CSE 141, Spring 2005

Single Cycle Implementation

Putting it all together– Supports Arithmetic, Load/Store and Branch instructions

MemtoReg

MemRead

MemWrite

ALUOp

ALUSrc

RegDst

PC

Instruction memory

Read address

Instruction [31– 0]

Instruction [20– 16]

Instruction [25– 21]

Add

Instruction [5– 0]

RegWrite

4

16 32Instruction [15– 0]

0Registers

Write registerWrite data

Write data

Read data 1

Read data 2

Read register 1Read register 2

Sign extend

ALU result

Zero

Data memory

Address Read data M

u x

1

0

M u x

1

0

M u x

1

0

M u x

1

Instruction [15– 11]

ALU control

Shift left 2

PCSrc

ALU

Add ALU result

Page 56: CSE 141 – Computer Architecture Summer Session I, 2005 ...cseweb.ucsd.edu/classes/su05/cse141/su05_03.pdfPramod Argade UCSD CSE 141, Spring 2005 3 Summer Session I, 2005 CSE141 Course

56Pramod Argade UCSD CSE 141, Spring 2005

AnnouncementsDiscussions Sections for 141: Thursday, 7:30 - 8:30 WLH-2205

Office Hour:– Tue. 7:30 - 9:00 (Center 105)– Wed. 5:00 - 6:00 PM (HSS 1330)– By Appointment

Reading Assignment– Chapter 4. Assessing and Understanding Performance

Sections 4.1 - 4.6Homework 3: Due Mon., July 11th in class4.1, 4.2, 4.6, 4.7, 4.8, 4.9, 4.12, 4.19, 4.22, 4.45

QuizWhen: Mon., July 11th, First 10 minutes of the classTopic: Performance, Chapter 4 Need: Paper, pen