computer architecture lecture 3 basic fundamentals and instruction sets
TRANSCRIPT
Computer Architecture
Lecture 3
Basic Fundamentals
and
Instruction Sets
2
The Task of a Computer Designer
1.1 Introduction 1.2 The Task of a Computer
Designer 1.3 Technology and Computer
Usage Trends 1.4 Cost and Trends in Cost 1.5 Measuring and Reporting
Performance 1.6 Quantitative Principles of
Computer Design 1.7 Putting It All Together: The
Concept of Memory Hierarchy
Evaluate ExistingEvaluate ExistingSystems for Systems for BottlenecksBottlenecks
Simulate NewSimulate NewDesigns andDesigns and
OrganizationsOrganizations
Implement NextImplement NextGeneration SystemGeneration System
TechnologyTrends
Benchmarks
Workloads
ImplementationComplexity
3
Technology and Computer Usage Trends
1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer Design
1.7 Putting It All Together: The Concept of Memory Hierarchy
Similarly, Computer Architecture is about working within constraints:
• What will the market buy?
• Cost/Performance
• Tradeoffs in materials and processes
When building a Cathedral numerous very practical considerations need to be taken into account:
• available materials
• worker skills
• willingness of the client to pay the price.
4
TrendsGordon Moore (Founder of Intel) observed in 1965 that the number of
transistors that could be crammed on a chip doubles every year.
This has CONTINUED to be true since then.Transistors Per Chip
1.E+03
1.E+04
1.E+05
1.E+06
1.E+07
1.E+08
1970 1975 1980 1985 1990 1995 2000 2005
4004
Power PC 601486
386
80286
8086
Pentium
Pentium Pro
Pentium II
Power PC G3
Pentium 3
5
Measuring And Reporting Performance
1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer Design
1.7 Putting It All Together: The Concept of Memory Hierarchy
This section talks about:
1. Metrics – how do we describe in a numerical way the performance of a computer?
2. What tools do we use to find those metrics?
6
Metrics
• Time to run the task (ExTime)– Execution time, response time, latency
• Tasks per day, hour, week, sec, ns … (Performance)
– Throughput, bandwidth
Plane
Boeing 747
BAD/Sud Concodre
Speed
610 mph
1350 mph
DC to Paris
6.5 hours
3 hours
Passengers
470
132
Throughput (pmph)
286,700
178,200
7
Metrics - Comparisons
"X is n times faster than Y" means
ExTime(Y) Performance(X)
--------- = ---------------
ExTime(X) Performance(Y)
Speed of Concorde vs. Boeing 747
Throughput of Boeing 747 vs. Concorde
8
Metrics - ComparisonsPat has developed a new product, "rabbit" about which she wishes to determine
performance. There is special interest in comparing the new product, rabbit to the old product, turtle, since the product was rewritten for performance reasons. (Pat had used Performance Engineering techniques and thus knew that rabbit was "about twice as fast" as turtle.) The measurements showed:
Performance Comparisons
Product Transactions / second Seconds/ transaction Seconds to process transaction
Turtle 30 0.0333 3
Rabbit 60 0.0166 1
Which of the following statements reflect the performance comparison of rabbit and turtle?
o Rabbit is 100% faster than turtle.o Rabbit is twice as fast as turtle.o Rabbit takes 1/2 as long as turtle.o Rabbit takes 1/3 as long as turtle.o Rabbit takes 100% less time than turtle.
o Rabbit takes 200% less time than turtle.o Turtle is 50% as fast as rabbit.o Turtle is 50% slower than rabbit.o Turtle takes 200% longer than rabbit.o Turtle takes 300% longer than rabbit.
9
Metrics - Throughput
Compiler
Programming Language
Application
DatapathControl
Transistors Wires Pins
ISA
Function Units
(millions) of Instructions per second: MIPS(millions) of (FP) operations per second: MFLOP/s
Cycles per second (clock rate)
Megabytes per second
Answers per monthOperations per second
10
Methods For Predicting Performance
• Benchmarks, Traces, Mixes• Hardware: Cost, delay, area, power estimation• Simulation (many levels)
– ISA, RT, Gate, Circuit• Queuing Theory• Rules of Thumb• Fundamental “Laws”/Principles
11
Benchmarks
• First Round 1989– 10 programs yielding a single number (“SPECmarks”)
• Second Round 1992– SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs)
• Compiler Flags unlimited. March 93 of DEC 4000 Model 610:
spice: unix.c:/def=(sysv,has_bcopy,”bcopy(a,b,c)=memcpy(b,a,c)”
wave5: /ali=(all,dcom=nat)/ag=a/ur=4/ur=200
nasa7: /norecu/ag=a/ur=4/ur2=200/lc=blas• Third Round 1995
– new set of programs: SPECint95 (8 integer programs) and SPECfp95 (10 floating point)
– “benchmarks useful for 3 years”– Single flag setting for all programs: SPECint_base95, SPECfp_base95
SPEC: System Performance Evaluation Cooperative
12
BenchmarksCINT2000 (Integer Component of SPEC CPU2000):
Program Language What Is It164.gzip C Compression
175.vpr C FPGA Circuit Placement and Routing
176.gcc C C Programming Language Compiler
181.mcf C Combinatorial Optimization
186.crafty C Game Playing: Chess
197.parser C Word Processing
252.eon C++ Computer Visualization
253.perlbmk C PERL Programming Language
254.gap C Group Theory, Interpreter
255.vortex C Object-oriented Database
256.bzip2 C Compression
300.twolf C Place and Route Simulator
http://www.spec.org/osg/cpu2000/CINT2000/
13
BenchmarksCFP2000 (Floating Point Component of SPEC
CPU2000):Program Language What Is It168.wupwiseFortran 77 Physics / Quantum Chromodynamics171.swim Fortran 77 Shallow Water Modeling172.mgrid Fortran 77 Multi-grid Solver: 3D Potential Field173.applu Fortran 77 Parabolic / Elliptic Differential Equations177.mesa C 3-D Graphics Library178.galgel Fortran 90 Computational Fluid Dynamics179.art C Image Recognition / Neural Networks183.equake C Seismic Wave Propagation Simulation187.facerec Fortran 90 Image Processing: Face Recognition188.ammp C Computational Chemistry189.lucas Fortran 90 Number Theory / Primality Testing191.fma3d Fortran 90 Finite-element Crash Simulation 200.sixtrack Fortran 77 High Energy Physics Accelerator Design 301.apsi Fortran 77 Meteorology: Pollutant Distribution
http://www.spec.org/osg/cpu2000/CFP2000/
14
Benchmarks Sample Results For SpecINT2000
Base Base Base Peak Peak Peak
Benchmarks Ref Time Run Time Ratio Ref Time Run Time Ratio
164.gzip 1400 277 505* 1400 270 518*
175.vpr 1400 419 334* 1400 417 336*
176.gcc 1100 275 399* 1100 272 405*
181.mcf 1800 621 290* 1800 619 291*
186.crafty 1000 191 522* 1000 191 523*
197.parser 1800 500 360* 1800 499 361*
252.eon 1300 267 486* 1300 267 486*
253.perlbmk 1800 302 596* 1800 302 596*
254.gap 1100 249 442* 1100 248 443*
255.vortex 1900 268 710* 1900 264 719*
256.bzip2 1500 389 386* 1500 375 400*
300.twolf 3000 784 382* 3000 776 387*
SPECint_base2000 438
SPECint2000 442
http://www.spec.org/osg/cpu2000/results/res2000q3/cpu2000-20000718-00168.asc
Intel OR840(1 GHz Pentium III processor)
15
BenchmarksPerformance Evaluation
• “For better or worse, benchmarks shape a field”• Good products created when have:
– Good benchmarks– Good ways to summarize performance
• Given sales is a function in part of performance relative to competition, investment in improving product as reported by performance summary
• If benchmarks/summary inadequate, then choose between improving product for real programs vs. improving product to get more sales;Sales almost always wins!
• Execution time is the measure of computer performance!
16
Benchmarks
Management would like to have one number.
Technical people want more:
1. They want to have evidence of reproducibility – there should be enough information so that you or someone else can repeat the experiment.
2. There should be consistency when doing the measurements multiple times.
How to Summarize Performance
How would you report these results?
Computer A Computer B Computer C
Program P1 (secs) 1 10 20
Program P2 (secs) 1000 100 20
Total Time (secs) 1001 110 40
17
Quantitative Principles of Computer Design
1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer Design
1.7 Putting It All Together: The Concept of Memory Hierarchy
Make the common case fast.Amdahl’s Law:
Relates total speedup of a system to the speedup of some portion of that system.
18
Amdahl's Law
Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected
Quantitative Design
tEnhancemenWithoutePerformanc
tEnhancemenWithePerformanc
tEnhancemenWithTimeExecution
tEnhancemenWithoutTimeExecutionESpeedup
__
__
___
___)(
Speedup due to enhancement E:
This fraction enhanced
19
Quantitative Design
“Instruction Frequency”
Invest Resources where time is Spent!
CPI = (CPU Time * Clock Rate) / Instruction Count = Cycles / Instruction Count
n
iii ICPITimeCycleTimeCPU
1
**__
n
iii FCPICPI
1
* whereCountnInstructio
Ii
iF _
Number of instructions of type I.
Cycles Per Instruction
20
Quantitative Design
Base Machine (Reg / Reg)
Op Freq Cycles CPI(i) (% Time)
ALU 50% 1 .5 (33%)
Load 20% 2 .4 (27%)
Store 10% 2 .2 (13%)
Branch 20% 2 .4 (27%)
Total CPI 1.5
Suppose we have a machine where we can count the frequency with which instructions are executed. We also know how many cycles it takes for each instruction type.
Cycles Per Instruction
21
Quantitative Design
Locality of Reference
Programs access a relatively small portion of the address space at any instant of time.
There are two different types of locality:
Temporal Locality (locality in time): If an item is referenced, it will tend to be referenced again soon (loops, reuse, etc.)
Spatial Locality (locality in space/location): If an item is referenced, items whose addresses are close by tend to be referenced soon (straight line code, array access, etc.)
22
The Concept of Memory Hierarchy
1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer Design
1.7 Putting It All Together: The Concept of Memory Hierarchy
Fast memory is expensive.
Slow memory is cheap.
The goal is to minimize the price/performance for a particular price point.
23
Memory Hierarchy
RegistersLevel 1 cache
Level 2Cache
Memory Disk
Typical Size
4 - 64 <16K bytes <2 Mbytes <16 Gigabytes
> 5 Gigabytes
Access Time
1 nsec 3 nsec 15 nsec 150 nsec 5,000,000 nsec
Bandwidth (in MB/sec)
10,000 – 50,000
2000 - 5000 500 - 1000 500 - 1000 100
Managed By
Compiler Hardware Hardware OS OS/User
24
Memory Hierarchy
• Hit: data appears in some block in the upper level (example: Block X)
– Hit Rate: the fraction of memory access found in the upper level– Hit Time: Time to access the upper level which consists of
RAM access time + Time to determine hit/miss• Miss: data needs to be retrieve from a block in the lower level
(Block Y)– Miss Rate = 1 - (Hit Rate)– Miss Penalty: Time to replace a block in the upper level +
Time to deliver the block the processor• Hit Time << Miss Penalty (500 instructions on 21264!)
25
Memory Hierarchy
RegistersLevel 1 cache
Level 2Cache
Memory Disk
What is the cost of executing a program if:• Stores are free (there’s a write pipe)• Loads are 20% of all instructions• 80% of loads hit (are found) in the Level 1 cache• 97 of loads hit in the Level 2 cache.
26
The Instruction Set2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
2.8 The MIPS Architecture
Bonus
27
IntroductionThe Instruction Set Architecture is that portion of the machine visible
to the assembly level programmer or to the compiler writer.
1. What are the advantages and disadvantages of various instruction set alternatives.
2. How do languages and compilers affect ISA.
3. Use the DLX architecture as an example of a RISC architecture.
instruction set
software
hardware
28
Classifying Instruction Set Architectures
Classifications can be by:
1. Stack/accumulator/register
2. Number of memory operands.
3. Number of total operands.
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
2.8 The DLX Architecture
29
Instruction Set Architectures
Accumulator:
1 address add A acc acc + mem[A]
1+x address addx A acc acc + mem[A + x]
Stack:
0 address add tos tos + next
General Purpose Register:
2 address add A B EA(A) EA(A) + EA(B)
3 address add A B C EA(A) EA(B) + EA(C)
Load/Store:
0 Memory load R1, Mem1
load R2, Mem2
add R1, R2
1 Memory add R1, Mem2
Basic ISA Classes
ALU Instructions can have two or three operands.
ALU Instructions can have 0, 1, 2, 3 operands. Shown here are cases of 0 and 1.
30
Instruction Set Architectures
Basic ISA Classes
Stack Accumulator Register
(Register-memory)
Register
(load-store)
Push A Load A Load R1, A Load R1, A
Push B Add B Add R1, B Load R2, B
Add Store C Store C, R1 Add R3, R1, R2
Pop C Store C, R3
The results of different address classes is easiest to see with the examples here, all of which implement the sequences for C = A + B.
Registers are the class that won out. The more registers on the CPU, the better.
31
Instruction Set Architectures
Intel 80x86 Integer Registers
GPR0 EAX Accumulator
GPR1 ECX Count register, string, loop
GPR2 EDX Data Register; multiply, divide
GPR3 EBX Base Address Register
GPR4 ESP Stack Pointer
GPR5 EBP Base Pointer – for base of stack seg.
GPR6 ESI Index Register
GPR7 EDI Index Register
CS Code Segment Pointer
SS Stack Segment Pointer
DS Data Segment Pointer
ES Extra Data Segment Pointer
FS Data Seg. 2
GS Data Seg. 3
PC EIP Instruction Counter
Eflags Condition Codes
32
Memory Addressing
Sections Include:
Interpreting Memory Addresses
Addressing Modes
Displacement Address Mode
Immediate Address Mode
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
2.8 The DLX Architecture
33
Memory Addressing
What object is accessed as a function of the address and length?
Objects have byte addresses – an address refers to the number of bytes counted from the beginning of memory.
Little Endian – puts the byte whose address is xx00 at the least significant position in the word.
Big Endian – puts the byte whose address is xx00 at the most significant position in the word.
Alignment – data must be aligned on a boundary equal to its size. Misalignment typically results in an alignment fault that must be handled by the Operating System.
Interpreting Memory Addresses
34
Memory Addressing
Addressing Modes
This table shows the most common modes.
Addressing Mode Example Instruction
Meaning When Used
Register Add R4, R3 R[R4] <- R[R4] + R[R3] When a value is in a register.
Immediate Add R4, #3 R[R4] <- R[R4] + 3 For constants.
Displacement Add R4, 100(R1) R[R4] <- R[R4] +
M[100+R[R1] ]
Accessing local variables.
Register Deferred Add R4, (R1) R[R4] <- R[R4] +
M[R[R1] ]
Using a pointer or a computed address.
Absolute Add R4, (1001) R[R4] <- R[R4] + M[1001] Used for static data.
35
Memory Addressing
Displacement Addressing Mode
How big should the displacement be?
For addresses that do fit in displacement size:Add R4, 10000 (R0)
For addresses that don’t fit in displacement size, the compiler must do the following:
Load R1, addressAdd R4, 0 (R1)
Depends on typical displaces as to how big this should be.
On both IA32 and DLX, the space allocated is 16 bits.
36
Memory Addressing
Immediate Address Mode
Used where we want to get to a numerical value in an instruction.
At high level:
a = b + 3;
if ( a > 17 )
goto Addr
At Assembler level:
Load R2, 3Add R0, R1, R2
Load R2, 17CMPBGT R1, R2
Load R1, AddressJump (R1)
37
Operations In The Instruction Set
Sections Include:
Detailed information about types of instructions.
Instructions for Control Flow (conditional branches, jumps)
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
2.8 The DLX Architecture
38
Operations In The Instruction Set
Arithmetic and logical and, add
Data transfer move, load
Control branch, jump, call
System system call, traps
Floating point add, mul, div, sqrt
Decimal add, convert
String move, compare
Multimedia - 2D, 3D? e.g., Intel MMX and Sun VIS
Operator Types
39
Operations In The Instruction Set
Control Instructions Issues:• taken or not• where is the target • link return address• save or restore
Instructions that change the PC:• (conditional) branches, (unconditional) jumps• function calls, function returns• system calls, system returns
Control InstructionsConditional branches are 20%
of all instructions!!
40
Operations In The Instruction Set
There are numerous tradeoffs:
Compare and branch
+ no extra compare, no state passed between instructions
-- requires ALU op, restricts code scheduling opportunities
Implicitly set condition codes Z, N, V, C
+ can be set ``for free''
-- constrains code reordering, extra state to save/restore
Explicitly set condition codes
+ can be set ``for free'', decouples branch/fetch from pipeline
-- extra state to save/restore
Control Instructions
There are numerous tradeoffs:
condition in general purpose register + no special state but uses up a register -- branch condition separate from branch
logic in pipeline some data for MIPS
> 80% branches use immediate data, > 80% of those zero
50% branches use == 0 or <> 0 compromise in MIPS
branch==0, branch<>0 compare instructions for all other
compares
41
Operations In The Instruction Set
Link Return Address:
implicit register many recent architectures use this
+ fast, simple
-- s/w save register before next call, surprise traps?
explicit register + may avoid saving register
-- register must be specified
processor stack + recursion direct
-- complex instructions
Control Instructions
Save or restore state:
What state? function calls: registers system calls: registers, flags, PC, PSW, etc
Hardware need not save registers Caller can save registers in use
Callee save registers it will use Hardware register save
IBM STM, VAX CALLS Faster?
Many recent architectures do no register saving
Or do implicit register saving with register windows (SPARC)
42
Type And Size of Operands
The type of the operand is usually encoded in the Opcode – a LDW implies loading of a word.
Common sizes are:Character (1 byte)
Half word (16 bits)
Word (32 bits)
Single Precision Floating Point (1 Word)
Double Precision Floating Point (2 Words)
Integers are two’s complement binary.
Floating point is IEEE 754.
Some languages (like COBOL) use packed decimal.
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
2.8 The DLX Architecture
43
Encoding And Instruction SetThis section has to do with how an assembly level instruction is encoded into binary.
Ultimately, it’s the binary that is read and interpreted by the machine.
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
2.8 The DLX Architecture
44
Encoding And Instruction Set
80x86 Instruction Encoding
for ( index = 0; index < iterations; index++ )
0040D3AF C7 45 F0 00 00 00 00 mov dword ptr [ebp-10h],0
0040D3B6 EB 09 jmp main+0D1h (0040d3c1)
0040D3B8 8B 4D F0 mov ecx,dword ptr [ebp-10h]
0040D3BB 83 C1 01 add ecx,1
0040D3BE 89 4D F0 mov dword ptr [ebp-10h],ecx
0040D3C1 8B 55 F0 mov edx,dword ptr [ebp-10h]
0040D3C4 3B 55 F8 cmp edx,dword ptr [ebp-8]
0040D3C7 7D 15 jge main+0EEh (0040d3de)
long_temp = (*alignment + long_temp) % 47;
0040D3C9 8B 45 F4 mov eax,dword ptr [ebp-0Ch]
0040D3CC 8B 00 mov eax,dword ptr [eax]
0040D3CE 03 45 EC add eax,dword ptr [ebp-14h]
0040D3D1 99 cdq
0040D3D2 B9 2F 00 00 00 mov ecx,2Fh
0040D3D7 F7 F9 idiv eax,ecx
0040D3D9 89 55 EC mov dword ptr [ebp-14h],edx
0040D3DC EB DA jmp main+0C8h (0040d3b8)
Here’s some sample code that’s been disassembled.
It was compiled with the debugger option so is not
optimized.
This code was
produced using Visual
Studio
45
Encoding And Instruction Set
80x86 Instruction Encoding
for ( index = 0; index < iterations; index++ )
00401000 8B 0D 40 54 40 00 mov ecx,dword ptr ds:[405440h]
00401006 33 D2 xor edx,edx
00401008 85 C9 test ecx,ecx
0040100A 7E 14 jle 00401020
0040100C 56 push esi
0040100D 57 push edi
0040100E 8B F1 mov esi,ecx
long_temp = (*alignment + long_temp) % 47;
00401010 8D 04 11 lea eax,[ecx+edx]
00401013 BF 2F 00 00 00 mov edi,2Fh
00401018 99 cdq
00401019 F7 FF idiv eax,edi
0040101B 4E dec esi
0040101C 75 F2 jne 00401010
0040101E 5F pop edi
0040101F 5E pop esi
00401020 C3 ret
Here’s some sample code that’s been disassembled.
It was compiled with optimization
This code was
produced using Visual
Studio
46
Encoding And Instruction Set
80x86 Instruction Encoding
for ( index = 0; index < iterations; index++ )
0x804852f <main+143>: add $0x10,%esp
0x8048532 <main+146>: lea 0xfffffff8(%ebp),%edx
0x8048535 <main+149>: test %esi,%esi
0x8048537 <main+151>: jle 0x8048543 <main+163>
0x8048539 <main+153>: mov %esi,%eax
0x804853b <main+155>: nop
0x804853c <main+156>: lea 0x0(%esi,1),%esi
long_temp = (*alignment + long_temp) % 47;
0x8048540 <main+160>: dec %eax
0x8048541 <main+161>: jne 0x8048540 <main+160>
0x8048543 <main+163>: add $0xfffffff4,%esp
Here’s some sample code that’s been disassembled.
It was compiled with optimization
This code was
produced using gcc and gdb.
Note that the representation of the code is dependent on the compiler/debugger!
47
Encoding And Instruction Set
80x86 Instruction Encoding
RegADD Disp.
34 8
postbyteSHL
6 8
V/w
2
Disp.
8
TEST
7
W
1
postbyte
8
Immediate
8
W
1 A Morass of disjoint encoding!!
48
Encoding And Instruction Set
80x86 Instruction Encoding
CALLF Offset Segment Number
CondJE Disp.
44
8 16 16
8
postbyteMOV
6 8
D/w
2
Disp.
8
PUSH
5
Reg
3
49
The Role of Compilers
Compiler goals: • All correct programs execute
correctly • Most compiled programs
execute fast (optimizations) • Fast compilation • Debugging support
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
2.8 The DLX Architecture
50
The Role of Compilers
Steps In Compilation
Parsing > intermediate representation Jump Optimization Loop Optimizations Register Allocation Code Generation > assembly code Common Sub Expression Procedure in-lining Constant Propagation Strength Reduction Pipeline Scheduling
51
The Role of Compilers
Steps In Compilation
Optimization Name
Explanation % of the total number of optimizing
transformations
High Level At or near the source level; machine-independent
Not Measured
Local Within Straight Line Code 40%
Global Across A Branch 42%
Machine Dependent Depends on Machine Knowledge Not Measured
52
The Role of Compilers
What compiler writers want:
• regularity • orthogonality • composability
Compilers perform a giant case analysis
• too many choices make it hard
Orthogonal instruction sets • operation, addressing mode,
data type
One solution or all possible solutions • 2 branch conditions eq, lt • or all six eq, ne, lt, gt, le, ge • not 3 or 4
There are advantages to having instructions that are primitives.
Let the compiler put the instructions together to make more complex sequences.
53
The MIPS Architecture
MIPS is very RISC oriented.
MIPS will be used for many examples throughout the course.
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
2.8 The MIPS Architecture
MIPS (originally an acronym for Microprocessor without Interlocked Pipeline Stages) is a RISC microprocessor architecture developed by MIPS Technologies. We will look at the Pipeline concept in our next lecture.
The acronym RISC (pronounced risk), for reduced instruction set computer represents a CPU design strategy emphasizing the insight that simplified instructions which "do less" may still provide for higher performance if this simplicity can be utilized to make instructions execute very fast. Well known RISC families include DEC Alpha, ARC, ARM, AVR, MIPS, PA-RISC, Power Architecture (including PowerPC), and SPARC.
54
The MIPS Architecture
MIPS Characteristics
32 bit byte addresses aligned Load/store only displacement
addressing Standard data types 3 fixed length formats 32 32 bit GPRs (r0 = 0) 16 64 bit (32 32 bit) FPRs FP status register No Condition Codes
Data transfer • load/store word, load/store byte/half
word signed? • load/store FP single/double • moves between GPRs and FPRs ALU • add/subtract signed? immediate? • multiply/divide signed? • and, or, xor immediate?, shifts: ll, rl,
ra immediate? • sets immediate?
There’s MIPS – 64 – the current arch.Standard datatypes 4 fixed length formats (8,16,32,64)32 64 bit GPRs (r0 = 0) 64 64 bit FPRs
Addressing Modes• Immediate• Displacement • (Register Mode used only for ALU)
55
The MIPS Architecture
MIPS Characteristics
Control • branches == 0, <> 0 • conditional branch testing FP bit • jump, jump register • jump & link, jump & link register • trap, return from exception
Floating Point• add/sub/mul/div • single/double • fp converts, fp set
56
The DLX Architecture
The DLX is a RISC processor architecture design by the principal designers of the MIPS and the Berkeley RISC designs, the two benchmark examples of RISC design. The DLX is essentially a cleaned up and simplified MIPS with a simple 32-bit load/store architecture. Intended primarily for teaching purposes, the DLX design is widely used in university-level computer architecture courses.
The next couple of lectures will use the MIPS and DLX architectures as examples to demonstrate concepts.
57
End of Lecture