slide 1 scalar processor design phenomenal advances in its brief lifetime of 30+ years : x2/18mo in...

18
Slide 1 Scalar Processor Design Phenomenal advances in its brief lifetime of 30+ years: X2/18mo in 30yr. multi-GFLOPs processors, inspiring and facilitating major innovations in: Embedded microcontrollers (300 millions sold in 2000) Personal computers (150 millions sold in 2000) Advanced workstations Handheld and mobile devices Application and file servers (4 millions sold in 2000) Web servers for the Internet Low-cost supercomputers Large-scale computing clusters Well over one billion microprocessors shipped per year The amazing decades of the evolution of microprocessors: 1970-1980 1980-1990 1990-2000 2000-2010 Transistor count 2K-100K 100K-1M 1M-100M 100M-2B Clock frequency 0.1-3 MHz 3-30 MHz 30MHz-1GHz 1GHz-15GHz Instruction/ 0.1 IPC 0.1-0.9 IPC 0.9-1.9 IPC 1.9-2.9 IPC

Post on 21-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Slide 1

Scalar Processor Design• Phenomenal advances in its brief lifetime of 30+ years: X2/18mo in 30yr. multi-GFLOPs processors, inspiring and facilitating major innovations in:– Embedded microcontrollers (300 millions sold in 2000)– Personal computers (150 millions sold in 2000)– Advanced workstations– Handheld and mobile devices– Application and file servers (4 millions sold in 2000)– Web servers for the Internet– Low-cost supercomputers– Large-scale computing clusters– Well over one billion microprocessors shipped per year

• The amazing decades of the evolution of microprocessors:

1970-1980 1980-1990 1990-2000 2000-2010

Transistor count

2K-100K 100K-1M 1M-100M 100M-2B

Clock frequency

0.1-3 MHz 3-30 MHz 30MHz-1GHz

1GHz-15GHz

Instruction/cycle

0.1 IPC 0.1-0.9 IPC 0.9-1.9 IPC 1.9-2.9 IPC

Slide 2

Scalar Processor Design

Past (Milestones):– First electronic computer ENIAC in 1946: 18,000 vacuum tubes, 3,000 cubic feet, 20 2-foot 10-digit registers, 5 KIPs (thousand additions per second);

– First microprocessor (a CPU on a single IC chip) Intel 4004 in 1971: 2,300 transistors, 60 KIPs, $200;

– Virtual elimination of assembly language programming reduced the need for object-code compatibility;

– The creation of standardized, vendor-independent operating systems, such as UNIX and its clone, Linux, lowered the cost and risk of bringing out a new architecture

– RISC instruction set architecture paved ways for drastic design innovations that focused on two critical performance techniques: instruction-level parallelism and use of caches

Slide 3

Scalar Processor Design

Present (State of the art): – Microprocessors approaching/surpassing 10 GFLOPS;– A high-end microprocessor (<$10K) today is easily more powerful than a supercomputer (>$10million) ten years ago;

– While technology advancement contributes a sustained annual growth of 35%, innovative computer design accounts for another 25% annual growth rate a factor of 15 in performance gains!(fig1.1)

– Three different computing markets (fig. 1.3):» Desktop Computing –- driven by price-performance (a few hundreds through over 10K);

» Servers – availability driven (distinguished from reliability), providing sustained high performance

» Embedded Computers – fastest growing portion of the computer market, real-time performance driven, and need to minimize memory and power, as well as ASIC

Slide 4

Scalar Processor Design

Present (State of the art): – The Task of the Computer Designer:

» Instruction Set Architecture (Traditional view of what Computer Architecture is), the boundary between software and hardware;

» Organization, high-level aspects of a computer’s design, such as the memory system, the bus structure, the internal design of CPU, based on a given instruction set architectrue;

» Hardware, the specifics of a machine, including the detailed logic design and the packaging technology of the machine.

Future (Technology Trends): – A truly successful instruction set architecture (ISA) should last for decades, however it takes an computer architect’s acute observation and knowledge of the rapidly changing technology, in order for the ISA to survive and cope with such changes:

Slide 5

Scalar Processor Design

» IC logic technology: transistor count on a chip grows at 55% annual rate (35% density growth rate + 10-20% die size growth) while device speed scales more slowly;

» Semiconductor DRAM: density grows at 60% annually while cycle time improves very slowly (decreasing one-third in ten years). Bandwidth per chip increases twice as fast as latency decreases;

» Magnetic dish technology: density increases at 100% annual rate since 1990 while access time improves at about a third every ten years; and

» Network technology: both latency and bandwidth have been improving, with more focus on bandwidth of late; the increasing importance of networking has led to faster improvement in performance than before—Internet bandwidth doubles every year in the U.S.

» Scaling of transistor performance: while transistor density increases quadratically with linear decrease in feature size, transistor performance increases roughly linearly with decrease in feature sizechallenge & opportunity for challenge & opportunity for computer designercomputer designer!

» Wires and power in IC: propagation delay and power needs?

Slide 6

Instruction Set Architecture

• ISA should reflect application characteristics: –Desktop computing is compute-intensive, thus focusing on features favoring Integer and FP ops;

–Server computing is data-intensive, focusing on integers and char-strings (yet FP ops are still standard in them)

–Embedded computing is time-sensitive, memory and power conciouse, thus focusing on code-density, real-time and media data streams.

• ISA has been defined as a contract between the software and hardware, or between the program and the machine, thus facilitating independent development of programs and machines.

Slide 7

Instruction Set Architecture

• Taxonomy of ISA:– StackStack: both operands are implicit on the top of the stack, a data structure in which items are accessed an a last in, first out fashion.

– AccumulatorAccumulator: : one operand is implicit in the accumulator, a special-purpose register.

– General Purpose RegisterGeneral Purpose Register: all operands are explicit in specified registers or memory locations. Depending on where operands are specified and stored, there are three different ISA groups:» Register-Memory: one operand in register and one in memory.Examples: IBM 360/370, Intel 80x86 family, Mototola 68000;

» Memory-Memory: both operands are in memory. Example: VAX.» Register=Register (load & store): all operands, except for those in load and store instructions, are in registers. Examples: SPARC (Sun Microsystems), MIPS, Precision Architecture (HP), PowerPC (IBM), Alpha (DEC).

Slide 8

Taxonomy of ISA: Examples Instruction Set Architecture

TOS

ALUALU ALU ALU

AccumulatorStack Reg. Set Reg. Set

Memory Memory Memory Memory

(a) Stack (b) Accumulator (c) Register-Memory (d) Reg-Reg/Load-Store

Push APush BAddPop C

Load AAdd BStore C

Load R1,AAdd R3,R1,BStore R3,C

Load R1,ALoad R2,BAdd R3,R1,R2Store R3,C

CA+B

ALU

Memory

Add C,A,B

(e) Memory-Memory

Slide 9

Dynamic and Static Interface

• Inherent in each ISA’s definition is an associated definition of an interface, called the dynamic and dynamic and static interfacestatic interface (DSI), that separates what is done statically at compile time versus what is done dynamically at run time.– A key issue in the design of an ISA is the placement of the DSI:

Program (Software)

Machine (Hardware)

Architecture (DSI)

Compiler complexity

Hardware complexity

Exposed to software

Hidden in hardware

“Static”

“Dynamic”

DEL ~CISC ~VLIW ~RISCHLL Program

DSI-1

DSI-2

DSI-3

Hardware

Slide 10

Processor Performance

• Performance Equation:– 1/Performance = Time/Program

= (Instuctions/Program)(Cycles/Instructions)(Time/Cycle)

= IC * CPI * CCT

Where: IC=Instruction Count;

CPI=Cycles per Instruction; CCT=Clock cycle time

• Performance Optimization: reducing one or more of the three factors (IC, CPI, CCT)

– IC: ISA dependent (CISC vs RISC); compiler dependent; dynamic elimination of redundant computation (computation reuse);

– CPI: ISA and instruction complexity dependent; microarchitecture dependent (pipelining and speculative execution of instructions);

– CCT: microarchitecture; pipelining; clock frequency; pipeline depth; can affect CPI;

Slide 11

Processor Performance• Performance Evaluation: functional (ISA) and performance functional (ISA) and performance

(CPI)(CPI)– Trace-driven simulation

– Execution-driven simulation

Physical execution with software instrumentation

TracesCycle-based performance

simulator

Physical execution with hardware instrumentation

Functional simulator

Trace generation

Trace-driven Timing simulation

Functional simulator(instruction interpretation)

Functional simulation

Trace storage

Cycle-based performance

simulator

Execution-driven Timing simulation

Execution trace

Checkpoint and control

Traces

Slide 12

Processor Performance• Performance Evaluation: Amdahl’s Law Amdahl’s Law

– Idealized pipeline execution profile

– Realistic pipeline execution profile

– Refined speedup equation :

1-g g

N

gg

Speedup

)1(

1

N

1Pip

elin

e D

epth

N

gggSpeedup

N

...

21

121

N

1

Pipeline stall Pipeline stall

enhanceSpeedup

ff

Speedup

)1(

1

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18