powerpoint presentation

Post on 30-Oct-2014

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

ECE337 Fall 2009

Chapter 1

Introduction to Computer Architecture and Organization

Architecture & Organization 1

• Architecture is those attributes visible to the programmer (alternatively say, those attributes have a direct impact on the logic execution of a program)– Instruction set, number of bits used for data

representation, I/O mechanisms, addressing techniques.– e.g. Is there a multiply instruction?

• Organization is how features are implemented– Control signals, interfaces, memory technology.– e.g. Is there a hardware multiply unit or is it done by

repeated addition?

Architecture & Organization 2

• All Intel x86 family share the same basic architecture

• This gives code compatibility– At least backwards

• Organization differs between different versions, same architecture, but different organization

Architecture & Organization 3

• Also the Intel Pentium and the AMD Athlon have nearly identical versions of the x86 instruction set architecture, but have totally different internal designs to implement the same architecture.

Why study computer architecture and organization?

• Acquire understanding and appreciation of a computer system’s functional components, their characteristics, their performance and their interactions

• Structure a program that runs more efficiently on a real machine

• Understand the trade-off among various components in order to select the most cost-effective computer system- IEEE/ACM Computer Curricula 2001

Languages, Levels, Virtual (hypothetical) Machines

A multilevel model

primitive,machine-oriented

complex, people-oriented

Translator and interpreter

• Translator : translate instructions before execution

eg. Compiler, Assembler

• Interpreter : interpret instructions during execution

eg. Matlab software, OS software

Contemporary Multilevel Machines

A six-level computer. The support method for each level is indicated below it (along with the name of supporting program).

gates, registers

collection of registers,ALU(data path)

machine language instruction sets (opcode)

Hybrid level. new instruction sets (system calls), memory organization, multiple processes control etc.

system programmer

application programmer

Computer Generations • First Generation

Vacuum Tubes (1945 – 1955)

• Second GenerationTransistors (1955 – 1965)

• Third GenerationIntegrated Circuits (1965 – 1980)

• Fourth GenerationVery Large Scale Integration (1980 – ?)

Computer History - An Overview

http://library.thinkquest.org/18268/History/hist_m.htm

Vacuum Tubes ENIAC - background

• Electronic Numerical Integrator And Computer• Eckert and Mauchly• University of Pennsylvania• Trajectory tables for weapons • Started 1943• Finished 1946

– Too late for war effort

• Used until 1955

ENIAC - details

• Decimal (not binary)• 20 accumulators of 10 digits• Programmed manually by switches• 18,000 vacuum tubes• 30 tons• 15,000 square feet• 140 kW power consumption• 5,000 additions per second

von Neumann machine• Stored Program concept• Main memory storing programs and data• ALU operating on binary data• Control unit interpreting instructions from memory

and executing • Input and output equipment operated by control unit• Princeton Institute for Advanced Studies

– IAS machine• Completed 1952

Structure of von Neumann machine

EDSAC

• the first stored program computer

• built in 1949

• could complete 714 operations per second.

IAS machine- details

• Stored Program

• 4096 x 40 bit word memory– Binary number– 2 x 20 bit instructions– A 40-bit signed integer

Transistors

• Replaced vacuum tubes• Smaller• Cheaper• Less heat dissipation• Solid State device• Made from Silicon• Invented 1947 at Bell Labs• William Shockley et al.

TX-0 (1956)

• the first general-purpose, programmable computer built with transistors.

• created by MIT researchers.

PDP-1(1960)

• The PDP-1 sold for $120,000.

• MIT wrote the first video game, Space War! for it.

• A total of 50 were built.

Integrated Circuit - Microelectronics

• Literally - “small electronics”

• A computer is made up of gates, memory cells and interconnections among them

• These can be manufactured on a semiconductor

• e.g. silicon wafer

IBM 360 (1964)

• IBM 360 with different models

• a great variety of combinations of speed, memory, and power.

• all the models were compatible

• IBM were getting 1,000 orders each month, month within two years.

Very Large Scale Integration

• 1981 - IBM PC

IBM's first PC ran on Intel's 4.77 MHz 8088 microprocessor. It came with Microsoft's MS-DOS operating system.

IBM PS/2 (1987)

• IBM's PS/2 was based on Intel's 80386 chip. At the same time, IBM introduced OS/2

• More than 1 million machines were sold by the end of the year.

Computer Generation

• Vacuum tube - 1946-1957• Transistor - 1958-1964• Small scale integration - 1965 - 1971

– Up to 100 devices on a chip

• Medium scale integration – 1965 -1971– 100-3,000 devices on a chip

• Large scale integration - 1971-1979– 3,000 - 100,000 devices on a chip

• Very large scale integration - 1980 -1991– 100,000 - 100,000,000 devices on a chip

• Ultra large scale integration – 1991 -– Over 100,000,000 devices on a chip

Moore’s Law• Gordon Moore – co-founder of Intel • Number of transistors on a chip was doubling every two years• Since 1970’s development has slowed a little

– Number of transistors doubles every 18 months

• Cost of a chip has remained almost unchanged• Higher packing density means shorter electrical paths, giving

higher speed• Smaller size gives increased flexibility for usage environment• Reduce power and cooling requirements• Fewer interconnections increases reliability (than solder

connections)

Growth in CPU Transistor Count

The Computer Spectrum

A rough categorization of current computers

Better Performance Design: Speeding Microprocessor Up

• Pipelining

• On board cache

• Branch prediction

• Data flow analysis

• Speculative execution

Computer Architecture Challenge: Performance Balance

• Processor speed increased

• Memory capacity increased

• Memory speed lags behind processor speed

Mismatch interface between processor and main memory

Logic(CPU) and Memory Performance Gap

Solutions• Increase number of bits retrieved at one time

– Make DRAM “wider”

• Reduce frequency of memory access– More complex cache and cache on chip

• Increase interconnection bandwidth– High speed buses– Hierarchy of buses

• The processor bus VLB• The cache bus• The memory bus• The local I/O bus (VLB, PCI)• The standard I/O bus (ISA) PCI

ISA

Typical I/O Device Data Rates

Improvements in Processor Chip

• Increase hardware speed of processor– Fundamentally due to shrinking logic gate size

• More gates, packed more tightly, increasing clock rate• Propagation time for signals reduced

• Increase size and speed of caches– Dedicating part of processor chip

• Cache access times drop significantly

• Change processor organization and architecture– Increase effective speed of execution– Parallelism (instruction-level)

Problems with Clock Speed and Logic Density• Power

– Power density increases with density of logic and clock speed– Dissipating heat

• RC delay(the signal delay of a wire) – Speed at which electrons flow limited by resistance and

capacitance of metal wires connecting them– Delay increases as RC product increases– Wire interconnects thinner, increasing resistance– Wires closer together, increasing capacitance

• Solution:– More emphasis on organizational and architectural approaches

Solution1: Increased Cache Capacity

• Typically two or three levels of cache between processor and main memory

• Chip density increased– More cache memory on chip

• Faster cache access

• Pentium chip devoted about 10% of chip area to cache

• Pentium 4 devotes about 50%

Solution 2: More Complex Execution Logic

• Enable parallel execution of instructions

• Pipeline works like assembly line– Different stages of execution of different

instructions at same time along pipeline

• Superscalar allows multiple pipelines within single processor– Instructions that do not depend on one another

can be executed in parallel

Soultion3: Processor-level Parallelism – Multiple Cores

• Multiple processors on single chip– Large shared cache

• If software can use multiple processors, doubling number of processors almost doubles performance

• Use two simpler processors on the chip rather than one more complex processor

• With two processors, larger caches are justified– Power consumption of memory logic less than processing logic

• Example: IBM POWER4– Two cores based on PowerPC

Pentium Evolution (CISC)(1)• 8080

– first general purpose microprocessor– 8 bit data path– Used in first personal computer

• 8086– much more powerful– 16 bit– instruction queue, prefetch a few instructions– 8088 (8 bit external bus) used in first IBM PC

• 80286– 16 Mbyte memory addressable instead of just 1 Mbyte

• 80386– 32 bit (IA-32 architecture, or called x86, x86-32)– Support for multitasking

Pentium Evolution (2)• 80486

– sophisticated powerful cache and instruction pipelining– built in math co-processor, offloading complex math operations from the

main CPU

• Pentium– Superscalar– Multiple instructions executed in parallel

• Pentium Pro– Increased superscalar organization– Aggressive register renaming– branch prediction– data flow analysis– speculative execution

Pentium Evolution (3)• Pentium II

– MMX technology (graphics, video & audio processing)

• Pentium III– Additional floating point instructions for 3D graphics

• Pentium 4– Further floating point and multimedia enhancements

• Itanium– 64 bit – IA-64 Architecture

• Itanium 2– Hardware enhancements to increase speed

• See Intel web pages for detailed information on processors

Intel Computer Family

Moore’s law for (Intel) CPU chips.

PowerPC (RISC)• 1975, 801 minicomputer project (IBM) RISC

• 1986, IBM first commercial RISC workstation product, RT PC. 2MIPS.

• 1990, IBM RISC System/6000– RISC-like superscalar machine

– referred to as POWER architecture

• IBM alliance with Motorola (68000 microprocessors), and Apple, (used 68000 in Macintosh)

• Result is PowerPC architecture– Derived from the POWER architecture

– Superscalar RISC

– Used in Apple Macintosh

– Embedded chip applications

PowerPC Family (general-purpose)(1)• 601 (G1):

– Quickly bring PowerPC architecture to market. – 32-bit machine

• 603(G2):– Low-end desktop and portable computers– 32-bit– Comparable performance with 601– Lower cost and more efficient implementation

• 604(G2):– Desktop and low-end servers– 32-bit machine– Much more advanced superscalar design and greater performance

• 620(G2):– High-end servers– 64-bit architecture

PowerPC Family (2)

• 740/750(G3):– Also known as G3 processor– Two levels of cache on chip

• G4:– Increases parallelism and internal speed

• G5:– Improvements in parallelism and internal speed – 64-bit organization

Internet Resources

• http://www.intel.com/ – Search for the Intel Museum, click on online

exhibit

• http://www.ibm.com

• PowerPC (IBM, Motorola)

• Intel Developer Home, http://developer.intel.com/design/index.htm

Example Computer Families

• Pentium 4 by Intel (CISC)

von Neumann architecture

• UltraSPARC III by Sun Microsystems (RISC) von Neumann architecture

• The 8051 chip by Intel, used for embedded systems

Harvard architecture

Harvard architecture v.s. von Neumann architecture

• Harvard architecture– Separate instruction and data memory– Can read an instruction and access data

memory at the same time

• Von Neumann architecture– Memory store instruction and data– Instruction fetch and data access cannot be at

the same time because of the same bus system

Read Chapter 1 and Review Appendix A & B

Reading Assignment

top related